Migrating from CloudQuery v0 to v1

Migrating from CloudQuery v0 to v1

October 3, 2022

Herman Schaaf
Name
Herman Schaaf
Twitter
hermanschaaf

We are thrilled to announce the release of the first major version of CloudQuery--see our v1 announcement blog post for details! With the new release comes a range of new exciting features, and this page is here to help you migrate an existing CloudQuery installation from v0 to v1.

Changes in V1

The announcement blog post lists many of the important improvements, and we won't re-iterate them all here. Most changes are internal and developer-facing, but some do impact existing CloudQuery teams. Those are:

Changes to the Configuration Format

V1 introduces a new config format that is closely related to the old one, but an old config will need some massaging to work with the CloudQuery v1 CLI. Mostly because we now support multiple destinations, there are separate configs for source and destination plugins.

Source Plugins

The new config format for source plugins are as follows:

kind: source
spec:
  ## Required. name of the plugin to use
  name: "aws" # required
 
  # Required. Must be a specific version starting with v, e.g. v1.2.3
  version: "v18.0.0"
 
  ## Optional. Default: "github". Available: "local", "grpc"
  # registry: github
 
  ## Plugin path. For official plugins, this should be in the format "cloudquery/<name>", e.g. "cloudquery/aws"
  path: "cloudquery/aws"
 
  ## Required. You can use ["*"] to sync all tables or specify specific tables. Please note that syncing all tables can be slow
  ## See all tables: https://www.cloudquery.io/docs/plugins/sources/aws/tables
  tables: ["aws_s3_buckets"]
 
  ## Required. all destinations you want to sync data to.
  destinations: ["postgresql"]
 
  spec:
    # plugin specific configuration.

Check the source spec documentation for general layout, and individual plugin documentation for details on how to configure the plugin-specific spec. Generally these will be the same as in v0, and all the same authentication functionality is still supported.

Destination Plugins

The new config format for destination plugins (e.g. PostgreSQL, BigQuery, Snowflake, and more) is as follows:

kind: destination
spec:
  ## Required. name of the plugin
  name: "postgresql"
 
  path: "cloudquery/postgresql"

  # Required. Must be a specific version starting with v, e.g. v1.2.3
  version: "v4.2.0"
 
  ## Optional. Default: "overwrite". Available: "overwrite", "append", "overwrite-delete-stale". Not all modes are 
  ## supported by all plugins, so make sure to check the plugin documentation for more details.
  write_mode: "overwrite" # overwrite, overwrite-delete-stale, append
 
  spec:
    ## plugin-specific configuration for PostgreSQL:
 
    ## Required. Connection string to your PostgreSQL instance
    connection_string: "postgresql://postgres:pass@localhost:5432/postgres?sslmode=disable"```

Check the destination spec documentation for general layout, and individual destination plugin documentation for details on how to configure the plugin-specific spec part. Generally these will be the same as in v0, and all the same authentication functionality is still supported.

Changes to the CLI Commands

Users of CloudQuery v0 would be familiar with the main commands init and fetch. These have changed in v1 and init is longer available (you should write config files manually).

Init

init was a command that generated a starter configuration template, but it is no longer a command in v1 of the CLI. Instead, please refer to our Quickstart guide to see how source and destination plugins should be configured.

The previous init command also generated a full list of tables to fetch. In v1, you can fetch all tables by using a wildcard entry:

tables: ["*"]

in the source configuration file. This can also be combined with the skip_tables option to fetch all tables except some subset:

tables: ["*"]
skip_tables: ["aws_accessanalyzer_analyzers", "aws_acm_certificates"]

Sync

cloudquery sync replaces the v0 cloudquery fetch command.

Functionally it is still the same: it loads data from a source to a destination, but sync now supports multiple destinations, while fetch only supported PostgreSQL. With this change also comes a change in expected config format, see the next section for more details on this.

cloudquery sync needs to be passed a path to a config file or directory containing config files. So for example, to sync using all .yml files in a directory named config:

cloudquery sync config/

Or to sync using a single YAML file named config.yml:

cloudquery sync config.yml

In this case config.yml should contain at least one source and one destination config, each separated by a line containing three dashes (---). More about this in Files and Directories.

See cloudquery sync --help for more details, or check our online reference.

Files and Directories

The sync command supports loading config from files or directories, and you may choose to combine multiple source- and destination- configs in a single file using --- on its own line to separate different sections. For example:

kind: source
spec:
    name: "aws"
    version: "v18.0.0"
    # rest of source spec here
---
kind: destination
spec:
    name: "postgresql"
    version: "v4.2.0"
    # rest of destination spec here

Changes to Tables and Schemas

Finally, during our work for v1, we endeavoured to make the table schemas more consistent, predictable and aligned with their upstream APIs. As such, some breaking changes to the schema were necessary. Each source plugin has its own schema migration guide to help you make the necessary changes to your custom queries, triggers and policies:

Note that these guides are (for the most part) automatically generated, so in some cases a table may be marked as removed when it was actually renamed. Please reach out to us if you find any errors.

Start from a clean Database

V1 introduces functionality to automatically perform backwards-compatible Postgres migrations when new columns or tables are added. However, this functionality relies on a clean start being made in V1, and if you try to run it against a database with tables from v0, there is a good chance it will fail.

Therefore, it is important that you start from a clean database. This can either mean creating a new database and pointing the v1 configuration there, or dropping all the tables in your v0 database.

Get Help / Ask Questions

If you run into issues not covered here, or have any questions about migrating or CloudQuery v1, don't hesitate to reach out on Discord (opens in a new tab). We're a friendly community and would love to help however we can.