Snowflake
Snowflake is a cloud-based data warehouse that supports both structured and semi-structured data. It is a columnar database with a unique architecture designed for the cloud. It separates compute and storage, making it a highly scalable and cost-effective solution.
Introduction
This document outlines the process of setting up a Snowflake connection and using valmi.io to synchronize data from a Snowflake database to any of the available supported destinations.
Snowflake Configuration
Required Permissions
To ensure efficient data retrieval and transfer between tables in your database and supported destinations, valmi.io calculates “diff” to identify the changes between each update. This minimizes the impact on your database’s performance and other applications’ APIs. To accomplish this, valmi.io creates and manages a set of tables in a designated bookkeeping schema, typically consisting of 2 or 3 tables per sync job.
To establish a successful connection with your Snowflake database, we recommend creating a dedicated user account exclusively for valmi.io, with a strong and unique password. This account will be used by valmi.io for authentication. To ensure proper functionality, the valmi account must be granted specific permissions as outlined below:
-
The account should have full administrative access to all tables within the bookkeeping schema. This includes the ability to create, delete, read from, and write to these tables.
-
Read-only access should be granted to any tables and views in the desired schema that you want valmi.io to publish to your service destinations.
Configuring Snowflake permissions can be complex, given the various ways permissions can be configured. The script below is known to work correctly and follows Snowflake’s best practices for creating read-only roles in a hierarchy:
Creating a Snowflake Connection on valmi.io
The picture below shows the configuration page for a Postgres connection. The following fields are required:
- Connection Name - A name for the connection.
- Snowflake Account Name - The Account identifier for your snowflake. For example, depending on your cloud region, it looks like this
ro64973.ap-south-1.aws
. For more information on your snowflake account identifier, refer to the Snowflake documentation. - Warehouse - The warehouse for compute operations performed by valmi.io. If created with the above script, it is
VALMI_WAREHOUSE
. - Username - The username to use to connect to the warehouse.
- Password - The password to use to connect to the warehouse.
Setting up a Sync
-
Selecting a table or a view to Sync
Once the connection is configured, sync creation involves selecting a source, a destination, a mapping and a schedule. Here we describe how to select a source. Selecting a Snowflake source while creating the sync requires the following steps:
- Select the Snowflake connection you created earlier.
- Select a snowflake database.
- Select a namespace.
- Select a database table or a view.
-
Supported Source Modes
Once you have chosen a data source, you will be prompted to select a destination connection. For detailed instructions on configuring the destination, please consult the specific documentation provided. After selecting the destination, you will proceed to choose the mapping for the synchronization process.
On the mapping page, you will be required to specify a source mode that governs how the source connector retrieves data from Postgres. There are two available modes to choose from:
- Full Table - This mode extracts the entire table from Snowflake. This is the default mode.
- Incremental - This mode extracts only the rows that have changed since the last sync. The data to extract will be determined by comparing the current table with a snapshot of the table taken during the last sync.
-
Selecting a Primary Key
During the configuration process, you will be prompted to choose a primary key. The primary key serves multiple purposes:
-
It acts as a unique identifier for objects in the destination. For instance, when synchronizing a Postgres table with a Salesforce object, the primary key is used to identify the corresponding Salesforce object.
-
It is utilized for determining the validity of rows by performing checks such as null and duplicate checks based on the primary key.
It is important to note that the primary key should consist of a single column. In cases where the table does not have a primary key, you have the option to select a unique column instead.