Introduction

The following documentation provides a step-by-step guide for establishing an Amazon Redshift connection and utilizing valmi.io to synchronize data from a Redshift database to any of the supported destinations.

Redshift Configuration

Required Permissions

To optimize data retrieval and transfer between tables in your database and supported destinations, valmi.io employs a “diff” calculation method to identify changes during each update. This approach minimizes the impact on your database’s performance and other applications’ APIs. As part of this process, valmi.io manages a set of tables in a designated bookkeeping schema, typically consisting of 2 or 3 tables per sync job.

To establish a successful connection with your Redshift database, it is advisable to create a dedicated user account specifically for valmi.io. This account should have a strong and unique password and will be used for authentication by valmi.io. To ensure smooth operation, the valmi account should be granted the following permissions:

  • Full administrative access to all tables within the bookkeeping schema, including the ability to create, delete, read from, and write to these tables.

  • Read-only access to any tables and views in the desired schema that you want valmi.io to publish to your service destinations.

Configuring permissions for Redshift can be intricate due to the various available configuration options. However, the script provided below has undergone thorough testing and is known to function correctly.

If you face any connection issues or unlisted tables or views, please make sure the following script is run by a user with the required permissions to create users, schemas, and grant permissions.
-- Give the valmi user the ability to sign in with a password
CREATE USER valmi WITH PASSWORD '<strong password>';

-- Create a private bookkeeping schema where valmi can store sync state
CREATE SCHEMA valmi_scratch_computed_;

-- Give the valmi user full access to the bookkeeping schema
GRANT ALL ON SCHEMA valmi_scratch_computed_ TO valmi;

-- Ensure the valmi user has access to any objects that may have already existed in the bookkeeping schema
GRANT ALL PRIVILEGES ON ALL TABLES IN SCHEMA valmi_scratch_computed_ TO valmi;

-- Let the valmi user see this schema
GRANT USAGE ON SCHEMA "<your schema>" TO valmi;

-- Let the valmi user read all existing tables in this schema
GRANT SELECT ON ALL TABLES IN SCHEMA "<your schema>" TO valmi;

-- Let the valmi user read any new tables added to this schema
ALTER DEFAULT PRIVILEGES IN SCHEMA "<your schema>" GRANT SELECT ON TABLES TO valmi;

-- Let the valmi user execute any existing functions in this schema
GRANT EXECUTE ON ALL FUNCTIONS IN SCHEMA "<your schema>" TO valmi;

-- Let the valmi user execute any new functions added to this schema
ALTER DEFAULT PRIVILEGES IN SCHEMA "<your schema>" GRANT EXECUTE ON FUNCTIONS TO valmi;

Creating a Redshift Connection on valmi.io

The picture below shows the configuration page for a Redshift connection. The following fields are required:

  • Connection Name - A name for the connection.
  • Host - The hostname of the Redshift warehouse. For instance, it looks similar to valmi-wg.294727603162.us-east-1.redshift-serverless.amazonaws.com for Redshift Serverless.
  • Port - The port number of the Redshift warehouse. The default port is 5439.
  • Database - The name of the database to connect to.
  • Username - The username to use to connect to the database.
  • Password - The password to use to connect to the database.

Setting up a Sync

  1. Selecting a table or a view to Sync

After successfully configuring the connection, creating a sync involves selecting a source, a destination, a mapping, and a schedule. In this section, we will explain how to select a source.

To select a Redshift source during the sync creation process, please follow the steps provided below:

  • Select the Redshift connection you created earlier.
  • Select a namespace.
  • Select a database table or a view.

  1. Supported Source Modes

Once you have chosen a source, the next step is to select a destination connection. For detailed instructions on configuring the destination, please consult the specific documentation provided. After selecting the destination, you will proceed to choose the mapping for the synchronization process.

On the mapping page, you will be required to specify a source mode that governs how the source connector extracts data from Redshift. There are two available modes to choose from:

  • Full Table - This mode extracts the entire table from Postgres. This is the default mode.
  • Incremental - This mode extracts only the rows that have changed since the last sync. The data to extract will be determined by comparing the current table with a snapshot of the table taken during the last sync.

  1. Selecting a Primary Key

During the sync creation process, you will also need to select a primary key. The primary key serves various purposes, including:

  • Identifying the corresponding object in the destination. For instance, when synchronizing a Postgres table with a Salesforce object, the primary key is used to uniquely identify the Salesforce object.

  • Validating and identifying rows as valid or invalid. This includes performing null checks and duplicate checks based on the primary key.

It is important to note that the primary key must consist of a single column. If the table does not have a primary key, you have the option to select a unique column instead.