How to Configure Cloud Data Synchronization with Empty Database

This tutorial shows to make a copy of cloud data in an empty database and configure a bi-directional synchronization between this copy and the original cloud application. For this tutorial, we select Salesforce as an example cloud source, and its Account and Contact objects.

In Skyvia, a Replication Package is used to create a copy of a cloud data in a database.  It can create tables for data automatically and then keep the database up-to-date performing one-way synchronization periodically from the cloud source. If such synchronization in one direction is enough for you, you can take a look on our Replication Tutorials.

Synchronization Packages, on the other hand, perform bi-directional data synchronization. However, they are not intended for creating an exact data copy, and  cannot create tables for the data automatically. They are intended for synchronization of existing tables and they support synchronization of data, having different structure.

So, if you have just an empty database, and want to create a copy of cloud data in it and then sync changes between this copy and original cloud source in both direction, you have two options. One is to manually create tables in the database, which is not always convenient. Another way is to use a replication package once for creating the necessary database tables, then manually modify these tables so that they meet Skyvia's synchronization requirements, and finally, configure a synchronization package.

 

ExpandedToggleIcon        Synchronization Requirements to Consider

Note that Skyvia has certain requirements to consider for synchronization. Skyvia supports synchronization not for every object in every cloud source. It requires that these cloud objects have fields, storing the record creation and modification time. You can find the information about these limitations in the corresponding Cloud Sources topics. Some cloud sources, like Salesforce, support synchronization for most objects, and some - only for a few objects.

As for databases, Skyvia requires synchronized tables to have auto-generated primary keys. This is the reason why we should manually modify database tables after creating them via a replication package, because replication package creates primary keys that are not auto-generated.

 

ExpandedToggleIcon        Connections

For our package we need to create a connection to our cloud source and a database. You can find the information about connecting Skyvia to various cloud sources in the Cloud Sources section. In this tutorial, we create a Salesforce connection. To create a connection to Salesforce, perform the following steps:

1.Click Connections in the menu on the left side of the page to switch to the Connections page.
2.On the Connections page click the New button.
3.In the opened pane click Salesforce.
4.Specify the connection Name, for example, Salesforce1.
5.From the Environment drop-down list select the Salesforce environment type to import data to. Since this is just a sample walkthrough, the Sandbox environment is recommended.
 
salesforce-connection-sandbox
6.From the Authentication drop-down list select the authentication method for connecting to Salesforce. If you don't mind storing your Salesforce credentials on Skyvia server, select User Name & Password. If you prefer not to store your credentials, select OAuth 2.0.
7.If you have selected User Name & Password, on the previous step, specify your Salesforce account e-mail, password, and security token. Otherwise, if you have selected OAuth 2.0 authentication, click the Log In with Salesforce button and login via the Salesforce website on the opened page. The result OAuth token will be stored in the connection data. Your Salesforce credentials will not be stored on our website.
 
salesforce-login-to-sandbox    
8.Click the Save button to save the connection.
 
salesforce-connection-sandbox-filled

 

As for connecting to databases, see the information about database connections in the Databases section. Please note that Skyvia does not support synchronization for cloud data warehouse services - Azure SQL Data Warehouse, Amazon Redshift, and Google BigQuery.

 

ExpandedToggleIcon        Replication Package

To create a Replication package, perform the following actions:

1.Click Integration in the menu on the left side of the page.
2.Click the New button in the top right corner of the page.
3.Click Replication.
4.Optionally click the automatically generated package name (on this step it's "Replication Package") and specify the package name. If you omit this step, the package name will be generated later based on the names of the replicated Salesforce objects.
5.In the Source header click Select and select Salesforce in the drop-down list.
6.In the Source Connection drop-down list select the Salesforce1 connection.
 
select-source-salesforce-connection
7.In the Target header click Select and select the target database server.
 
replication-select-target-type
8.In the Target Connection drop-down list select the corresponding database connection.
9.Important! Clear the Create Foreign Keys check box. If you need a bi-directional synchronization, you must not have foreign keys, created by replication. If you don't clear this check box, you will need to drop the foreign keys manually later.
10. In the grid under Select Objects, select check boxes for objects you want to synchronize. In our example, these are Account and Contact objects.
 
replication-select-objects
11. The first run of the synchronization package will re-load all the records from Salesforce to the target database anyway. So it's not necessary that replication package actually load records. We need it only to create the tables.
 
If you don't want to spend time on additional configuration, you may omit the steps  and allow replication to load data. In this case, it may take quite some time for replication to load all the data, and you will need to delete data from the database tables before running the synchronization anyway. Besides, these extra records are added to your subscription counter.
 
But if you don't want to unnecessarily load data, you can avoid this by configuring filters for replication:
a.Click the edit link for the selected object.
b.In the replication task editor click Add Condition.
c.In the added boxes, specify a filter condition that is never true for any record. In the first box, select a field; in the second - comparison operator, and in the third - specify some value for comparison.
 
For example, for Salesforce I can use a condition Id equals 0. It cannot be true in Salesforce for any record. Thus, all the records will be filtered out by this filter, and the replication package will create a table, but won't load any records.
 
filter-id-equals-0
d.Repeat the steps a - с for all the objects you want to synchronize.
12. Save the package.
13. Run the package by clicking Run.

 

ExpandedToggleIcon        Editing Database Tables

After replication creates the tables, we need to edit them. We need to delete their primary key constraint, and add a new autogenerated one. You can either use some visual database tools to make these changes or run SQL scripts against your database, for example, with Skyvia Query.

If you want to use a database tool, you need to perform the following actions:

1.If you didn't add filters so that no data were replicated, truncate your database tables, which were created after the replication.
2.Delete the tables' primary key constraints.
3.You may optionally delete their primary key columns as well. Their values are automatically generated on the cloud side and cannot be synchronized when loading data from the database to the cloud source. They will only be synced when loading data from the cloud source to database.
4.Add a new autogenerated column for the primary key. For example, you may call it sync_id:
oIn PostgreSQL you may use the serial data type to make a column autogenerated.
oIn MySQL you may use a BIGINT data type with the AUTOINCREMENT option.
oFor SQL Server, you may use a BIGINT data type with the IDENTITY option.
oFor Oracle 12c and higher, you may use a NUMBER data type with the GENERATED ALWAYS AS IDENTITY option.
oFor lower Oracle versions, you will need to create a sequence and a trigger to assign values from this sequence to the primary key column when inserting records.

Alternatively, you may do the same, using DDL scripts.

1.Click Query in the menu on the left side of the page.
2.On the Query page, click New SQL.
3.Click Select connection and select the connection to your database, where you replicated data, in the list.
4.Enter your script to the query editor box in the top right part of the page. Here we provide scripts for our example with Account and Contact objects. You can create your own scripts, based on these example, by replacing Account and Contact with your object names:
oFor SQL Server:
 
ALTER TABLE Contact  DROP CONSTRAINT PK_Contact;
ALTER TABLE Contact  ADD Sync_id BIGINT IDENTITY PRIMARY KEY;
 
ALTER TABLE Account DROP CONSTRAINT PK_Account;
ALTER TABLE Account ADD Sync_id BIGINT IDENTITY PRIMARY KEY;
 
oFor MySQL:
 
ALTER TABLE contact DROP PRIMARY KEY;
ALTER TABLE contact ADD sync_id BIGINT AUTO_INCREMENT PRIMARY KEY;
 
ALTER TABLE account DROP PRIMARY KEY;
ALTER TABLE account ADD sync_id BIGINT AUTO_INCREMENT PRIMARY KEY;
oFor PostgreSQL:
 
ALTER TABLE "Contact"  DROP CONSTRAINT "PK_Contact";
ALTER TABLE "Contact"  ADD Sync_id SERIAL PRIMARY KEY;
 
ALTER TABLE "Account" DROP CONSTRAINT "PK_Account";
ALTER TABLE "Account" ADD Sync_id SERIAL PRIMARY KEY
oFor Oracle (12c or higher):
 
ALTER TABLE "Contact"  DROP CONSTRAINT "PK_Contact";
ALTER TABLE "Contact"  ADD Sync_id NUMBER GENERATED ALWAYS AS IDENTITY PRIMARY KEY;
 
ALTER TABLE "Account" DROP CONSTRAINT "PK_Account";
ALTER TABLE "Account" ADD Sync_id NUMBER GENERATED ALWAYS AS IDENTITY PRIMARY KEY;
 
5.Click Execute or press F9.

 

ExpandedToggleIcon        Synchronization Package

To create a Synchronization package, perform the following actions:

1.Click Integration on the menu on the left side of the page.
2.Click the New button in the top right corner of the page.
3.Click Synchronization.
4.Optionally click the automatically generated package name (on this step it's "Sync Package") and specify the package name.
5.Now it's necessary to specify source and target connections. Note that synchronization is bi-directional, and the only difference between the source and target is that source changes have a priority when solving change conflicts (when a record was changed both in source and target between synchronizations). Let Salesforce be the source in our example. Click Select in the Source header and select Salesforce in the drop-down list.
6. In the Source Connection drop-down list select the corresponding  connection.
 
select-source-salesforce-connection
7. Click Select in the Target header and select the corresponding database.
 
target-type-selection-sync
8. In the Target Connection drop-down list select the corresponding database connection.
 
select-target-connection-sync-sql
9. Click the + Add task button.
10. In the Source list select Account.
11. In the Target list select the Account table and click Next.
12. Since all the columns in source and target have the same names and types, they are mapped automatically. Click Target to Source in order for columns to be also mapped for the opposite direction.
13. Click Finish.
14. Repeat the steps 9 - 12 for all other objects that you want to sync.

 

ExpandedToggleIcon        Scheduling Package Execution

After we have created the synchronization package, we want to keep the data in sync automatically. For this, we will configure the package to run every hour during workdays.

Skyvia uses LastModifiedData and CreatedDate fields to track changes in Salesforce, and it creates its own tracking tables and triggers to track changes in the  database.

Note that since we used Salesforce connection as the source, its changes have a priority, and if a record was changed both in Salesforce and in the database, Salesforce changes are applied.

Perform the following actions to set the schedule:

1.Under Schedule (at bottom left of the package editor) click the Set schedule button.
 
schedule-empty
2.In the Run every drop-down list select Week.
3.Click all the workdays in Days of week.
 
schedule-week-days
4.Click Occurs once at and select Recur every.
 
schedule-recur-every
5.Enter "1" (without quotes) into the Recur every box and click the Set time restrictions link.
 
schedule-every-hour
6.Set Start at to 8:00 and End at to 17:59.
 
schedule-time-restrictions
7.Save the package.

After this our package will run automatically every hour between 8:00 and 17:59 of every workday.

You can also visit Scheduling Packages to get more detailed information on setting a package schedule.