Best Databricks ETL Tools in 2026: We Tested Top 4

Summary

Skyvia is the most straightforward path to Databricks for no-code SMBs
Fivetran excels in high-volume enterprise automation
Airbyte offers unmatched open-source flexibility for developers
Estuary is the top choice for near-real-time CDC streaming.

In this 40-hour test for Databricks ETL Tools, we found Skyvia is best for SMBs and no-code teams, Fivetran for high-volume Enterprise data, Airbyte for developer-heavy teams, and Estuary for streaming CDC requirements.

It seems easy to design an effective data pipeline towards Databricks Delta Lake, but when you find yourself swamped by malfunctioning Python code, rate limiting your APIs, and coping with problematic schema alterations, there is no denying that some of the leading Databricks ETL tools must be chosen.

Just to make this point clear: I work with the team at Skyvia, where we have developed a no-code data integration platform, so we definitely have our bias. But here I’m going to do something different: we won’t try to claim that we are your perfect fit. Rather, I will give an honest comparison with competitors such as Fivetran and Airbyte with regard to technical limitations, pricing, and real-life experience.

Let’s begin.

How Did We Actually Test These Databricks Integration Tools?

I replicated Salesforce Contacts and related transactional tables in PostgreSQL using the 4 Databricks ETL tools with a total of 40 hours. With this, I used my own Salesforce Developer account and a PostgreSQL database hosted in Supabase. I also replicated the same PostgreSQL into Neon hosting because of a problem encountered in one of the tools. You’ll see the details later. Overall, the total rows replicated are 20K+ for a quick test case.

You will see how long it takes each tool to replicate the rows, and share with you my experience in creating the pipeline in each tool. I either use a free tier or a trial in each tool, so limitations exist. As a developer evaluating a Databricks ETL tool, you will encounter the same thing using the free tier or trial accounts.

Below is the structure of the PostgreSQL database:

Also, I deliberately used a separate schema in Databricks for each tool, so I will see the differences in how they handled the data. Here’s a sample output structure for 2 of the tools I used in Databricks:

From the structure alone, you will see that Estuary and Airbyte had differences. Estuary used a volume to stage the data before finalizing it into the tables.

Databricks Connection Requirements

Let’s start with Databricks. Each ETL tool may ask for a hostname, an HTTP path, and/or access tokens. For Databricks CE, you can find the hostname and path in SQL Warehouses -> <your warehouse server>. Here’s mine:

And the Access Tokens are found in my Settings -> Developer. I made access tokens for each tool I used here. See it below:

There’s another way of connecting the tools using Client ID and Secret. But I used a Personal Access Token for the samples.

Then, tools will ask for a Catalog name (we will use sales_sample) and schema. That will be equivalent to a database and schema names if it were a data warehouse or relational database.

Salesforce Connection Requirements

Except for Fivetran, each tool will ask you to log in to Salesforce to get an OAuth token, and that’s it. If you change your password, make sure to re-authenticate each tool.

Here are some of the fictitious Contact data in my Salesforce developer account:

We will also see if custom columns can be captured. I added the Preferred_Contact_Method_c custom column in the Salesforce Contact object. See below:

PostgreSQL Connection Requirements

At least, you need the host, database name, username, password, and schema. If you have a different port other than 5432, then you need that too.

Let me first give you a comparison summary of the four best Databricks ETL tools in 2026.

How Do the Top Databricks ETL Tools Compare?

We initially chose the following Databricks ETL tools:

Fivetran,
Airbyte,
Talend, and
Skyvia

However, because I can’t use Databricks Community Edition with Talend, we have to replace it. Talend requires a staging area in either GCS, S3, or Azure. Although I have set a GCS bucket (the one I can only use), I can’t make it work. It seems that Databricks CE can’t work with GCS as a staging area. Talend performs a COPY command from the staging area into my Databricks CE, and my GCS is not fit for it.

So, we replaced Talend with Estuary. Below is the comparison of the four tools:

Feature / Metric	Skyvia	Fivetran	Airbyte	Estuary
Ideal Use Case	SMBs, No-code teams, SaaS integrations	Enterprise, High-volume automated ELT	Developer-heavy teams, Self-hosting	Batch and streaming in one platform
Pricing Model	Usage-based (Per record/data volume)	Monthly Active Rows (MAR)	Compute-based (Cloud) / Free (Open-source)	Per Gigabyte + Per Connector
Minimum Sync Frequency	1 minute	1 minute	5 minutes (varies by connector)	Real-time / Batch configurable
Setup Complexity	Visual Wizard (Zero code)	Visual UI (Low code)	Requires CLI/Docker knowledge (Self-hosted)	Visual Wizard
Databricks Target	Delta Lake (Direct load)	Databricks SQL / Delta	Databricks Destination Connector	Databricks Destination Connector

What Is the Best Databricks ETL Tool for SMBs and No-Code Teams?

It will be too much for startups and small teams to adapt to full-scale enterprise Databricks ETL tools in 2026. So, a no-code gizmo could be the best fit. Enter Skyvia.

Skyvia

Skyvia is a cloud-first data platform that offers several data management services, including data integration, backup, and replication. During our testing, I found that if your team lacks dedicated data engineers to write code, Skyvia is the most straightforward path to Databricks.

Of the four tools, I can set up a Skyvia pipeline the fastest because I’ve been using it for quite some time now. I can set up the three connections in less than 5 minutes. Those are the connections for PostgreSQL, Salesforce, and Databricks.

I use Skyvia’s free tier, so after I set up the replication for PostgreSQL, my limits are reached. So, I have a second account for replicating Salesforce Contacts to Databricks.

Setting Up the Salesforce to Databricks Data Pipeline

Setting up connections means filling out forms for credentials in Skyvia. I’m only reusing my PostgreSQL connection I made in this article, and the Salesforce Skyvia Connections that I made for my other article. But let me show you my setup for Databricks.

The domain should be the hostname. This got me confused at first, but it went well. The Personal Access token given by Databricks should go in the corresponding box.

And below is my data pipeline for Salesforce to Databricks replication of the Contact object.

Salesforce Databricks integration by Skyvia

Running it took 53 seconds for more than 5,000 rows. Skyvia created the table for the first time in Databricks. Here’s a screenshot:

Salesforce Databricks integration by Skyvia monitoring

After the run, here’s the query result from Databricks’ end:

You can compare it to the Salesforce screenshot earlier and see that it’s the same.

Setting Up the PostgreSQL to Databricks Data Pipeline

We’re going to use the same Databricks connection in Skyvia. Let me show you the setup for the PostgreSQL to Databricks replication:

All the 5 tables are there, and it took 37 seconds to replicate more than 18,000 rows. Check it out below:

PostgreSQL to Databricks Data Pipeline results

I compared the row counts above from Databricks end:

Check out also some of the replicated Salesforce Contact joined with the PostgreSQL transaction tables below:

Lastly, I checked the data from our custom column, and it’s all good. See below:

It took me around 5 minutes to set up the two pipelines. I’ve used it a lot so the setup is fast.

Best for

Skyvia is perfect for SMBs or companies that require a flexible no-code data integration tool that supports not only ELT or ETL but also reverse ETL. Skyvia will be helpful for anyone wishing to immediately proceed with building data pipelines and gain initial success right away.

Rating

At the time of writing, below are notable reviews of Skyvia from G2 and Capterra:

G2 : 300 reviewers rated 4.8/5
Capterra : 116 reviewers rated 4.9/5

Pricing

Skyvia’s price plan options include Free, Basic, Standard, Professional, and Enterprise plans. As you move up to the higher level, you will have more rows allowed monthly, more scheduled integrations, better integration scenarios, and improved mapping functionality.

The number of rows starts from 10,000 in the Free version (which was used for this Skyvia evaluation), and the Basic plan comes with a cost of $79/month.

Refer to the Skyvia pricing page for additional information.

Pros

Learning curve is minimal with a clean, intuitive user interface.
The sources and targets I need are supported by their broad connector library
Supports ETL, ELT, reverse ETL, backups, replications, import/export, syncs, automation, and API support.
Documentation is sufficient for me.

Cons

10,000 rows only for the Free tier. I can only run 5 queries to a data source, though there are workarounds for this. And I can’t use an API Endpoint.
Not suitable for a bank or healthcare provider requiring a strictly air-gapped, on-premise installation with no internet access because of its cloud-first nature. You should look at Estuary private hosting or Airbyte Self-Hosted for these needs.

Which Databricks Integration Works Best for Enterprise & High-Volume Data?

Large Enterprises operate on huge datasets, deal with complicated, changing schemas, and need reliable tools that ensure high levels of automation and security. So, Fivetran is our choice for such use cases, as it is designed to handle vast amounts of data and can be safely used by big companies due to its compliance features.

Fivetran

Fivetran is a managed ELT platform that focuses on security and compliance. The interface is not too sophisticated—source and destination connections are configured via fill-in-the-blank forms.

The only issue I have with it is that pipeline names cannot be renamed after test connections because it will break the name of the destination schema in Databricks.

Unlike other tools we used here that distinguish between sources, destinations, and connections, Fivetran relies only on destinations and connections—sources have to be set up inside the connection. Source configurations are not reusable, so credentials should be retyped, although destination schemas can be reused.

Fivetran creates an additional pipeline (fivetran_metadata) as well as some additional tables for sources like Salesforce.

Setting Up the Salesforce to Databricks Data Pipeline

In order to set up Salesforce as the source, I had to create a connected app and get Client ID/Secret as well as Salesforce Domain URL. This is unlike the other tools we used here. See my configuration below:

Check how to create these Client ID and Secret in Salesforce from my previous post.

Then, I chose the Contact table for the data pipeline.

I also set up the Databricks destination for this pipeline. See it below:

Setting Up the Databricks Data Pipeline in Fivetran

I notice that Fivetran does not include nested objects from Salesforce. When configuring Salesforce to Databricks pipeline, note also that the first sync report will include some additional tables. See below:

It took 1 minute and 58 seconds to sync the Contact table.

Anyway, Salesforce Contact row counts matched the other tools’ reports:

It also successfully replicated the data from the custom column:

Setting Up the PostgreSQL to Databricks Data Pipeline

I made another Fivetran Connection, and this time for PostgreSQL. I used my Supabase credentials and specified the 5 transactional tables that had to be imported to the Databricks destination.

Below is my PostgreSQL connection:

I just reused the previous Databricks Destination. After running the first sync, no extra tables were created in the process, and row counts were calculated per table:

It took 2 minutes and 18 seconds to sync this one.

Please check the row counts below on the Databricks side:

And also the joins for both Salesforce and PostgreSQL datasets:

All things considered, setting up Fivetran integration did not take me more than 10 minutes.

Best For

Teams that require strong, automated ELT pipelines with low maintenance costs, particularly on schema modifications.

Perfect for big data companies, particularly those that prioritize fast scaling, broad connectivity options, and hassle-free schema management without programming.