In this article we will have a step-by-step demo on how to implement Azure Synapse Link for Azure SQL Database using Change Feed and load data into Azure Synapse Analytics.
What is Azure Synapse Link for Azure SQL:
⏸️ Azure Synapse Link for SQL Extract data incrementally from transactional databases and Load it directly into dedicated SQL pool in Azure Synapse Analytics in near Realtime using Change Feed.
- ➡️Source: Transactional databases (SQL Server 2022 & Azure SQL Database)
- ➡️Destination: Dedicated SQL pool in Azure Synapse Analytics
Change Feed of source database: Enabling Incremental ELT
- Change feed is a new feature—in both SQL Server 2022 and Azure SQL Database
- It enables incremental ELT or ETL.
- It supports data replication from source transactional system to Azure Synapse Analytics.
- It watches for changes and moves them to a temporary landing zone for processing into target dedicated SQL pool.
Azure Synapse Link for SQL is available for Azure SQL Database and SQL Server 2022. Now this Azure Synapse Link feature is in preview.
HOW it works? Architecture & Flow:
Architecture Components & Flow Sequence:
- Source Database: (SQL Server 2022 or Azure SQL Database)
- Change Feed of source database
- Temporary Landing Zone: Azure Data Lake Storage Gen2 to to minimize impact of extracting changes from source system.
- Ingestion Service: Azure Synapse Link for SQL behind the scenes deploys a cloud-based ingestion service to move data from the landing zone into the target dedicated SQL pool.
- Azure Synapse Analytics Dedicated SQL Pool : Extracted data to be loaded into dedicated SQL pool.
Once the link is set up and link is started below happens:
Initial Loading of Source data:
- 📌 Source database exports schema & data from the source tables to ADLS landing zone.
- 📌 Ingestion service get snapshots from the landing zone, creates target tables in the dedicated SQL pool & then loads initial data.
Change Feed Replication: Incremental Load
- 📌 Using Change feed feature, source database SQL Server 2022 and Azure SQL Database watches the changes and continually publishes changes for each table to landing zone.
- 📌 The ingestion service picks up changed data and load into respective tables of dedicated SQL pool.
Prerequisites for this demo:
- Synapse workspace and dedicated SQL pool: for demo, we already created pool named synapseSQLPool.
- Azure SQL Database: along with existing table and record;
For this demo, we created Employee table and inserted few records as shown below.
Demo Set up: Azure Synapse Link for Azure SQL Database
- Configure Azure SQL Database
- Create Linked service for Azure SQL database in synapse
- Create Azure Synapse Link for Azure SQL Database
Required Link Service:
- Azure Synapse Analytics: This is already created as part synapse workspace creation.
- Azure SQL Database: we will create a link Service name as AzureSqlDatabase1
Configure Azure SQL Database:
Synapse workspace will connect to Azure SQL Server using a managed identity
- Enable System configure assigned managed identity for Azure Sql Server
- Allow Azure services and resources to access this server.
- Please refer this article on how to perform above 2 step.
For Synapse workspace to connect and access Azure SQL Database, we need to perform below steps:
- Set Active Directory admin for Azure SQL Server: Enable Azure AD authentication
- Create a contained database user and grant required permission to Managed Identity.
- Please click this link on how to perform above 2 step.
Create linked service for Azure SQL database in synapse:
- Open the synapse workspace
- Click on the Manage tab at left side and click on Linked services and
- Click on + sign to create new link Service and
- then search for Azure Sql database and select and provide required parameters details as mentioned in below image.
- Link Service name is AzureSqlDatabase1 for this demo as shown below.
- Select authentication type as System Assigned Managed Identity for Azure SQL Database authentication.
- Please click this link for step by step demo on how to perform Managed Identity Authentication for Azure SQL Database.
Create Azure Synapse Link for Azure SQL Database:
To create Link connection in Synapse workspace:
- Open the synapse workspace
- Click on Integrate tab
- Then click on + sign
- Select Link connection (Preview) to create New Link Connection as shown in below image
Once the Link connection (Preview) option is clicked, we will see the below to configure link connection:
Source database setting:
Available Data Source Type are: Azure SQL Database & SQL Server.
- Source Type: Here we will select Azure SQL Database as Source type.
- Source linked service: select AzureSqlDatabase1 as linked service that we created above.
Once we select AzureSqlDatabase1, it will fetch the available tables to be ingested initially as shown below.
Employee table is selected to be ingested initially and incrementally using change feed feature of Azure SQL Database:
Now click on Continue button present at the above image to set up target SQL pool:
Target SQL pool settings:
Select the target Synapse SQL pool that should receive the continuously replicated data.
Let’s select synapseSQLPool SQL pool as shown below image and Now click on Continue:
Compute for Replication: Connection settings
Provide a name and select compute settings for the link connection.
Let’s select below details for compute which will perform ingestion to SQL pool:
- Link service name as: sqldb2synapsesqlpool
- Core count : 2+2
- Mode: Continuous
Here for demo, we will select Continuous option so that we can test subsequent change feed loading just after initial load.
Batch option allow to perform replication of data in certain interval starting from 20 minutes.
Now Click on OK and publish.
Let’s have full view of link connection as below:
Now link connection is ready but yet to be started.
Mapping Source & Target:
Let’s review the mapping details:
- Source table: as we selected in source settings i.e. Employee table
- Target table: table name is same as source table by default; But we can update to new table name.
As part table configuration, we have below options to config:
- Distribution Type: Round-Robin, Hash-distributed, Replicated.
- Structure Type: Heap, Clustered Index, Clustered columnstore Index
For this demo, we will keep all as default values as shown in below image. Please refer Start button to start the link connection.
Review SQL table and initial data:
Before we start the link connection, let’s check what will be loaded when link in running.
Start the link connection:
Now let’s click on Start to start the link connection and it will take sometime displaying below message:
“Starting the link connection. This may take a few minutes.”
State of link connection: Starting, running, stopped
Monitoring the link connection:
we can monitor Azure Synapse Link for SQL at the link and table levels.
For each link connection, we will see the following status:
- Initial: a link connection is created but not started. You will not be charged in initial state.
- Starting: a link connection is setting up compute engines to replicate data.
- Running: a link connection is replicating data.
- Stopping: a link connection is shutting down the compute engines.
- Stopped: a link connection is stopped. You will not be charged in stopped state.
Once it is started i.e. in running state, its shows as to STOP as it is running. So current status:
- Now link service is started
- Initial copy is in progress.
- Now in monitor Link connection status is Snapshotting for initial data load
- Once initial load is done, status will be Replicating for change feed of source database for continuous replication mode
Monitoring Initial load:
Let’s navigate to Monitor and click on ‘link connection’;
Link connection status is ‘Running‘ as shown in below image.
Below image will displayed once we click on link connection name as shown in above image;
In below image, we can see time of last processed data column has date i.e. initial load is completed.
Please note the time stamp of initial load. Because will insert few records into Azure SQL Database to check when replication happens afterwards as change feed.
For each table under link connection, status will be below:
- Snapshotting: a source table is initially loaded to the destination with full snapshot.
- Replicating: any updates on source table are replicated to the destination.
- Failed: the data on source table can’t be replicated to destination due to a fatal error. If you want to retry after fixing the error, remove the table from link connection and add it back.
- Suspended: replication is suspended for this table due to an error. It will be resumed after the error is resolved.
Verify Initial load:
We can see below output for initial data load into dedicated sql pool. These records were there in source Azure SQL database.
Change Feed test:
Now let’s run below query insert few records into Employee table of source Azure SQL database.
As link connection is running, so it will replicate the new records i.e. the Change Feed of Azure SQL database into dedicated SQL pool.
INSERT INTO EMPLOYEE (EMPID, Salary) VALUES (4,4000); INSERT INTO EMPLOYEE (EMPID, Salary) VALUES (5,5000);
After few moments, the change feed will be replicated into dedicated SQL pool.
Let’s query Employee table in SQL pool; In below image we can see change feed replicated in Employee table having EMPID 4 & 5;
Now let’s monitor the time stamp of this change feed replication by link connection:
Initial load time stamp: 9:42:16 PM
Change Feed Replication time stamp:9:42:51 PM, so it took few seconds to replicate the change feed from source Azure SQL database into Synapse Analytics SQL pool.
Hope we understood the demo how Azure Synapse Link for Azure SQL Database enables replication the change feed from source transactional database into synapse analytics SQL pool.
Thanks for reading the article; Please feel free to share your query and thought.