In this article we will understand the features of MERGE T-SQL command and have a quick demo on Source table to Sync with target dimension table in Azure Synapse Analytics Dedicated SQL pool.
MERGE T-SQL command for Azure Synapse Dedicated SQL pools is now Generally Available
MERGE T-SQL feature:
- It combines INSERTs/UPDATEs/DELETEs into a single statement.
- Thus it executes in single transaction rather than 3 separate INSERTs/UPDATEs/DELETEs transaction.
- Hence it improves query performance and scalability.
- MERGE command eases table synchronization specially for Slowly Changing Dimension Type 1.
- It simplify migrations onto Synapse and improving code readability.
MERGE T-SQL command is applicable to
- Azure Synapse Analytics
- Azure SQL Database
- SQL Server (all supported versions)
For official GA-supported MERGE,
@@version
of Synapse Dedicated SQL pool should be ‘10.0.17829.0’ or beyond.
Common Use case:
- Updating Slowly Changing Dimension tables in data warehousing workloads.
- From source data to be sync with target Dimension tables in a single operation with following DML commands:
- INSERT new data which is not present at target
- UPDATE existing data of target if columns values is different
- DELETE existing data which is not present at source but in target
Quick Demo: Source table to Sync Target dimension table
Here we will sync Prod Dimension table i.e. prod.DimProductTbl based on Source Staging Dimension table i.e. staging.DimProductTbl
Prerequisite:
- Azure Synapse Analytics Dedicated SQL pool
- Source & Target table along with required data
Please note table name are same but with different schema, one is prod and another is staging.
Target : Prod Dimension table: prod.DimProductTbl
Merge statements with a WHEN NOT MATCHED [BY TARGET] clause must target a hash distributed table.
This table contains below data for ProductID 11,22,33
Source: Staging Dimension table: staging.DimProductTbl
This table contains below data for product id 11,44
Analyze current data of source and target:
Comparing above tables, we can see :
- INSERT: ProductId 44 is not present at target but at source, it will be INSERTED
- UPDATE: ProductId 11 is present at both source & target & its price is updated, it will be UPDATED
- DELETE: ProductId 22 is present at target but not at source, it will be DELETED from target
SAMPLE CODE – MERGE T-SQL command execution:
Once the below query is run successfully at Dedicated SQL Pool, we will verify the records of target table.
MERGE INTO prod.DimProductTbl AS TargetTbl
USING staging.DimProductTbl AS SourceTbl
ON SourceTbl.ProductID = TargetTbl.ProductID
WHEN MATCHED AND SourceTbl.Price <> TargetTbl.Price
-- UPDATEs for new price of existing product
THEN UPDATE SET TargetTbl.Price = SourceTbl.Price
-- INSERTs: Source has 1 new product i.e. ProductID 44
WHEN NOT MATCHED BY TARGET THEN
INSERT (ProductID, ProductName, Price)
VALUES (SourceTbl.ProductID,SourceTbl.ProductName,SourceTbl.Price)
-- DELETEs: Target has 1 ProductID 22 that Source doesn't have
WHEN NOT MATCHED BY SOURCE THEN
DELETE
OPTION ( LABEL = 'MERGE DEMO - Update 1, Insert 1, Delete 1)' );
Verify Target table record:
Now we will run a select statement on target table as shown below.
This shows the same data what is in staging source table for
- ProductID 44 was not present at target earlier, so it is now INSERTED.
- ProductID 11 was present at both source & target & its price was updated in source, so it is UPDATED to 99.99
- ProductID 22 was present at target but not at source, it is now DELETED from target, hence not showing below.
Note: In Azure Synapse Analytics, the MERGE command has few differences compared to SQL server and Azure SQL database.
So from the above demo, we understood how MERGE T-SQL works in real scenario in Azure Synapse Analytics Dedicated Pool