MERGE command in Azure Synapse Analytics Dedicated SQL pool

In this article we will understand the features of MERGE T-SQL command and have a quick demo on Source table to Sync with target dimension table in Azure Synapse Analytics Dedicated SQL pool.

MERGE T-SQL command for Azure Synapse Dedicated SQL pools is now Generally Available

MERGE T-SQL feature:
  • It combines INSERTs/UPDATEs/DELETEs into a single statement.
  • Thus it executes in single transaction rather than 3 separate INSERTs/UPDATEs/DELETEs transaction.
  • Hence it improves query performance and scalability.
  • MERGE command eases table synchronization specially for Slowly Changing Dimension Type 1.
  • It simplify migrations onto Synapse and improving code readability.
MERGE Command in Action: GIF credit Microsoft
MERGE T-SQL command is applicable to
  • Azure Synapse Analytics
  • Azure SQL Database
  • SQL Server (all supported versions)

For official GA-supported MERGE, @@version of Synapse Dedicated SQL pool should be ‘10.0.17829.0’ or beyond.

Common Use case:
  • Updating Slowly Changing Dimension tables in data warehousing workloads.
  • From source data to be sync with target Dimension tables in a single operation with following DML commands:
    1. INSERT new data which is not present at target
    2. UPDATE existing data of target if columns values is different
    3. DELETE existing data which is not present at source but in target

Quick Demo: Source table to Sync Target dimension table

Here we will sync Prod Dimension table i.e. prod.DimProductTbl based on Source Staging Dimension table i.e. staging.DimProductTbl

Prerequisite:
  • Azure Synapse Analytics Dedicated SQL pool
  • Source & Target table along with required data

Please note table name are same but with different schema, one is prod and another is staging.

Target : Prod Dimension table: prod.DimProductTbl

Merge statements with a WHEN NOT MATCHED [BY TARGET] clause must target a hash distributed table.

This table contains below data for ProductID 11,22,33

Source: Staging Dimension table: staging.DimProductTbl

This table contains below data for product id 11,44

Analyze current data of source and target:

Comparing above tables, we can see :

  • INSERT: ProductId 44 is not present at target but at source, it will be INSERTED
  • UPDATE: ProductId 11 is present at both source & target & its price is updated, it will be UPDATED
  • DELETE: ProductId 22 is present at target but not at source, it will be DELETED from target
SAMPLE CODE – MERGE T-SQL command execution:

Once the below query is run successfully at Dedicated SQL Pool, we will verify the records of target table.

MERGE INTO prod.DimProductTbl AS TargetTbl
USING staging.DimProductTbl AS SourceTbl
ON SourceTbl.ProductID = TargetTbl.ProductID
WHEN MATCHED AND SourceTbl.Price <> TargetTbl.Price
-- UPDATEs for new price of existing product
THEN UPDATE SET TargetTbl.Price = SourceTbl.Price
-- INSERTs: Source has 1 new product i.e. ProductID 44
WHEN NOT MATCHED BY TARGET THEN
INSERT (ProductID, ProductName, Price)
VALUES (SourceTbl.ProductID,SourceTbl.ProductName,SourceTbl.Price)
-- DELETEs: Target has 1 ProductID 22 that Source doesn't have
WHEN NOT MATCHED BY SOURCE THEN
DELETE
OPTION ( LABEL = 'MERGE DEMO - Update 1, Insert 1, Delete 1)' );
Verify Target table record:

Now we will run a select statement on target table as shown below.

This shows the same data what is in staging source table for

  • ProductID 44 was not present at target earlier, so it is now INSERTED.
  • ProductID 11 was present at both source & target & its price was updated in source, so it is UPDATED to 99.99
  • ProductID 22 was present at target but not at source, it is now DELETED from target, hence not showing below.

Note: In Azure Synapse Analytics, the MERGE command has few differences compared to SQL server and Azure SQL database.

So from the above demo, we understood how MERGE T-SQL works in real scenario in Azure Synapse Analytics Dedicated Pool

Leave a Reply