Azure Data Factory mapping data flow: ‘Source’ partitioning on Azure SQL Database

Azure Data Factory & Azure Synapse Pipelines both provides mapping data flows for the data transformation. Each data flow should contain at least one ‘source transformation‘ which configures the data source for the data flow.

We can configure a source transformation by adding a data source using ‘Add Source’ box in the data flow canvas. This article refers to the reading of data from Azure SQL database as data source.

While transforming data using Azure Data Factory mapping data flow, ‘Source’ partitioning on Azure SQL Database improves the read performance from source SQL Database.

This ‘Source’ partitioning is unique to Azure SQL Database only.

‘Source’ partitioning on Azure SQL Database

After enabling this partitioning under Optimize tab, it improves performance i.e., read the source data faster by creating parallel connections on the source SQL database.

As we can see the above diagram, while on selection of “Source partitioning”, it allows us to

  • specify the number of partitions.
  • partition source data by specifying column name or query condition.

In case of column, we should use partition column with high cardinality.

In this partitioning, generally four or five partitions is ideal else too many partitions may saturate source database.

Leave a Reply