Azure Data Factory Copy Activity: Copy Hierarchical JSON data to Azure Synapse Analytics SQL Pool

In this article we will use Azure Data Factory Copy Activity to copy the JSON data from Azure Data Lake Gen 2 to Azure Synapse Analytics SQL Pool i.e. copy from Hierarchical source to tabular sink in the form of Hierarchical Structures to Relational format.

When to use ADF Copy Activity & Data flow?

  • ADF Copy: For simple & straight forward copy of Hierarchical Structures into Relational Data i.e. no transformation like derived column etc is not required then ADF Copy Activity is fine.
  • Data Flow: For advanced hierarchical to tabular transformation, we have to use Data Flow.

This copy activity will de-normalize the JSON structures into relational format w.r.t array present in JSON. How much the denormalization will happened that depends on no of element in the array.

Azure Data Factory Copy: Source & Destination Mapping

While copying data from hierarchical source to tabular sink, ADF copy activity supports the following capabilities:

  1. Extract data from objects and arrays: Along with other element of JOSN, from object we can map each object property to the column of table.
  2. Collection References: Select or specify the JSONPath of a nested JSON array for cross-apply.

We can think ‘Collection References‘ as the master data or reference data on which the cross mapping happened converting JSON object into multiple records in tabular result.

Total No of records will be multiplication of number of records of collection reference array of the JSON. We will understood this below after copy activity below.

Note: If do not select or check the ‘Collection References’ i.e. if we do not specify an array for cross-apply, it will copy only the 1st element of that array. If we select, cross mapping will be applied.

In this article we will see how to use ‘Collection References‘ of mapping and in next article we will understand how to use “Map complex values to string” of mapping. Please refer above image to see the presence of these features.

Now let us have a quick demo to copy the JSON into synapse table:

  • Source data: Dataset is created based on the JSON file of Azure Data Lake. This JSON file is individual customer record with address and purchase details.
  • Destination: Dataset is created based on ‘Order’ table of Azure Synapse SQL Pool table. one table ‘Order’ with required columns is already created.

Let’s have a look at the source dataset and preview the data: it’s now normalized view, once copied it will be denormalized.

Image1: Azure Data Factory Copy Source JSON Dataset
Image2: Azure Data Factory Copy Source JSON Data Preview

In the preview data, we can see the customer is an object and its property will be mapped to column of synapse table.
But “orders” is an array which will be used as ‘Collection References’ and also will be used for cross mapping with rest of JSON data.

Now let’s focus only on mapping tab of Copy activity: Please refer below image and follow the step subsequently

Image3: Azure Data Factory Copy: Source & Destination Mapping
  1. First we have click on the ‘Import Schema’ to import schema of both source & target; Schema of both source/sink is mandatory.
  2. Now let’s select on the ‘Collection References’ dropdown and select the ‘orders’ for cross mapping.
  3. Now we have to map the required source and target column & remove the source column which may not be required as mentioned in below image.
  4. Once the mapping is done, now let’s trigger debug run on the pipeline and wait for the completion. Once it displayed the execution status as successful. we can check the “Order” table of Azure Synapse SQL pool.

Please note, in the above image we have not enabled Advanced editor, we will enable it later to show what will happen if we enable it.

Here we can see the inserted record into Order table of Azure Synapse Anaytics SQL pool :

Image4: Azure Synapse Analytics: View data in table of SQL Pool
  1. Customer objects’s property mapped to respective column
  2. Order array i.e. collection reference has 3 rows, so there here 1*3=3 rows are inserted into table

So the JSON data has been denormalised now. here the order number/order date/customer name etc is repeated for each order items of JSON.

In above mapping tab of Image 3, please note the Hierarchical Structures of the JSON is displayed & we mapped to target column. Hierarchical Structures of the JSON is displayed because “Advanced editor” was not checked. If we check it, Hierarchical Structures will be turned to flat mapping as shown in below image.

Image5: Azure Data Factory Copy Advanced editor
Image6: Azure Data Factory Copy Advanced editor flat mapping

So in this article we understood how to use ADF Copy to JSON data into Azure Synapse Analytics SQL pool & usage of Collection References as well.

he reading; Please feel to ask query/question in comment section below.

Leave a Reply