Microsoft recently announced ‘Map Data’ tool, a new feature of Azure Synapse Analytics.
Update: Map Data tool in Synapse Analytics is now Generally Available as on Jul 12 2022.
In this article we will demonstrate the step by step on how to use this new feature in Azure Synapse Analytics by mapping data between Azure Data Lake Gen2 & Lake database using new mapping method.
What is ‘Map Data’ of Azure Synapse Analytics?
The Map Data tool is a guided process to help users create ETL mappings and mapping data flows from their source data to Synapse lake database tables without writing code.
This Map Data tool is available within the Synapse lake database, we need to create a lake database first to use this tool.
Currently available sources & destination for the data mapping:
- Sources are Azure Data Lake Gen2 & Lake database itself;
- Destination is only lake database tables.
How to get started? step by step –
Prerequisites:
We need to have below source and destination ready before we get started for this demo.
- Source file/ database: Already available CSV file in Azure Azure Data Lake Gen2
- Destination: custom table in lake database.
To Create custom table in lake database, first select the 1 Table drop-down and select 2 ‘Custom’ option, it will displays the below 3 table template to fill with schema details & publish:
Once fill with schema details and publish the target table in synapse lake database will be created as shown in below image:
As Map Data tool is available within the Synapse lake database, let’s create a new lake database or open an existing lake database.
Now let’s open Map data tool by right clicking on database & selecting “Open (preview)” option as shown in below image:
Once we click on the “Open (preview)” option for an existing lake database, the similar layout shown in below image will be displayed:
Here in the above screenshot, we can see the “Map data(Preview)” option. This is the Map data feature of synapse.
So as mentioned above, Map Data tool is available within the Synapse lake database.
Now click on this “Map data(Preview)” option, it will ask to “Turn on data flow debug” as Data Mapping assistant requires an active data flow debug session. This will take few seconds to get the cluster ready.
Once this cluster is ready, we will get option to create new data mapping by selecting the source type as shown in below image:
Here we will select Azure Data Lake Gen2 as source, once we continue we will have to provide the setting details of source file in new data mapping; Please refer below two images:
Let’s provide/select the below details as shown in above image
- Source linked service: one linked service of Data Lake where the source data is staged.
- Dataset type: Delimited Text
- Folder Path: select the container/folder where here files are staged
- First row as header: This is important – select this option based on your file structure.
- Comma delimiter
- Row delimiter
As shown in the above image, Select the source file: This will display the all files available under the container/folder selected above in point. For each file, there is option to preview the files.
Here we preview the source CSV file data as well as schema as shown below.
Preview the data from CSV file staged in Azure data lake; Later we can verify while mapping with database.
We can see the schema of CSV file staged in Azure data lake:
We can see the schema, here for all column data tape is STRING. we need to convert the type to respective column data type while we will be mapping.
Now in the next new data mapping screen, please provide name of this Data Mapping and select the target database.
In the next page, it will ask to select the primary source table from drop-down and target table from lake database.
Initial page: Here we can see the data flow debug is ON
Here we selected CSV input file as source table and Fact_SalesMapData as target table as shown below image:
Please refer below details to understand the mapping details mentioned in above image.
New mapping: “Use the New Mapping button to add a mapping method to create a mapping or transformation”.
This layout contains several New Mapping method; the following mapping method are available:
- Direct
- Surrogate Key
- Lookup
- Unpivot
- Aggregate: Sum, Minimum, Maximum, First, Last, Standard Deviation, Average, Mean
- Derived Column: Trim, Upper, Lower, Advanced
Here in our case we used ‘Advanced‘ under ‘Derived Column‘ to convert the data type using Expressions & functions.
Please refer the source column where we have performed type conversion using method like toLong(), toTimestamp(), toInteger() etc using expression builder.
Here we have completed the manual mapping between source and target for each column.
Preview data: ‘Data Preview’ tab gives an interactive snapshot of the data of each transform.
Now let’s preview the data; it should show data what we saw while previewing the source data of Data Lake:
Additional source: In the above image ‘Additional source’ button helps us join to and add another source to the mapping.
So in this article we understood how to use and get started the Map data tool and map the data between source and target
Thanks for the reading; Please feel to ask query/question in comment section below.