Recently Cast transformation is introduced to Mapping Data Flow in Azure Data Factory and Azure Synapse Analytics.
In this article, we will perform a quick demo on how to use this cast transformation in data flow. We will also explore every option available in Cast transformation Settings.
How Cast Transformation helps Data Engineer?
- It allows data engineer to perform data type conversions with built-in type checking.
- While type casting, it optionally marks the rows of failed type casting.
- It also allows data engineer to format the required data.
For demo, we have source data i.e. Sales data is stored in Azure blob storage.
Below is the sales data from datasets preview. Number of Sales for ProductId p1 is intentionally assigned as ‘A’ to produce errors during type casting.
Schema Tab of dataset is showing the data type of all columns as String. But data type of NumOfSales should be integer or long and data type of Price should be decimal.
This is where we need cast transformation to perform data type conversion.
Here we will cast the below column:
- ‘NumOfSales’ to integer and
- ‘Price’ to decimal and formats it also.
Adding Cast transformation in data flow:
To add this transformation, please click on the + sign on the transformation step, here it is source transformation and select the cast transformation shown in below image:
Cast transformation Settings:
Once we add this transformation to the data flow, we can see the below image.
To add the ‘NumOfSales’ and ‘Price’ column in type conversion , as highlighted in above image, we need to click on + sign at the end of first row and select the above columns along with type and format.
Here we have below properties to work on:
- Column : select the column name for which we have perform type casting. This drop down list contains the name of column from input file.
- Type: select the data type to which we need to cast the data type of selected column. Here we will set the precision and scale for decimal data.
- Format: for decimal or date, we can format the data in desired format.
Relevant formats are available in the drop down list as shown in below image.
Format options for Integer:
Format options for Decimal:
Assert Type check:
It allows ADF and synapse to mark rows where type casting fails with type check error.
We can check or keep it as uncheck as below:
- Assert Type checked: If we select this option, it will mark the rows where type casting fails. As mentioned earlier, we have productid1 with wrong NumOfSales ‘A’ which will fail in type casting with value as NULL. This marking of failed rows can be used in later stage.
- Assert Type unchecked: If we keep it uncheck, it will not mark the failed rows but the value of the column as NULL.
Type casting with Cast transformation:
As we seen earlier in data set schema, NumofSales and Price column’s data type is String. Here we will perform below casting:
- NumOfSales column to type casting as integer
- Price column to type casting as decimal. We will set the precision and scale in the next step.
Set precision and scale for decimal data:
Please select the ‘Edit’ option just below of the ‘Type’ field to provide the precision as 10 and scale as 2. Please refer below images:
Once we select the Edit option , we have to provide the precision and scale in the next Type setting:
First we will keep ‘Assert Type check’ unchecked as below:
Now let’s preview the data:
Here we can see successful type conversion of all rows except ProductId p1 for NumOfSales and Price columns.
Its original value was ‘A’, so type casting fails resulting NULL value. But there is no marking on this row that this particular row is failed while type casting.
‘Assert Type check’ checked as below:
Here we can also see successful type conversion of all rows except ProductId p1 for NumOfSales and Price columns.
Its original value was ‘A’, so type casting fails resulting NULL value. But there is marking on this row that this particular row is failed while type casting.
So in this article we saw how to use this new Cast transformation in data flow in a demo; Hope it helps you. Thanks for reading.
More article on Azure Data Factory and Synapse:
How to create and use Flowlet transformation in Azure Data Factory and Azure Synapse pipeline
Azure Data Factory and Synapse Analytics provides Script Activity to execute DML & DDL script
Please feel free to comment below in case of your query or feedback.