As part of introduction of Azure Purview, in this article we will have a simple demo to visualize end to end data lineage (in Azure Purview) of copy data from Azure Sql database to Azure Data Lake GEN2. We will also get familiarize with Data map & Data Catalog in Azure Purview.
Microsoft defined Azure Purview as below:
” Azure Purview is a unified data governance solution that helps you manage and govern your on-premises, multicloud, and software-as-a-service (SaaS) data. Easily create a holistic, up-to-date map of your data landscape with automated data discovery, sensitive data classification and end-to-end data lineage. Enable data consumers to find valuable, trustworthy data.”
Here we will concentrate only in data lineage portion of Azure Purview
Prerequisite:
We have to create below service and have ready with few data in Azure Sql Database to be copied into Data Lake.
- Azure Purview Account : to visualize data lineage
- Azure Data Factory: for data ingestion and push lineage details to Azure Purview
- Azure Sql Database: data source
- Azure Data Lake Gen2: destination
As shown below, during Azure Purview Account creation Azure storage and Event Hub is created along with purview account under a managed resource group.
Connect Azure Data Factory to Azure Purview:
1st we we need to connect Azure Purview account in Azure Data Factory by Clicking on ‘connect to a Azure Purview account’ to flow the lineage details to Azure Purview account.
Once it is connected with Purview account, we can see the below image about the integration of data lineage:
Now let’s see the Azure Factory Copy Activity details as shown below image. Data will be copied from Azure Sql Database to Azure Data Lake Gen2:
Once we ran the pipeline, we can see the below run status along with lineage details in ADF monitor:
Detailed flow:
Now let’s open the Azure Purview account to visualize the data lineage of this copy activity:
Azure Purview ‘Data catalog’:
“The Azure Purview Data Catalogue is an application built on Data Map for use by business data users, data engineers and stewards to discover data, identify lineage relationships and assign business context quickly and easily.”
Once we open the Purview account we can see the ‘Data catalog’ where we can browset the asset (entity involved) and search any asset. We can check asset details by clicking on ‘Browse Assets’ or we can search the individual asset in search box.
“The Azure Purview data catalog offers a browse experience that enables users to explore what data is available to them either by collection or through traversing the hierarchy of each data source in the catalog”
Now let’s browse the catalog to see data source asset; Here we can see the the asset list as source type:
‘Azure Data factory, Azure Sql Database & Azure SQL Server.
Azure Purview ‘Data map’:
Microsoft defined Azure Purview ‘Data map’ as below
“The Azure Purview Data Map is the foundation for managing data governance at cloud scale. Data Map stores metadata, lineage, classifications, and other annotations associated with data assets and serves the requests for this information through Apache Atlas APIs or through the applications built on Data Map including Data Catalogue. “
Let’ on the data map at left to view what this data map contains; As below image, we can see it includes collection (assets), source, classification etc. Here let’s click on ‘asset’ to go to see the asset collection in next image:
Now let’s click on the assets under collection to see the data asset in collection:
Data assets includes the data factory, data factory pipeline, data source Azure Sql Database & Data sink Azure Data Lake etc as shown below:
Now we will visualize the data lineage details of each asset by clicking on data factory pipeline, data source Azure Sql Database & Data sink Azure Data Lake.
1st let’s click on data factory pipeline, it will redirect to data catalog as shown below:
Here we can visualize the end to end data lineage of data factory pipeline i.e. source of data, how it is getting ingested and where it is getting stored finally.
Now let’s click on data source Azure Sql Database to see the data lineage:
Now let’s click on Data sink Azure Data Lake Gen2 to see the data lineage:
So far we visualize the data lineage of Azure Data Factory copy activity and also lineage of source and sink of the pipeline.
Thanks for reading the article; Please feel free to comment your question and thought on this.