Data Lineage
Tokern Lineage Engine is a fast and easy to use platform to collect, visualize and analyze column-level data lineage in databases, data warehouses and data lakes in AWS and GCP.
Tokern Lineage Engine helps you browse column-level data lineage
- visually using kedro-viz
- analyze lineage graphs programmatically using the powerful networkx graph library
Utilize column-level lineage to add rich context and powerful automation for common data management tasks like:
- Track and debug data quality.
- Track PII, PHI and other sensitive data and their access rights.
- Save costs by removing unused ETL pipelines and datasets.
- Enrich your data dictionary to help users find the right dataset for their analysis.
Check out the post on using data lineage for cost control as an example.
Calling for Tokern Lineage Engine beta users
We are building a data lineage platform for Snowflake, Redshift and BigQuery. Interested? Take this survey to sign up.
Need help with your data governance and lineage strategy?
If you would like hands-on assistance setting up Tokern projects, open source data catalogs like Datahub or Amundsen, or adding additional functionality to Tokern Lineage Engine, please get in touch through this form.
Resources
Demo of Tokern Lineage App
Download the docker-compose file from Github repository.
# in a new directory run
wget https://raw.githubusercontent.com/tokern/data-lineage/master/install-manifests/docker-compose/catalog-demo.yml
# or run
curl https://raw.githubusercontent.com/tokern/data-lineage/master/install-manifests/docker-compose/catalog-demo.yml -o docker-compose.yml
Run docker-compose
docker-compose up -d
Check that the containers are running.
docker ps
CONTAINER ID IMAGE CREATED STATUS PORTS NAMES
3f4e77845b81 tokern/data-lineage-viz:latest ... 4 hours ago Up 4 hours 0.0.0.0:8000->80/tcp tokern-data-lineage-visualizer
1e1ce4efd792 tokern/data-lineage:latest ... 5 days ago Up 5 days tokern-data-lineage
38be15bedd39 tokern/demodb:latest ... 2 weeks ago Up 2 weeks tokern-demodb
Try out Tokern Lineage App
Head to http://localhost:8000/
to open the Tokern Lineage app
Jupyter Notebooks and case studies
Checkout an example data lineage notebook.
Check out the post on using data lineage for cost control for an example of how data lineage can be used in production.
Getting Started
Check out Installation for multiple options to start the data_lineage engine and browse lineage graphs.
Checkout the following example notebooks to analyze lineage graphs:
- Use the API to create the lineage graph
- Use the query parser module to generate lineage from SQL ETL queries.
Features
- Lineage & Catalog stored in a database.
- Integrates with open source data catalogs.
- API and SDK to integrate with any ETL framework.
- Use the query parser module to generate lineage from SQL query history.
- Supports ANSI SQL queries.
Supported Databases
- PostgreSQL
- AWS Redshift
- Snowflake
Coming Soon
- AWS Athena
- MySQL/MariaDb