Skip to main content

Query Parsing

Overview

This example demonstrates how to scan query history from a data warehouse and save it in the Tokern Lineage engine. The app automatically parses and extracts data lineage from the queries.

The example consists of the following sequence of operations:

  • Start docker containers containing a demo. Refer to docs for detailed instructions on installing demo-wikimedia.
  • Scan and send queries from query history to data lineage app.
  • Visualize the graph by visiting Tokern UI.
  • Analyze the graph

Download queries for the wikimedia demo from Github

Install Wikimedia demo

This demo requires wikimedia demo to be running. Start the demo using the following instructions:

# in a new directory run
wget https://raw.githubusercontent.com/tokern/data-lineage/master/install-manifests/docker-compose/wikimedia-demo.yml
# or run
curl https://raw.githubusercontent.com/tokern/data-lineage/master/install-manifests/docker-compose/wikimedia-demo.yml -o docker-compose.yml

Run docker-compose

docker-compose up -d

Verify containers are running

docker container ls | grep tokern
# Required configuration for API and wikimedia database network address

docker_address = "http://127.0.0.1:8000"
wikimedia_db = {
"username": "etldev",
"password": "3tld3v",
"uri": "tokern-demo-wikimedia",
"port": "5432",
"database": "wikimedia"
}
from data_lineage import Catalog

catalog = Catalog(docker_address)

# Register wikimedia datawarehouse with data-lineage app.

source = catalog.add_source(name="wikimedia", source_type="postgresql", **wikimedia_db)

# Scan the wikimedia data warehouse and register all schemata, tables and columns.

catalog.scan_source(source)
# Read queries from a json file. Note that this is for demo only

with open("queries.json", "r") as file:
queries = json.load(file)
from datetime import datetime
from data_lineage import Analyze

analyze = Analyze(docker_address)

for query in queries:
print(query)
analyze.analyze(**query, source=source, start_time=datetime.now(), end_time=datetime.now())

Visit Kedro UI

One Task Graph