Export to Datahub or Amundsen
Overview
Metadata stored in Tokern Catalog especially PII and column-level lineage can be exported to Datahub or Amundsen.
Datahub
dbcat provides a Source plugin. The source plugin has to be configured in an ingestion recipe.
CatalogSource accepts the following configuration:
path: Path to SQLite databaseuser: user name of role in Postgres Catalogpassword: password of role in Postgres Cataloghost: host name of Postgres Catalogdb: database name of role in Postgres Catalogport: Port number of Postgres Catalogsecret: Secret Key to encrypt passwords and tokens in the Catalogsource_names: List of sources to exportinclude_schema_regex: List of regular expressions that specify which schemata to include. Refer include_exclude_listsexclude_schema_regex: List of regular expressions that specify which schemata to include. Refer include_exclude_listsinclude_table_regex: List of regular expressions that specify which schemata to include. Refer include_exclude_listsexclude_table_regex: List of regular expressions that specify which schemata to include. Refer include_exclude_listsinclude_source_name: True/False Specify if table names should include source or not in the formatsource.schema.table. Useful when there are multiple databasesenv: Environment variable expected by Databub. Default is PROD
Installation
# Install required libraries in a virtualenv
pip install dbcat[datahub]
# Create an ingestion recipe (see below)
# Run recipe
datahub ingest -c contrib/datahub/export.yml
Example Recipes
Basic Recipe
The following configuration sets up Catalog Source with default configuration and the sink is to console:
source:
  type: dbcat.datahub.CatalogSource
sink:
  type: "console"
Postgres Catalog, specific source and include schema
source:
  type: dbcat.datahub.CatalogSource
  config:
    user: tokern
    password: passw0rd
    host: postgres
    database: tdb
    secret: my_secret_password
    source_names:
       - redshift_prod
       - bq_analysis
    include_schema:
       - events
sink:
  type: "console"
To configure sinks, refer to Datahub metadata ingestion documentation
Amundsen
dbcat provides a CatalogExtractor to extract metadata information. The Extractor can be used in an Amundsen
metadata ingestion pipeline. 
CatalogExtractor accepts the following configuration:
catalog_config: accepts a dictionary with connection parameters as described catalog configurationsource_names: List of sources to exportinclude_schema_regex: List of regular expressions that specify which schemata to include. Refer include_exclude_listsexclude_schema_regex: List of regular expressions that specify which schemata to include. Refer include_exclude_listsinclude_table_regex: List of regular expressions that specify which schemata to include. Refer include_exclude_listsexclude_table_regex: List of regular expressions that specify which schemata to include. Refer include_exclude_lists
Check out an example loader in Github project