Skip to main content

Plugin Development

PIICatcher supports two types of scanning techniques:

  • Metadata
  • Data

Plugins can be created for either of these two techniques. Plugins are then registered using an API or using Python Entry Points.

Create a new detector

To create a new detector, simply create a new class that inherits from MetadataDetector or DatumDetector.

In the new class, define a function detect that will return a PIIType If you are detecting a new PII type, then you can define a new class that inherits from PIIType.

The example below finds India PAN numbers in data and column metadata.

# Define a new PII Type
class PAN(PIIType):
pass

@register_detector
class ColumnNamePanDetector(MetadataDetector):
regex = re.compile("pan", re.IGNORECASE)
name = "ColumnNamePanDetector"

def detect(self, column: CatColumn) -> Optional[PiiType]:
if regex.match(column.name) is not None:
return PAN()

return None


@register_detector
class DatumPanDetector(DatumDetector):
regex = re.compile("[A-Z]{5}[0-9]{4}[A-Z]{1}")
name = "ColumnNamePanDetector"

def detect(self, column: CatColumn, datum: str) -> Optional[PiiType]:
if regex.match(datum) is not None:
return PAN()

return None

Registration

To ensure that PIICatcher knows about your detector you should use @register_detector decorator call the function before invoking a scan. This function adds the detector to a catalogue. PIICatcher iterates through all detectors in the catalogue and chooses the output of the first detector that returns a PII type.

Entry Points

If you are running PII Catcher through the command line, or you have created a new package for your detector, then there is no opportunity to run the register_detector decorator. In this case, you should use entry points to register detectors. On start-up PIICatcher finds all entry points registered as piicatcher_detectors and automatically loads these detectors. For example,

[options.entry_points]
piicatcher_detectors =
pan_metadata = pan_plugin.detectors:ColumnNamePanDetector
pan_datum = pan_plugin.detectors:DatumPanDetector

More information about entry points can be found in the setuptools documentation.

Spacy Plugin

Tokern provides spacy as a plugin. Use the following commands to install the spacy plugin:

pip install piicatcher_spacy

The plugin is registered automatically because the entry point is set in pyproject.toml.

[tool.poetry.plugins."piicatcher_detectors"]
spacy = "piicatcher_spacy.detectors:SpacyDetector"