Plugin Development
PIICatcher supports two types of scanning techniques:
- Metadata
- Data
Plugins can be created for either of these two techniques. Plugins are then registered using an API or using Python Entry Points.
Create a new detector
To create a new detector, simply create a new class that inherits from MetadataDetector
or DatumDetector
.
In the new class, define a function detect
that will return a PIIType
If you are detecting a new PII type, then you can define a new class that inherits from PIIType.
The example below finds India PAN numbers in data and column metadata.
# Define a new PII Type
class PAN(PIIType):
pass
@register_detector
class ColumnNamePanDetector(MetadataDetector):
regex = re.compile("pan", re.IGNORECASE)
name = "ColumnNamePanDetector"
def detect(self, column: CatColumn) -> Optional[PiiType]:
if regex.match(column.name) is not None:
return PAN()
return None
@register_detector
class DatumPanDetector(DatumDetector):
regex = re.compile("[A-Z]{5}[0-9]{4}[A-Z]{1}")
name = "ColumnNamePanDetector"
def detect(self, column: CatColumn, datum: str) -> Optional[PiiType]:
if regex.match(datum) is not None:
return PAN()
return None
Registration
To ensure that PIICatcher knows about your detector you should use @register_detector
decorator call the function
before invoking a scan. This function adds the detector to a catalogue. PIICatcher iterates through all detectors
in the catalogue and chooses the output of the first detector that returns a PII type.
Entry Points
If you are running PII Catcher through the command line, or you have created a new package for your detector, then there
is no opportunity to run the register_detector
decorator. In this case, you should use entry points to register
detectors. On start-up PIICatcher finds all entry points registered as piicatcher_detectors
and automatically loads
these detectors. For example,
[options.entry_points]
piicatcher_detectors =
pan_metadata = pan_plugin.detectors:ColumnNamePanDetector
pan_datum = pan_plugin.detectors:DatumPanDetector
More information about entry points can be found in the setuptools documentation.
Spacy Plugin
Tokern provides spacy as a plugin. Use the following commands to install the spacy plugin:
pip install piicatcher_spacy
The plugin is registered automatically because the entry point is set in pyproject.toml.
[tool.poetry.plugins."piicatcher_detectors"]
spacy = "piicatcher_spacy.detectors:SpacyDetector"