medkit.text.ner.nlstruct_entity_matcher#

This module needs extra-dependencies not installed as core dependencies of medkit. To install them, use pip install medkit-lib[nlstruct].

Classes:

NLStructEntityMatcher(model_name_or_dirpath)

Entity matcher based on a NLstruct InformationExtraction model.

class NLStructEntityMatcher(model_name_or_dirpath, attrs_to_copy=None, device=-1, hf_auth_token=None, cache_dir=None, name=None, uid=None)[source]#

Entity matcher based on a NLstruct InformationExtraction model. The matcher expects a directory with a torch checkpoint and a text file if the model was pretrained using word embeddings.

The paper [1] presents a model trained with the NLstruct [2] library and the mimic learning approach. The model used a private teacher model to annotate the unlabeled [CAS clinical French corpus](https://aclanthology.org/W18-5614/). The weights of the CAS student model are shared via the HuggingFace Hub and you can load the model using the following model name NesrineBannour/CAS-privacy-preserving-model to create a NLstructEntityMatcher.

References

Parameters:
  • model_name_or_dirpath (str or Path) – Name (on the HuggingFace models hub) or dirpath of the NLstruct model. The model dir must contain a PyTorch file (‘.cpkt’,’.pt’) and a text file (.txt) representing the FastText embeddings if required.

  • attrs_to_copy (list of str, optional) – Labels of the attributes that should be copied from the input segment to the created entity. Useful for propagating context attributes (negation, antecendent, etc).

  • device (int, default=-1) – Device to use for the NLstruct model. Follows the HuggingFace convention (-1 for “cpu” and device number for gpu, for instance 0 for “cuda:0”).

  • hf_auth_token (str, optional) – HuggingFace Authentication token (to access private models on the hub)

  • cache_dir (str or Path, optional) – Directory where to store downloaded models. If not set, the default HuggingFace cache dir is used.

  • name (str, optional) – Name describing the matcher (defaults to the class name).

  • uid (str, optional) – Identifier of the matcher.

Methods:

run(segments)

Return entities for each match in segments.

set_prov_tracer(prov_tracer)

Enable provenance tracing.

Attributes:

description

Contains all the operation init parameters.

run(segments)[source]#

Return entities for each match in segments.

Parameters:

segments (list of Segment) – List of segments into which to look for matches.

Return type:

list[Entity]

Returns:

list of Entity – Entities found in segments.

property description: OperationDescription#

Contains all the operation init parameters.

Return type:

OperationDescription

set_prov_tracer(prov_tracer)#

Enable provenance tracing.

Parameters:

prov_tracer (ProvTracer) – The provenance tracer used to trace the provenance.