:py:mod:`medkit.text.ner.nlstruct_entity_matcher`
=================================================

.. py:module:: medkit.text.ner.nlstruct_entity_matcher


Module Contents
---------------

Classes
~~~~~~~

.. autoapisummary::

   medkit.text.ner.nlstruct_entity_matcher.NLStructEntityMatcher




.. py:class:: NLStructEntityMatcher(model_name_or_dirpath: str | pathlib.Path, attrs_to_copy: list[str] | None = None, device: int = -1, hf_auth_token: str | None = None, cache_dir: str | pathlib.Path | None = None, name: str | None = None, uid: str | None = None)


   Bases: :py:obj:`medkit.core.text.NEROperation`

   
   Entity matcher based on a NLstruct InformationExtraction model.

   The matcher expects a directory with a torch checkpoint and a text file if
   the model was pretrained using word embeddings.

   The paper [R9e7c7744ea89-1]_ presents a model trained with the NLstruct [R9e7c7744ea89-2]_ library and the mimic learning approach.
   The model used a private teacher model to annotate the unlabeled [CAS clinical French corpus](https://aclanthology.org/W18-5614/).
   The weights of the CAS student model are shared via the HuggingFace Hub and you can
   load the model using the following model name `NesrineBannour/CAS-privacy-preserving-model`
   to create a NLstructEntityMatcher.

   :Parameters:

       **model_name_or_dirpath** : str or Path
           Name (on the HuggingFace models hub) or dirpath of the NLstruct model.
           The model dir must contain a PyTorch file ('.cpkt','.pt') and a text file (.txt)
           representing the FastText embeddings if required.

       **attrs_to_copy** : list of str, optional
           Labels of the attributes that should be copied from the input segment
           to the created entity. Useful for propagating context attributes
           (negation, antecendent, etc).

       **device** : int, default=-1
           Device to use for the NLstruct model. Follows the HuggingFace convention
           (-1 for "cpu" and device number for gpu, for instance 0 for "cuda:0").

       **hf_auth_token** : str, optional
           HuggingFace Authentication token (to access private models on the
           hub)

       **cache_dir** : str or Path, optional
           Directory where to store downloaded models. If not set, the default
           HuggingFace cache dir is used.

       **name** : str, optional
           Name describing the matcher (defaults to the class name).

       **uid** : str, optional
           Identifier of the matcher.










   .. rubric:: References

   .. [R9e7c7744ea89-1] Nesrine Bannour, Perceval Wajsbürt, Bastien Rance, Xavier Tannier, and Aurélie Névéol. 2022.
           Privacy-preserving mimic models for clinical named entity recognition in French.
           Journal of Biomedical Informatics 130, (2022), 104073.
           DOI: https://doi.org/https://doi.org/10.1016/j.jbi.2022.104073
   .. [R9e7c7744ea89-2] Perceval Wajsbürt. 2021. Extraction and normalization of simple and structured entities in medical documents.
           Theses. Sorbonne Université. Retrieved from https://hal.archives-ouvertes.fr/tel-03624928

   .. only:: latex

      [R9e7c7744ea89-1]_, [R9e7c7744ea89-2]_




   ..
       !! processed by numpydoc !!
   .. py:method:: _load_from_checkpoint_dir(checkpoint_dir: pathlib.Path, device)
      :staticmethod:

      
      Get the location of the checkpoint and fix the path of the Fast Text file in the configuration.

      Return the nlstruct model created with the modified config.















      ..
          !! processed by numpydoc !!

   .. py:method:: run(segments: list[medkit.core.text.Segment]) -> list[medkit.core.text.Entity]

      
      Return entities for each match in `segments`.


      :Parameters:

          **segments** : list of Segment
              List of segments into which to look for matches.

      :Returns:

          list of Entity
              Entities found in `segments`.













      ..
          !! processed by numpydoc !!

   .. py:method:: _matches_to_entities(matches: list[dict], segment: medkit.core.text.Segment) -> Iterator[medkit.core.text.Entity]



