medkit.text.ner.quick_umls_matcher
==================================

.. py:module:: medkit.text.ner.quick_umls_matcher


Classes
-------

.. autoapisummary::

   medkit.text.ner.quick_umls_matcher.QuickUMLSMatcher


Module Contents
---------------

.. py:class:: QuickUMLSMatcher(version: str, language: str, lowercase: bool = False, normalize_unicode: bool = False, overlapping: typing_extensions.Literal[length, score] = 'length', threshold: float = 0.9, window: int = 5, similarity: typing_extensions.Literal[dice, jaccard, cosine, overlap] = 'jaccard', accepted_semtypes: list[str] = quickumls.constants.ACCEPTED_SEMTYPES, attrs_to_copy: list[str] | None = None, output_label: str | dict[str, str] | None = None, name: str | None = None, uid: str | None = None)

   Bases: :py:obj:`medkit.core.text.NEROperation`


   
   Entity annotator relying on QuickUMLS.

   This annotator requires a QuickUMLS installation performed
   with `python -m quickumls.install` with flags corresponding
   to the params `language`, `version`, `lowercase` and `normalize_unicode`
   passed at init. QuickUMLS installations must be registered with the
   `add_install` class method.

   For instance, if we want to use `QuickUMLSMatcher` with a french
   lowercase QuickUMLS install based on UMLS version 2021AB,
   we must first create this installation with:

   >>> python -m quickumls.install --language FRE --lowercase /path/to/umls/2021AB/data /path/to/quick/umls/install

   then register this install with:

   >>> QuickUMLSMatcher.add_install(
   >>>        "/path/to/quick/umls/install",
   >>>        version="2021AB",
   >>>        language="FRE",
   >>>        lowercase=True,
   >>> )

   and finally instantiate the matcher with:

   >>> matcher = QuickUMLSMatcher(
   >>>     version="2021AB",
   >>>     language="FRE",
   >>>     lowercase=True,
   >>> )

   This mechanism makes it possible to store in the OperationDescription
   how the used QuickUMLS was created, and to reinstantiate the same matcher
   on a different environment if a similar install is available.















   ..
       !! processed by numpydoc !!

   .. py:attribute:: _install_paths
      :type:  ClassVar[dict[_QuickUMLSInstall, str]]


   .. py:method:: add_install(path: str | pathlib.Path, version: str, language: str, lowercase: bool = False, normalize_unicode: bool = False)
      :classmethod:


      
      Register path and settings of a QuickUMLS installation.


      :Parameters:

          **path** : str or Path
              The path to the destination folder passed to the install command

          **version** : str
              The version of the UMLS database, for instance "2021AB"

          **language** : str
              The language flag passed to the install command, for instance "ENG"

          **lowercase** : bool, default=False
              Whether the --lowercase flag was passed to the install command
              (concepts are lowercased to increase recall)

          **normalize_unicode** : bool, default=False
              Whether the --normalize-unicode flag was passed to the install command
              (non-ASCII chars in concepts are converted to the closest ASCII chars)














      ..
          !! processed by numpydoc !!


   .. py:method:: clear_installs()
      :classmethod:


      
      Remove all QuickUMLS installation registered with `add_install`.
















      ..
          !! processed by numpydoc !!


   .. py:method:: _get_path_to_install(version: str, language: str, lowercase: bool = False, normalize_unicode: bool = False) -> str
      :classmethod:


      
      Find a QuickUMLS install with corresponding settings.

      The QuickUMLS install must have been previously registered with `add_install`.















      ..
          !! processed by numpydoc !!


   .. py:attribute:: language


   .. py:attribute:: version


   .. py:attribute:: lowercase
      :value: False



   .. py:attribute:: normalize_unicode
      :value: False



   .. py:attribute:: overlapping
      :value: 'length'



   .. py:attribute:: threshold
      :value: 0.9



   .. py:attribute:: similarity
      :value: 'jaccard'



   .. py:attribute:: window
      :value: 5



   .. py:attribute:: accepted_semtypes


   .. py:attribute:: attrs_to_copy
      :value: None



   .. py:attribute:: _matcher


   .. py:attribute:: _semtype_to_semgroup
      :value: None



   .. py:attribute:: label_mapping


   .. py:method:: _get_label_mapping(output_label: None | str | dict[str, str]) -> dict[str, str]
      :staticmethod:


      
      Return label mapping according to `output_label`.
















      ..
          !! processed by numpydoc !!


   .. py:method:: run(segments: list[medkit.core.text.Segment]) -> list[medkit.core.text.Entity]

      
      Return entities (with UMLS normalization attributes) for each match in `segments`.


      :Parameters:

          **segments** : list of Segment
              List of segments into which to look for matches



      :Returns:

          list of Entity
              Entities found in `segments`, with :class:`~UMLSNormAttribute` attributes.











      ..
          !! processed by numpydoc !!


   .. py:method:: _find_matches_in_segment(segment: medkit.core.text.Segment) -> Iterator[medkit.core.text.Entity]


