medkit.text.ner.umls_utils
==========================

.. py:module:: medkit.text.ner.umls_utils


Attributes
----------

.. autoapisummary::

   medkit.text.ner.umls_utils.SEMGROUP_LABELS
   medkit.text.ner.umls_utils.SEMGROUPS


Classes
-------

.. autoapisummary::

   medkit.text.ner.umls_utils.UMLSEntry


Functions
---------

.. autoapisummary::

   medkit.text.ner.umls_utils.load_umls_entries
   medkit.text.ner.umls_utils.preprocess_term_to_match
   medkit.text.ner.umls_utils.preprocess_acronym
   medkit.text.ner.umls_utils.guess_umls_version


Module Contents
---------------

.. py:data:: SEMGROUP_LABELS

   
   Labels corresponding to UMLS semgroups
















   ..
       !! processed by numpydoc !!

.. py:data:: SEMGROUPS

   
   Valid UMLS semgroups
















   ..
       !! processed by numpydoc !!

.. py:class:: UMLSEntry

   
   Entry in MRCONSO.RRF file of a UMLS dictionary.














   :Attributes:

       **cui** : str
           Unique identifier of the concept designated by the term

       **term** : str
           Original version of the term

       **semtypes** : list of str, optional
           Semantic types of the concept (TUIs)

       **semgroups** : list of str, optional
           Semantic groups of the concept


   ..
       !! processed by numpydoc !!

   .. py:attribute:: cui
      :type:  str


   .. py:attribute:: term
      :type:  str


   .. py:attribute:: semtypes
      :type:  list[str] | None
      :value: None



   .. py:attribute:: semgroups
      :type:  list[str] | None
      :value: None



   .. py:method:: to_dict()


.. py:function:: load_umls_entries(mrconso_file: str | pathlib.Path, mrsty_file: str | pathlib.Path | None = None, sources: list[str] | None = None, languages: list[str] | None = None, show_progress: bool = False) -> Iterator[UMLSEntry]

   
   Load all terms and associated CUIs found in a UMLS MRCONSO.RRF file.


   :Parameters:

       **mrconso_file** : str or Path
           Path to the UMLS MRCONSO.RRF file

       **mrsty_file** : str or Path, optional
           Path to the UMLS MRSTY.RRF file. If provided, semtypes info will be
           included in the entries returned.

       **sources** : list of str, optional
           Sources to consider (ex: ICD10, CCS) If none provided, CUIs and terms
           of all sources will be taken into account.

       **languages** : list of str, optional
           Languages to consider. If none provided, CUIs and terms of all languages
           will be taken into account

       **show_progress** : bool, default=False
           Whether to show a progressbar

   :Returns:

       iterator of UMLSEntry
           Iterator over all term entries found in UMLS install













   ..
       !! processed by numpydoc !!

.. py:function:: preprocess_term_to_match(term: str, lowercase: bool, normalize_unicode: bool, clean_nos: bool = True, clean_brackets: bool = False, clean_dashes: bool = False)

   
   Preprocess a UMLS term for matching purposes.


   :Parameters:

       **term: str**
           Term to preprocess

       **lowercase** : bool
           Whether `term` should be lowercased

       **normalize_unicode** : bool
           Whether `term_to_match` should be ASCII-only (non-ASCII chars replaced by closest ASCII chars)

       **clean_nos** : bool, default=True
           Whether to remove "NOS"

       **clean_brackets** : bool, default=False
           Whether to remove brackets

       **clean_dashes** : bool, default=False
           Whether to remove dashes














   ..
       !! processed by numpydoc !!

.. py:function:: preprocess_acronym(term: str) -> str | None

   
   Detect if a term contains an acronym with the expanded form between parenthesis.

   Eventually return the acronym if any is detected.

   This will work for terms such as: "ECG (ÉlectroCardioGramme)", where the
   acronym can be rebuilt by taking the ASCII version of each uppercase
   letter inside the parenthesis.

   :Parameters:

       **term** : str
           Term that may contain an acronym. Ex: "ECG (ÉlectroCardioGramme)"

   :Returns:

       str, optional
           The acronym in the term if any, else `None`. Ex: "ECG"













   ..
       !! processed by numpydoc !!

.. py:function:: guess_umls_version(path: str | pathlib.Path) -> str

   
   Try to infer UMLS version (ex: "2021AB") from any UMLS-related path.


   :Parameters:

       **path** : str or Path
           Path to the root directory of the UMLS install or any file inside that directory

   :Returns:

       str
           UMLS version, estimated by finding the leaf-most folder in `path` that is not
           "META", "NET" nor "LEX", nor a subfolder of these folders













   ..
       !! processed by numpydoc !!

