medkit.text.ner.umls_utils#

Classes:

UMLSEntry(cui, term)

Entry in MRCONSO.RRF file of a UMLS dictionary

Functions:

guess_umls_version(path)

Try to infer UMLS version (ex: "2021AB") from any UMLS-related path

load_umls(mrconso_file[, sources, ...])

Load all terms and associated CUIs found in a UMLS MRCONSO.RRF file

preprocess_term_to_match(term, lowercase, ...)

Preprocess a UMLS term for matching purposes

class UMLSEntry(cui, term)[source]#

Entry in MRCONSO.RRF file of a UMLS dictionary

Variables
  • cui (str) – Unique identifier of the concept designated by the term

  • ref_term – Original version of the term

load_umls(mrconso_file, sources=None, languages=None, show_progress=False)[source]#

Load all terms and associated CUIs found in a UMLS MRCONSO.RRF file

Parameters
  • mrconso_file (Union[str, Path]) – Path to the UMLS MRCONSO.RRF file

  • sources (Optional[List[str]]) – Sources to consider (ex: ICD10, CCS) If none provided, CUIs and terms of all sources will be taken into account.

  • languages (Optional[List[str]]) – Languages to consider. If none provided, CUIs and terms of all languages will be taken into account

  • show_progress (bool) – Whether to show a progressbar

Return type

Iterator[UMLSEntry]

Returns

Iterator[UMLSEntry] – Iterator over all term entries found in UMLS install

preprocess_term_to_match(term, lowercase, normalize_unicode, clean_nos=True, clean_brackets=True, clean_dashes=True)[source]#

Preprocess a UMLS term for matching purposes

Parameters
  • term (str) – Term to preprocess

  • lowercase (bool) – Whether term should be lowercased

  • normalize_unicode (bool) – Whether term_to_match should be ASCII-only (non-ASCII chars replaced by closest ASCII chars)

  • clean_nos (bool) – Whether to remove “NOS”

  • clean_brackets (bool) – Whether to remove brackets

  • clean_dashes (bool) – Wehther to remove dashes

guess_umls_version(path)[source]#

Try to infer UMLS version (ex: “2021AB”) from any UMLS-related path

Parameters

path (Union[str, Path]) – Path to the root directory of the UMLS install or any file inside that directory

Return type

str

Returns

  • UMLS version, estimated by finding the leaf-most folder in path that is not

  • ”META”, “NET” nor “LEX”, nor a subfolder of these folders