medkit.text.ner.umls_utils
medkit.text.ner.umls_utils#
Classes:
|
Entry in MRCONSO.RRF file of a UMLS dictionary |
Functions:
|
Try to infer UMLS version (ex: "2021AB") from any UMLS-related path |
|
Load all terms and associated CUIs found in a UMLS MRCONSO.RRF file |
|
Preprocess a UMLS term for matching purposes |
- class UMLSEntry(cui, term)[source]#
Entry in MRCONSO.RRF file of a UMLS dictionary
- Variables
cui (str) – Unique identifier of the concept designated by the term
ref_term – Original version of the term
- load_umls(mrconso_file, sources=None, languages=None, show_progress=False)[source]#
Load all terms and associated CUIs found in a UMLS MRCONSO.RRF file
- Parameters
mrconso_file (
Union[str,Path]) – Path to the UMLS MRCONSO.RRF filesources (
Optional[List[str]]) – Sources to consider (ex: ICD10, CCS) If none provided, CUIs and terms of all sources will be taken into account.languages (
Optional[List[str]]) – Languages to consider. If none provided, CUIs and terms of all languages will be taken into accountshow_progress (
bool) – Whether to show a progressbar
- Return type
Iterator[UMLSEntry]- Returns
Iterator[UMLSEntry] – Iterator over all term entries found in UMLS install
- preprocess_term_to_match(term, lowercase, normalize_unicode, clean_nos=True, clean_brackets=True, clean_dashes=True)[source]#
Preprocess a UMLS term for matching purposes
- Parameters
term (str) – Term to preprocess
lowercase (
bool) – Whether term should be lowercasednormalize_unicode (
bool) – Whether term_to_match should be ASCII-only (non-ASCII chars replaced by closest ASCII chars)clean_nos (
bool) – Whether to remove “NOS”clean_brackets (
bool) – Whether to remove bracketsclean_dashes (
bool) – Wehther to remove dashes
- guess_umls_version(path)[source]#
Try to infer UMLS version (ex: “2021AB”) from any UMLS-related path
- Parameters
path (
Union[str,Path]) – Path to the root directory of the UMLS install or any file inside that directory- Return type
str- Returns
UMLS version, estimated by finding the leaf-most folder in path that is not
”META”, “NET” nor “LEX”, nor a subfolder of these folders