medkit.core.text

Contents

medkit.core.text#

APIs#

For accessing these APIs, you may use import like this:

from medkit.core.text import <api_to_import>

Classes:

AnySpan()

ContextOperation([uid, name])

Abstract operation for context detection.

CustomTextOpType(value[, names, module, ...])

Supported function types for creating custom text operations.

Entity(label, text, spans[, attrs, ...])

Text entity referencing part of an TextDocument.

EntityAttributeContainer(owner_id)

Manage a list of attributes attached to a text entity.

EntityNormAttribute(kb_name, kb_id[, ...])

Normalization attribute linking an entity to an ID in a knowledge base

ModifiedSpan(length, replaced_spans)

Slice of text not present in the original text

NEROperation([uid, name])

Abstract operation for detecting entities.

Relation(label, source_id, target_id[, ...])

Relation between two text entities.

Segment(label, text, spans[, attrs, ...])

Text segment referencing part of an TextDocument.

SegmentationOperation([uid, name])

Abstract operation for segmenting text.

Span(start, end)

Slice of text extracted from the original text

TextAnnotation(label[, attrs, metadata, ...])

Base abstract class for all text annotations

TextAnnotationContainer(doc_id, raw_segment)

Manage a list of text annotations belonging to a text document.

TextDocument(text[, anns, attrs, metadata, uid])

Document holding text annotations

UMLSNormAttribute(cui, umls_version[, term, ...])

Normalization attribute linking an entity to a CUI in the UMLS knowledge base

Functions:

create_text_operation(function, function_type)

Function for instantiating a custom test operation from a user-defined function

class TextAnnotation(label, attrs=None, metadata=None, uid=None, attr_container_class=<class 'AttributeContainer'>)[source]#

Base abstract class for all text annotations

Variables:
  • uid (str) – Unique identifier of the annotation.

  • label (str) – The label for this annotation (e.g., SENTENCE)

  • attrs (AttributeContainer) – Attributes of the annotation. Stored in a :class:{~medkit.core.AttributeContainer} but can be passed as a list at init.

  • metadata (dict of str to Any) – The metadata of the annotation

  • keys (set of str) – Pipeline output keys to which the annotation belongs to.

Methods:

get_subclass_for_data_dict(data_dict)

Return the subclass that corresponds to the class name found in a data dict

classmethod get_subclass_for_data_dict(data_dict)#

Return the subclass that corresponds to the class name found in a data dict

Parameters:

data_dict (dict of str to Any) – Data dict returned by the to_dict() method of a subclass (or of the base class itself)

Return type:

Optional[type[Self]]

Returns:

subclass – Subclass that generated data_dict, or None if data_dict correspond to the base class itself.

class Segment(label, text, spans, attrs=None, metadata=None, uid=None, store=None, attr_container_class=<class 'AttributeContainer'>)[source]#

Text segment referencing part of an TextDocument.

Variables:
  • uid (str) – The segment identifier.

  • label (str) – The label for this segment (e.g., SENTENCE)

  • text (str) – Text of the segment.

  • spans (list of AnySpan) – List of spans indicating which parts of the segment text correspond to which part of the document’s full text.

  • attrs (AttributeContainer) – Attributes of the segment. Stored in a :class:{~medkit.core.AttributeContainer} but can be passed as a list at init.

  • metadata (dict of str to Any) – The metadata of the segment

  • keys (set of str) – Pipeline output keys to which the segment belongs to.

Methods:

from_dict(segment_dict)

Creates a Segment from a dict

get_subclass_for_data_dict(data_dict)

Return the subclass that corresponds to the class name found in a data dict

classmethod from_dict(segment_dict)[source]#

Creates a Segment from a dict

Parameters:

segment_dict (dict of str to Any) – A dictionary from a serialized segment as generated by to_dict()

Return type:

Self

classmethod get_subclass_for_data_dict(data_dict)#

Return the subclass that corresponds to the class name found in a data dict

Parameters:

data_dict (dict of str to Any) – Data dict returned by the to_dict() method of a subclass (or of the base class itself)

Return type:

Optional[type[Self]]

Returns:

subclass – Subclass that generated data_dict, or None if data_dict correspond to the base class itself.

class Entity(label, text, spans, attrs=None, metadata=None, uid=None, store=None, attr_container_class=<class 'EntityAttributeContainer'>)[source]#

Text entity referencing part of an TextDocument.

Variables:
  • uid (str) – The entity identifier.

  • label (str) – The label for this entity (e.g., DISEASE)

  • text (str) – Text of the entity.

  • spans (list of AnySpan) – List of spans indicating which parts of the entity text correspond to which part of the document’s full text.

  • attrs (EntityAttributeContainer) – Attributes of the entity. Stored in a :class:{~medkit.core.EntityAttributeContainer} but can be passed as a list at init.

  • metadata (dict of str to Any) – The metadata of the entity

  • keys (set of str) – Pipeline output keys to which the entity belongs to.

Methods:

from_dict(segment_dict)

Creates a Segment from a dict

get_subclass_for_data_dict(data_dict)

Return the subclass that corresponds to the class name found in a data dict

classmethod from_dict(segment_dict)#

Creates a Segment from a dict

Parameters:

segment_dict (dict of str to Any) – A dictionary from a serialized segment as generated by to_dict()

Return type:

Self

classmethod get_subclass_for_data_dict(data_dict)#

Return the subclass that corresponds to the class name found in a data dict

Parameters:

data_dict (dict of str to Any) – Data dict returned by the to_dict() method of a subclass (or of the base class itself)

Return type:

Optional[type[Self]]

Returns:

subclass – Subclass that generated data_dict, or None if data_dict correspond to the base class itself.

class Relation(label, source_id, target_id, attrs=None, metadata=None, uid=None, store=None, attr_container_class=<class 'AttributeContainer'>)[source]#

Relation between two text entities.

Variables:
  • uid (str) – The identifier of the relation

  • label (str) – The relation label

  • source_id (str) – The identifier of the entity from which the relation is defined

  • target_id (str) – The identifier of the entity to which the relation is defined

  • attrs (AttributeContainer) – The attributes of the relation

  • metadata (dict of str to Any) – The metadata of the relation

  • keys (set of str) – Pipeline output keys to which the relation belongs to

Methods:

from_dict(relation_dict)

Creates a Relation from a dict

get_subclass_for_data_dict(data_dict)

Return the subclass that corresponds to the class name found in a data dict

classmethod get_subclass_for_data_dict(data_dict)#

Return the subclass that corresponds to the class name found in a data dict

Parameters:

data_dict (dict of str to Any) – Data dict returned by the to_dict() method of a subclass (or of the base class itself)

Return type:

Optional[type[Self]]

Returns:

subclass – Subclass that generated data_dict, or None if data_dict correspond to the base class itself.

classmethod from_dict(relation_dict)[source]#

Creates a Relation from a dict

Parameters:

relation_dict (dict of str to Any) – A dictionary from a serialized relation as generated by to_dict()

Return type:

Self

class TextAnnotationContainer(doc_id, raw_segment)[source]#

Manage a list of text annotations belonging to a text document.

This behaves more or less like a list: calling len() and iterating are supported. Additional filtering is available through the get() method.

Also provides retrieval of entities, segments, relations, and handling of raw segment.

Instantiate the annotation container

Parameters:

doc_id (str) – The identifier of the document which annotations belong to.

Attributes:

entities

Return the list of entities

relations

Return the list of relations

segments

Return the list of segments

Methods:

add(ann)

Attach an annotation to the document.

get(*[, label, key])

Return a list of the annotations of the document, optionally filtering by label or key.

get_by_id(uid)

Return the annotation corresponding to a specific identifier.

get_entities(*[, label, key])

Return a list of the entities of the document, optionally filtering by label or key.

get_ids(*[, label, key])

Return an iterator of the identifiers of the annotations of the document, optionally filtering by label or key.

get_relations(*[, label, key, source_id])

Return a list of the relations of the document, optionally filtering by label, key or source entity.

get_segments(*[, label, key])

Return a list of the segments of the document (not including entities), optionally filtering by label or key.

property segments: list[Segment]#

Return the list of segments

Return type:

list[Segment]

property entities: list[Entity]#

Return the list of entities

Return type:

list[Entity]

property relations: list[Relation]#

Return the list of relations

Return type:

list[Relation]

add(ann)[source]#

Attach an annotation to the document.

Parameters:

ann (AnnotationType) – Annotation to add.

Raises:

ValueError – If the annotation is already attached to the document (based on annotation.uid)

get(*, label=None, key=None)[source]#

Return a list of the annotations of the document, optionally filtering by label or key.

Parameters:
  • label (str, optional) – Label to use to filter annotations.

  • key (str, optional) – Key to use to filter annotations.

Return type:

list[TextAnnotation]

get_by_id(uid)[source]#

Return the annotation corresponding to a specific identifier.

Parameters:

uid (str) – Identifier of the annotation to return.

Return type:

TextAnnotation

get_segments(*, label=None, key=None)[source]#

Return a list of the segments of the document (not including entities), optionally filtering by label or key.

Parameters:
  • label (str, optional) – Label to use to filter segments.

  • key (str, optional) – Key to use to filter segments.

Return type:

list[Segment]

get_entities(*, label=None, key=None)[source]#

Return a list of the entities of the document, optionally filtering by label or key.

Parameters:
  • label (str, optional) – Label to use to filter entities.

  • key (str, optional) – Key to use to filter entities.

Return type:

list[Entity]

get_relations(*, label=None, key=None, source_id=None)[source]#

Return a list of the relations of the document, optionally filtering by label, key or source entity.

Parameters:
  • label (str, optional) – Label to use to filter relations.

  • key (str, optional) – Key to use to filter relations.

  • source_id (str, optional) – Identifier of the source entity to use to filter relations.

Return type:

list[Relation]

get_ids(*, label=None, key=None)#

Return an iterator of the identifiers of the annotations of the document, optionally filtering by label or key.

This method is provided, so it is easier to implement additional filtering in subclasses.

Parameters:
  • label (str, optional) – Label to use to filter annotations.

  • key (str, optional) – Key to use to filter annotations.

Return type:

Iterator[str]

class TextDocument(text, anns=None, attrs=None, metadata=None, uid=None)[source]#

Document holding text annotations

Annotations must be subclasses of TextAnnotation.

Variables:
  • uid (str) – Unique identifier of the document.

  • text (str) – Full document text.

  • anns (TextAnnotationContainer) – Annotations of the document. Stored in an TextAnnotationContainer but can be passed as a list at init.

  • attrs (AttributeContainer) – Attributes of the document. Stored in an AttributeContainer but can be passed as a list at init

  • metadata (dict of str to Any) – Document metadata.

  • raw_segment (Segment) – Auto-generated segment containing the full unprocessed document text. To get the raw text as an annotation to pass to processing operations:

Examples

>>> doc = TextDocument(text="hello")
>>> raw_text = doc.anns.get(label=TextDocument.RAW_LABEL)[0]

Methods:

from_dict(doc_dict)

Creates a TextDocument from a dict

from_dir(path[, pattern, encoding])

Create documents from text files in a directory

from_file(path[, encoding])

Create a document from a text file

get_snippet(segment, max_extend_length)

Return a portion of the original text containing the annotation

get_subclass_for_data_dict(data_dict)

Return the subclass that corresponds to the class name found in a data dict

classmethod get_subclass_for_data_dict(data_dict)#

Return the subclass that corresponds to the class name found in a data dict

Parameters:

data_dict (dict of str to Any) – Data dict returned by the to_dict() method of a subclass (or of the base class itself)

Return type:

Optional[type[Self]]

Returns:

subclass – Subclass that generated data_dict, or None if data_dict correspond to the base class itself.

classmethod from_dict(doc_dict)[source]#

Creates a TextDocument from a dict

Parameters:

doc_dict (dict of str to Any) – A dictionary from a serialized TextDocument as generated by to_dict()

Return type:

Self

classmethod from_file(path, encoding='utf-8')[source]#

Create a document from a text file

Parameters:
  • path (Path) – Path of the text file

  • encoding (str, default="utf-8") – Text encoding to use

Return type:

Self

Returns:

TextDocument – Text document with contents of path as text. The file path is included in the document metadata.

classmethod from_dir(path, pattern='*.txt', encoding='utf-8')[source]#

Create documents from text files in a directory

Parameters:
  • path (Path) – Path of the directory containing text files

  • pattern (str) – Glob pattern to match text files in path

  • encoding (str) – Text encoding to use

Return type:

list[Self]

Returns:

list of TextDocument – Text documents with contents of each file as text

get_snippet(segment, max_extend_length)[source]#

Return a portion of the original text containing the annotation

Parameters:
  • segment (Segment) – The annotation

  • max_extend_length (int) – Maximum number of characters to use around the annotation

Return type:

str

Returns:

str – A portion of the text around the annotation

class EntityAttributeContainer(owner_id)[source]#

Manage a list of attributes attached to a text entity.

This behaves more or less like a list: calling len() and iterating are supported. Additional filtering is available through the get() method.

Also provides retrieval of normalization attributes.

Attributes:

norms

Return the list of normalization attributes

Methods:

add(attr)

Attach an attribute to the annotation.

get(*[, label])

Return a list of the attributes of the annotation, optionally filtering by label.

get_by_id(uid)

Return the attribute corresponding to a specific identifier.

get_norms()

Return a list of the normalization attributes of the annotation

property norms: list[EntityNormAttribute]#

Return the list of normalization attributes

Return type:

list[EntityNormAttribute]

add(attr)[source]#

Attach an attribute to the annotation.

Parameters:

attr (Attribute) – Attribute to add.

Raises:

ValueError – If the attribute is already attached to the annotation (based on attr.uid).

get_norms()[source]#

Return a list of the normalization attributes of the annotation

Return type:

list[EntityNormAttribute]

get(*, label=None)#

Return a list of the attributes of the annotation, optionally filtering by label.

Parameters:

label (str, optional) – Label to use to filter attributes.

Return type:

list[Attribute]

Returns:

list of Attribute – The list of all attributes of the annotation, filtered by label if specified.

get_by_id(uid)#

Return the attribute corresponding to a specific identifier.

Parameters:

uid (str) – Identifier of the attribute to return.

Return type:

Attribute

Returns:

Attribute – The attribute corresponding to the identifier

class EntityNormAttribute(kb_name, kb_id, kb_version=None, term=None, score=None, metadata=None, uid=None)[source]#

Normalization attribute linking an entity to an ID in a knowledge base

Variables:
  • uid (str) – Identifier of the attribute

  • label (str) – The attribute label, always set to EntityNormAttribute.LABEL

  • value (Any) – String representation of the normalization, containing kb_id, along with kb_name if available (ex: “umls:C0011849”). For special cases where only term is available, it is used as value.

  • kb_name (str, optional) – Name of the knowledge base (ex: “icd”). Should always be provided except in special cases when we just want to store a normalized term.

  • kb_id (Any, optional) – ID in the knowledge base to which the annotation should be linked. Should always be provided except in special cases when we just want to store a normalized term.

  • kb_version (str, optional) – Optional version of the knowledge base.

  • term (str, optional) – Optional normalized version of the entity text.

  • score (float, optional) – Optional score reflecting confidence of this link.

  • metadata (dict of str to Any) – Metadata of the attribute

Attributes:

LABEL

Label used for all normalization attributes

Methods:

copy()

Create a new attribute that is a copy of the current instance, but with a new identifier

from_dict(data_dict)

Creates an Attribute from a dict

get_subclass_for_data_dict(data_dict)

Return the subclass that corresponds to the class name found in a data dict

to_brat()

Return a value compatible with the brat format

to_spacy()

Return a value compatible with spaCy

LABEL: typing.ClassVar[str] = 'NORMALIZATION'#

Label used for all normalization attributes

copy()#

Create a new attribute that is a copy of the current instance, but with a new identifier

This is used when we want to duplicate an existing attribute onto a different annotation.

Return type:

Attribute

classmethod get_subclass_for_data_dict(data_dict)#

Return the subclass that corresponds to the class name found in a data dict

Parameters:

data_dict (dict of str to Any) – Data dict returned by the to_dict() method of a subclass (or of the base class itself)

Return type:

Optional[type[Self]]

Returns:

subclass – Subclass that generated data_dict, or None if data_dict correspond to the base class itself.

to_brat()[source]#

Return a value compatible with the brat format

Return type:

str

to_spacy()[source]#

Return a value compatible with spaCy

Return type:

str

classmethod from_dict(data_dict)[source]#

Creates an Attribute from a dict

Parameters:

attribute_dict (dict of str to Any) – A dictionary from a serialized Attribute as generated by to_dict()

Return type:

Self

class ContextOperation(uid=None, name=None, **kwargs)[source]#

Abstract operation for context detection. It uses a list of segments as input for running the operation and creates attributes that are directly appended to these segments.

Common initialization for all annotators:
  • assigning identifier to operation

  • storing class name, name and config in description

Parameters:
  • uid (str, optional) – Operation identifier

  • name (str, optional) – Operation name (defaults to class name)

  • kwargs – All other arguments of the child init useful to describe the operation

Examples

In the __init__ function of your annotator, use:

>>> init_args = locals()
>>> init_args.pop("self")
>>> super().__init__(**init_args)

Attributes:

description

Contains all the operation init parameters.

Methods:

set_prov_tracer(prov_tracer)

Enable provenance tracing.

property description: OperationDescription#

Contains all the operation init parameters.

Return type:

OperationDescription

set_prov_tracer(prov_tracer)#

Enable provenance tracing.

Parameters:

prov_tracer (ProvTracer) – The provenance tracer used to trace the provenance.

class NEROperation(uid=None, name=None, **kwargs)[source]#

Abstract operation for detecting entities. It uses a list of segments as input and produces a list of detected entities.

Common initialization for all annotators:
  • assigning identifier to operation

  • storing class name, name and config in description

Parameters:
  • uid (str, optional) – Operation identifier

  • name (str, optional) – Operation name (defaults to class name)

  • kwargs – All other arguments of the child init useful to describe the operation

Examples

In the __init__ function of your annotator, use:

>>> init_args = locals()
>>> init_args.pop("self")
>>> super().__init__(**init_args)

Attributes:

description

Contains all the operation init parameters.

Methods:

set_prov_tracer(prov_tracer)

Enable provenance tracing.

property description: OperationDescription#

Contains all the operation init parameters.

Return type:

OperationDescription

set_prov_tracer(prov_tracer)#

Enable provenance tracing.

Parameters:

prov_tracer (ProvTracer) – The provenance tracer used to trace the provenance.

class SegmentationOperation(uid=None, name=None, **kwargs)[source]#

Abstract operation for segmenting text. It uses a list of segments as input and produces a list of new segments.

Common initialization for all annotators:
  • assigning identifier to operation

  • storing class name, name and config in description

Parameters:
  • uid (str, optional) – Operation identifier

  • name (str, optional) – Operation name (defaults to class name)

  • kwargs – All other arguments of the child init useful to describe the operation

Examples

In the __init__ function of your annotator, use:

>>> init_args = locals()
>>> init_args.pop("self")
>>> super().__init__(**init_args)

Attributes:

description

Contains all the operation init parameters.

Methods:

set_prov_tracer(prov_tracer)

Enable provenance tracing.

property description: OperationDescription#

Contains all the operation init parameters.

Return type:

OperationDescription

set_prov_tracer(prov_tracer)#

Enable provenance tracing.

Parameters:

prov_tracer (ProvTracer) – The provenance tracer used to trace the provenance.

class CustomTextOpType(value, names=None, *, module=None, qualname=None, type=None, start=1, boundary=None)[source]#

Supported function types for creating custom text operations.

Attributes:

CREATE_ONE_TO_N

Take 1 data item, return N new data items.

EXTRACT_ONE_TO_N

Take 1 data item, return N existing data items

FILTER

Take 1 data item, return True or False.

denominator

the denominator of a rational number in lowest terms

imag

the imaginary part of a complex number

numerator

the numerator of a rational number in lowest terms

real

the real part of a complex number

Methods:

as_integer_ratio()

Return integer ratio.

bit_count()

Number of ones in the binary representation of the absolute value of self.

bit_length()

Number of bits necessary to represent self in binary.

conjugate

Returns self, the complex conjugate of any int.

from_bytes([byteorder, signed])

Return the integer represented by the given array of bytes.

to_bytes([length, byteorder, signed])

Return an array of bytes representing an integer.

CREATE_ONE_TO_N = 1#

Take 1 data item, return N new data items.

EXTRACT_ONE_TO_N = 2#

Take 1 data item, return N existing data items

FILTER = 3#

Take 1 data item, return True or False.

as_integer_ratio()#

Return integer ratio.

Return a pair of integers, whose ratio is exactly equal to the original int and with a positive denominator.

>>> (10).as_integer_ratio()
(10, 1)
>>> (-10).as_integer_ratio()
(-10, 1)
>>> (0).as_integer_ratio()
(0, 1)
bit_count()#

Number of ones in the binary representation of the absolute value of self.

Also known as the population count.

>>> bin(13)
'0b1101'
>>> (13).bit_count()
3
bit_length()#

Number of bits necessary to represent self in binary.

>>> bin(37)
'0b100101'
>>> (37).bit_length()
6
conjugate()#

Returns self, the complex conjugate of any int.

denominator#

the denominator of a rational number in lowest terms

from_bytes(byteorder='big', *, signed=False)#

Return the integer represented by the given array of bytes.

bytes

Holds the array of bytes to convert. The argument must either support the buffer protocol or be an iterable object producing bytes. Bytes and bytearray are examples of built-in objects that support the buffer protocol.

byteorder

The byte order used to represent the integer. If byteorder is ‘big’, the most significant byte is at the beginning of the byte array. If byteorder is ‘little’, the most significant byte is at the end of the byte array. To request the native byte order of the host system, use `sys.byteorder’ as the byte order value. Default is to use ‘big’.

signed

Indicates whether two’s complement is used to represent the integer.

imag#

the imaginary part of a complex number

numerator#

the numerator of a rational number in lowest terms

real#

the real part of a complex number

to_bytes(length=1, byteorder='big', *, signed=False)#

Return an array of bytes representing an integer.

length

Length of bytes object to use. An OverflowError is raised if the integer is not representable with the given number of bytes. Default is length 1.

byteorder

The byte order used to represent the integer. If byteorder is ‘big’, the most significant byte is at the beginning of the byte array. If byteorder is ‘little’, the most significant byte is at the end of the byte array. To request the native byte order of the host system, use `sys.byteorder’ as the byte order value. Default is to use ‘big’.

signed

Determines whether two’s complement is used to represent the integer. If signed is False and a negative integer is given, an OverflowError is raised.

create_text_operation(function, function_type, name=None, args=None)[source]#

Function for instantiating a custom test operation from a user-defined function

Parameters:
  • function (Callable) – User-defined function

  • function_type (CustomTextOpType) – Type of function. Supported values are defined in CustomTextOpType

  • name (str, optional) – Name of the operation used for provenance info (default: function name)

  • args (str, optional) – Dictionary containing the arguments of the function if any.

Return type:

_CustomTextOperation

Returns:

_CustomTextOperation – An instance of a custom text operation

class Span(start, end)[source]#

Slice of text extracted from the original text

Parameters:
  • start (int) – Index of the first character in the original text

  • end (int) – Index of the last character in the original text, plus one

Methods:

from_dict(span_dict)

Creates a Span from a dict

get_subclass_for_data_dict(data_dict)

Return the subclass that corresponds to the class name found in a data dict

overlaps(other)

Test if 2 spans reference at least one character in common

overlaps(other)[source]#

Test if 2 spans reference at least one character in common

classmethod from_dict(span_dict)[source]#

Creates a Span from a dict

Parameters:

span_dict (dict) – A dictionary from a serialized span as generated by to_dict()

Return type:

Self

classmethod get_subclass_for_data_dict(data_dict)#

Return the subclass that corresponds to the class name found in a data dict

Parameters:

data_dict (dict of str to Any) – Data dict returned by the to_dict() method of a subclass (or of the base class itself)

Return type:

Optional[type[Self]]

Returns:

subclass – Subclass that generated data_dict, or None if data_dict correspond to the base class itself.

class ModifiedSpan(length, replaced_spans)[source]#

Slice of text not present in the original text

Parameters:
  • length (int) – Number of characters

  • replaced_spans (list of Span) – Slices of the original text that this span is replacing

Methods:

from_dict(modified_span_dict)

Creates a Modified from a dict

get_subclass_for_data_dict(data_dict)

Return the subclass that corresponds to the class name found in a data dict

classmethod from_dict(modified_span_dict)[source]#

Creates a Modified from a dict

Parameters:

modified_span_dict (dict of str to Any) – A dictionary from a serialized ModifiedSpan as generated by to_dict()

Return type:

Self

classmethod get_subclass_for_data_dict(data_dict)#

Return the subclass that corresponds to the class name found in a data dict

Parameters:

data_dict (dict of str to Any) – Data dict returned by the to_dict() method of a subclass (or of the base class itself)

Return type:

Optional[type[Self]]

Returns:

subclass – Subclass that generated data_dict, or None if data_dict correspond to the base class itself.

class AnySpan[source]#

Methods:

get_subclass_for_data_dict(data_dict)

Return the subclass that corresponds to the class name found in a data dict

classmethod get_subclass_for_data_dict(data_dict)#

Return the subclass that corresponds to the class name found in a data dict

Parameters:

data_dict (dict of str to Any) – Data dict returned by the to_dict() method of a subclass (or of the base class itself)

Return type:

Optional[type[Self]]

Returns:

subclass – Subclass that generated data_dict, or None if data_dict correspond to the base class itself.

class UMLSNormAttribute(cui, umls_version, term=None, score=None, sem_types=None, metadata=None, uid=None)[source]#

Normalization attribute linking an entity to a CUI in the UMLS knowledge base

Variables:
  • uid (str) – Identifier of the attribute

  • label (str) – The attribute label, always set to EntityNormAttribute.LABEL

  • value (Any) – CUI prefixed with “umls:” (ex: “umls:C0011849”)

  • kb_name (str, optional) – Name of the knowledge base. Always “umls”

  • kb_id (Any, optional) – CUI (Concept Unique Identifier) to which the annotation should be linked

  • cui (str) – Convenience alias of kb_id

  • kb_version (str, optional) – Version of the UMLS database (ex: “202AB”)

  • umls_version (str) – Convenience alias of kb_version

  • term (str, optional) – Optional normalized version of the entity text

  • score (float, optional) – Optional score reflecting confidence of this link

  • sem_types (list of str, optional) – Optional IDs of semantic types of the CUI (ex: [“T047”])

  • metadata (dict of str to Any) – Metadata of the attribute

Attributes:

LABEL

Label used for all normalization attributes

Methods:

copy()

Create a new attribute that is a copy of the current instance, but with a new identifier

from_dict(data)

Creates an Attribute from a dict

get_subclass_for_data_dict(data_dict)

Return the subclass that corresponds to the class name found in a data dict

to_brat()

Return a value compatible with the brat format

to_spacy()

Return a value compatible with spaCy

LABEL: ClassVar[str] = 'NORMALIZATION'#

Label used for all normalization attributes

copy()#

Create a new attribute that is a copy of the current instance, but with a new identifier

This is used when we want to duplicate an existing attribute onto a different annotation.

Return type:

Attribute

classmethod from_dict(data)[source]#

Creates an Attribute from a dict

Parameters:

attribute_dict (dict of str to Any) – A dictionary from a serialized Attribute as generated by to_dict()

Return type:

Self

classmethod get_subclass_for_data_dict(data_dict)#

Return the subclass that corresponds to the class name found in a data dict

Parameters:

data_dict (dict of str to Any) – Data dict returned by the to_dict() method of a subclass (or of the base class itself)

Return type:

Optional[type[Self]]

Returns:

subclass – Subclass that generated data_dict, or None if data_dict correspond to the base class itself.

to_brat()#

Return a value compatible with the brat format

Return type:

str

to_spacy()#

Return a value compatible with spaCy

Return type:

str

Subpackages / Submodules#

medkit.core.text.utils

medkit.core.text.span_utils