medkit.io._common#

Attributes#

logger

Classes#

Entity

Text entity referencing part of an TextDocument.

Relation

Relation between two text entities.

Segment

Text segment referencing part of an TextDocument.

TextAnnotation

Base abstract class for all text annotations.

TextDocument

Document holding text annotations.

Functions#

get_anns_by_type(→ dict[str, ...)

Filter annotations by labels and return a dictionary by type of annotation.

Module Contents#

class medkit.io._common.Entity(label: str, text: str, spans: list[medkit.core.text.span.AnySpan], attrs: list[medkit.core.attribute.Attribute] | None = None, metadata: dict[str, Any] | None = None, uid: str | None = None, store: medkit.core.store.Store | None = None, attr_container_class: type[medkit.core.text.entity_attribute_container.EntityAttributeContainer] = EntityAttributeContainer)#

Bases: Segment

Text entity referencing part of an TextDocument.

Attributes:
uidstr

The entity identifier.

labelstr

The label for this entity (e.g., DISEASE)

textstr

Text of the entity.

spanslist of AnySpan

List of spans indicating which parts of the entity text correspond to which part of the document’s full text.

attrsEntityAttributeContainer

Attributes of the entity. Stored in a :class:{~medkit.core.EntityAttributeContainer} but can be passed as a list at init.

metadatadict of str to Any

The metadata of the entity

keysset of str

Pipeline output keys to which the entity belongs to.

attrs: medkit.core.text.entity_attribute_container.EntityAttributeContainer#
class medkit.io._common.Relation(label: str, source_id: str, target_id: str, attrs: list[medkit.core.attribute.Attribute] | None = None, metadata: dict[str, Any] | None = None, uid: str | None = None, store: medkit.core.store.Store | None = None, attr_container_class: type[medkit.core.attribute_container.AttributeContainer] = AttributeContainer)#

Bases: TextAnnotation

Relation between two text entities.

Attributes:
uidstr

The identifier of the relation

labelstr

The relation label

source_idstr

The identifier of the entity from which the relation is defined

target_idstr

The identifier of the entity to which the relation is defined

attrsAttributeContainer

The attributes of the relation

metadatadict of str to Any

The metadata of the relation

keysset of str

Pipeline output keys to which the relation belongs to

source_id: str#
target_id: str#
to_dict() dict[str, Any]#
classmethod from_dict(relation_dict: dict[str, Any]) typing_extensions.Self#

Create a Relation from a dict.

Parameters:
relation_dictdict of str to Any

A dictionary from a serialized relation as generated by to_dict()

class medkit.io._common.Segment(label: str, text: str, spans: list[medkit.core.text.span.AnySpan], attrs: list[medkit.core.attribute.Attribute] | None = None, metadata: dict[str, Any] | None = None, uid: str | None = None, store: medkit.core.store.Store | None = None, attr_container_class: type[medkit.core.attribute_container.AttributeContainer] = AttributeContainer)#

Bases: TextAnnotation

Text segment referencing part of an TextDocument.

Attributes:
uidstr

The segment identifier.

labelstr

The label for this segment (e.g., SENTENCE)

textstr

Text of the segment.

spanslist of AnySpan

List of spans indicating which parts of the segment text correspond to which part of the document’s full text.

attrsAttributeContainer

Attributes of the segment. Stored in a :class:{~medkit.core.AttributeContainer} but can be passed as a list at init.

metadatadict of str to Any

The metadata of the segment

keysset of str

Pipeline output keys to which the segment belongs to.

spans: list[medkit.core.text.span.AnySpan]#
text: str#
to_dict() dict[str, Any]#
classmethod from_dict(segment_dict: dict[str, Any]) typing_extensions.Self#

Create a Segment from a dict.

Parameters:
segment_dictdict of str to Any

A dictionary from a serialized segment as generated by to_dict()

class medkit.io._common.TextAnnotation(label: str, attrs: list[medkit.core.attribute.Attribute] | None = None, metadata: dict[str, Any] | None = None, uid: str | None = None, attr_container_class: type[medkit.core.attribute_container.AttributeContainer] = AttributeContainer)#

Bases: abc.ABC, medkit.core.dict_conv.SubclassMapping

Base abstract class for all text annotations.

Attributes:
uidstr

Unique identifier of the annotation.

labelstr

The label for this annotation (e.g., SENTENCE)

attrsAttributeContainer

Attributes of the annotation. Stored in a :class:{~medkit.core.AttributeContainer} but can be passed as a list at init.

metadatadict of str to Any

The metadata of the annotation

keysset of str

Pipeline output keys to which the annotation belongs to.

uid: str#
label: str#
attrs: medkit.core.attribute_container.AttributeContainer#
metadata: dict[str, Any]#
keys: set[str]#
classmethod __init_subclass__()#
classmethod from_dict(ann_dict: dict[str, Any]) typing_extensions.Self#
abstract to_dict() dict[str, Any]#
class medkit.io._common.TextDocument(text: str, anns: Sequence[medkit.core.text.annotation.TextAnnotation] | None = None, attrs: Sequence[medkit.core.Attribute] | None = None, metadata: dict[str, Any] | None = None, uid: str | None = None)#

Bases: medkit.core.dict_conv.SubclassMapping

Document holding text annotations.

Annotations must be subclasses of TextAnnotation.

Examples

>>> doc = TextDocument(text="hello")
>>> raw_text = doc.anns.get(label=TextDocument.RAW_LABEL)[0]
Attributes:
uidstr

Unique identifier of the document.

textstr

Full document text.

annsTextAnnotationContainer

Annotations of the document. Stored in an TextAnnotationContainer but can be passed as a list at init.

attrsAttributeContainer

Attributes of the document. Stored in an AttributeContainer but can be passed as a list at init

metadatadict of str to Any

Document metadata.

raw_segmentSegment

Auto-generated segment containing the full unprocessed document text. To get the raw text as an annotation to pass to processing operations:

RAW_LABEL: ClassVar[str] = 'RAW_TEXT'#
uid: str#
anns: medkit.core.text.annotation_container.TextAnnotationContainer#
attrs: medkit.core.AttributeContainer#
metadata: dict[str, Any]#
raw_segment: medkit.core.text.annotation.Segment#
classmethod _generate_raw_segment(text: str, doc_id: str) medkit.core.text.annotation.Segment#
property text: str#
classmethod __init_subclass__()#
to_dict(with_anns: bool = True) dict[str, Any]#
classmethod from_dict(doc_dict: dict[str, Any]) typing_extensions.Self#

Create a TextDocument from a dict.

Parameters:
doc_dictdict of str to Any

A dictionary from a serialized TextDocument as generated by to_dict()

classmethod from_file(path: os.PathLike, encoding: str = 'utf-8') typing_extensions.Self#

Create a document from a text file.

Parameters:
pathPath

Path of the text file

encodingstr, default=”utf-8”

Text encoding to use

Returns:
TextDocument

Text document with contents of path as text. The file path is included in the document metadata.

classmethod from_dir(path: os.PathLike, pattern: str = '*.txt', encoding: str = 'utf-8') list[typing_extensions.Self]#

Create documents from text files in a directory.

Parameters:
pathPath

Path of the directory containing text files

patternstr

Glob pattern to match text files in path

encodingstr

Text encoding to use

Returns:
list of TextDocument

Text documents with contents of each file as text

get_snippet(segment: medkit.core.text.annotation.Segment, max_extend_length: int) str#

Return a portion of the original text containing the annotation.

Parameters:
segmentSegment

The annotation

max_extend_lengthint

Maximum number of characters to use around the annotation

Returns:
str

A portion of the text around the annotation

medkit.io._common.logger#
medkit.io._common.get_anns_by_type(medkit_doc: medkit.core.text.TextDocument, anns_labels: list[str] | None = None) dict[str, medkit.core.text.TextAnnotation]#

Filter annotations by labels and return a dictionary by type of annotation.

Parameters:
medkit_docTextDocument

Text document with annotations

anns_labelslist of str, optional

Labels to filter annotations. If not provided, all annotations will be in the dictionary

Returns:
Dict[str, TextAnnotation]

Annotations by type: ‘entities’, ‘relations’, and ‘segments’.