medkit.io._common#
Attributes#
Classes#
Text entity referencing part of an |
|
Relation between two text entities. |
|
Text segment referencing part of an |
|
Base abstract class for all text annotations. |
|
Document holding text annotations. |
Functions#
|
Filter annotations by labels and return a dictionary by type of annotation. |
Module Contents#
- class medkit.io._common.Entity(label: str, text: str, spans: list[medkit.core.text.span.AnySpan], attrs: list[medkit.core.attribute.Attribute] | None = None, metadata: dict[str, Any] | None = None, uid: str | None = None, store: medkit.core.store.Store | None = None, attr_container_class: type[medkit.core.text.entity_attribute_container.EntityAttributeContainer] = EntityAttributeContainer)#
Bases:
SegmentText entity referencing part of an
TextDocument.- Attributes:
- uidstr
The entity identifier.
- labelstr
The label for this entity (e.g., DISEASE)
- textstr
Text of the entity.
- spanslist of AnySpan
List of spans indicating which parts of the entity text correspond to which part of the documentâs full text.
- attrsEntityAttributeContainer
Attributes of the entity. Stored in a :class:{~medkit.core.EntityAttributeContainer} but can be passed as a list at init.
- metadatadict of str to Any
The metadata of the entity
- keysset of str
Pipeline output keys to which the entity belongs to.
- class medkit.io._common.Relation(label: str, source_id: str, target_id: str, attrs: list[medkit.core.attribute.Attribute] | None = None, metadata: dict[str, Any] | None = None, uid: str | None = None, store: medkit.core.store.Store | None = None, attr_container_class: type[medkit.core.attribute_container.AttributeContainer] = AttributeContainer)#
Bases:
TextAnnotationRelation between two text entities.
- Attributes:
- uidstr
The identifier of the relation
- labelstr
The relation label
- source_idstr
The identifier of the entity from which the relation is defined
- target_idstr
The identifier of the entity to which the relation is defined
- attrsAttributeContainer
The attributes of the relation
- metadatadict of str to Any
The metadata of the relation
- keysset of str
Pipeline output keys to which the relation belongs to
- source_id: str#
- target_id: str#
- to_dict() dict[str, Any]#
- classmethod from_dict(relation_dict: dict[str, Any]) typing_extensions.Self#
Create a Relation from a dict.
- Parameters:
- relation_dictdict of str to Any
A dictionary from a serialized relation as generated by to_dict()
- class medkit.io._common.Segment(label: str, text: str, spans: list[medkit.core.text.span.AnySpan], attrs: list[medkit.core.attribute.Attribute] | None = None, metadata: dict[str, Any] | None = None, uid: str | None = None, store: medkit.core.store.Store | None = None, attr_container_class: type[medkit.core.attribute_container.AttributeContainer] = AttributeContainer)#
Bases:
TextAnnotationText segment referencing part of an
TextDocument.- Attributes:
- uidstr
The segment identifier.
- labelstr
The label for this segment (e.g., SENTENCE)
- textstr
Text of the segment.
- spanslist of AnySpan
List of spans indicating which parts of the segment text correspond to which part of the documentâs full text.
- attrsAttributeContainer
Attributes of the segment. Stored in a :class:{~medkit.core.AttributeContainer} but can be passed as a list at init.
- metadatadict of str to Any
The metadata of the segment
- keysset of str
Pipeline output keys to which the segment belongs to.
- spans: list[medkit.core.text.span.AnySpan]#
- text: str#
- to_dict() dict[str, Any]#
- classmethod from_dict(segment_dict: dict[str, Any]) typing_extensions.Self#
Create a Segment from a dict.
- Parameters:
- segment_dictdict of str to Any
A dictionary from a serialized segment as generated by to_dict()
- class medkit.io._common.TextAnnotation(label: str, attrs: list[medkit.core.attribute.Attribute] | None = None, metadata: dict[str, Any] | None = None, uid: str | None = None, attr_container_class: type[medkit.core.attribute_container.AttributeContainer] = AttributeContainer)#
Bases:
abc.ABC,medkit.core.dict_conv.SubclassMappingBase abstract class for all text annotations.
- Attributes:
- uidstr
Unique identifier of the annotation.
- labelstr
The label for this annotation (e.g., SENTENCE)
- attrsAttributeContainer
Attributes of the annotation. Stored in a :class:{~medkit.core.AttributeContainer} but can be passed as a list at init.
- metadatadict of str to Any
The metadata of the annotation
- keysset of str
Pipeline output keys to which the annotation belongs to.
- uid: str#
- label: str#
- metadata: dict[str, Any]#
- keys: set[str]#
- classmethod __init_subclass__()#
- classmethod from_dict(ann_dict: dict[str, Any]) typing_extensions.Self#
- abstract to_dict() dict[str, Any]#
- class medkit.io._common.TextDocument(text: str, anns: Sequence[medkit.core.text.annotation.TextAnnotation] | None = None, attrs: Sequence[medkit.core.Attribute] | None = None, metadata: dict[str, Any] | None = None, uid: str | None = None)#
Bases:
medkit.core.dict_conv.SubclassMappingDocument holding text annotations.
Annotations must be subclasses of TextAnnotation.
Examples
>>> doc = TextDocument(text="hello") >>> raw_text = doc.anns.get(label=TextDocument.RAW_LABEL)[0]
- Attributes:
- uidstr
Unique identifier of the document.
- textstr
Full document text.
- annsTextAnnotationContainer
Annotations of the document. Stored in an
TextAnnotationContainerbut can be passed as a list at init.- attrsAttributeContainer
Attributes of the document. Stored in an
AttributeContainerbut can be passed as a list at init- metadatadict of str to Any
Document metadata.
- raw_segmentSegment
Auto-generated segment containing the full unprocessed document text. To get the raw text as an annotation to pass to processing operations:
- RAW_LABEL: ClassVar[str] = 'RAW_TEXT'#
- uid: str#
- metadata: dict[str, Any]#
- raw_segment: medkit.core.text.annotation.Segment#
- classmethod _generate_raw_segment(text: str, doc_id: str) medkit.core.text.annotation.Segment#
- property text: str#
- classmethod __init_subclass__()#
- to_dict(with_anns: bool = True) dict[str, Any]#
- classmethod from_dict(doc_dict: dict[str, Any]) typing_extensions.Self#
Create a TextDocument from a dict.
- Parameters:
- doc_dictdict of str to Any
A dictionary from a serialized TextDocument as generated by to_dict()
- classmethod from_file(path: os.PathLike, encoding: str = 'utf-8') typing_extensions.Self#
Create a document from a text file.
- Parameters:
- pathPath
Path of the text file
- encodingstr, default=âutf-8â
Text encoding to use
- Returns:
- TextDocument
Text document with contents of path as text. The file path is included in the document metadata.
- classmethod from_dir(path: os.PathLike, pattern: str = '*.txt', encoding: str = 'utf-8') list[typing_extensions.Self]#
Create documents from text files in a directory.
- Parameters:
- pathPath
Path of the directory containing text files
- patternstr
Glob pattern to match text files in path
- encodingstr
Text encoding to use
- Returns:
- list of TextDocument
Text documents with contents of each file as text
- get_snippet(segment: medkit.core.text.annotation.Segment, max_extend_length: int) str#
Return a portion of the original text containing the annotation.
- Parameters:
- segmentSegment
The annotation
- max_extend_lengthint
Maximum number of characters to use around the annotation
- Returns:
- str
A portion of the text around the annotation
- medkit.io._common.logger#
- medkit.io._common.get_anns_by_type(medkit_doc: medkit.core.text.TextDocument, anns_labels: list[str] | None = None) dict[str, medkit.core.text.TextAnnotation]#
Filter annotations by labels and return a dictionary by type of annotation.
- Parameters:
- medkit_docTextDocument
Text document with annotations
- anns_labelslist of str, optional
Labels to filter annotations. If not provided, all annotations will be in the dictionary
- Returns:
- Dict[str, TextAnnotation]
Annotations by type: âentitiesâ, ârelationsâ, and âsegmentsâ.