medkit.core.text.document
medkit.core.text.document#
Classes:
|
Document holding text annotations |
- class TextDocument(text, anns=None, metadata=None, uid=None)[source]#
Document holding text annotations
Annotations must be subclasses of TextAnnotation.
- Variables
uid (str) – Unique identifier of the document.
text – Full document text.
anns (medkit.core.text.annotation_container.TextAnnotationContainer) – Annotations of the document. Stored in an
TextAnnotationContainerbut can be passed as a list at init.metadata (Dict[str, Any]) – Document metadata.
raw_segment (medkit.core.text.annotation.Segment) –
Auto-generated segment containing the full unprocessed document text. To get the raw text as an annotation to pass to processing operations:
>>> doc = TextDocument(text="hello") >>> raw_text = doc.anns.get(label=TextDocument.RAW_LABEL)[0]
Methods:
from_dict(doc_dict)Creates a TextDocument from a dict
get_snippet(segment, max_extend_length)Return a portion of the original text containing the annotation
- classmethod from_dict(doc_dict)[source]#
Creates a TextDocument from a dict
- Parameters
doc_dict (dict) – A dictionary from a serialized TextDocument as generated by to_dict()
- Return type
Self
- get_snippet(segment, max_extend_length)[source]#
Return a portion of the original text containing the annotation
- Parameters
segment (
Segment) – The annotationmax_extend_length (
int) – Maximum number of characters to use around the annotation
- Return type
str- Returns
str – A portion of the text around the annotation