medkit.text.postprocessing#
Classes:
|
Annotator to copy attributes from a source segment to its nested segments. |
|
Split text documents using its segments as a reference. |
Functions:
|
Return source segments aligned with its nested segments. |
|
Filter a list of entities and remove overlaps. |
- class AttributeDuplicator(attr_labels, uid=None)[source]#
Annotator to copy attributes from a source segment to its nested segments. For each attribute to be duplicated, a new attribute is created in the nested segment
Instantiate the attribute duplicator
- Parameters:
attr_labels (list of str) – Labels of the attributes to copy
uid (str, optional) – Identifier of the annotator
Methods:
run(source_segments, target_segments)Add attributes from source segments to all nested segments.
set_prov_tracer(prov_tracer)Enable provenance tracing.
Attributes:
Contains all the operation init parameters.
- run(source_segments, target_segments)[source]#
Add attributes from source segments to all nested segments. The nested segments are chosen among the target_segments based on their spans.
- property description: OperationDescription#
Contains all the operation init parameters.
- Return type:
- set_prov_tracer(prov_tracer)#
Enable provenance tracing.
- Parameters:
prov_tracer (ProvTracer) – The provenance tracer used to trace the provenance.
- compute_nested_segments(source_segments, target_segments)[source]#
Return source segments aligned with its nested segments. Only nested segments fully contained in the source_segments are returned.
- class DocumentSplitter(segment_label, entity_labels=None, attr_labels=None, relation_labels=None, name=None, uid=None)[source]#
Split text documents using its segments as a reference.
The resulting ‘mini-documents’ contain the entities belonging to each segment along with their attributes.
This operation can be used to create datasets from medkit text documents.
Instantiate the document splitter
- Parameters:
segment_label (str) – Label of the segments to use as references for the splitter
entity_labels (list of str, optional) – Labels of entities to be included in the mini documents. If None, all entities from the document will be included.
attr_labels (list of str, optional) – Labels of the attributes to be included into the new annotations. If None, all attributes will be included.
relation_labels (list of str, optional) – Labels of relations to be included in the mini documents. If None, all relations will be included.
name (str, optional) – Name describing the splitter (default to the class name).
uid (str, Optional) – Identifier of the operation
Methods:
run(docs)Split docs into mini documents
set_prov_tracer(prov_tracer)Enable provenance tracing.
Attributes:
Contains all the operation init parameters.
- run(docs)[source]#
Split docs into mini documents
- Parameters:
docs (list of TextDocument) – List of text documents to split
- Return type:
list[TextDocument]- Returns:
list of TextDocument – List of documents created from the selected segments
- property description: OperationDescription#
Contains all the operation init parameters.
- Return type:
- set_prov_tracer(prov_tracer)#
Enable provenance tracing.
- Parameters:
prov_tracer (ProvTracer) – The provenance tracer used to trace the provenance.