medkit.text.postprocessing#

APIs#

For accessing these APIs, you may use import like this:

from medkit.text.postprocessing import <api_to_import>

Classes:

AttributeDuplicator(attr_labels[, uid])

Annotator to copy attributes from a source segment to its nested segments.

Functions:

compute_nested_segments(source_segments, ...)

Return source segments aligned with its nested segments.

filter_overlapping_entities(entities)

Filter a list of entities and remove overlaps.

class AttributeDuplicator(attr_labels, uid=None)[source]#

Annotator to copy attributes from a source segment to its nested segments. For each attribute to be duplicated, a new attribute is created in the nested segment

Instantiate the attribute duplicator

Parameters
  • attr_labels (List[str]) – Labels of the attributes to copy

  • uid (str) – Identifier of the annotator

Methods:

run(source_segments, target_segments)

Add attributes from source segments to all nested segments.

set_prov_tracer(prov_tracer)

Enable provenance tracing.

Attributes:

description

Contains all the operation init parameters.

run(source_segments, target_segments)[source]#

Add attributes from source segments to all nested segments. The nested segments are chosen among the target_segments based on their spans.

Parameters
  • source_segments (List[Segment]) – List of segments with attributes to copy

  • target_segments (List[Segment]) – List of segments target

property description: medkit.core.operation_desc.OperationDescription#

Contains all the operation init parameters.

Return type

OperationDescription

set_prov_tracer(prov_tracer)#

Enable provenance tracing.

Parameters

prov_tracer (ProvTracer) – The provenance tracer used to trace the provenance.

compute_nested_segments(source_segments, target_segments)[source]#

Return source segments aligned with its nested segments.

Parameters
  • source_segments (List[Segment]) – List of source segments

  • target_segments (List[Segment]) – List of segments to align

Return type

List[Tuple[Segment, List[Segment]]]

Returns

List[Tuple[~medkit.core.text.Segment,List[~medkit.core.text.Segment]]] – List of aligned segments

filter_overlapping_entities(entities)[source]#

Filter a list of entities and remove overlaps. This method may be useful for the creation of data for named entity recognition, where a part of text can only contain one entity per ‘word’. When an overlap is detected, the longest entity is preferred.

Parameters

entities (List[Entity]) – Entities to filter

Return type

List[Entity]

Returns

List[Entity] – Filtered entities

Subpackages / Submodules#

medkit.text.postprocessing.alignment_utils

medkit.text.postprocessing.attribute_duplicator

medkit.text.postprocessing.overlapping