medkit.core.text.span_utils#
Functions:
|
Remove small gaps in normalized spans. |
|
Concatenate text and span objects |
|
Extract parts of a text as well as its associated spans |
|
Insert strings in text, and update accordingly its associated spans |
|
Move part of a text to another position, also moving its associated spans |
|
Return a transformed of spans in which all instances of ModifiedSpan are replaced by the spans they refer to, spans are sorted and contiguous spans are merged. |
|
Remove parts of a text, while also removing accordingly its associated spans |
|
Replace parts of a text, and update accordingly its associated spans |
- replace(text, spans, ranges, replacement_texts)[source]#
Replace parts of a text, and update accordingly its associated spans
- Parameters:
text (str) – The text in which some parts will be replaced
spans (list of AnySpan) – The spans associated with text
ranges (list of tuple of int) – The ranges of the parts that will be replaced (end excluded), sorted by ascending order
replacement_texts (tuple) – The strings to use as replacements (must be the same length as ranges)
- Return type:
tuple[str,list[AnySpan]]- Returns:
text (str) – The updated text
spans (list of AnySpan) – The spans associated with the updated text
Examples
>>> text = "Hello, my name is John Doe." >>> spans = [Span(0, len(text))] >>> ranges = [(0, 5), (18, 22)] >>> replacements = ["Hi", "Jane"] >>> text, spans = replace(text, spans, ranges, replacements) >>> print(text) Hi, my name is Jane Doe.
- remove(text, spans, ranges)[source]#
Remove parts of a text, while also removing accordingly its associated spans
- Parameters:
text (str) – The text in which some parts will be removed
spans (list of AnySpan) – The spans associated with text
ranges (list of tuple of int) – The ranges of the parts that will be removed (end excluded), sorted by ascending order
- Return type:
tuple[str,list[AnySpan]]- Returns:
text (str) – The updated text
spans (list of AnySpan) – The spans associated with the updated text
- extract(text, spans, ranges)[source]#
Extract parts of a text as well as its associated spans
- Parameters:
text (str) – The text to extract parts from
spans (list of AnySpan) – The spans associated with text
ranges (list of tuple of int) – The ranges of the parts to extract (end excluded), sorted by ascending order
- Return type:
tuple[str,list[AnySpan]]- Returns:
text (str) – The extracted text
spans (list of AnySpan) – The spans associated with the extracted text
- insert(text, spans, positions, insertion_texts)[source]#
Insert strings in text, and update accordingly its associated spans
- Parameters:
text (str) – The text in which some strings will be inserted
spans (list of AnySpan) – The spans associated with text
positions (list of int) – The positions where the strings will be inserted, sorted by ascending order
insertion_texts (list of str) – The strings to insert (must be the same length as positions)
- Return type:
tuple[str,list[AnySpan]]- Returns:
text (str) – The updated text
spans (list of AnySpan) – The spans associated with the updated text
Examples
>>> text = "Hello, my name is John Doe." >>> spans = [Span(0, len(text))] >>> positions = [5] >>> inserts = [" everybody"] >>> text, spans = insert(text, spans, positions, inserts) >>> print(text) Hello everybody, my name is John Doe."
- move(text, spans, range, destination)[source]#
Move part of a text to another position, also moving its associated spans
- Parameters:
text (str) – The text in which a part should be moved
spans (list of AnySpan) – The spans associated with the input text
range (tuple of int) – The range of the part to move (end excluded)
destination (int) – The position where to insert the displaced range
- Return type:
tuple[str,list[AnySpan]]- Returns:
text (str) – The updated text
spans (list of AnySpan) – The spans associated with the updated text
Examples
>>> text = "Hello, my name is John Doe." >>> spans = [Span(0, len(text))] >>> range = (17, 22) >>> dest = len(text) - 1 >>> text, spans = move(text, spans, range, dest) >>> print(text) Hi, my name is Doe John.
- normalize_spans(spans)[source]#
Return a transformed of spans in which all instances of ModifiedSpan are replaced by the spans they refer to, spans are sorted and contiguous spans are merged.
- Parameters:
spans (list of AnySpan) – The spans associated with a text, including additional spans if insertions or replacement were performed
- Return type:
list[Span]- Returns:
normalized_spans (list of Span) – Spans in spans normalized as described
Examples
>>> spans = [ ... Span(0, 10), ... Span(20, 30), ... ModifiedSpan(8, replaced_spans=[Span(30, 36)]), ... ] >>> spans = normalize_spans(spans) >>> print(spans) >>> [Span(0, 10), Span(20, 36)]
- concatenate(texts, all_spans)[source]#
Concatenate text and span objects
- Return type:
tuple[str,list[AnySpan]]
- clean_up_gaps_in_normalized_spans(spans, text, max_gap_length=3)[source]#
Remove small gaps in normalized spans.
This is useful for converting non-contiguous entity spans with small gaps containing only whitespace or a few meaningless characters (due to clean-up preprocessing or translation) into one unique bigger span. Gaps having less than max_gap_length will be removed by merging the spans before and after the gap.
- Parameters:
spans (list of Span) – The normalized spans in which to remove gaps
text (str) – The text associated with spans
max_gap_length (int, default=3) – Max number of characters in gaps, after stripping leading and trailing whitespace.
Examples
>>> text = "heart failure" >>> spans = [Span(0, 5), Span(6, 13)] >>> spans = clean_up_gaps_in_normalized_spans(spans, text) >>> print(spans) >>> spans = [Span(0, 13)]