medkit.core.text.span_utils#

Functions:

clean_up_gaps_in_normalized_spans(spans, text)

Remove small gaps in normalized spans.

concatenate(texts, all_spans)

Concatenate text and span objects

extract(text, spans, ranges)

Extract parts of a text as well as its associated spans

insert(text, spans, positions, insertion_texts)

Insert strings in text, and update accordingly its associated spans

move(text, spans, range, destination)

Move part of a text to another position, also moving its associated spans

normalize_spans(spans)

Return a transformed of spans in which all instances of ModifiedSpan are replaced by the spans they refer to, spans are sorted and contiguous spans are merged.

remove(text, spans, ranges)

Remove parts of a text, while also removing accordingly its associated spans

replace(text, spans, ranges, replacement_texts)

Replace parts of a text, and update accordingly its associated spans

replace(text, spans, ranges, replacement_texts)[source]#

Replace parts of a text, and update accordingly its associated spans

Parameters:
  • text (str) – The text in which some parts will be replaced

  • spans (list of AnySpan) – The spans associated with text

  • ranges (list of tuple of int) – The ranges of the parts that will be replaced (end excluded), sorted by ascending order

  • replacement_texts (tuple) – The strings to use as replacements (must be the same length as ranges)

Return type:

tuple[str, list[AnySpan]]

Returns:

  • text (str) – The updated text

  • spans (list of AnySpan) – The spans associated with the updated text

Examples

>>> text = "Hello, my name is John Doe."
>>> spans = [Span(0, len(text))]
>>> ranges = [(0, 5), (18, 22)]
>>> replacements = ["Hi", "Jane"]
>>> text, spans = replace(text, spans, ranges, replacements)
>>> print(text)
Hi, my name is Jane Doe.
remove(text, spans, ranges)[source]#

Remove parts of a text, while also removing accordingly its associated spans

Parameters:
  • text (str) – The text in which some parts will be removed

  • spans (list of AnySpan) – The spans associated with text

  • ranges (list of tuple of int) – The ranges of the parts that will be removed (end excluded), sorted by ascending order

Return type:

tuple[str, list[AnySpan]]

Returns:

  • text (str) – The updated text

  • spans (list of AnySpan) – The spans associated with the updated text

extract(text, spans, ranges)[source]#

Extract parts of a text as well as its associated spans

Parameters:
  • text (str) – The text to extract parts from

  • spans (list of AnySpan) – The spans associated with text

  • ranges (list of tuple of int) – The ranges of the parts to extract (end excluded), sorted by ascending order

Return type:

tuple[str, list[AnySpan]]

Returns:

  • text (str) – The extracted text

  • spans (list of AnySpan) – The spans associated with the extracted text

insert(text, spans, positions, insertion_texts)[source]#

Insert strings in text, and update accordingly its associated spans

Parameters:
  • text (str) – The text in which some strings will be inserted

  • spans (list of AnySpan) – The spans associated with text

  • positions (list of int) – The positions where the strings will be inserted, sorted by ascending order

  • insertion_texts (list of str) – The strings to insert (must be the same length as positions)

Return type:

tuple[str, list[AnySpan]]

Returns:

  • text (str) – The updated text

  • spans (list of AnySpan) – The spans associated with the updated text

Examples

>>> text = "Hello, my name is John Doe."
>>> spans = [Span(0, len(text))]
>>> positions = [5]
>>> inserts = [" everybody"]
>>> text, spans = insert(text, spans, positions, inserts)
>>> print(text)
Hello everybody, my name is John Doe."
move(text, spans, range, destination)[source]#

Move part of a text to another position, also moving its associated spans

Parameters:
  • text (str) – The text in which a part should be moved

  • spans (list of AnySpan) – The spans associated with the input text

  • range (tuple of int) – The range of the part to move (end excluded)

  • destination (int) – The position where to insert the displaced range

Return type:

tuple[str, list[AnySpan]]

Returns:

  • text (str) – The updated text

  • spans (list of AnySpan) – The spans associated with the updated text

Examples

>>> text = "Hello, my name is John Doe."
>>> spans = [Span(0, len(text))]
>>> range = (17, 22)
>>> dest = len(text) - 1
>>> text, spans = move(text, spans, range, dest)
>>> print(text)
Hi, my name is Doe John.
normalize_spans(spans)[source]#

Return a transformed of spans in which all instances of ModifiedSpan are replaced by the spans they refer to, spans are sorted and contiguous spans are merged.

Parameters:

spans (list of AnySpan) – The spans associated with a text, including additional spans if insertions or replacement were performed

Return type:

list[Span]

Returns:

normalized_spans (list of Span) – Spans in spans normalized as described

Examples

>>> spans = [
...     Span(0, 10),
...     Span(20, 30),
...     ModifiedSpan(8, replaced_spans=[Span(30, 36)]),
... ]
>>> spans = normalize_spans(spans)
>>> print(spans)
>>> [Span(0, 10), Span(20, 36)]
concatenate(texts, all_spans)[source]#

Concatenate text and span objects

Return type:

tuple[str, list[AnySpan]]

clean_up_gaps_in_normalized_spans(spans, text, max_gap_length=3)[source]#

Remove small gaps in normalized spans.

This is useful for converting non-contiguous entity spans with small gaps containing only whitespace or a few meaningless characters (due to clean-up preprocessing or translation) into one unique bigger span. Gaps having less than max_gap_length will be removed by merging the spans before and after the gap.

Parameters:
  • spans (list of Span) – The normalized spans in which to remove gaps

  • text (str) – The text associated with spans

  • max_gap_length (int, default=3) – Max number of characters in gaps, after stripping leading and trailing whitespace.

Examples

>>> text = "heart failure"
>>> spans = [Span(0, 5), Span(6, 13)]
>>> spans = clean_up_gaps_in_normalized_spans(spans, text)
>>> print(spans)
>>> spans = [Span(0, 13)]