medkit.text.segmentation.tokenizer_utils#

Functions:

lstrip(text[, start, chars])

Returns a copy of the string with leading characters removed and its corresponding new start index.

rstrip(text[, end, chars])

Returns a copy of the string with trailing characters removed and its corresponding new end index.

strip(text[, start, chars])

Returns a copy of the string with leading characters removed and its corresponding new start and end indexes.

lstrip(text, start=0, chars=None)[source]#

Returns a copy of the string with leading characters removed and its corresponding new start index.

Parameters
  • text (str) – The text to strip.

  • start (int) – The start index from the original text if any.

  • chars (Optional[str]) – The list of characters to strip. Default behaviour is like str.lstrip([chars]).

Return type

Tuple[str, int]

rstrip(text, end=None, chars=None)[source]#

Returns a copy of the string with trailing characters removed and its corresponding new end index.

Parameters
  • text (str) – The text to strip.

  • end (Optional[int]) – The end index from the original text if any.

  • chars (Optional[str]) – The list of characters to strip. Default behaviour is like str.rstrip([chars]).

Return type

Tuple[str, int]

strip(text, start=0, chars=None)[source]#

Returns a copy of the string with leading characters removed and its corresponding new start and end indexes.

Parameters
  • text (str) – The text to strip.

  • start (int) – The start index from the original text if any.

  • chars (Optional[str]) – The list of characters to strip. Default behaviour is like str.lstrip([chars]).

Return type

Tuple[str, int, int]