medkit.text.preprocessing.normalizer#

Classes:

Normalizer(output_label[, rules, name, uid])

Generic normalizer to be used as pre-processing module

NormalizerRule(pattern_to_replace, new_text)

Create new instance of NormalizerRule(pattern_to_replace, new_text)

class Normalizer(output_label, rules=None, name=None, uid=None)[source]#

Generic normalizer to be used as pre-processing module

This module is a non-destructive module allowing to replace selected characters with the wanted characters. It respects the span modification by creating a new text-bound annotation containing the span modification information from input text.

Parameters
  • output_label (str) – The output label of the created annotations

  • rules (Optional[List[Tuple[str, str]]]) – The list of replacement rules

  • name (Optional[str]) – Name describing the pre-processing module (defaults to the class name)

  • uid (str) – Identifier of the pre-processing module

Methods:

run(segments)

Run the module on a list of segments provided as input and returns a new list of segments

run(segments)[source]#

Run the module on a list of segments provided as input and returns a new list of segments

Parameters

segments (List[Segment]) – List of segments to normalize

Return type

List[Segment]

Returns

List[~medkit.core.text.Segment] – List of normalized segments

class NormalizerRule(pattern_to_replace, new_text)[source]#

Create new instance of NormalizerRule(pattern_to_replace, new_text)

Attributes:

new_text

Alias for field number 1

pattern_to_replace

Alias for field number 0

property pattern_to_replace#

Alias for field number 0

property new_text#

Alias for field number 1