medkit.text.preprocessing.normalizer#

Classes:

`Normalizer`(output_label[, rules, name, uid])	Generic normalizer to be used as pre-processing module
`NormalizerRule`(pattern_to_replace, new_text)	Create new instance of NormalizerRule(pattern_to_replace, new_text)

class Normalizer(output_label, rules=None, name=None, uid=None)[source]#

Generic normalizer to be used as pre-processing module

This module is a non-destructive module allowing to replace selected characters with the wanted characters. It respects the span modification by creating a new text-bound annotation containing the span modification information from input text.

Parameters

output_label (str) – The output label of the created annotations
rules (Optional[List[Tuple[str, str]]]) – The list of replacement rules
name (Optional[str]) – Name describing the pre-processing module (defaults to the class name)
uid (str) – Identifier of the pre-processing module

Methods:

run(segments)

Run the module on a list of segments provided as input and returns a new list of segments

run(segments)[source]#

Run the module on a list of segments provided as input and returns a new list of segments

Parameters: segments (List[Segment]) – List of segments to normalize
Return type: List[Segment]
Returns: List[~medkit.core.text.Segment] – List of normalized segments

class NormalizerRule(pattern_to_replace, new_text)[source]#

Create new instance of NormalizerRule(pattern_to_replace, new_text)

Attributes:

`new_text`	Alias for field number 1
`pattern_to_replace`	Alias for field number 0

property pattern_to_replace#: Alias for field number 0

property new_text#: Alias for field number 1