medkit.text.segmentation.section_tokenizer
medkit.text.segmentation.section_tokenizer#
Classes:
|
|
|
Section segmentation annotator based on keyword rules |
- class SectionTokenizer(section_dict, output_label='SECTION', section_rules=(), strip_chars='.;,?! \n\r\t', uid=None)[source]#
Section segmentation annotator based on keyword rules
Initialize the Section Tokenizer
- Parameters
section_dict (
Dict[str,List[str]]) – Dictionary containing the section name as key and the list of mappings as value (cf. content of default_section_dict.yml as example)output_label (
str) – Segment label to use for annotation output. Default is SECTION.section_rules (
Iterable[SectionModificationRule]) – List of rules for modifying a section name according its order to the other sections.strip_chars (
str) – The list of characters to strip at the beginning of the returned segment. Default: ‘.;,?!DefaultConfig) (' (cf.) –
uid (str, Optional) – Identifier of the tokenizer
Methods:
load_section_definition(filepath[, encoding])Load the sections definition stored in a yml file
run(segments)Return sections detected in segments.
save_section_definition(section_dict, ...[, ...])Save section yaml definition file
- static load_section_definition(filepath, encoding=None)[source]#
Load the sections definition stored in a yml file
- Parameters
filepath (
Path) – Path to a yml file containing the sections(name + mappings) and rulesencoding (
Optional[str]) – Encoding of the file to open
- Return type
Tuple[Dict[str,List[str]],Tuple[SectionModificationRule, …]]- Returns
Tuple[Dict[str, List[str]], Tuple[SectionModificationRule, …]] – Tuple containing: - the dictionary where key is the section name and value is the list of all equivalent strings. - the list of section modification rules. These rules allow to rename some sections according their order
- static save_section_definition(section_dict, section_rules, filepath, encoding=None)[source]#
Save section yaml definition file
- Parameters
section_dict (
Dict[str,List[str]]) – Dictionary containing the section name as key and the list of mappings as value (cf. content of default_section_dict.yml as example)section_rules (
Iterable[SectionModificationRule]) – List of rules for modifying a section name according its order to the other sections.filepath (
Path) – Path to the file to saveencoding (
Optional[str]) – File encoding. Default: None