:py:mod:`medkit.text.segmentation.section_tokenizer`
====================================================

.. py:module:: medkit.text.segmentation.section_tokenizer


Module Contents
---------------

Classes
~~~~~~~

.. autoapisummary::

   medkit.text.segmentation.section_tokenizer.SectionModificationRule
   medkit.text.segmentation.section_tokenizer.SectionTokenizer




.. py:class:: SectionModificationRule


   .. py:attribute:: section_name
      :type: str

      

   .. py:attribute:: new_section_name
      :type: str

      

   .. py:attribute:: other_sections
      :type: list[str]

      

   .. py:attribute:: order
      :type: typing_extensions.Literal[BEFORE, AFTER]

      


.. py:class:: SectionTokenizer(section_dict: dict[str, list[str]] | None = None, output_label: str = _DEFAULT_LABEL, section_rules: Iterable[SectionModificationRule] = (), strip_chars: str = _DEFAULT_STRIP_CHARS, uid: str | None = None)


   Bases: :py:obj:`medkit.core.text.SegmentationOperation`

   
   Section segmentation annotator based on keyword rules.


   :Parameters:

       **section_dict: dict of str to list of str, optional**
           Dictionary containing the section name as key and the list of mappings as
           value. If None, the content of default_section_definition.yml will be used.

       **output_label: str, optional**
           Segment label to use for annotation output.

       **section_rules: iterable of SectionModificationRule, optional**
           List of rules for modifying a section name according its order to the other
           sections. If section_dict is None, the content of
           default_section_definition.yml will be used.

       **strip_chars: str, optional**
           The list of characters to strip at the beginning of the returned segment.

       **uid: str, optional**
           Identifier of the tokenizer














   ..
       !! processed by numpydoc !!
   .. py:attribute:: _DEFAULT_LABEL
      :type: str
      :value: 'section'

      

   .. py:attribute:: _DEFAULT_STRIP_CHARS
      :type: str
      :value: Multiline-String

       .. raw:: html

           <details><summary>Show Value</summary>

       .. code-block:: python

           """.;,?! 
           
           	"""

       .. raw:: html

           </details>

      

   .. py:method:: run(segments: list[medkit.core.text.Segment]) -> list[medkit.core.text.Segment]

      
      Return sections detected in `segments`.

      Each section is a segment with an attached attribute
      (label: <same as self.output_label>, value: <the name of the section>).

      :Parameters:

          **segments: list of Segment**
              List of segments into which to look for sections

      :Returns:

          list of Segment
              Sections segments found in `segments`













      ..
          !! processed by numpydoc !!

   .. py:method:: _find_sections_in_segment(segment: medkit.core.text.Segment)


   .. py:method:: _get_sections_to_rename(match: list[tuple])


   .. py:method:: get_example()
      :classmethod:


   .. py:method:: load_section_definition(filepath: pathlib.Path, encoding: str | None = None) -> tuple[dict[str, list[str]], tuple[SectionModificationRule, Ellipsis]]
      :staticmethod:

      
      Load the sections definition stored in a yml file.


      :Parameters:

          **filepath** : Path
              Path to a yml file containing the sections(name + mappings) and rules

          **encoding** : str, optional
              Encoding of the file to open

      :Returns:

          tuple
              Tuple containing:
              - the dictionary where key is the section name and value is the list of all
              equivalent strings.
              - the list of section modification rules.
              These rules allow to rename some sections according their order













      ..
          !! processed by numpydoc !!

   .. py:method:: save_section_definition(section_dict: dict[str, list[str]], section_rules: Iterable[SectionModificationRule], filepath: pathlib.Path, encoding: str | None = None)
      :staticmethod:

      
      Save section yaml definition file.


      :Parameters:

          **section_dict** : dict of str to list of str
              Dictionary containing the section name as key and the list of mappings
              as value (cf. content of default_section_dict.yml as example)

          **section_rules** : iterable of SectionModificationRule
              List of rules for modifying a section name according its order to the other
              sections.

          **filepath** : Path
              Path to the file to save

          **encoding** : str, optional
              File encoding














      ..
          !! processed by numpydoc !!


