medkit.text.segmentation.section_tokenizer
==========================================

.. py:module:: medkit.text.segmentation.section_tokenizer


Classes
-------

.. autoapisummary::

   medkit.text.segmentation.section_tokenizer.SectionModificationRule
   medkit.text.segmentation.section_tokenizer.SectionTokenizer


Module Contents
---------------

.. py:class:: SectionModificationRule

   .. py:attribute:: section_name
      :type:  str


   .. py:attribute:: new_section_name
      :type:  str


   .. py:attribute:: other_sections
      :type:  list[str]


   .. py:attribute:: order
      :type:  typing_extensions.Literal[BEFORE, AFTER]


.. py:class:: SectionTokenizer(section_dict: dict[str, list[str]] | None = None, output_label: str = _DEFAULT_LABEL, section_rules: Iterable[SectionModificationRule] = (), strip_chars: str = _DEFAULT_STRIP_CHARS, uid: str | None = None)

   Bases: :py:obj:`medkit.core.text.SegmentationOperation`


   
   Section segmentation annotator based on keyword rules.


   :Parameters:

       **section_dict: dict of str to list of str, optional**
           Dictionary containing the section name as key and the list of mappings as
           value. If None, the content of default_section_definition.yml will be used.

       **output_label: str, optional**
           Segment label to use for annotation output.

       **section_rules: iterable of SectionModificationRule, optional**
           List of rules for modifying a section name according its order to the other
           sections. If section_dict is None, the content of
           default_section_definition.yml will be used.

       **strip_chars: str, optional**
           The list of characters to strip at the beginning of the returned segment.

       **uid: str, optional**
           Identifier of the tokenizer














   ..
       !! processed by numpydoc !!

   .. py:attribute:: _DEFAULT_LABEL
      :type:  str
      :value: 'section'



   .. py:attribute:: _DEFAULT_STRIP_CHARS
      :type:  str
      :value: Multiline-String

      .. raw:: html

         <details><summary>Show Value</summary>

      .. code-block:: python

         """.;,?! 
         
         	"""

      .. raw:: html

         </details>




   .. py:attribute:: output_label
      :value: 'section'



   .. py:attribute:: strip_chars
      :value: Multiline-String

      .. raw:: html

         <details><summary>Show Value</summary>

      .. code-block:: python

         """.;,?! 
         
         	"""

      .. raw:: html

         </details>




   .. py:attribute:: section_dict
      :value: None



   .. py:attribute:: section_rules
      :value: ()



   .. py:attribute:: keyword_processor


   .. py:method:: run(segments: list[medkit.core.text.Segment]) -> list[medkit.core.text.Segment]

      
      Return sections detected in `segments`.

      Each section is a segment with an attached attribute
      (label: <same as self.output_label>, value: <the name of the section>).

      :Parameters:

          **segments: list of Segment**
              List of segments into which to look for sections



      :Returns:

          list of Segment
              Sections segments found in `segments`











      ..
          !! processed by numpydoc !!


   .. py:method:: _find_sections_in_segment(segment: medkit.core.text.Segment)


   .. py:method:: _get_sections_to_rename(match: list[tuple])


   .. py:method:: get_example()
      :classmethod:



   .. py:method:: load_section_definition(filepath: pathlib.Path, encoding: str | None = None) -> tuple[dict[str, list[str]], tuple[SectionModificationRule, Ellipsis]]
      :staticmethod:


      
      Load the sections definition stored in a yml file.


      :Parameters:

          **filepath** : Path
              Path to a yml file containing the sections(name + mappings) and rules

          **encoding** : str, optional
              Encoding of the file to open



      :Returns:

          tuple
              Tuple containing:
              - the dictionary where key is the section name and value is the list of all
              equivalent strings.
              - the list of section modification rules.
              These rules allow to rename some sections according their order











      ..
          !! processed by numpydoc !!


   .. py:method:: save_section_definition(section_dict: dict[str, list[str]], section_rules: Iterable[SectionModificationRule], filepath: pathlib.Path, encoding: str | None = None)
      :staticmethod:


      
      Save section yaml definition file.


      :Parameters:

          **section_dict** : dict of str to list of str
              Dictionary containing the section name as key and the list of mappings
              as value (cf. content of default_section_dict.yml as example)

          **section_rules** : iterable of SectionModificationRule
              List of rules for modifying a section name according its order to the other
              sections.

          **filepath** : Path
              Path to the file to save

          **encoding** : str, optional
              File encoding














      ..
          !! processed by numpydoc !!


