medkit.text.preprocessing.eds_cleaner
=====================================

.. py:module:: medkit.text.preprocessing.eds_cleaner


Classes
-------

.. autoapisummary::

   medkit.text.preprocessing.eds_cleaner.EDSCleaner


Module Contents
---------------

.. py:class:: EDSCleaner(output_label: str = _DEFAULT_LABEL, keep_endlines: bool = False, handle_parentheses_eds: bool = True, handle_points_eds: bool = True, uid: str | None = None)

   Bases: :py:obj:`medkit.core.Operation`


   EDS pre-processing annotation module.

   This module is a non-destructive module allowing to remove and clean selected points
   and newlines characters. It respects the span modification by creating a new
   text-bound annotation containing the span modification information from input text.

   :Parameters:

       **output_label** : str, optional
           The output label of the created annotations.

       **keep_endlines** : bool, default=False
           If True, modify multiple endlines using `.\\n` as a replacement.
           If False (default), modify multiple endlines using whitespaces (`.\\s`) as a replacement.

       **handle_parentheses_eds** : bool, default=True
           If True (default), modify the text near to parentheses or keywords according to
           predefined rules for french documents
           If False, the text near to parentheses or keywords is not modified

       **handle_points_eds** : bool, default=True
           Modify points near to predefined keywords for french documents
           If True (default), modify the points near to keywords
           If False, the points near to keywords is not modified

       **uid** : str, optional
           Identifier of the pre-processing module


   ..
       !! processed by numpydoc !!

   .. py:attribute:: _DEFAULT_LABEL
      :value: 'clean_text'


   .. py:method:: run(segments: list[medkit.core.text.Segment]) -> list[medkit.core.text.Segment]

      
      Run the module on a list of segments provided as input and returns a new list of segments.


      :Parameters:

          **segments** : list of Segment
              List of segments to normalize

      :Returns:

          list of Segment
              List of cleaned segments.


      ..
          !! processed by numpydoc !!


   .. py:method:: _clean_segment_text(segment: medkit.core.text.Segment)

      
      Clean up a segment non-destructively, remove points between numbers and  upper case letters.

      Then remove multiple whitespaces or newline characters.
      Finally, modify parentheses or point after keywords if necessary.


      ..
          !! processed by numpydoc !!