medkit.core.text.utils
======================

.. py:module:: medkit.core.text.utils


Functions
---------

.. autoapisummary::

   medkit.core.text.utils.clean_newline_character
   medkit.core.text.utils.clean_parentheses_eds
   medkit.core.text.utils.clean_multiple_whitespaces_in_sentence
   medkit.core.text.utils.replace_point_after_keywords
   medkit.core.text.utils.replace_multiple_newline_after_sentence
   medkit.core.text.utils.replace_newline_inside_sentence
   medkit.core.text.utils.replace_point_in_uppercase
   medkit.core.text.utils.replace_point_in_numbers
   medkit.core.text.utils.replace_point_before_keywords
   medkit.core.text.utils.lstrip
   medkit.core.text.utils.rstrip
   medkit.core.text.utils.strip


Module Contents
---------------

.. py:function:: clean_newline_character(text: str, spans: list[medkit.core.text.span.AnySpan], keep_endlines: bool = False) -> tuple[str, list[medkit.core.text.span.AnySpan]]

   
   Replace the newline character depending on its position in the text.

   The endlines characters that are not suppressed can be either kept as
   endlines, or replaced by spaces. This method combines :func:`replace_multiple_newline_after_sentence`
   and :func:`replace_newline_inside_sentence`.

   :Parameters:

       **text** : str
           The text to be modified

       **spans** : list of AnySpan
           Spans associated to the `text`

       **keep_endlines** : bool, default=False
           Whether to keep the endlines as '.\\\\n' or replace them with '. '


   :Returns:

       **text** : str
           The cleaned text

       **spans** : list of AnySpan
           The list of modified spans


   .. rubric:: Examples

   >>> text = "This is\\n\\n\\ta sentence\\nAnother\\nsentence\\n\\nhere"
   >>> spans = [Span(0, len(text))]
   >>> text, spans = clean_newline_character(text, spans, keep_endlines=False)
   >>> print(text)
   This is a sentence. Another sentence here

   >>> text, spans = clean_newline_character(text, spans, keep_endlines=True)
   >>> print(text)
   This is a sentence.
   Another sentence here

   ..
       !! processed by numpydoc !!

.. py:function:: clean_parentheses_eds(text: str, spans: list[medkit.core.text.span.AnySpan]) -> tuple[str, list[medkit.core.text.span.AnySpan]]

   
   Modify the text near the parentheses depending on its content.

   The rules are adapted for French documents.


   .. rubric:: Examples

   >>> text = \"\"\"
   ... Le test PCR est (-), pas de nouvelles.
   ... L'examen d'aujourd'hui est (+).
   ... Les bilans réalisés (biologique, métabolique en particulier à la recherche
   ... de GAMT et X fragile) sont revenus négatifs.
   ... Le patient a un traitement(debuté le 3/02).
   ... \"\"\"
   >>> spans = [Span(0, len(text))]
   >>> text, spans = clean_parentheses_eds(text, spans)
   >>> print(text)
   Le test PCR est  negatif , pas de nouvelles.
   L'examen d'aujourd'hui est  positif .
   Les bilans réalisés sont revenus négatifs ; biologique, métabolique en particulier à la recherche
   de GAMT et X fragile.
   Le patient a un traitement,debuté le 3/02,.

   ..
       !! processed by numpydoc !!

.. py:function:: clean_multiple_whitespaces_in_sentence(text: str, spans: list[medkit.core.text.span.AnySpan]) -> tuple[str, list[medkit.core.text.span.AnySpan]]

   
   Normalize consecutive whitespaces in a sentence.

   Replace multiple white-spaces between alphanumeric characters and
   lowercase characters with a single whitespace.


   .. rubric:: Examples

   >>> text = "A   phrase    with  multiple   spaces     "
   >>> spans = [Span(0, len(text))]
   >>> text, spans = clean_multiple_whitespaces_in_sentence(text, spans)
   >>> print(text)
   A phrase with multiple spaces

   ..
       !! processed by numpydoc !!

.. py:function:: replace_point_after_keywords(text: str, spans: list[medkit.core.text.span.AnySpan], keywords: list[str], strict: bool = False, replace_by: str = ' ') -> tuple[str, list[medkit.core.text.span.AnySpan]]

   
   Replace the dot character after a keyword and update its span.

   Could be used to replace dots that indicate the title of a person (i.e. M. or Mrs.)
   or some dots that appear by mistake after `keywords`.

   :Parameters:

       **text** : str
           The text to be modified

       **spans** : list of AnySpan
           Spans associated to the `text`

       **keywords** : list of str
           Word or pattern to match before a point

       **strict** : bool, default=False
           If True, the keyword must be followed by a point.
           If False, the keyword could have zero or many whitespaces before a point

       **replace_by** : str, default=" "
           Replacement string


   :Returns:

       **text** : str
           The text with the replaced matches

       **spans** : list of AnySpan
           The list of modified spans


   .. rubric:: Examples

   >>> text = "Le Dr. a un rdv. Mme. Bernand est venue à 14h"
   >>> spans = [Span(0, len(text))]
   >>> keywords = ["Dr", "Mme"]
   >>> text, spans = replace_point_after_keywords(text, spans, keywords, replace_by="")
   >>> print(text)
   Le Dr a un rdv. Mme Bernand est venue à 14h

   ..
       !! processed by numpydoc !!

.. py:function:: replace_multiple_newline_after_sentence(text: str, spans: list[medkit.core.text.span.AnySpan]) -> tuple[str, list[medkit.core.text.span.AnySpan]]

   
   Normalize consecutive newlines between sentences.

   Replace multiple space characters between a newline
   character \\\\n and a capital letter or a number with a single newline character.

   :Parameters:

       **text** : str
           The text to be modified

       **spans** : list of AnySpan
           Spans associated to the `text`


   :Returns:

       **text** : str
           The cleaned text

       **spans** : list of AnySpan
           The list of modified spans


   ..
       !! processed by numpydoc !!

.. py:function:: replace_newline_inside_sentence(text: str, spans: list[medkit.core.text.span.AnySpan]) -> tuple[str, list[medkit.core.text.span.AnySpan]]

   
   Replace newline in a sentence.

   Replace the newline character \\\\n between lowercase letters
   or punctuation marks with a space.

   :Parameters:

       **text** : str
           The text to be modified

       **spans** : list of AnySpan
           Spans associated to the `text`


   :Returns:

       **text** : str
           The cleaned text

       **spans** : list of AnySpan
           The list of modified spans


   ..
       !! processed by numpydoc !!

.. py:function:: replace_point_in_uppercase(text: str, spans: list[medkit.core.text.span.AnySpan]) -> tuple[str, list[medkit.core.text.span.AnySpan]]

   
   Replace the dot character between uppercase characters with a space and update its span.


   .. rubric:: Examples

   >>> text = "Abréviation ING.DRT or RTT.J"
   >>> spans = [Span(0, len(text))]
   >>> text, spans = replace_point_in_uppercase(text, spans)
   >>> print(text)
   Abréviation ING DRT or RTT J

   ..
       !! processed by numpydoc !!

.. py:function:: replace_point_in_numbers(text: str, spans: list[medkit.core.text.span.AnySpan]) -> tuple[str, list[medkit.core.text.span.AnySpan]]

   
   Replace the dot character between numbers with a comma and update its span.


   .. rubric:: Examples

   >>> text = "La valeur est de 3.456."
   >>> spans = [Span(0, len(text))]
   >>> text, spans = replace_point_in_numbers(text, spans)
   >>> print(text)
   La valeur est de 3,456.

   ..
       !! processed by numpydoc !!

.. py:function:: replace_point_before_keywords(text: str, spans: list[medkit.core.text.span.AnySpan], keywords: list[str]) -> tuple[str, list[medkit.core.text.span.AnySpan]]

   
   Replace the dot character before a keyword with a space and update its span.


   ..
       !! processed by numpydoc !!

.. py:function:: lstrip(text: str, start: int = 0, chars: str | None = None) -> tuple[str, int]

   
   Return a copy of the string with leading characters removed and its corresponding new start index.


   :Parameters:

       **text** : str
           The text to strip.

       **start** : int, default=0
           The start index from the original text if any.

       **chars** : str, optional
           The list of characters to strip. Default behaviour is like `str.lstrip([chars])`.


   :Returns:

       **new_text** : str
           New text

       **new_start** : int
           New start index


   ..
       !! processed by numpydoc !!

.. py:function:: rstrip(text: str, end: int | None = None, chars: str | None = None) -> tuple[str, int]

   
   Return a copy of the string with trailing characters removed and its corresponding new end index.


   :Parameters:

       **text** : str
           The text to strip.

       **end** : int, optional
           The end index from the original text if any.

       **chars** : str, optional
           The list of characters to strip. Default behaviour is like `str.rstrip([chars])`.


   :Returns:

       **new_text** : str
           New text

       **new_end** : int
           New end index


   ..
       !! processed by numpydoc !!

.. py:function:: strip(text: str, start: int = 0, chars: str | None = None) -> tuple[str, int, int]

   
   Return a copy of the string with leading characters removed and its corresponding new start and end indexes.


   :Parameters:

       **text** : str
           The text to strip.

       **start** : int, default=0
           The start index from the original text if any.

       **chars** : str, optional
           The list of characters to strip. Default behaviour is like `str.lstrip([chars])`.


   :Returns:

       **new_text** : str
           New text

       **new_start** : int
           New start index

       **new_end** : int
           New end index


   ..
       !! processed by numpydoc !!