medkit.text.segmentation.rush_sentence_tokenizer
================================================

.. py:module:: medkit.text.segmentation.rush_sentence_tokenizer


Classes
-------

.. autoapisummary::

   medkit.text.segmentation.rush_sentence_tokenizer.RushSentenceTokenizer


Module Contents
---------------

.. py:class:: RushSentenceTokenizer(output_label: str = _DEFAULT_LABEL, path_to_rules: str | pathlib.Path | None = None, keep_newlines: bool = True, attrs_to_copy: list[str] | None = None, uid: str | None = None)

   Bases: :py:obj:`medkit.core.text.SegmentationOperation`


   Sentence segmentation annotator based on PyRuSH.


   :Parameters:

       **output_label: str, optional**
           The output label of the created annotations.

       **path_to_rules: str or Path, optional**
           Path to csv or tsv file to provide to PyRuSH. If none provided,
           "rush_tokenizer_default_rules.tsv" will be used
           (corresponds to the "conf/rush_rules.tsv" in the PyRush repo)

       **keep_newlines: bool, default=True**
           With the default rules, newline chars are not used to split
           sentences, therefore a sentence maybe contain one or more newline chars.
           If `keep_newlines` is False, newlines will be replaced by spaces.

       **attrs_to_copy: list of str, optional**
           Labels of the attributes that should be copied from the input segment
           to the derived segment. For example, useful for propagating section name.

       **uid: str, optional**
           Identifier of the tokenizer


   ..
       !! processed by numpydoc !!

   .. py:attribute:: _DEFAULT_LABEL
      :value: 'sentence'


   .. py:attribute:: output_label
      :value: 'sentence'


   .. py:attribute:: path_to_rules
      :value: None


   .. py:attribute:: keep_newlines
      :value: True


   .. py:attribute:: attrs_to_copy
      :value: None


   .. py:attribute:: _rush


   .. py:method:: run(segments: list[medkit.core.text.Segment]) -> list[medkit.core.text.Segment]

      
      Return sentences detected in `segments`.


      :Parameters:

          **segments: list of Segment**
              List of segments into which to look for sentences


      :Returns:

          list of Segment:
              Sentences segments found in `segments`


      ..
          !! processed by numpydoc !!


   .. py:method:: _find_sentences_in_segment(segment: medkit.core.text.Segment) -> Iterator[medkit.core.text.Segment]