medkit.text.postprocessing.document_splitter
============================================

.. py:module:: medkit.text.postprocessing.document_splitter


Classes
-------

.. autoapisummary::

   medkit.text.postprocessing.document_splitter.DocumentSplitter


Module Contents
---------------

.. py:class:: DocumentSplitter(segment_label: str, entity_labels: list[str] | None = None, attr_labels: list[str] | None = None, relation_labels: list[str] | None = None, name: str | None = None, uid: str | None = None)

   Bases: :py:obj:`medkit.core.Operation`


   Split text documents using its segments as a reference.

   The resulting 'mini-documents' contain the entities belonging to each
   segment along with their attributes.

   This operation can be used to create datasets from medkit text documents.

   :Parameters:

       **segment_label** : str
           Label of the segments to use as references for the splitter

       **entity_labels** : list of str, optional
           Labels of entities to be included in the mini documents.
           If None, all entities from the document will be included.

       **attr_labels** : list of str, optional
           Labels of the attributes to be included into the new annotations.
           If None, all attributes will be included.

       **relation_labels** : list of str, optional
           Labels of relations to be included in the mini documents.
           If None, all relations will be included.

       **name** : str, optional
           Name describing the splitter (default to the class name).

       **uid** : str, Optional
           Identifier of the operation


   ..
       !! processed by numpydoc !!

   .. py:attribute:: segment_label


   .. py:attribute:: entity_labels
      :value: None


   .. py:attribute:: attr_labels
      :value: None


   .. py:attribute:: relation_labels
      :value: None


   .. py:method:: run(docs: list[medkit.core.text.TextDocument]) -> list[medkit.core.text.TextDocument]

      
      Split docs into mini documents.


      :Parameters:

          **docs: list of TextDocument**
              List of text documents to split


      :Returns:

          list of TextDocument
              List of documents created from the selected segments


      ..
          !! processed by numpydoc !!


   .. py:method:: _create_segment_doc(segment: medkit.core.text.Segment, entities: list[medkit.core.text.Entity], relations: list[medkit.core.text.Relation], doc_source: medkit.core.text.TextDocument) -> medkit.core.text.TextDocument

      
      Create a TextDocument from a segment and its entities.

      The original zone of the segment becomes the text of the document.

      :Parameters:

          **segment** : Segment
              Segment to use as reference for the new document

          **entities** : list of Entity
              Entities inside the segment

          **relations** : list of Relation
              Relations inside the segment

          **doc_source** : TextDocument
              Initial document from which annotations where extracted


      :Returns:

          TextDocument
              A new document with entities, the metadata includes the original span and metadata


      ..
          !! processed by numpydoc !!


   .. py:method:: _filter_attrs_from_ann(ann: medkit.core.text.TextAnnotation) -> list[medkit.core.Attribute]

      
      Filter attributes from an annotation using 'attr_labels'.


      ..
          !! processed by numpydoc !!