medkit.io._common
=================

.. py:module:: medkit.io._common


Attributes
----------

.. autoapisummary::

   medkit.io._common.logger


Classes
-------

.. autoapisummary::

   medkit.io._common.Entity
   medkit.io._common.Relation
   medkit.io._common.Segment
   medkit.io._common.TextAnnotation
   medkit.io._common.TextDocument


Functions
---------

.. autoapisummary::

   medkit.io._common.get_anns_by_type


Module Contents
---------------

.. py:class:: Entity(label: str, text: str, spans: list[medkit.core.text.span.AnySpan], attrs: list[medkit.core.attribute.Attribute] | None = None, metadata: dict[str, Any] | None = None, uid: str | None = None, store: medkit.core.store.Store | None = None, attr_container_class: type[medkit.core.text.entity_attribute_container.EntityAttributeContainer] = EntityAttributeContainer)

   Bases: :py:obj:`Segment`


   Text entity referencing part of an :class:`~medkit.core.text.TextDocument`.


   :Attributes:

       **uid** : str
           The entity identifier.

       **label** : str
           The label for this entity (e.g., DISEASE)

       **text** : str
           Text of the entity.

       **spans** : list of AnySpan
           List of spans indicating which parts of the entity text correspond to
           which part of the document's full text.

       **attrs** : EntityAttributeContainer
           Attributes of the entity. Stored in a
           :class:{~medkit.core.EntityAttributeContainer} but can be passed as a list at
           init.

       **metadata** : dict of str to Any
           The metadata of the entity

       **keys** : set of str
           Pipeline output keys to which the entity belongs to.


   ..
       !! processed by numpydoc !!

   .. py:attribute:: attrs
      :type:  medkit.core.text.entity_attribute_container.EntityAttributeContainer


.. py:class:: Relation(label: str, source_id: str, target_id: str, attrs: list[medkit.core.attribute.Attribute] | None = None, metadata: dict[str, Any] | None = None, uid: str | None = None, store: medkit.core.store.Store | None = None, attr_container_class: type[medkit.core.attribute_container.AttributeContainer] = AttributeContainer)

   Bases: :py:obj:`TextAnnotation`


   Relation between two text entities.


   :Attributes:

       **uid** : str
           The identifier of the relation

       **label** : str
           The relation label

       **source_id** : str
           The identifier of the entity from which the relation is defined

       **target_id** : str
           The identifier of the entity to which the relation is defined

       **attrs** : AttributeContainer
           The attributes of the relation

       **metadata** : dict of str to Any
           The metadata of the relation

       **keys** : set of str
           Pipeline output keys to which the relation belongs to


   ..
       !! processed by numpydoc !!

   .. py:attribute:: source_id
      :type:  str


   .. py:attribute:: target_id
      :type:  str


   .. py:method:: to_dict() -> dict[str, Any]


   .. py:method:: from_dict(relation_dict: dict[str, Any]) -> typing_extensions.Self
      :classmethod:


      Create a Relation from a dict.


      :Parameters:

          **relation_dict** : dict of str to Any
              A dictionary from a serialized relation as generated by to_dict()


      ..
          !! processed by numpydoc !!


.. py:class:: Segment(label: str, text: str, spans: list[medkit.core.text.span.AnySpan], attrs: list[medkit.core.attribute.Attribute] | None = None, metadata: dict[str, Any] | None = None, uid: str | None = None, store: medkit.core.store.Store | None = None, attr_container_class: type[medkit.core.attribute_container.AttributeContainer] = AttributeContainer)

   Bases: :py:obj:`TextAnnotation`


   Text segment referencing part of an :class:`~medkit.core.text.TextDocument`.


   :Attributes:

       **uid** : str
           The segment identifier.

       **label** : str
           The label for this segment (e.g., SENTENCE)

       **text** : str
           Text of the segment.

       **spans** : list of AnySpan
           List of spans indicating which parts of the segment text correspond to
           which part of the document's full text.

       **attrs** : AttributeContainer
           Attributes of the segment. Stored in a
           :class:{~medkit.core.AttributeContainer} but can be passed as a list at
           init.

       **metadata** : dict of str to Any
           The metadata of the segment

       **keys** : set of str
           Pipeline output keys to which the segment belongs to.


   ..
       !! processed by numpydoc !!

   .. py:attribute:: spans
      :type:  list[medkit.core.text.span.AnySpan]


   .. py:attribute:: text
      :type:  str


   .. py:method:: to_dict() -> dict[str, Any]


   .. py:method:: from_dict(segment_dict: dict[str, Any]) -> typing_extensions.Self
      :classmethod:


      Create a Segment from a dict.


      :Parameters:

          **segment_dict** : dict of str to Any
              A dictionary from a serialized segment as generated by to_dict()


      ..
          !! processed by numpydoc !!


.. py:class:: TextAnnotation(label: str, attrs: list[medkit.core.attribute.Attribute] | None = None, metadata: dict[str, Any] | None = None, uid: str | None = None, attr_container_class: type[medkit.core.attribute_container.AttributeContainer] = AttributeContainer)

   Bases: :py:obj:`abc.ABC`, :py:obj:`medkit.core.dict_conv.SubclassMapping`


   Base abstract class for all text annotations.


   :Attributes:

       **uid** : str
           Unique identifier of the annotation.

       **label** : str
           The label for this annotation (e.g., SENTENCE)

       **attrs** : AttributeContainer
           Attributes of the annotation. Stored in a
           :class:{~medkit.core.AttributeContainer} but can be passed as a list at
           init.

       **metadata** : dict of str to Any
           The metadata of the annotation

       **keys** : set of str
           Pipeline output keys to which the annotation belongs to.


   ..
       !! processed by numpydoc !!

   .. py:attribute:: uid
      :type:  str


   .. py:attribute:: label
      :type:  str


   .. py:attribute:: attrs
      :type:  medkit.core.attribute_container.AttributeContainer


   .. py:attribute:: metadata
      :type:  dict[str, Any]


   .. py:attribute:: keys
      :type:  set[str]


   .. py:method:: __init_subclass__()
      :classmethod:


   .. py:method:: from_dict(ann_dict: dict[str, Any]) -> typing_extensions.Self
      :classmethod:


   .. py:method:: to_dict() -> dict[str, Any]
      :abstractmethod:


.. py:class:: TextDocument(text: str, anns: Sequence[medkit.core.text.annotation.TextAnnotation] | None = None, attrs: Sequence[medkit.core.Attribute] | None = None, metadata: dict[str, Any] | None = None, uid: str | None = None)

   Bases: :py:obj:`medkit.core.dict_conv.SubclassMapping`


   Document holding text annotations.

   Annotations must be subclasses of `TextAnnotation`.


   .. rubric:: Examples

   >>> doc = TextDocument(text="hello")
   >>> raw_text = doc.anns.get(label=TextDocument.RAW_LABEL)[0]

   :Attributes:

       **uid** : str
           Unique identifier of the document.

       **text** : str
           Full document text.

       **anns** : TextAnnotationContainer
           Annotations of the document. Stored in an
           :class:`~.text.TextAnnotationContainer` but can be passed as a list at init.

       **attrs** : AttributeContainer
           Attributes of the document. Stored in an
           :class:`~.core.AttributeContainer` but can be passed as a list at init

       **metadata** : dict of str to Any
           Document metadata.

       **raw_segment** : Segment
           Auto-generated segment containing the full unprocessed document text. To
           get the raw text as an annotation to pass to processing operations:


   ..
       !! processed by numpydoc !!

   .. py:attribute:: RAW_LABEL
      :type:  ClassVar[str]
      :value: 'RAW_TEXT'


   .. py:attribute:: uid
      :type:  str


   .. py:attribute:: anns
      :type:  medkit.core.text.annotation_container.TextAnnotationContainer


   .. py:attribute:: attrs
      :type:  medkit.core.AttributeContainer


   .. py:attribute:: metadata
      :type:  dict[str, Any]


   .. py:attribute:: raw_segment
      :type:  medkit.core.text.annotation.Segment


   .. py:method:: _generate_raw_segment(text: str, doc_id: str) -> medkit.core.text.annotation.Segment
      :classmethod:


   .. py:property:: text
      :type: str


   .. py:method:: __init_subclass__()
      :classmethod:


   .. py:method:: to_dict(with_anns: bool = True) -> dict[str, Any]


   .. py:method:: from_dict(doc_dict: dict[str, Any]) -> typing_extensions.Self
      :classmethod:


      Create a TextDocument from a dict.


      :Parameters:

          **doc_dict** : dict of str to Any
              A dictionary from a serialized TextDocument as generated by to_dict()


      ..
          !! processed by numpydoc !!


   .. py:method:: from_file(path: os.PathLike, encoding: str = 'utf-8') -> typing_extensions.Self
      :classmethod:


      Create a document from a text file.


      :Parameters:

          **path** : Path
              Path of the text file

          **encoding** : str, default="utf-8"
              Text encoding to use

      :Returns:

          TextDocument
              Text document with contents of `path` as text. The file path is
              included in the document metadata.


      ..
          !! processed by numpydoc !!


   .. py:method:: from_dir(path: os.PathLike, pattern: str = '*.txt', encoding: str = 'utf-8') -> list[typing_extensions.Self]
      :classmethod:


      Create documents from text files in a directory.


      :Parameters:

          **path** : Path
              Path of the directory containing text files

          **pattern** : str
              Glob pattern to match text files in `path`

          **encoding** : str
              Text encoding to use

      :Returns:

          list of TextDocument
              Text documents with contents of each file as text


      ..
          !! processed by numpydoc !!


   .. py:method:: get_snippet(segment: medkit.core.text.annotation.Segment, max_extend_length: int) -> str

      
      Return a portion of the original text containing the annotation.


      :Parameters:

          **segment** : Segment
              The annotation

          **max_extend_length** : int
              Maximum number of characters to use around the annotation

      :Returns:

          str
              A portion of the text around the annotation


      ..
          !! processed by numpydoc !!


.. py:data:: logger

.. py:function:: get_anns_by_type(medkit_doc: medkit.core.text.TextDocument, anns_labels: list[str] | None = None) -> dict[str, medkit.core.text.TextAnnotation]

   
   Filter annotations by labels and return a dictionary by type of annotation.


   :Parameters:

       **medkit_doc** : TextDocument
           Text document with annotations

       **anns_labels** : list of str, optional
           Labels to filter annotations.
           If not provided, all annotations will be in the dictionary

   :Returns:

       Dict[str, TextAnnotation]
           Annotations by type: 'entities', 'relations', and 'segments'.


   ..
       !! processed by numpydoc !!