medkit.audio.transcription.transcribed_text_document
====================================================

.. py:module:: medkit.audio.transcription.transcribed_text_document


Classes
-------

.. autoapisummary::

   medkit.audio.transcription.transcribed_text_document.TranscribedTextDocument


Module Contents
---------------

.. py:class:: TranscribedTextDocument(text: str, text_spans_to_audio_spans: dict[medkit.core.text.Span, medkit.core.audio.Span], audio_doc_id: str | None, anns: Sequence[medkit.core.text.TextAnnotation] | None = None, attrs: Sequence[medkit.core.Attribute] | None = None, metadata: dict[str, Any] | None = None, uid: str | None = None)

   Bases: :py:obj:`medkit.core.text.TextDocument`


   Text document generated by audio transcription.


   :Parameters:

       **text: str**
           The full transcribed text.

       **text_spans_to_audio_spans: dict of TextSpan to AudioSpan**
           Mapping between text characters spans in this document and
           corresponding audio spans in the original audio.

       **audio_doc_id: str, optional**
           Id of the original
           :class:`~medkit.core.audio.document.AudioDocument` that was
           transcribed, if known.

       **anns: sequence of TextAnnotation, optional**
           Annotations of the document.

       **attrs: sequence of Attribute, optional**
           Attributes of the document.

       **metadata: dict of str to Any**
           Document metadata.

       **uid: str, optional**
           Document identifier.


   :Attributes:

       **raw_segment: TextSegment**
           Auto-generated segment containing the raw full transcribed text.


   ..
       !! processed by numpydoc !!

   .. py:attribute:: text_spans_to_audio_spans
      :type:  dict[medkit.core.text.Span, medkit.core.audio.Span]


   .. py:attribute:: audio_doc_id
      :type:  str | None


   .. py:method:: get_containing_audio_spans(text_ann_spans: list[medkit.core.text.AnySpan]) -> list[medkit.core.audio.Span]

      
      Return the audio spans used to transcribe the text referenced by a text annotation.

      For instance, if the audio ranging from 1.0 to 20.0 seconds is
      transcribed to some text ranging from character 10 to 56 in the
      transcribed document, and then a text annotation is created referencing
      the span 15 to 25, then the containing audio span will be the one ranging
      from 1.0 to 20.0 seconds.

      Note that some text annotations maybe be contained in more that one
      audio spans.

      :Parameters:

          **text_ann_spans: list of AnyTextSpan**
              Text spans of a text annotation referencing some characters in the
              transcribed document.

      :Returns:

          list of AudioSpan
              Audio spans used to transcribe the text referenced by the spans of `text_ann`.


      ..
          !! processed by numpydoc !!


   .. py:method:: to_dict(with_anns: bool = True) -> dict[str, Any]


   .. py:method:: from_dict(doc_dict: dict[str, Any]) -> typing_extensions.Self
      :classmethod:


      Create a `TranscribedTextDocument` from a dict.


      :Parameters:

          **doc_dict: dict of str to Any**
              A dictionary from a serialized `TranscribedTextDocument` as generated by to_dict()


      ..
          !! processed by numpydoc !!