medkit.audio.transcription
==========================

.. py:module:: medkit.audio.transcription


Submodules
----------

.. toctree::
   :maxdepth: 1

   /reference/api/medkit/audio/transcription/doc_transcriber/index
   /reference/api/medkit/audio/transcription/hf_transcriber/index
   /reference/api/medkit/audio/transcription/sb_transcriber/index
   /reference/api/medkit/audio/transcription/transcribed_text_document/index


Classes
-------

.. autoapisummary::

   medkit.audio.transcription.DocTranscriber
   medkit.audio.transcription.TranscriptionOperation
   medkit.audio.transcription.TranscribedTextDocument


Package Contents
----------------

.. py:class:: DocTranscriber(input_label: str, output_label: str, transcription_operation: TranscriptionOperation, attrs_to_copy: list[str] | None = None, uid: str | None = None)

   Bases: :py:obj:`medkit.core.Operation`


   Speech-to-text transcriber generating text documents from audio documents.

   For each text document, all audio segments with a specific label are
   converted into text segments and regrouped in a corresponding new text
   document. The text of each segment is concatenated to form the full raw text
   of the new document.

   Generated text documents are instances of
   :class:`~medkit.audio.transcription.transcribed_text_document.TranscribedTextDocument`
   (subclass of :class:`~medkit.core.text.document.TextDocument`) with
   additional info such as the identifier of the original audio document and a mapping
   between audio spans and text spans.

   Methods :func: `create_text_segment()` and :func:
   `augment_full_text_for_next_segment()` can be overridden to customize how
   the text segments are created and how they are concatenated to form the full
   text.

   The actual transcription task is delegated to a
   :class:`~.TranscriptionOperation` that must be provided, for instance
   :class`~medkit.audio.transcription.hf_transcriber.HFTranscriber` or
   :class`~medkit.audio.transcription.sb_transcriber.SBTranscriber`.

   :Parameters:

       **input_label: str**
           Label of audio segments that should be transcribed.

       **output_label: str**
           Label of generated text segments.

       **transcription_operation: TranscriptionOperation**
           Transcription operation in charge of actually transcribing each
           audio segment.

       **attrs_to_copy: list of str, optional**
           Labels of attributes that should be copied from the original audio segments
           to the transcribed text segments.

       **uid: str, optional**
           Identifier of the transcriber.


   ..
       !! processed by numpydoc !!

   .. py:method:: run(audio_docs: list[medkit.core.audio.AudioDocument]) -> list[medkit.audio.transcription.transcribed_text_document.TranscribedTextDocument]

      
      Return a transcribed text document for each document in `audio_docs`.


      :Parameters:

          **audio_docs: list of AudioDocument**
              Audio documents to transcribe

      :Returns:

          list of TranscribedTextDocument:
              Transcribed text documents (once per document in `audio_docs`)


      ..
          !! processed by numpydoc !!


   .. py:method:: _transcribe_doc(audio_doc: medkit.core.audio.AudioDocument) -> medkit.audio.transcription.transcribed_text_document.TranscribedTextDocument


   .. py:method:: augment_full_text_for_next_segment(full_text: str, segment_text: str, audio_segment: medkit.core.audio.Segment) -> str

      
      Append intermediate joining text to full text before the next segment is concatenated to it.

      Override for custom behavior.


      ..
          !! processed by numpydoc !!


.. py:class:: TranscriptionOperation

   Bases: :py:obj:`typing_extensions.Protocol`


   Protocol for speech-to-text transcription operations.


   :Attributes:

       **output_label** : str
           Label to use for generated transcription attributes.


   ..
       !! processed by numpydoc !!

   .. py:attribute:: output_label
      :type:  str


   .. py:method:: run(segments: list[medkit.core.audio.Segment])

      
      Run the transcription operation.

      Add a transcription attribute to each segment with a text value containing
      the transcribed text.

      :Parameters:

          **segments: list of AudioSegment**
              List of segments to transcribe


      ..
          !! processed by numpydoc !!


.. py:class:: TranscribedTextDocument(text: str, text_spans_to_audio_spans: dict[medkit.core.text.Span, medkit.core.audio.Span], audio_doc_id: str | None, anns: Sequence[medkit.core.text.TextAnnotation] | None = None, attrs: Sequence[medkit.core.Attribute] | None = None, metadata: dict[str, Any] | None = None, uid: str | None = None)

   Bases: :py:obj:`medkit.core.text.TextDocument`


   Text document generated by audio transcription.


   :Parameters:

       **text: str**
           The full transcribed text.

       **text_spans_to_audio_spans: dict of TextSpan to AudioSpan**
           Mapping between text characters spans in this document and
           corresponding audio spans in the original audio.

       **audio_doc_id: str, optional**
           Id of the original
           :class:`~medkit.core.audio.document.AudioDocument` that was
           transcribed, if known.

       **anns: sequence of TextAnnotation, optional**
           Annotations of the document.

       **attrs: sequence of Attribute, optional**
           Attributes of the document.

       **metadata: dict of str to Any**
           Document metadata.

       **uid: str, optional**
           Document identifier.


   :Attributes:

       **raw_segment: TextSegment**
           Auto-generated segment containing the raw full transcribed text.


   ..
       !! processed by numpydoc !!

   .. py:attribute:: text_spans_to_audio_spans
      :type:  dict[medkit.core.text.Span, medkit.core.audio.Span]


   .. py:attribute:: audio_doc_id
      :type:  str | None


   .. py:method:: get_containing_audio_spans(text_ann_spans: list[medkit.core.text.AnySpan]) -> list[medkit.core.audio.Span]

      
      Return the audio spans used to transcribe the text referenced by a text annotation.

      For instance, if the audio ranging from 1.0 to 20.0 seconds is
      transcribed to some text ranging from character 10 to 56 in the
      transcribed document, and then a text annotation is created referencing
      the span 15 to 25, then the containing audio span will be the one ranging
      from 1.0 to 20.0 seconds.

      Note that some text annotations maybe be contained in more that one
      audio spans.

      :Parameters:

          **text_ann_spans: list of AnyTextSpan**
              Text spans of a text annotation referencing some characters in the
              transcribed document.

      :Returns:

          list of AudioSpan
              Audio spans used to transcribe the text referenced by the spans of `text_ann`.


      ..
          !! processed by numpydoc !!


   .. py:method:: to_dict(with_anns: bool = True) -> dict[str, Any]


   .. py:method:: from_dict(doc_dict: dict[str, Any]) -> typing_extensions.Self
      :classmethod:


      Create a `TranscribedTextDocument` from a dict.


      :Parameters:

          **doc_dict: dict of str to Any**
              A dictionary from a serialized `TranscribedTextDocument` as generated by to_dict()


      ..
          !! processed by numpydoc !!