medkit.audio.transcription.hf_transcriber
=========================================

.. py:module:: medkit.audio.transcription.hf_transcriber


Classes
-------

.. autoapisummary::

   medkit.audio.transcription.hf_transcriber.HFTranscriber


Module Contents
---------------

.. py:class:: HFTranscriber(model: str = 'facebook/s2t-large-librispeech-asr', output_label: str = 'transcribed_text', language: str | None = None, add_trailing_dot: bool = True, capitalize: bool = True, device: int = -1, batch_size: int = 1, hf_auth_token: str | None = None, cache_dir: str | pathlib.Path | None = None, uid: str | None = None)

   Bases: :py:obj:`medkit.core.Operation`


   Transcriber operation based on a Hugging Face transformers model.

   For each segment given as input, a transcription attribute will be created
   with the transcribed text as value. If needed, a text document can later be
   created from all the transcriptions of a audio document using
   :func:`~medkit.audio.transcription.TranscribedTextDocument.from_audio_doc
   <TranscribedTextDocument.from_audio_doc>`

   :Parameters:

       **model** : str, default="facebook/s2t-large-librispeech-asr"
           Name of the ASR model on the Hugging Face models hub. Must be a
           model compatible with the `AutomaticSpeechRecognitionPipeline`
           transformers class.

       **output_label** : str, default="transcribed_text"
           Label of the attribute containing the transcribed text that will be
           attached to the input segments

       **language** : str, optional
           Optional output language to be forced on the model (useful for some
           multilingual models such as Whisper)

       **add_trailing_dot** : bool, default=True
           If `True`, a dot will be added at the end of each transcription text.

       **capitalize** : bool, default=True
           It `True`, the first letter of each transcription text will be
           uppercased and the rest lowercased.

       **device** : int, default=-1
           Device to use for pytorch models. Follows the Hugging Face convention
           (`-1` for cpu and device number for gpu, for instance `0` for "cuda:0")

       **batch_size** : int, default=1
           Size of batches processed by ASR pipeline.

       **hf_auth_token** : str, optional
           HuggingFace Authentication token (to access private models on the
           hub)

       **cache_dir** : str or Path, optional
           Directory where to store downloaded models. If not set, the default
           HuggingFace cache dir is used.

       **uid** : str, optional
           Identifier of the transcriber.


   ..
       !! processed by numpydoc !!

   .. py:method:: run(segments: list[medkit.core.audio.Segment])

      
      Run the transcription operation.

      Add a transcription attribute to each segment with a text value containing
      the transcribed text.

      :Parameters:

          **segments** : list of Segment
              List of segments to transcribe


      ..
          !! processed by numpydoc !!


   .. py:method:: _transcribe_audios(audios: list[medkit.core.audio.AudioBuffer]) -> list[str]