medkit.audio.transcription.hf_transcriber

medkit.audio.transcription.hf_transcriber#

This module needs extra-dependencies not installed as core dependencies of medkit. To install them, use pip install medkit-lib[hf-transcriber].

Classes:

HFTranscriber([model, output_label, ...])

Transcriber operation based on a Hugging Face transformers model.

class HFTranscriber(model='facebook/s2t-large-librispeech-asr', output_label='transcribed_text', language=None, add_trailing_dot=True, capitalize=True, device=-1, batch_size=1, hf_auth_token=None, cache_dir=None, uid=None)[source]#

Transcriber operation based on a Hugging Face transformers model.

For each segment given as input, a transcription attribute will be created with the transcribed text as value. If needed, a text document can later be created from all the transcriptions of a audio document using ~medkit.audio.transcription.TranscribedTextDocument.from_audio_doc

Parameters:

model (str, default="facebook/s2t-large-librispeech-asr") – Name of the ASR model on the Hugging Face models hub. Must be a model compatible with the AutomaticSpeechRecognitionPipeline transformers class.
output_label (str, default="transcribed_text") – Label of the attribute containing the transcribed text that will be attached to the input segments
language (str, optional) – Optional output language to be forced on the model (useful for some multilingual models such as Whisper)
add_trailing_dot (bool, default=True) – If True, a dot will be added at the end of each transcription text.
capitalize (bool, default=True) – It True, the first letter of each transcription text will be uppercased and the rest lowercased.
device (int, default=-1) – Device to use for pytorch models. Follows the Hugging Face convention (-1 for cpu and device number for gpu, for instance 0 for “cuda:0”)
batch_size (int, default=1) – Size of batches processed by ASR pipeline.
hf_auth_token (str, optional) – HuggingFace Authentication token (to access private models on the hub)
cache_dir (str or Path, optional) – Directory where to store downloaded models. If not set, the default HuggingFace cache dir is used.
uid (str, optional) – Identifier of the transcriber.

Methods:

`run`(segments)	Add a transcription attribute to each segment with a text value containing the transcribed text.
`set_prov_tracer`(prov_tracer)	Enable provenance tracing.

Attributes:

description

Contains all the operation init parameters.

run(segments)[source]#

Add a transcription attribute to each segment with a text value containing the transcribed text.

Parameters:: segments (list of Segment) – List of segments to transcribe

property description: OperationDescription#

Contains all the operation init parameters.

Return type:: OperationDescription

set_prov_tracer(prov_tracer)#

Enable provenance tracing.

Parameters:: prov_tracer (ProvTracer) – The provenance tracer used to trace the provenance.

medkit.audio.transcription.hf_transcriber

Contents

medkit.audio.transcription.hf_transcriber#