medkit.audio.transcription.doc_transcriber#

Classes:

`DocTranscriber`(input_label, output_label, ...)	Speech-to-text transcriber generating text documents from audio documents.
`TranscriberFunction`(args, *kwds)	Protocol for components in charge of the actual speech-to-text transcription to use with `DocTranscriber`
`TranscriberFunctionDescription`(name[, config])	Description of a specific instance of a transcriber function (similarly to `OperationDescription`).

class DocTranscriber(input_label, output_label, transcriber_func, attrs_to_copy=None, uid=None)[source]#

Speech-to-text transcriber generating text documents from audio documents.

For each text document, all audio segments with a specific label are converted into text segments and regrouped in a corresponding new text document. The text of each segment is concatenated to form the full raw text of the new document.

Generated text documents are instances of TranscribedDocument (subclass of TextDocument) with additional info such as the identifier of the original audio document and a mapping between audio spans and text spans.

Methods :func: create_text_segment() and :func: augment_full_text_for_next_segment() can be overridden to customize how the text segments are created and how they are concatenated to form the full text.

The actual transcription task is delegated to a TranscriberFunction that must be provided.

Parameters

input_label (str) – Label of audio segments that should be transcribed.
output_label (str) – Label of generated text segments.
transcriber_func (TranscriberFunction) – Transcription component in charge of actually transforming each audio signal into text.
attrs_to_copy (Optional[List[str]]) – Labels of attributes that should be copied from the original audio segments to the transcribed text segments.
uid (str) – Identifier of the transcriber.

Methods:

`augment_full_text_for_next_segment`(...)	Append intermediate joining text to full text before the next segment is concatenated to it.
`run`(audio_docs)	Return a transcribed text document for each document in audio_docs

run(audio_docs)[source]#

Return a transcribed text document for each document in audio_docs

Parameters: audio_docs (List[AudioDocument]) – Audio documents to transcribe
Return type: List[TranscribedDocument]
Returns: List[TranscribedDocument] – Transcribed text documents (once per document in audio_docs)

augment_full_text_for_next_segment(full_text, segment_text, audio_segment)[source]#

Append intermediate joining text to full text before the next segment is concatenated to it. Override for custom behavior.

Return type: str

class TranscriberFunction(*args, **kwds)[source]#

Protocol for components in charge of the actual speech-to-text transcription to use with DocTranscriber

Methods:

transcribe(audios)

Convert audio buffers into strings by performing speech-to-text.

transcribe(audios)[source]#

Convert audio buffers into strings by performing speech-to-text.

Parameters: audios (List[AudioBuffer]) – Audio buffers to converted
Return type: List[str]
Returns: List[str] – Text transcription for each buffer in audios

class TranscriberFunctionDescription(name, config=<factory>)[source]#

Description of a specific instance of a transcriber function (similarly to OperationDescription).

Parameters

name (str) – The name of the transcriber function (typically the class name).
config (Dict[str, Any]) – The specific configuration of the instance.