medkit.audio.transcription#

APIs#

For accessing these APIs, you may use import like this:

from medkit.audio.transcription import <api_to_import>

Classes:

DocTranscriber(input_label, output_label, ...)

Speech-to-text transcriber generating text documents from audio documents.

TranscribedTextDocument(text, ...[, anns, ...])

Subclass for TextDocument instances generated by audio transcription.

TranscriptionOperation(*args, **kwargs)

Protocol for operations in charge of the actual speech-to-text transcription to use with DocTranscriber

class DocTranscriber(input_label, output_label, transcription_operation, attrs_to_copy=None, uid=None)[source]#

Speech-to-text transcriber generating text documents from audio documents.

For each text document, all audio segments with a specific label are converted into text segments and regrouped in a corresponding new text document. The text of each segment is concatenated to form the full raw text of the new document.

Generated text documents are instances of TranscribedTextDocument (subclass of TextDocument) with additional info such as the identifier of the original audio document and a mapping between audio spans and text spans.

Methods :func: create_text_segment() and :func: augment_full_text_for_next_segment() can be overridden to customize how the text segments are created and how they are concatenated to form the full text.

The actual transcription task is delegated to a TranscriptionOperation that must be provided, for instance :class`~medkit.audio.transcription.hf_transcriber.HFTranscriber` or :class`~medkit.audio.transcription.sb_transcriber.SBTranscriber`.

Parameters:
  • input_label (str) – Label of audio segments that should be transcribed.

  • output_label (str) – Label of generated text segments.

  • transcription_operation (TranscriptionOperation) – Transcription operation in charge of actually transcribing each audio segment.

  • attrs_to_copy (list of str, optional) – Labels of attributes that should be copied from the original audio segments to the transcribed text segments.

  • uid (str, optional) – Identifier of the transcriber.

Methods:

augment_full_text_for_next_segment(...)

Append intermediate joining text to full text before the next segment is concatenated to it.

run(audio_docs)

Return a transcribed text document for each document in audio_docs

set_prov_tracer(prov_tracer)

Enable provenance tracing.

Attributes:

description

Contains all the operation init parameters.

run(audio_docs)[source]#

Return a transcribed text document for each document in audio_docs

Parameters:

audio_docs (list of AudioDocument) – Audio documents to transcribe

Return type:

list[TranscribedTextDocument]

Returns:

list of TranscribedTextDocument – Transcribed text documents (once per document in audio_docs)

augment_full_text_for_next_segment(full_text, segment_text, audio_segment)[source]#

Append intermediate joining text to full text before the next segment is concatenated to it. Override for custom behavior.

Return type:

str

property description: OperationDescription#

Contains all the operation init parameters.

Return type:

OperationDescription

set_prov_tracer(prov_tracer)#

Enable provenance tracing.

Parameters:

prov_tracer (ProvTracer) – The provenance tracer used to trace the provenance.

class TranscriptionOperation(*args, **kwargs)[source]#

Protocol for operations in charge of the actual speech-to-text transcription to use with DocTranscriber

Attributes:

output_label

Label to use for generated transcription attributes

Methods:

run(segments)

Add a transcription attribute to each segment with a text value containing the transcribed text.

output_label: str#

Label to use for generated transcription attributes

run(segments)[source]#

Add a transcription attribute to each segment with a text value containing the transcribed text.

Parameters:

segments (list of AudioSegment) – List of segments to transcribe

class TranscribedTextDocument(text, text_spans_to_audio_spans, audio_doc_id, anns=None, attrs=None, metadata=None, uid=None)[source]#

Subclass for TextDocument instances generated by audio transcription.

Variables:
  • uid (str, optional) – Document identifier.

  • text (str) – The full transcribed text.

  • text_spans_to_audio_spans (dict of TextSpan to AudioSpan) – Mapping between text characters spans in this document and corresponding audio spans in the original audio.

  • audio_doc_id (str, optional) – Id of the original AudioDocument that was transcribed, if known.

  • anns (sequence of TextAnnotation, optional) – Annotations of the document.

  • attrs (sequence of Attribute, optional) – Attributes of the document.

  • metadata (dict of str to Any) – Document metadata.

  • raw_segment (TextSegment) – Auto-generated segment containing the raw full transcribed text.

Methods:

from_dict(doc_dict)

Create a TranscribedTextDocument from a dict

from_dir(path[, pattern, encoding])

Create documents from text files in a directory

from_file(path[, encoding])

Create a document from a text file

get_containing_audio_spans(text_ann_spans)

Return the audio spans used to transcribe the text referenced by a text annotation.

get_snippet(segment, max_extend_length)

Return a portion of the original text containing the annotation

get_subclass_for_data_dict(data_dict)

Return the subclass that corresponds to the class name found in a data dict

get_containing_audio_spans(text_ann_spans)[source]#

Return the audio spans used to transcribe the text referenced by a text annotation.

For instance, if the audio ranging from 1.0 to 20.0 seconds is transcribed to some text ranging from character 10 to 56 in the transcribed document, and then a text annotation is created referencing the span 15 to 25, then the containing audio span will be the one ranging from 1.0 to 20.0 seconds.

Note that some text annotations maybe be contained in more that one audio spans.

Parameters:

text_ann_spans (list of AnyTextSpan) – Text spans of a text annotation referencing some characters in the transcribed document.

Return type:

list[Span]

Returns:

list of AudioSpan – Audio spans used to transcribe the text referenced by the spans of text_ann.

classmethod from_dir(path, pattern='*.txt', encoding='utf-8')#

Create documents from text files in a directory

Parameters:
  • path (Path) – Path of the directory containing text files

  • pattern (str) – Glob pattern to match text files in path

  • encoding (str) – Text encoding to use

Return type:

list[Self]

Returns:

list of TextDocument – Text documents with contents of each file as text

classmethod from_file(path, encoding='utf-8')#

Create a document from a text file

Parameters:
  • path (Path) – Path of the text file

  • encoding (str, default="utf-8") – Text encoding to use

Return type:

Self

Returns:

TextDocument – Text document with contents of path as text. The file path is included in the document metadata.

get_snippet(segment, max_extend_length)#

Return a portion of the original text containing the annotation

Parameters:
  • segment (Segment) – The annotation

  • max_extend_length (int) – Maximum number of characters to use around the annotation

Return type:

str

Returns:

str – A portion of the text around the annotation

classmethod get_subclass_for_data_dict(data_dict)#

Return the subclass that corresponds to the class name found in a data dict

Parameters:

data_dict (dict of str to Any) – Data dict returned by the to_dict() method of a subclass (or of the base class itself)

Return type:

Optional[type[Self]]

Returns:

subclass – Subclass that generated data_dict, or None if data_dict correspond to the base class itself.

classmethod from_dict(doc_dict)[source]#

Create a TranscribedTextDocument from a dict

Parameters:

doc_dict (dict of str to Any) – A dictionary from a serialized TranscribedTextDocument as generated by to_dict()

Return type:

Self

Subpackages / Submodules#

medkit.audio.transcription.hf_transcriber

This module needs extra-dependencies not installed as core dependencies of medkit.

medkit.audio.transcription.sb_transcriber

This module needs extra-dependencies not installed as core dependencies of medkit.