medkit.audio.transcription
Contents
medkit.audio.transcription#
APIs#
For accessing these APIs, you may use import like this:
from medkit.audio.transcription import <api_to_import>
Classes:
|
Speech-to-text transcriber generating text documents from audio documents. |
|
Subclass for |
|
Protocol for components in charge of the actual speech-to-text transcription to use with |
|
Description of a specific instance of a transcriber function (similarly to |
- class DocTranscriber(input_label, output_label, transcriber_func, attrs_to_copy=None, uid=None)[source]#
Speech-to-text transcriber generating text documents from audio documents.
For each text document, all audio segments with a specific label are converted into text segments and regrouped in a corresponding new text document. The text of each segment is concatenated to form the full raw text of the new document.
Generated text documents are instances of
TranscribedDocument(subclass ofTextDocument) with additional info such as the identifier of the original audio document and a mapping between audio spans and text spans.Methods :func: create_text_segment() and :func: augment_full_text_for_next_segment() can be overridden to customize how the text segments are created and how they are concatenated to form the full text.
The actual transcription task is delegated to a
TranscriberFunctionthat must be provided.- Parameters
input_label (
str) – Label of audio segments that should be transcribed.output_label (
str) – Label of generated text segments.transcriber_func (
TranscriberFunction) – Transcription component in charge of actually transforming each audio signal into text.attrs_to_copy (
Optional[List[str]]) – Labels of attributes that should be copied from the original audio segments to the transcribed text segments.uid (str) – Identifier of the transcriber.
Methods:
Append intermediate joining text to full text before the next segment is concatenated to it.
run(audio_docs)Return a transcribed text document for each document in audio_docs
set_prov_tracer(prov_tracer)Enable provenance tracing.
Attributes:
Contains all the operation init parameters.
- run(audio_docs)[source]#
Return a transcribed text document for each document in audio_docs
- Parameters
audio_docs (
List[AudioDocument]) – Audio documents to transcribe- Return type
List[TranscribedDocument]- Returns
List[TranscribedDocument] – Transcribed text documents (once per document in audio_docs)
- augment_full_text_for_next_segment(full_text, segment_text, audio_segment)[source]#
Append intermediate joining text to full text before the next segment is concatenated to it. Override for custom behavior.
- Return type
str
- property description: medkit.core.operation_desc.OperationDescription#
Contains all the operation init parameters.
- Return type
- set_prov_tracer(prov_tracer)#
Enable provenance tracing.
- Parameters
prov_tracer (
ProvTracer) – The provenance tracer used to trace the provenance.
- class TranscriberFunction(*args, **kwargs)[source]#
Protocol for components in charge of the actual speech-to-text transcription to use with
DocTranscriberMethods:
transcribe(audios)Convert audio buffers into strings by performing speech-to-text.
- transcribe(audios)[source]#
Convert audio buffers into strings by performing speech-to-text.
- Parameters
audios (
List[AudioBuffer]) – Audio buffers to converted- Return type
List[str]- Returns
List[str] – Text transcription for each buffer in audios
- class TranscriberFunctionDescription(name, config=<factory>)[source]#
Description of a specific instance of a transcriber function (similarly to
OperationDescription).- Parameters
name (str) – The name of the transcriber function (typically the class name).
config (Dict[str, Any]) – The specific configuration of the instance.
- class TranscribedDocument(text, text_spans_to_audio_spans, audio_doc_id, anns=None, metadata=None, uid=None)[source]#
Subclass for
TextDocumentinstances generated by audio transcription.- Variables
uid (str) – Document identifier.
text – The full transcribed text.
text_spans_to_audio_spans (Dict[medkit.core.text.span.Span, medkit.core.audio.span.Span]) – Mapping between text characters spans in this document and corresponding audio spans in the original audio.
audio_doc_id (Optional[str]) – Id of the original
AudioDocumentthat was transcribed, if known.anns (medkit.core.text.annotation_container.TextAnnotationContainer) – Annotations of the document.
metadata (Dict[str, Any]) – Document metadata.
raw_segment (medkit.core.text.annotation.Segment) – Auto-generated segment containing the raw full transcribed text.
Methods:
from_dict(doc_dict)Create a TranscribedDocument from a dict
get_containing_audio_spans(text_ann_spans)Return the audio spans used to transcribe the text referenced by a text annotation.
get_snippet(segment, max_extend_length)Return a portion of the original text containing the annotation
get_subclass_for_data_dict(data_dict)Return the subclass that corresponds to the class name found in a data dict
- get_containing_audio_spans(text_ann_spans)[source]#
Return the audio spans used to transcribe the text referenced by a text annotation.
For instance, if the audio ranging from 1.0 to 20.0 seconds is transcribed to some text ranging from character 10 to 56 in the transcribed document, and then a text annotation is created referencing the span 15 to 25, then the containing audio span will be the one ranging from 1.0 to 20.0 seconds.
Note that some text annotations maybe be contained in more that one audio spans.
- get_snippet(segment, max_extend_length)#
Return a portion of the original text containing the annotation
- Parameters
segment (
Segment) – The annotationmax_extend_length (
int) – Maximum number of characters to use around the annotation
- Return type
str- Returns
str – A portion of the text around the annotation
- classmethod get_subclass_for_data_dict(data_dict)#
Return the subclass that corresponds to the class name found in a data dict
- Parameters
data_dict (
Dict[str,Any]) – Data dict returned by the to_dict() method of a subclass (or of the base class itself)- Return type
Optional[Type[Self]]- Returns
subclass – Subclass that generated data_dict, or None if data_dict correspond to the base class itself.
Subpackages / Submodules#
This module needs extra-dependencies not installed as core dependencies of medkit. |
|
This module needs extra-dependencies not installed as core dependencies of medkit. |
|