Audio operations#

This page lists all components related to audio processing.

Note

For more details about all sub-packages, refer to medkit.audio.

Pre-processing operations#

This section provides some information about how to use preprocessing modules for audio.

Note

For more details about public APIs, refer to medkit.audio.preprocessing.

Downmixer#

For more details, refer to medkit.audio.preprocessing.downmixer.

Power normalizer#

For more details, refer to medkit.audio.preprocessing.power_normalizer.

Resampler#

Important

Resampler needs additional dependencies that can be installed with pip install medkit-lib[resampler]

For more details, refer to medkit.audio.preprocessing.resampler.

Segmentation operations#

This section lists audio segmentation operations. They are part of the medkit.audio.segmentation module.

WebRTC voice detector#

For more details, refer to medkit.audio.segmentation.webrtc_voice_detector.

Pyannote speaker detector#

Important

PASpeakerDetector is an experimental feature. It depends on a version of pyannote-audio that is not released yet on PyPI.

To install it, you may use the JSALT2023 tag :

pip install https://github.com/pyannote/pyannote-audio/archive/refs/tags/JSALT2023.tar.gz

For more details, refer to medkit.audio.segmentation.pa_speaker_detector.

Audio Transcription#

This section lists operations and other components to use to perform audio transcription. They are part of the medkit.audio.transcription module.

DocTranscriber is the operation handling the transformation of AudioDocument instances into TranscribedDocument instances (subclass of TextDocument).

The actual conversion from text to audio is delegated to components complying with the TranscriberFunction protocol. HFTranscriberFunction and SBTranscriberFunction are implementations of TranscriberFunction, allowing to use HuggingFace transformer models and speechbrain models respectively.

DocTranscriber#

For more details, refer to medkit.audio.transcription.doc_transcriber.

TranscribedDocument#

For more details, refer to medkit.audio.transcription.transcribed_document.

HFTranscriberFunction#

Important

HFTranscriberFunction needs additional dependencies that can be installed with pip install medkit-lib[hf-transcriber-function]

For more details, refer to medkit.audio.transcription.hf_transcriber_function.

SBTranscriberFunction#

Important

SBTranscriberFunction needs additional dependencies that can be installed with pip install medkit-lib[sb-transcriber-function]

For more details, refer to medkit.audio.transcription.sb_transcriber_function.

Audio operations

Contents

Audio operations#

Pre-processing operations#

Downmixer#

Power normalizer#

Resampler#

Segmentation operations#

WebRTC voice detector#

Pyannote speaker detector#

Audio Transcription#

DocTranscriber#

TranscribedDocument#

HFTranscriberFunction#

SBTranscriberFunction#