medkit.audio.segmentation.pa_speaker_detector
medkit.audio.segmentation.pa_speaker_detector#
This module needs extra-dependencies that are not installed with medkit. To install them, use : pip install torch https://github.com/pyannote/pyannote-audio/archive/refs/tags/JSALT2023.tar.gz.
Classes:
|
Speaker diarization operation relying on pyannote.audio |
- class PASpeakerDetector(segmentation_model, embedding_model, clustering, output_label, pipeline_params=None, min_nb_speakers=None, max_nb_speakers=None, segmentation_batch_size=1, embedding_batch_size=1, uid=None)[source]#
Speaker diarization operation relying on pyannote.audio
Each input segment will be split into several sub-segments corresponding to speech turn, and an attribute will be attached to each of these sub-segments indicating the speaker of the turn.
PASpeakerDetector uses the SpeakerDiarization pipeline from pyannote.audio, which performs the following steps:
perform multi-speaker VAD with a PyanNet segmentation model and extract voiced segments ;
compute embeddings for each voiced segment with a embeddings model (typically speechbrain ECAPA-TDNN) ;
group voice segments by speakers using a clustering algorithm such as agglomerative clustering, HMM, etc.
- Parameters
segmentation_model (
Union[str,Path]) – Name (on the HuggingFace models hub) or path of the PyanNet segmentation model. When a path, should point to the .bin file containing the model.embedding_model (
Union[str,Path]) – Name (on the HuggingFace models hub) or path to the embedding model. When a path to a speechbrain model, should point to the directory containing the model weights and hyperparameters.clustering (
Literal['AgglomerativeClustering','FINCHClustering','HiddenMarkovModelClustering','OracleClustering']) – Clustering method to use.output_label (
str) – Label of generated turn segments.pipeline_params (
Optional[Dict]) – Dictionary of segmentation and clustering parameters. The dictionary can hold a “segmentation” key and a “clustering” key pointing to sub dictionaries. Refer to the pyannote documentation for the supported parameters segmentation and clustering parameters (clustering parameters depend on the clustering method used).min_nb_speakers (
Optional[int]) – Minimum number of speakers expected to be found.max_nb_speakers (
Optional[int]) – Maximum number of speakers expected to be found.segmentation_batch_size (
int) – Number of input segments in batches processed by segmentation model.embedding_batch_size (
int) – Number of pre-segmented audios in batches processed by embedding model.uid (str) – Identifier of the detector.
Methods:
run(segments)Return all turn segments detected for all input segments.
set_prov_tracer(prov_tracer)Enable provenance tracing.
Attributes:
Contains all the operation init parameters.
- property description: medkit.core.operation_desc.OperationDescription#
Contains all the operation init parameters.
- Return type
- set_prov_tracer(prov_tracer)#
Enable provenance tracing.
- Parameters
prov_tracer (
ProvTracer) – The provenance tracer used to trace the provenance.