medkit.audio.segmentation.pa_speaker_detector
=============================================

.. py:module:: medkit.audio.segmentation.pa_speaker_detector


Classes
-------

.. autoapisummary::

   medkit.audio.segmentation.pa_speaker_detector.PASpeakerDetector


Module Contents
---------------

.. py:class:: PASpeakerDetector(model: str | pathlib.Path, output_label: str, min_nb_speakers: int | None = None, max_nb_speakers: int | None = None, min_duration: float = 0.1, device: int = -1, segmentation_batch_size: int = 1, embedding_batch_size: int = 1, hf_auth_token: str | None = None, uid: str | None = None)

   Bases: :py:obj:`medkit.core.audio.SegmentationOperation`


   Speaker diarization operation relying on `pyannote.audio`.

   Each input segment will be split into several sub-segments corresponding
   to speech turn, and an attribute will be attached to each of these sub-segments
   indicating the speaker of the turn.

   `PASpeakerDetector` uses the `SpeakerDiarization` pipeline from
   `pyannote.audio`, which performs the following steps:

   - perform multi-speaker VAD with a `PyanNet` segmentation model and extract     voiced segments ;

   - compute embeddings for each voiced segment with a     embeddings model (typically speechbrain ECAPA-TDNN) ;

   - group voice segments by speakers using a clustering algorithm such as
     agglomerative clustering, HMM, etc.

   :Parameters:

       **model** : str or Path
           Name (on the HuggingFace models hub) or path of a pretrained
           pipeline. When a path, should point to the .yaml file containing the
           pipeline configuration.

       **output_label** : str
           Label of generated turn segments.

       **min_nb_speakers** : int, optional
           Minimum number of speakers expected to be found.

       **max_nb_speakers** : int, optional
           Maximum number of speakers expected to be found.

       **min_duration** : float, default=0.1
           Minimum duration of speech segments, in seconds (short segments will
           be discarded).

       **device** : int, default=-1
           Device to use for pytorch models. Follows the Hugging Face
           convention (`-1` for cpu and device number for gpu, for instance `0`
           for "cuda:0").

       **segmentation_batch_size** : int, default=1
           Number of input segments in batches processed by segmentation model.

       **embedding_batch_size** : int, default=1
           Number of pre-segmented audios in batches processed by embedding model.

       **hf_auth_token** : str, optional
           HuggingFace Authentication token (to access private models on the
           hub)

       **uid** : str, optional
           Identifier of the detector.


   ..
       !! processed by numpydoc !!

   .. py:method:: run(segments: list[medkit.core.audio.Segment]) -> list[medkit.core.audio.Segment]

      
      Return all turn segments detected for all input `segments`.


      :Parameters:

          **segments** : list of Segment
              Audio segments on which to perform diarization.

      :Returns:

          list of Segment
              Segments detected as containing speech activity (with speaker
              attributes)


      ..
          !! processed by numpydoc !!


   .. py:method:: _detect_turns_in_segment(segment: medkit.core.audio.Segment) -> Iterator[medkit.core.audio.Segment]