:py:mod:`medkit.audio.segmentation.webrtc_voice_detector`
=========================================================

.. py:module:: medkit.audio.segmentation.webrtc_voice_detector


Module Contents
---------------

Classes
~~~~~~~

.. autoapisummary::

   medkit.audio.segmentation.webrtc_voice_detector.WebRTCVoiceDetector


.. py:class:: WebRTCVoiceDetector(output_label: str, aggressiveness: typing_extensions.Literal[0, 1, 2, 3] = 2, frame_duration: typing_extensions.Literal[10, 20, 30] = 30, nb_frames_in_window: int = 10, switch_ratio: float = 0.9, uid: str | None = None)


   Bases: :py:obj:`medkit.core.audio.SegmentationOperation`

   
   Voice Activity Detection operation relying on the `webrtcvad` package.

   Per-frame VAD results of `webrtcvad` are aggregated with a switch algorithm
   considering the percentage of speech/non-speech frames in a wider sliding window.

   Input segments must be mono at 8kHZ, 16kHz, 32kHz or 48Khz.

   :Parameters:

       **output_label** : str
           Label of output speech segments.

       **aggressiveness** : {0, 1, 2, 3}, default=2
           Aggressiveness param passed to `webrtcvad` (the higher, the more likely
           to detect speech).

       **frame_duration** : {10, 20, 30}, default=30
           Duration in milliseconds of frames passed to `webrtcvad`.

       **nb_frames_in_window** : int, default=10
           Number of frames in the sliding window used when aggregating per-frame VAD
           results.

       **switch_ratio** : float, default=0.9
           Percentage of speech/non-speech frames required to switch the window speech
           state when aggregating per-frame VAD results.

       **uid** : str, optional
           Identifier of the detector.


   ..
       !! processed by numpydoc !!
   .. py:method:: run(segments: list[medkit.core.audio.Segment]) -> list[medkit.core.audio.Segment]

      
      Return all speech segments detected for all input `segments`.


      :Parameters:

          **segments** : list of Segment
              Audio segments on which to perform VAD.

      :Returns:

          list of Segment
              Segments detected as containing speech activity.


      ..
          !! processed by numpydoc !!

   .. py:method:: _detect_activity_in_segment(segment: medkit.core.audio.Segment) -> Iterator[medkit.core.audio.Segment]


   .. py:method:: _get_aggregated_vad(frames, sample_rate)

      
      Return index ranges of voiced frames using webrtcvad.


      ..
          !! processed by numpydoc !!