medkit.core.audio#

Classes:

AudioAnnotationContainer(doc_id, raw_segment)

Manage a list of audio annotations belonging to an audio document.

AudioBuffer(sample_rate, nb_samples, nb_channels)

Audio buffer base class.

AudioDocument(audio[, anns, attrs, ...])

Document holding audio annotations.

FileAudioBuffer(path[, trim_start, ...])

Audio buffer giving access to audio files stored on the filesystem (to use when manipulating unmodified raw audio).

MemoryAudioBuffer(signal, sample_rate)

Audio buffer giving access to signals stored in memory (to use when reading/writing a modified audio signal).

PreprocessingOperation([uid, name])

Abstract operation for pre-processing segments.

Segment(label, audio, span[, attrs, ...])

Audio segment referencing part of an AudioDocument.

SegmentationOperation([uid, name])

Abstract operation for segmenting audio.

Span(start, end)

Boundaries of a slice of audio.

class Segment(label, audio, span, attrs=None, metadata=None, uid=None)[source]#

Audio segment referencing part of an AudioDocument.

Variables:
  • uid (str) – Unique identifier of the segment.

  • label (str) – Label of the segment.

  • audio (AudioBuffer) – The audio signal of the segment. It must be consistent with the span, in the sense that it must correspond to the audio signal of the document at the span boundaries. But it can be a modified, processed version of this audio signal.

  • span (Span) – Span (in seconds) indicating the part of the document’s full signal that this segment references.

  • attrs (AttributeContainer) – Attributes of the segment. Stored in a :class:{~medkit.core.AttributeContainer} but can be passed as a list at init.

  • metadata (dict of str to Any) – Metadata of the segment.

  • keys (set of str) – Pipeline output keys to which the annotation belongs to.

Methods:

get_subclass_for_data_dict(data_dict)

Return the subclass that corresponds to the class name found in a data dict

classmethod get_subclass_for_data_dict(data_dict)#

Return the subclass that corresponds to the class name found in a data dict

Parameters:

data_dict (dict of str to Any) – Data dict returned by the to_dict() method of a subclass (or of the base class itself)

Return type:

Optional[type[Self]]

Returns:

subclass – Subclass that generated data_dict, or None if data_dict correspond to the base class itself.

class AudioAnnotationContainer(doc_id, raw_segment)[source]#

Manage a list of audio annotations belonging to an audio document.

This behaves more or less like a list: calling len() and iterating are supported. Additional filtering is available through the get() method.

Also provides handling of raw segment.

Instantiate the annotation container

Parameters:

doc_id (str) – The identifier of the document which annotations belong to.

Methods:

add(ann)

Attach an annotation to the document.

get(*[, label, key])

Return a list of the annotations of the document, optionally filtering by label or key.

get_by_id(uid)

Return the annotation corresponding to a specific identifier.

get_ids(*[, label, key])

Return an iterator of the identifiers of the annotations of the document, optionally filtering by label or key.

add(ann)[source]#

Attach an annotation to the document.

Parameters:

ann (AnnotationType) – Annotation to add.

Raises:

ValueError – If the annotation is already attached to the document (based on annotation.uid)

get(*, label=None, key=None)[source]#

Return a list of the annotations of the document, optionally filtering by label or key.

Parameters:
  • label (str, optional) – Label to use to filter annotations.

  • key (str, optional) – Key to use to filter annotations.

Return type:

list[Segment]

get_by_id(uid)[source]#

Return the annotation corresponding to a specific identifier.

Parameters:

uid (str) – Identifier of the annotation to return.

Return type:

Segment

get_ids(*, label=None, key=None)#

Return an iterator of the identifiers of the annotations of the document, optionally filtering by label or key.

This method is provided, so it is easier to implement additional filtering in subclasses.

Parameters:
  • label (str, optional) – Label to use to filter annotations.

  • key (str, optional) – Key to use to filter annotations.

Return type:

Iterator[str]

class AudioBuffer(sample_rate, nb_samples, nb_channels)[source]#

Audio buffer base class. Gives access to raw audio samples.

Parameters:
  • sample_rate (int) – Sample rate of the signal, in samples per second.

  • nb_samples (int) – Duration of the signal in samples.

  • nb_channels (int) – Number of channels in the signal.

Attributes:

duration

Duration of the signal in seconds.

Methods:

get_subclass_for_data_dict(data_dict)

Return the subclass that corresponds to the class name found in a data dict

read([copy])

Return the signal in the audio buffer.

trim(start, end)

Return a new audio buffer pointing to portion of the signal in the original buffer, using boundaries in samples.

trim_duration([start_time, end_time])

Return a new audio buffer pointing to a portion of the signal in the original buffer, using boundaries in seconds.

property duration: float#

Duration of the signal in seconds.

Return type:

float

abstract read(copy=False)[source]#

Return the signal in the audio buffer.

Parameters:

copy (bool) – If True, the returned array will be a copy that can be safely mutated.

Return type:

ndarray

Returns:

np.ndarray – Raw audio samples

abstract trim(start, end)[source]#

Return a new audio buffer pointing to portion of the signal in the original buffer, using boundaries in samples.

Parameters:
  • start (int, optional) – Start sample of the new buffer (defaults to 0).

  • end (int, optional) – End sample of the new buffer, excluded (default to full duration).

Return type:

AudioBuffer

Returns:

AudioBuffer – Trimmed audio buffer with new start and end samples, of same type as original audio buffer.

trim_duration(start_time=None, end_time=None)[source]#

Return a new audio buffer pointing to a portion of the signal in the original buffer, using boundaries in seconds. Since start_time and end_time are in seconds, the exact trim boundaries will be rounded to the nearest sample and will therefore depend on the sampling rate.

Parameters:
  • start_time (float, optional) – Start time of the new buffer (defaults to 0.0).

  • end_time (float, optional) – End time of thew new buffer, excluded (default to full duration).

Return type:

AudioBuffer

Returns:

AudioBuffer – Trimmed audio buffer with new start and end samples, of same type as original audio buffer.

classmethod get_subclass_for_data_dict(data_dict)#

Return the subclass that corresponds to the class name found in a data dict

Parameters:

data_dict (dict of str to Any) – Data dict returned by the to_dict() method of a subclass (or of the base class itself)

Return type:

Optional[type[Self]]

Returns:

subclass – Subclass that generated data_dict, or None if data_dict correspond to the base class itself.

class FileAudioBuffer(path, trim_start=None, trim_end=None, sf_info=None)[source]#

Audio buffer giving access to audio files stored on the filesystem (to use when manipulating unmodified raw audio).

Supports all file formats handled by libsndfile (http://www.mega-nerd.com/libsndfile/#Features)

Parameters:
  • path (str or Path) – Path to the audio file.

  • trim_start (int, optional) – First sample of audio file to consider.

  • trim_end (int, optional) – First sample of audio file to exclude.

  • sf_info (Any, optional) – Optional metadata dict returned by soundfile.

Methods:

get_subclass_for_data_dict(data_dict)

Return the subclass that corresponds to the class name found in a data dict

read([copy])

Return the signal in the audio buffer.

trim([start, end])

Return a new audio buffer pointing to portion of the signal in the original buffer, using boundaries in samples.

trim_duration([start_time, end_time])

Return a new audio buffer pointing to a portion of the signal in the original buffer, using boundaries in seconds.

Attributes:

duration

Duration of the signal in seconds.

read(copy=False)[source]#

Return the signal in the audio buffer.

Parameters:

copy (bool) – If True, the returned array will be a copy that can be safely mutated.

Return type:

ndarray

Returns:

np.ndarray – Raw audio samples

trim(start=None, end=None)[source]#

Return a new audio buffer pointing to portion of the signal in the original buffer, using boundaries in samples.

Parameters:
  • start (int, optional) – Start sample of the new buffer (defaults to 0).

  • end (int, optional) – End sample of the new buffer, excluded (default to full duration).

Return type:

AudioBuffer

Returns:

AudioBuffer – Trimmed audio buffer with new start and end samples, of same type as original audio buffer.

property duration: float#

Duration of the signal in seconds.

Return type:

float

classmethod get_subclass_for_data_dict(data_dict)#

Return the subclass that corresponds to the class name found in a data dict

Parameters:

data_dict (dict of str to Any) – Data dict returned by the to_dict() method of a subclass (or of the base class itself)

Return type:

Optional[type[Self]]

Returns:

subclass – Subclass that generated data_dict, or None if data_dict correspond to the base class itself.

trim_duration(start_time=None, end_time=None)#

Return a new audio buffer pointing to a portion of the signal in the original buffer, using boundaries in seconds. Since start_time and end_time are in seconds, the exact trim boundaries will be rounded to the nearest sample and will therefore depend on the sampling rate.

Parameters:
  • start_time (float, optional) – Start time of the new buffer (defaults to 0.0).

  • end_time (float, optional) – End time of thew new buffer, excluded (default to full duration).

Return type:

AudioBuffer

Returns:

AudioBuffer – Trimmed audio buffer with new start and end samples, of same type as original audio buffer.

class MemoryAudioBuffer(signal, sample_rate)[source]#

Audio buffer giving access to signals stored in memory (to use when reading/writing a modified audio signal).

Parameters:
  • signal (ndarray) – Samples constituting the audio signal, with shape (nb_channel, nb_samples).

  • sample_rate (int) – Sample rate of the signal, in samples per second.

Methods:

get_subclass_for_data_dict(data_dict)

Return the subclass that corresponds to the class name found in a data dict

read([copy])

Return the signal in the audio buffer.

trim([start, end])

Return a new audio buffer pointing to portion of the signal in the original buffer, using boundaries in samples.

trim_duration([start_time, end_time])

Return a new audio buffer pointing to a portion of the signal in the original buffer, using boundaries in seconds.

Attributes:

duration

Duration of the signal in seconds.

read(copy=False)[source]#

Return the signal in the audio buffer.

Parameters:

copy (bool) – If True, the returned array will be a copy that can be safely mutated.

Return type:

ndarray

Returns:

np.ndarray – Raw audio samples

trim(start=None, end=None)[source]#

Return a new audio buffer pointing to portion of the signal in the original buffer, using boundaries in samples.

Parameters:
  • start (int, optional) – Start sample of the new buffer (defaults to 0).

  • end (int, optional) – End sample of the new buffer, excluded (default to full duration).

Return type:

AudioBuffer

Returns:

AudioBuffer – Trimmed audio buffer with new start and end samples, of same type as original audio buffer.

property duration: float#

Duration of the signal in seconds.

Return type:

float

classmethod get_subclass_for_data_dict(data_dict)#

Return the subclass that corresponds to the class name found in a data dict

Parameters:

data_dict (dict of str to Any) – Data dict returned by the to_dict() method of a subclass (or of the base class itself)

Return type:

Optional[type[Self]]

Returns:

subclass – Subclass that generated data_dict, or None if data_dict correspond to the base class itself.

trim_duration(start_time=None, end_time=None)#

Return a new audio buffer pointing to a portion of the signal in the original buffer, using boundaries in seconds. Since start_time and end_time are in seconds, the exact trim boundaries will be rounded to the nearest sample and will therefore depend on the sampling rate.

Parameters:
  • start_time (float, optional) – Start time of the new buffer (defaults to 0.0).

  • end_time (float, optional) – End time of thew new buffer, excluded (default to full duration).

Return type:

AudioBuffer

Returns:

AudioBuffer – Trimmed audio buffer with new start and end samples, of same type as original audio buffer.

class AudioDocument(audio, anns=None, attrs=None, metadata=None, uid=None)[source]#

Document holding audio annotations.

Variables:
  • uid (str) – Unique identifier of the document.

  • audio (AudioBuffer) – Audio buffer containing the entire signal of the document.

  • anns (AudioAnnotationContainer) – Annotations of the document. Stored in an AudioAnnotationContainer but can be passed as a list at init.

  • attrs (AttributeContainer) – Attributes of the document. Stored in an AttributeContainer but can be passed as a list at init

  • metadata (dict of str to Any) – Document metadata.

  • raw_segment (Segment) – Auto-generated segment containing the full unprocessed document audio.

Attributes:

RAW_LABEL

Label to be used for raw segment

Methods:

from_dir(path[, pattern])

Create documents from audio files in a directory

from_file(path)

Create document from an audio file

get_subclass_for_data_dict(data_dict)

Return the subclass that corresponds to the class name found in a data dict

RAW_LABEL: typing.ClassVar[str] = 'RAW_AUDIO'#

Label to be used for raw segment

classmethod get_subclass_for_data_dict(data_dict)#

Return the subclass that corresponds to the class name found in a data dict

Parameters:

data_dict (dict of str to Any) – Data dict returned by the to_dict() method of a subclass (or of the base class itself)

Return type:

Optional[type[Self]]

Returns:

subclass – Subclass that generated data_dict, or None if data_dict correspond to the base class itself.

classmethod from_file(path)[source]#

Create document from an audio file

Parameters:

path (path-like) – Path to the audio file. Supports all file formats handled by libsndfile (http://www.mega-nerd.com/libsndfile/#Features)

Return type:

Self

Returns:

AudioDocument – Audio document with signal of path as audio. The file path is included in the document metadata.

classmethod from_dir(path, pattern='*.wav')[source]#

Create documents from audio files in a directory

Parameters:
  • path (path-like) – Path of the directory containing audio files

  • pattern (str, default=”*.wav”) – Glob pattern to match audio files in path. Supports all file formats handled by libsndfile (http://www.mega-nerd.com/libsndfile/#Features)

Return type:

list[Self]

Returns:

List[AudioDocument] – Audio documents with signal of each file as audio

class PreprocessingOperation(uid=None, name=None, **kwargs)[source]#

Abstract operation for pre-processing segments.

It uses a list of segments as input and produces a list of pre-processed segments. Each input segment will have a corresponding output segment.

Common initialization for all annotators:
  • assigning identifier to operation

  • storing class name, name and config in description

Parameters:
  • uid (str, optional) – Operation identifier

  • name (str, optional) – Operation name (defaults to class name)

  • kwargs – All other arguments of the child init useful to describe the operation

Examples

In the __init__ function of your annotator, use:

>>> init_args = locals()
>>> init_args.pop("self")
>>> super().__init__(**init_args)

Attributes:

description

Contains all the operation init parameters.

Methods:

set_prov_tracer(prov_tracer)

Enable provenance tracing.

property description: OperationDescription#

Contains all the operation init parameters.

Return type:

OperationDescription

set_prov_tracer(prov_tracer)#

Enable provenance tracing.

Parameters:

prov_tracer (ProvTracer) – The provenance tracer used to trace the provenance.

class SegmentationOperation(uid=None, name=None, **kwargs)[source]#

Abstract operation for segmenting audio.

It uses a list of segments as input and produces a list of new segments. Each input segment will have zero, one or more corresponding output segments.

Common initialization for all annotators:
  • assigning identifier to operation

  • storing class name, name and config in description

Parameters:
  • uid (str, optional) – Operation identifier

  • name (str, optional) – Operation name (defaults to class name)

  • kwargs – All other arguments of the child init useful to describe the operation

Examples

In the __init__ function of your annotator, use:

>>> init_args = locals()
>>> init_args.pop("self")
>>> super().__init__(**init_args)

Attributes:

description

Contains all the operation init parameters.

Methods:

set_prov_tracer(prov_tracer)

Enable provenance tracing.

property description: OperationDescription#

Contains all the operation init parameters.

Return type:

OperationDescription

set_prov_tracer(prov_tracer)#

Enable provenance tracing.

Parameters:

prov_tracer (ProvTracer) – The provenance tracer used to trace the provenance.

class Span(start, end)[source]#

Boundaries of a slice of audio.

Variables:
  • start (float) – Starting point in the original audio, in seconds.

  • end (float) – Ending point in the original audio, in seconds.

Create new instance of Span(start, end)

Attributes:

end

Alias for field number 1

length

Length of the span, in seconds

start

Alias for field number 0

Methods:

count(value, /)

Return number of occurrences of value.

index(value[, start, stop])

Return first index of value.

start: float#

Alias for field number 0

end: float#

Alias for field number 1

property length#

Length of the span, in seconds

count(value, /)#

Return number of occurrences of value.

index(value, start=0, stop=9223372036854775807, /)#

Return first index of value.

Raises ValueError if the value is not present.