medkit.core.audio
Contents
medkit.core.audio#
APIs#
For accessing these APIs, you may use import like this:
from medkit.core.audio import <api_to_import>
Classes:
|
Manage a list of audio annotations belonging to an audio document. |
|
Audio buffer base class. |
|
Document holding audio annotations. |
|
Audio buffer giving access to audio files stored on the filesystem (to use when manipulating unmodified raw audio). |
|
Audio buffer giving acces to signals stored in memory (to use when reading/writing a modified audio signal). |
|
Abstract operation for pre-processing segments. |
|
Audio segment referencing part of an |
|
Abstract operation for segmenting audio. |
|
Boundaries of a slice of audio. |
- class Segment(label, audio, span, attrs=None, metadata=None, uid=None)[source]#
Audio segment referencing part of an
AudioDocument.- Variables
uid (str) – Unique identifier of the segment.
label (str) – Label of the segment.
audio (medkit.core.audio.audio_buffer.AudioBuffer) – The audio signal of the segment. It must be consistent with the span, in the sense that it must correspond to the audio signal of the document at the span boundaries. But it can be a modified, processed version of this audio signal.
span (medkit.core.audio.span.Span) – Span (in seconds) indicating the part of the document’s full signal that this segment references.
attrs (medkit.core.attribute_container.AttributeContainer) – Attributes of the segment. Stored in a :class:{~medkit.core.AttributeContainer} but can be passed as a list at init.
metadata (Dict[str, Any]) – Metadata of the segment.
keys (Set[str]) – Pipeline output keys to which the annotation belongs to.
- class AudioAnnotationContainer(doc_id, raw_segment)[source]#
Manage a list of audio annotations belonging to an audio document.
This behaves more or less like a list: calling len() and iterating are supported. Additional filtering is available through the get() method.
Also provides handling of raw segment.
- class AudioBuffer(sample_rate, nb_samples, nb_channels)[source]#
Audio buffer base class. Gives access to raw audio samples.
- Parameters
sample_rate (
int) – Sample rate of the signal, in samples per second.nb_samples (
int) – Duration of the signal in samples.nb_channels (
int) – Number of channels in the signal.
Attributes:
Duration of the signal in seconds.
Methods:
read([copy])Return the signal in the audio buffer.
trim(start, end)Return a new audio buffer pointing to portion of the signal in the original buffer, using boundaries in samples.
trim_duration([start_time, end_time])Return a new audio buffer pointing to a portion of the signal in the original buffer, using boundaries in seconds.
- property duration: float#
Duration of the signal in seconds.
- Return type
float
- abstract read(copy=False)[source]#
Return the signal in the audio buffer.
- Parameters
copy (
bool) – If True, the returned array will be a copy that can be safely mutated.- Return type
ndarray- Returns
np.ndarray – Raw audio samples
- abstract trim(start, end)[source]#
Return a new audio buffer pointing to portion of the signal in the original buffer, using boundaries in samples.
- Parameters
start (
Optional[int]) – Start sample of the new buffer (defaults to 0).end (
Optional[int]) – End sample of the new buffer, excluded (default to full duration).
- Return type
- Returns
AudioBuffer – Trimmed audio buffer with new start and end samples, of same type as original audio buffer.
- trim_duration(start_time=None, end_time=None)[source]#
Return a new audio buffer pointing to a portion of the signal in the original buffer, using boundaries in seconds. Since start_time and end_time are in seconds, the exact trim boundaries will be rounded to the nearest sample and will therefore depend on the sampling rate.
- Parameters
start_time (
Optional[float]) – Start time of the new buffer (defaults to 0.0).end_time (
Optional[float]) – End time of thew new buffer, excluded (default to full duration).
- Return type
- Returns
AudioBuffer – Trimmed audio buffer with new start and end samples, of same type as original audio buffer.
- class FileAudioBuffer(path, trim_start=None, trim_end=None, sf_info=None)[source]#
Audio buffer giving access to audio files stored on the filesystem (to use when manipulating unmodified raw audio).
- Parameters
path (
Union[str,Path]) – Path to the audio file.trim_start (
Optional[int]) – First sample of audio file to consider.trim_end (
Optional[int]) – First sample of audio file to exclude.sf_info (
Optional[Any]) – Optional metadata dict returned by soundfile.
- class MemoryAudioBuffer(signal, sample_rate)[source]#
Audio buffer giving acces to signals stored in memory (to use when reading/writing a modified audio signal).
- Parameters
signal (
ndarray) – Samples constituting the audio signal, with shape (nb_channel, nb_samples).sample_rate (
int) – Sample rate of the signal, in samples per second.
- class AudioDocument(audio, anns=None, metadata=None, uid=None)[source]#
Document holding audio annotations.
- Variables
uid (str) – Unique identifier of the document.
audio – Audio buffer containing the entire signal of the document.
anns (
AudioAnnotationContainer) – Annotations of the document. Stored in anAudioAnnotationContainerbut can be passed as a list at init.metadata (Dict[str, Any]) – Document metadata.
raw_segment (
Segment) – Auto-generated segment containing the full unprocessed document audio.
Attributes:
Label to be used for raw segment
- RAW_LABEL: ClassVar[str] = 'RAW_AUDIO'#
Label to be used for raw segment
- class PreprocessingOperation(uid=None, name=None, **kwargs)[source]#
Abstract operation for pre-processing segments.
It uses a list of segments as input and produces a list of pre-processed segments. Each input segment will have a corresponding output segment.
- Common initialization for all annotators:
assigning identifier to operation
storing class name, name and config in description
- Parameters
uid (str) – Operation identifier
name – Operation name (defaults to class name)
kwargs – All other arguments of the child init useful to describe the operation
Examples
In the __init__ function of your annotator, use:
>>> init_args = locals() >>> init_args.pop('self') >>> super().__init__(**init_args)
- class SegmentationOperation(uid=None, name=None, **kwargs)[source]#
Abstract operation for segmenting audio.
It uses a list of segments as input and produces a list of new segments. Each input segment will have zero, one or more corresponding output segments.
- Common initialization for all annotators:
assigning identifier to operation
storing class name, name and config in description
- Parameters
uid (str) – Operation identifier
name – Operation name (defaults to class name)
kwargs – All other arguments of the child init useful to describe the operation
Examples
In the __init__ function of your annotator, use:
>>> init_args = locals() >>> init_args.pop('self') >>> super().__init__(**init_args)
- class Span(start, end)[source]#
Boundaries of a slice of audio.
- Variables
start (float) – Starting point in the original audio, in seconds.
end (float) – Ending point in the original audio, in seconds.
Create new instance of Span(start, end)
Attributes:
Alias for field number 1
Length of the span, in seconds
Alias for field number 0
- property start#
Alias for field number 0
- property end#
Alias for field number 1
- property length#
Length of the span, in seconds