medkit.io
=========

.. py:module:: medkit.io


Submodules
----------

.. toctree::
   :maxdepth: 1

   /reference/api/medkit/io/_brat_utils/index
   /reference/api/medkit/io/_common/index
   /reference/api/medkit/io/brat/index
   /reference/api/medkit/io/doccano/index
   /reference/api/medkit/io/medkit_json/index
   /reference/api/medkit/io/rttm/index
   /reference/api/medkit/io/spacy/index
   /reference/api/medkit/io/srt/index


Classes
-------

.. autoapisummary::

   medkit.io.BratInputConverter
   medkit.io.BratOutputConverter
   medkit.io.DoccanoClientConfig
   medkit.io.DoccanoInputConverter
   medkit.io.DoccanoOutputConverter
   medkit.io.DoccanoTask
   medkit.io.RTTMInputConverter
   medkit.io.RTTMOutputConverter


Package Contents
----------------

.. py:class:: BratInputConverter(detect_cuis_in_notes: bool = True, notes_label: str = 'brat_note', uid: str | None = None)

   Bases: :py:obj:`medkit.core.InputConverter`


   
   Class in charge of converting brat annotations.


   :Parameters:

       **detect_cuis_in_notes** : bool, default=True
           If `True`, strings looking like CUIs in annotator notes of entities
           will be converted to UMLS normalization attributes rather than creating
           an :class:`~.core.Attribute` with the whole note text as value.

       **notes_label** : str, default="brat_note",
           Label to use for attributes created from annotator notes.

       **uid** : str, optional
           Identifier of the converter.

   :Attributes:

       **description** : str
           Description of the operation













   ..
       !! processed by numpydoc !!

   .. py:attribute:: notes_label
      :value: 'brat_note'



   .. py:attribute:: detect_cuis_in_notes
      :value: True



   .. py:attribute:: uid
      :value: None



   .. py:attribute:: _prov_tracer
      :type:  medkit.core.ProvTracer | None
      :value: None



   .. py:property:: description
      :type: medkit.core.OperationDescription



   .. py:method:: set_prov_tracer(prov_tracer: medkit.core.ProvTracer)


   .. py:method:: load(dir_path: str | pathlib.Path, ann_ext: str = ANN_EXT, text_ext: str = TEXT_EXT) -> list[medkit.core.text.TextDocument]

      
      Load brat annotations as text documents.

      Create a list of TextDocuments from a folder containing text files and
      associated brat annotations files.

      :Parameters:

          **dir_path** : str or Path
              The path to the directory containing the text files and the annotation
              files (.ann)

          **ann_ext** : str, optional
              The extension of the brat annotation file (e.g. .ann)

          **text_ext** : str, optional
              The extension of the text file (e.g. .txt)



      :Returns:

          list of TextDocument
              The list of TextDocuments











      ..
          !! processed by numpydoc !!


   .. py:method:: load_doc(ann_path: str | pathlib.Path, text_path: str | pathlib.Path) -> medkit.core.text.TextDocument

      
      Load a brat annotation and text file combo as a text document.

      Create a TextDocument from a .ann file and its associated .txt file.

      :Parameters:

          **ann_path** : str or Path
              The path to the brat annotation file.

          **text_path** : str or Path
              The path to the text document file.



      :Returns:

          TextDocument
              The document containing the text and the annotations











      ..
          !! processed by numpydoc !!


   .. py:method:: load_annotations(ann_file: str | pathlib.Path) -> list[medkit.core.text.TextAnnotation]

      
      Load a brat annotation file as a list of annotations.

      Load a .ann file and return a list of
      :class:`~medkit.core.text.annotation.Annotation` objects.

      :Parameters:

          **ann_file** : str or Path
              Path to the .ann file.



      :Returns:

          list of TextAnnotation
              The list of text annotations











      ..
          !! processed by numpydoc !!


.. py:class:: BratOutputConverter(anns_labels: list[str] | None = None, attrs: list[str] | None = None, notes_label: str = 'brat_note', ignore_segments: bool = True, convert_cuis_to_notes: bool = True, create_config: bool = True, top_values_by_attr: int = 50, uid: str | None = None)

   Bases: :py:obj:`medkit.core.OutputConverter`


   
   Class for converting text documents to a brat collection file.

   .. hint::
       BRAT checks for coherence between span and text for each annotation.
       This converter adjusts the text and spans to get the right visualization
       and ensure compatibility.

   :Parameters:

       **anns_labels** : list of str, optional
           Labels of medkit annotations to convert into Brat annotations.
           If `None` (default) all the annotations will be converted

       **attrs** : list of str, optional
           Labels of medkit attributes to add in the annotations that will be included.
           If `None` (default) all medkit attributes found in the segments or relations
           will be converted to Brat attributes

       **notes_label** : str, default="brat_note"
           Label of attributes that will be converted to annotator notes.

       **ignore_segments** : bool, default=True
           If `True` medkit segments will be ignored. Only entities, attributes and relations
           will be converted to Brat annotations.  If `False` the medkit segments will be
           converted to Brat annotations as well.

       **convert_cuis_to_notes** : bool, default=True
           If `True`, UMLS normalization attributes will be converted to
           annotator notes rather than attributes. For entities with multiple
           UMLS attributes, CUIs will be separated by spaces (ex: "C0011849 C0004096").

       **create_config** : bool, default=True
           Whether to create a configuration file for the generated collection.
           This file defines the types of annotations generated, it is necessary for the correct
           visualization on Brat.

       **top_values_by_attr** : int, default=50
           Defines the number of most common values by attribute to show in the configuration.
           This is useful when an attribute has a large number of values, only the 'top' ones
           will be in the config. By default, the top 50 of values by attr will be in the config.

       **uid** : str, optional
           Identifier of the converter

   :Attributes:

       **description** : str
           Description for the operation













   ..
       !! processed by numpydoc !!

   .. py:attribute:: uid
      :value: ''



   .. py:attribute:: anns_labels
      :value: None



   .. py:attribute:: attrs
      :value: None



   .. py:attribute:: notes_label
      :value: 'brat_note'



   .. py:attribute:: ignore_segments
      :value: True



   .. py:attribute:: convert_cuis_to_notes
      :value: True



   .. py:attribute:: create_config
      :value: True



   .. py:attribute:: top_values_by_attr
      :value: 50



   .. py:property:: description
      :type: medkit.core.OperationDescription



   .. py:method:: save(docs: list[medkit.core.text.TextDocument], dir_path: str | pathlib.Path, doc_names: list[str] | None = None)

      
      Save text documents as brat files.

      Convert and save a collection or list of TextDocuments into a Brat collection.
      For each collection or list of documents, a folder is created with '.txt' and '.ann'
      files. A file named 'annotation.conf' may also be saved if required.

      :Parameters:

          **docs** : list of TextDocument
              List of medkit doc objects to convert

          **dir_path** : str or Path
              String or path object to save the generated files

          **doc_names** : list of str, optional
              Optional list with the names for the generated files. If 'None', 'uid' will
              be used as the name. Where 'uid.txt' has the raw text of the document and
              'uid.ann' the Brat annotation file.














      ..
          !! processed by numpydoc !!


   .. py:method:: _convert_medkit_anns_to_brat(segments: list[medkit.core.text.Segment], relations: list[medkit.core.text.Relation], config: medkit.io._brat_utils.BratAnnConfiguration, raw_text: str) -> list[medkit.io._brat_utils.BratEntity | medkit.io._brat_utils.BratAttribute | medkit.io._brat_utils.BratRelation | medkit.io._brat_utils.BratNote]

      
      Convert Segments, Relations and Attributes into brat data structures.


      :Parameters:

          **segments** : list of Segment
              Medkit segments to convert

          **relations** : list of Relation
              Medkit relations to convert

          **config** : BratAnnConfiguration
              Optional `BratAnnConfiguration` structure, this object is updated
              with the types of the generated Brat annotations.

          **raw_text** : str
              Text of reference to get the original text of the annotations



      :Returns:

          list of BratEntity or BratAttribute or BratRelation or BratNote
              A list of brat annotations











      ..
          !! processed by numpydoc !!


   .. py:method:: _ensure_text_and_spans(segment: medkit.core.text.Segment, raw_text: str) -> tuple[str, list[tuple[int, int]]]
      :staticmethod:


      
      Ensure consistency between the segment and the raw text.

      The text of a BRAT annotation can't contain multiple white spaces (including a newline character).
      This method cleans the fragments' text and adjust its spans to point to the same location in the raw text.

      :Parameters:

          **segment** : Segment
              Segment to ensure

          **raw_text** : str
              Text of reference



      :Returns:

          **text** : str
              The cleaned text

          **spans** : list of tuple
              The adjusted spans











      ..
          !! processed by numpydoc !!


   .. py:method:: _convert_segment_to_brat(segment: medkit.core.text.Segment, nb_segment: int, raw_text: str) -> medkit.io._brat_utils.BratEntity

      
      Get a brat entity from a medkit segment.


      :Parameters:

          **segment** : Segment
              A medkit segment to convert into brat format

          **nb_segment** : int
              The current counter of brat segments

          **raw_text** : str
              Text of reference to get the original text of the segment



      :Returns:

          BratEntity
              The equivalent brat entity of the medkit segment











      ..
          !! processed by numpydoc !!


   .. py:method:: _convert_relation_to_brat(relation: medkit.core.text.Relation, nb_relation: int, brat_entities_by_segment_id: dict[str, medkit.io._brat_utils.BratEntity]) -> tuple[medkit.io._brat_utils.BratRelation, medkit.io._brat_utils.RelationConf]
      :staticmethod:


      
      Get a brat relation from a medkit relation.


      :Parameters:

          **relation** : Relation
              A medkit relation to convert into brat format

          **nb_relation** : int
              The current counter of brat relations

          **brat_entities_by_segment_id** : dict of str to BratEntity
              A dict to map medkit ID to brat annotation



      :Returns:

          **relation** : BratRelation
              The equivalent brat relation of the medkit relation

          **config** : RelationConf
              Configuration of the brat attribute




      :Raises:

          ValueError
              When the source or target was not found in the mapping object







      ..
          !! processed by numpydoc !!


   .. py:method:: _convert_attribute_to_brat(label: str, value: str | None, nb_attribute: int, target_brat_id: str, is_from_entity: bool) -> tuple[medkit.io._brat_utils.BratAttribute, medkit.io._brat_utils.AttributeConf]
      :staticmethod:


      
      Get a brat attribute from a medkit attribute.


      :Parameters:

          **label** : str
              Attribute label to convert into brat format

          **value** : str, optional
              Attribute value

          **nb_attribute** : int
              The current counter of brat attributes

          **target_brat_id** : str
              Corresponding target brat ID



      :Returns:

          **attribute** : BratAttribute
              The equivalent brat attribute of the medkit attribute

          **config** : AttributeConf
              Configuration of the brat attribute











      ..
          !! processed by numpydoc !!


   .. py:method:: _convert_umls_attributes_to_brat_note(cuis: list[str], nb_note: int, target_brat_id: str) -> medkit.io._brat_utils.BratNote
      :staticmethod:


      
      Get a brat note from a medkit umls norm attribute.


      :Parameters:

          **cuis** : list of str
              CUI to convert to brat note

          **nb_note** : int
              The current counter of brat notes

          **target_brat_id** : str
              Corresponding target brat ID



      :Returns:

          BratNote
              The equivalent brat note of the medkit umls attribute











      ..
          !! processed by numpydoc !!


   .. py:method:: _convert_attributes_to_brat_note(values: list[Any], nb_note: int, target_brat_id: str) -> medkit.io._brat_utils.BratNote
      :staticmethod:


      
      Get a brat note from medkit attribute values.


      :Parameters:

          **values** : list of Any
              Attribute values

          **nb_note** : int
              The current counter of brat notes

          **target_brat_id** : str
              Corresponding target brat ID



      :Returns:

          BratNote
              The equivalent brat note of the medkit attribute values











      ..
          !! processed by numpydoc !!


.. py:class:: DoccanoClientConfig

   
   Doccano client configuration.

   The default values are the default values used by doccano.


   :Attributes:

       **column_text** : str, default="text"
           Name or key representing the text

       **column_label** : str, default="label"
           Name or key representing the label













   ..
       !! processed by numpydoc !!

   .. py:attribute:: column_text
      :type:  str
      :value: 'text'



   .. py:attribute:: column_label
      :type:  str
      :value: 'label'



.. py:class:: DoccanoInputConverter(task: DoccanoTask, client_config: DoccanoClientConfig | None = None, attr_label: str = 'doccano_category', uid: str | None = None)

   
   Convert doccano files (.JSONL) containing annotations for a given task.

   For each line, a :class:`~.core.text.TextDocument` will be created.
   The doccano files can be loaded from a directory with zip files or from a jsonl file.

   The converter supports custom configuration to define the parameters used by doccano
   when importing the data (c.f. :class:`~.io.doccano.DoccanoClientConfig`)

   .. warning::
       If the option *Count grapheme clusters as one character*  was selected
       when creating the doccano project, the converted documents are
       likely to have alignment problems; the converter does not support this option.

   :Parameters:

       **task** : DocanoTask
           The doccano task for the input converter

       **client_config** : DoccanoClientConfig, optional
           Optional client configuration to define default values in doccano interface.
           This config can change, for example, the name of the text field or labels.

       **attr_label** : str, default="doccano_category"
           The label to use for the medkit attribute that represents the doccano category.
           This is related to :class:`~.io.DoccanoTask.TEXT_CLASSIFICATION` projects.

       **uid** : str, optional
           Identifier of the converter.

   :Attributes:

       **description** : str
           Description for the operation.













   ..
       !! processed by numpydoc !!

   .. py:attribute:: uid
      :value: None



   .. py:attribute:: client_config
      :value: None



   .. py:attribute:: task


   .. py:attribute:: attr_label
      :value: 'doccano_category'



   .. py:attribute:: _prov_tracer
      :type:  medkit.core.ProvTracer | None
      :value: None



   .. py:method:: set_prov_tracer(prov_tracer: medkit.core.ProvTracer)

      
      Enable provenance tracing.


      :Parameters:

          **prov_tracer** : ProvTracer
              The provenance tracer used to trace the provenance.














      ..
          !! processed by numpydoc !!


   .. py:property:: description
      :type: medkit.core.OperationDescription


      
      Contains all the input converter init parameters.
















      ..
          !! processed by numpydoc !!


   .. py:method:: load_from_directory_zip(dir_path: str | pathlib.Path) -> list[medkit.core.text.TextDocument]

      
      Load text documents from a directory of zip files.

      The zip files should contain JSONL files coming from doccano.

      :Parameters:

          **dir_path** : str or Path
              The path to the directory containing zip files.



      :Returns:

          list of TextDocument
              A list of TextDocuments











      ..
          !! processed by numpydoc !!


   .. py:method:: load_from_zip(input_file: str | pathlib.Path) -> list[medkit.core.text.TextDocument]

      
      Load text documents from a zip file.


      :Parameters:

          **input_file** : str or Path
              The path to the zip file containing a docanno JSONL file



      :Returns:

          list of TextDocument
              A list of TextDocuments











      ..
          !! processed by numpydoc !!


   .. py:method:: load_from_file(input_file: str | pathlib.Path) -> list[medkit.core.text.TextDocument]

      
      Load text documents from a JSONL file.


      :Parameters:

          **input_file** : str or Path
              The path to the JSONL file containing doccano annotations



      :Returns:

          list of TextDocument
              A list of TextDocuments











      ..
          !! processed by numpydoc !!


   .. py:method:: _check_crlf_character(documents: list[medkit.core.text.TextDocument])

      
      Check if the list of converted documents contains the CRLF character.

      This character is the only indicator available to warn if there are alignment
      problems in the documents.















      ..
          !! processed by numpydoc !!


   .. py:method:: _parse_doc_line(doc_line: dict[str, Any]) -> medkit.core.text.TextDocument

      
      Parse a doc_line into a TextDocument depending on the task.


      :Parameters:

          **doc_line** : dict of str to Any
              A dictionary representing an annotation from doccano



      :Returns:

          TextDocument
              A document with parsed annotations.











      ..
          !! processed by numpydoc !!


   .. py:method:: _parse_doc_line_relation_extraction(doc_line: dict[str, Any]) -> medkit.core.text.TextDocument

      
      Parse a dictionary and return a TextDocument with entities and relations.


      :Parameters:

          **doc_line** : dict of str to Any
              Dictionary with doccano annotation



      :Returns:

          TextDocument
              The document with annotations











      ..
          !! processed by numpydoc !!


   .. py:method:: _parse_doc_line_seq_labeling(doc_line: dict[str, Any]) -> medkit.core.text.TextDocument

      
      Parse a dictionary and return a TextDocument with entities.


      :Parameters:

          **doc_line** : dict of str to Any
              Dictionary with doccano annotation.



      :Returns:

          TextDocument
              The document with annotations











      ..
          !! processed by numpydoc !!


   .. py:method:: _parse_doc_line_text_classification(doc_line: dict[str, Any]) -> medkit.core.text.TextDocument

      
      Parse a dictionary and return a TextDocument with an attribute.


      :Parameters:

          **doc_line** : dict of str to Any
              Dictionary with doccano annotation.



      :Returns:

          TextDocument
              The document with its category











      ..
          !! processed by numpydoc !!


.. py:class:: DoccanoOutputConverter(task: DoccanoTask, anns_labels: list[str] | None = None, attr_label: str | None = None, ignore_segments: bool = True, include_metadata: bool | None = True, uid: str | None = None)

   
   Convert medkit files to doccano files (.JSONL) for a given task.

   For each :class:`~medkit.core.text.TextDocument` a jsonline will be created.

   :Parameters:

       **task** : DoccanoTask
           The doccano task for the input converter

       **anns_labels** : list of str, optional
           Labels of medkit annotations to convert into doccano annotations.
           If `None` (default) all the entities or relations will be converted.
           Useful for :class:`~.io.DoccanoTask.SEQUENCE_LABELING` or
           :class:`~.io.DoccanoTask.RELATION_EXTRACTION` converters.

       **attr_label** : str, optional
           The label of the medkit attribute that represents the text category.
           Useful for :class:`~.io.DoccanoTask.TEXT_CLASSIFICATION` converters.

       **ignore_segments** : bool, default=True
           If `True` medkit segments will be ignored. Only entities will be
           converted to Doccano entities.  If `False` the medkit segments will
           be converted to Doccano entities as well.
           Useful for :class:`~.io.DoccanoTask.SEQUENCE_LABELING` or
           :class:`~.io.DoccanoTask.RELATION_EXTRACTION` converters.

       **include_metadata** : bool, default=True
           Whether include medkit metadata in the converted documents

       **uid** : str, optional
           Identifier of the converter.

   :Attributes:

       **description** : str
           Description for the operation.













   ..
       !! processed by numpydoc !!

   .. py:attribute:: uid
      :value: None



   .. py:attribute:: task


   .. py:attribute:: anns_labels
      :value: None



   .. py:attribute:: attr_label
      :value: None



   .. py:attribute:: ignore_segments
      :value: True



   .. py:attribute:: include_metadata
      :value: True



   .. py:property:: description
      :type: medkit.core.OperationDescription



   .. py:method:: save(docs: list[medkit.core.text.TextDocument], output_file: str | pathlib.Path)

      
      Convert and save a list of TextDocuments into a doccano file (.JSONL).


      :Parameters:

          **docs** : list of TextDocument
              List of medkit doc objects to convert

          **output_file** : str or Path
              Path or string of the JSONL file where to save the converted documents














      ..
          !! processed by numpydoc !!


   .. py:method:: _convert_doc_by_task(medkit_doc: medkit.core.text.TextDocument) -> dict[str, Any]

      
      Convert a TextDocument into a dictionary depending on the task.


      :Parameters:

          **medkit_doc** : TextDocument
              Document to convert



      :Returns:

          dict of str to Any
              Dictionary with doccano annotation











      ..
          !! processed by numpydoc !!


   .. py:method:: _convert_doc_relation_extraction(medkit_doc: medkit.core.text.TextDocument) -> dict[str, Any]

      
      Convert a TextDocument to a doc_line compatible with the doccano relation extraction task.


      :Parameters:

          **medkit_doc** : TextDocument
              Document to convert, it may contain entities and relations.



      :Returns:

          dict of str to Any
              Dictionary with doccano annotation. It may contain text, entities and relations.











      ..
          !! processed by numpydoc !!


   .. py:method:: _convert_doc_seq_labeling(medkit_doc: medkit.core.text.TextDocument) -> dict[str, Any]

      
      Convert a TextDocument to a doc_line compatible with the doccano sequence labeling task.


      :Parameters:

          **medkit_doc** : TextDocument
              Document to convert, it may contain entities.



      :Returns:

          dict of str to Any
              Dictionary with doccano annotation. It may contain
              text ans its label (a list of tuples representing entities).











      ..
          !! processed by numpydoc !!


   .. py:method:: _convert_doc_text_classification(medkit_doc: medkit.core.text.TextDocument) -> dict[str, Any]

      
      Convert a TextDocument to a doc_line compatible with the doccano text classification task.


      :Parameters:

          **medkit_doc** : TextDocument
              Document to convert, it may contain at least one attribute to convert.



      :Returns:

          dict of str to Any
              Dictionary with doccano annotation. It may contain
              text ans its label (a category(str)).











      ..
          !! processed by numpydoc !!


.. py:class:: DoccanoTask(*args, **kwds)

   Bases: :py:obj:`enum.Enum`


   
   Supported doccano tasks.



   :Attributes:

       **TEXT_CLASSIFICATION**
           Documents with a category

       **RELATION_EXTRACTION**
           Documents with entities and relations (including IDs)

       **SEQUENCE_LABELING**
           Documents with entities in tuples













   ..
       !! processed by numpydoc !!

   .. py:attribute:: TEXT_CLASSIFICATION
      :value: 'text_classification'



   .. py:attribute:: RELATION_EXTRACTION
      :value: 'relation_extraction'



   .. py:attribute:: SEQUENCE_LABELING
      :value: 'sequence_labeling'



.. py:class:: RTTMInputConverter(turn_label: str = 'turn', speaker_label: str = 'speaker', converter_id: str | None = None)

   Bases: :py:obj:`medkit.core.InputConverter`


   
   Class for conversions from Rich Transcription Time Marked (.rttm) into turn segments.

   Convert Rich Transcription Time Marked (.rttm) files containing diarization
   information into turn segments.

   For each turn in a .rttm file containing diarization information, a
   :class:`~medkit.core.audio.annotation.Segment` will be created, with an
   associated :class:`~medkit.core.Attribute` holding the name of the turn
   speaker as value. The segments can be retrieved directly or as part of an
   :class:`~medkit.core.audio.document.AudioDocument` instance.

   If a :class:`~medkit.core.ProvTracer` is set, provenance information will be
   added for each segment and each attribute (referencing the input converter
   as the operation).

   :Parameters:

       **turn_label** : str, default="turn"
           Label of segments representing turns in the .rttm file.

       **speaker_label** : str, default="speaker"
           Label of speaker attributes to add to each segment.

       **converter_id** : str, optional
           Identifier of the converter.

   :Attributes:

       **description** : OperationDescription
           Description for the operation.













   ..
       !! processed by numpydoc !!

   .. py:attribute:: uid
      :value: None



   .. py:attribute:: turn_label
      :value: 'turn'



   .. py:attribute:: speaker_label
      :value: 'speaker'



   .. py:attribute:: _prov_tracer
      :type:  medkit.core.ProvTracer | None
      :value: None



   .. py:property:: description
      :type: medkit.core.OperationDescription


      
      Contains all the input converter init parameters.
















      ..
          !! processed by numpydoc !!


   .. py:method:: set_prov_tracer(prov_tracer: medkit.core.ProvTracer)

      
      Enable provenance tracing.


      :Parameters:

          **prov_tracer:**
              The provenance tracer used to trace the provenance.














      ..
          !! processed by numpydoc !!


   .. py:method:: load(rttm_dir: str | pathlib.Path, audio_dir: str | pathlib.Path | None = None, audio_ext: str = '.wav') -> list[medkit.core.audio.AudioDocument]

      
      Load all .rttm files in a directory into a list of audio documents.

      For each .rttm file, they must be a corresponding audio file with the
      same basename, either in the same directory or in an separated audio
      directory.

      :Parameters:

          **rttm_dir** : str or Path
              Directory containing the .rttm files.

          **audio_dir** : str or Path, optional
              Directory containing the audio files corresponding to the .rttm files,
              if they are not in `rttm_dir`.

          **audio_ext** : str, default=".wav"
              File extension to use for audio files.



      :Returns:

          list of AudioDocument
              List of generated documents.











      ..
          !! processed by numpydoc !!


   .. py:method:: load_doc(rttm_file: str | pathlib.Path, audio_file: str | pathlib.Path) -> medkit.core.audio.AudioDocument

      
      Load a single .rttm file into an audio document.


      :Parameters:

          **rttm_file** : str or Path
              Path to the .rttm file.

          **audio_file** : str or Path
              Path to the corresponding audio file.



      :Returns:

          AudioDocument
              Generated document.











      ..
          !! processed by numpydoc !!


   .. py:method:: load_turns(rttm_file: str | pathlib.Path, audio_file: str | pathlib.Path) -> list[medkit.core.audio.Segment]

      
      Load a .rttm file as a list of segments.


      :Parameters:

          **rttm_file** : str or Path
              Path to the .rttm file.

          **audio_file** : str or Path
              Path to the corresponding audio file.



      :Returns:

          list of Segment
              Turn segments as found in the .rttm file.











      ..
          !! processed by numpydoc !!


   .. py:method:: _load_rows(rttm_file: pathlib.Path)
      :staticmethod:



   .. py:method:: _build_turn_segment(row: dict[str, Any], full_audio: medkit.core.audio.FileAudioBuffer) -> medkit.core.audio.Segment


.. py:class:: RTTMOutputConverter(turn_label: str = 'turn', speaker_label: str = 'speaker')

   Bases: :py:obj:`medkit.core.OutputConverter`


   
   Class for conversions to Rich Transcription Time Marked (.rttm).

   Build Rich Transcription Time Marked (.rttm) files containing diarization
   information from :class:`~medkit.core.audio.annotation.Segment` objects.

   There must be a segment for each turn, with an associated
   :class:`~medkit.core.Attribute` holding the name of the turn speaker as
   value. The segments can be passed directly or as part of
   :class:`~medkit.core.audio.document.AudioDocument` instances.

   :Parameters:

       **turn_label** : str, default="turn"
           Label of segments representing turns in the audio documents.

       **speaker_label** : str, default="speaker"
           Label of speaker attributes attached to each turn segment.














   ..
       !! processed by numpydoc !!

   .. py:attribute:: turn_label
      :value: 'turn'



   .. py:attribute:: speaker_label
      :value: 'speaker'



   .. py:method:: save(docs: list[medkit.core.audio.AudioDocument], rttm_dir: str | pathlib.Path, doc_names: list[str] | None = None)

      
      Save a collection of audio documents to RTTM files in a directory.


      :Parameters:

          **docs** : list of AudioDocument
              List of audio documents to save.

          **rttm_dir** : str or Path
              Directory into which the generated .rttm files will be stored.

          **doc_names** : list of str, optional
              Optional list of names to use as basenames and file ids for the
              generated .rttm files (2d column). If none provided, the document
              ids will be used.














      ..
          !! processed by numpydoc !!


   .. py:method:: save_doc(doc: medkit.core.audio.AudioDocument, rttm_file: str | pathlib.Path, rttm_doc_id: str | None = None)

      
      Save a single audio document to a RTTM file.


      :Parameters:

          **doc** : AudioDocument
              Audio document to save.

          **rttm_file** : str or Path
              Path of the generated .rttm file.

          **rttm_doc_id** : str, optional
              File uid to use for the generated .rttm file (2d column). If none
              provided, the document uid will be used.














      ..
          !! processed by numpydoc !!


   .. py:method:: save_turn_segments(turn_segments: list[medkit.core.audio.Segment], rttm_file: str | pathlib.Path, rttm_doc_id: str | None)

      
      Save :class:`~medkit.core.audio.annotation.Segment` objects into a .rttm file.


      :Parameters:

          **turn_segments** : list of Segment
              Turn segments to save.

          **rttm_file** : str or Path
              Path of the generated .rttm file.

          **rttm_doc_id** : str, optional
              File uid to use for the generated .rttm file (2d column).














      ..
          !! processed by numpydoc !!


   .. py:method:: _build_rttm_row(turn_segment: medkit.core.audio.Segment, rttm_doc_id: str | None) -> dict[str, Any]


