medkit.core.doc_pipeline
========================

.. py:module:: medkit.core.doc_pipeline


Classes
-------

.. autoapisummary::

   medkit.core.doc_pipeline.DocPipeline


Module Contents
---------------

.. py:class:: DocPipeline(pipeline: medkit.core.pipeline.Pipeline, labels_by_input_key: dict[str, list[str]] | None = None, uid: str | None = None)

   Bases: :py:obj:`medkit.core.operation.DocOperation`, :py:obj:`Generic`\ [\ :py:obj:`medkit.core.annotation.AnnotationType`\ ]


   Convenience wrapper to facilitate running pipelines on a collection of documents.

   Wrapper around the `Pipeline` class that runs a pipeline on a list
   (or collection) of documents, retrieving input annotations from each document
   and attaching output annotations back to documents.

   :Parameters:

       **pipeline** : Pipeline
           Pipeline to execute on documents.
           Annotations given to `pipeline` (corresponding to its `input_keys`) will
           be retrieved from documents, according to `labels_by_input`.
           Annotations returned by `pipeline` (corresponding to its `output_keys`)
           will be added to documents.

       **labels_by_input_key** : dict of str to list of str, optional
           Optional labels of existing annotations that should be retrieved from
           documents and passed to the pipeline as input. One list of labels
           per input key.
           
           When `labels_by_input_key` is not provided, it is assumed that the
           `pipeline` just expects the document raw segments as input.
           
           For the use case where the documents contain pre-existing sentence segments
           labelled as "SENTENCE", that we want to pass the "sentences" input
           key of the pipeline:


   .. rubric:: Examples

   >>> doc_pipeline = DocPipeline(
   >>>     pipeline,
   >>>     labels_by_input={"sentences": ["SENTENCE"]},
   >>> )

   Because the values of `labels_by_input_key` are lists (one per input),
   it is possible to use annotation with different labels for the same input key.

   ..
       !! processed by numpydoc !!

   .. py:attribute:: pipeline


   .. py:attribute:: labels_by_input_key
      :type:  dict[str, list[str]] | None
      :value: None


   .. py:method:: set_prov_tracer(prov_tracer: medkit.core.prov_tracer.ProvTracer)

      
      Enable provenance tracing.


      :Parameters:

          **prov_tracer: ProvTracer**
              The provenance tracer used to trace the provenance.


      ..
          !! processed by numpydoc !!


   .. py:method:: run(docs: list[medkit.core.document.Document[medkit.core.annotation.AnnotationType]]) -> None

      
      Run the pipeline on a list of documents, adding the output annotations to each document.


      :Parameters:

          **docs** : list of Document
              The documents on which to run the pipeline.
              Labels to input keys association will be used to retrieve existing
              annotations from each document, and all output annotations will also
              be added to each corresponding document.


      ..
          !! processed by numpydoc !!


   .. py:method:: _process_doc(doc: medkit.core.document.Document[medkit.core.annotation.AnnotationType])