medkit.core.doc_pipeline#

Classes:

DocPipeline(pipeline, labels_by_input_key[, uid])

Wrapper around the Pipeline class that runs a pipeline on a list (or collection) of documents, retrieving input annotations from each document and attaching output annotations back to documents.

class DocPipeline(pipeline, labels_by_input_key, uid=None)[source]#

Wrapper around the Pipeline class that runs a pipeline on a list (or collection) of documents, retrieving input annotations from each document and attaching output annotations back to documents.

Initialize the pipeline

Parameters
  • pipeline (Pipeline) – Pipeline to execute on documents. Annotations given to pipeline (corresponding to its input_keys) will be retrieved from documents, according to labels_by_input. Annotations returned by pipeline (corresponding to its output_keys) will be added to documents.

  • labels_by_input_key (Dict[str, List[str]]) –

    Labels of existing annotations that should be retrieved from documents and passed to the pipeline as input. One list of labels per input key.

    For the typical use case where the pipeline takes a text document raw segment as input with key “full_text”:

    >>> doc_pipeline = DocPipeline(
    >>>     pipeline,
    >>>     labels_by_input={"full_text": [TextDocument.RAW_SEGMENT]},
    >>> )
    

    Because the values of labels_by_input_key are lists (one per input), it is possible to use annotation with different labels for the same input key.

Methods:

run(docs)

Run the pipeline on a list of documents, adding the output annotations to each document

run(docs)[source]#

Run the pipeline on a list of documents, adding the output annotations to each document

Parameters

docs (List[Document[~AnnotationType]]) – The documents on which to run the pipeline. Labels to input keys association will be used to retrieve existing annotations from each document, and all output annotations will also be added to each corresponding document.

Return type

None