medkit.text.spacy.doc_pipeline

medkit.text.spacy.doc_pipeline#

Module Contents#

Classes#

SpacyDocPipeline

DocPipeline to obtain annotations created using spacy.

class medkit.text.spacy.doc_pipeline.SpacyDocPipeline(nlp: spacy.Language, medkit_labels_anns: list[str] | None = None, medkit_attrs: list[str] | None = None, spacy_entities: list[str] | None = None, spacy_span_groups: list[str] | None = None, spacy_attrs: list[str] | None = None, medkit_attribute_factories: dict[str, Callable[[spacy.tokens.Span, str], medkit.core.Attribute]] | None = None, name: str | None = None, uid: str | None = None)#

Bases: medkit.core.DocOperation

DocPipeline to obtain annotations created using spacy.

run(medkit_docs: list[medkit.core.text.TextDocument]) None#

Run a spacy pipeline on a list of medkit documents.

Each medkit document is converted to spacy document (Doc object), with the selected annotations and attributes. Then, the spacy pipeline is executed and finally, the new annotations and attributes are converted into medkit annotations.

Parameters:
medkit_docslist of TextDocument

List of TextDocuments on which to run the pipeline