medkit.text.spacy.spacy_utils#
Functions:
|
Create a Spacy Doc from a TextDocument. |
|
Create a Spacy Doc from a Segment. |
|
Given a spacy document, convert selected entities or spans into Segments. |
- extract_anns_and_attrs_from_spacy_doc(spacy_doc, medkit_source_ann=None, entities=None, span_groups=None, attrs=None, attribute_factories=None, rebuild_medkit_anns_and_attrs=False)[source]#
Given a spacy document, convert selected entities or spans into Segments. Extract attributes for each annotation in the document.
- Parameters:
spacy_doc (Doc) – A Spacy Doc with spans to be converted
medkit_source_ann (Segment, optional) – Segment used to rebuild spans referencing the original text
entities (list of str, optional) – Labels of entities to be extracted If None (default) all new entities will be extracted as annotations
span_groups (list of str, optional) – Name of span groups to be extracted If None (default) all new spans will be extracted as annotations
attrs (list of str, optional) – Name of custom attributes to extract from the annotations that will be included. If None (default) all the custom attributes will be extracted
attribute_factories (dict of str to Callable, optional) – Mapping of factories in charge of converting spacy attributes to medkit attributes. Factories will receive a spacy span and an attribute label when called. The key in the mapping is the attribute label.
rebuild_medkit_anns_and_attrs (bool, default=False) – If True the annotations and attributes with medkit ids will become new annotations/attributes with new ids. If False (default) the annotations and attributes with medkit ids are not rebuilt, only new annotations and attributes are returned
- Return type:
- Returns:
annotations (list of Segment) – Segments extracted from the spacy Doc object
attributes_by_ann (dict of str to list of Attribute) – Attributes extracted for each annotation, the key is a medkit uid
- Raises:
ValueError – Raises when the given medkit source and the spacy doc do not have the same medkit uid
- build_spacy_doc_from_medkit_doc(nlp, medkit_doc, labels_anns=None, attrs=None, include_medkit_info=True)[source]#
Create a Spacy Doc from a TextDocument.
- Parameters:
nlp (
Language) – Language object with the loaded pipeline from Spacymedkit_doc (
TextDocument) – TextDocument to convertlabels_anns (
list[str] | None) – Labels of annotations to include in the spacy document. If None (default) all the annotations will be included.attrs (
list[str] | None) – Labels of attributes to add in the annotations that will be included. If None (default) all the attributes will be added as custom attributes in each annotation included.include_medkit_info (
bool) – If True, medkitID is included as an extension in the Doc object to identify the medkit source annotation. If False, no information about IDs is included
- Return type:
Doc- Returns:
Doc – A Spacy Doc with the selected annotations included.
- build_spacy_doc_from_medkit_segment(nlp, segment, annotations=None, attrs=None, include_medkit_info=True)[source]#
Create a Spacy Doc from a Segment.
- Parameters:
nlp (
Language) – Language object with the loaded pipeline from Spacysegment (
Segment) – Segment to convert, this annotation contains the text to create the spacy docannotations (
list[Segment] | None) – List of annotations in segment to includeattrs (
list[str] | None) – Labels of attributes to add in the annotations that will be included. If None (default) all the attributes will be added as custom attributes in each annotation included.include_medkit_info (
bool) – If True, medkitID is included as an extension in the Doc object to identify the medkit source annotation. If False, no information about IDs is included.
- Return type:
Doc- Returns:
Doc – A Spacy Doc with the selected annotations included.