medkit.text.spacy.spacy_utils
=============================

.. py:module:: medkit.text.spacy.spacy_utils


Functions
---------

.. autoapisummary::

   medkit.text.spacy.spacy_utils.extract_anns_and_attrs_from_spacy_doc
   medkit.text.spacy.spacy_utils.build_spacy_doc_from_medkit_doc
   medkit.text.spacy.spacy_utils.build_spacy_doc_from_medkit_segment


Module Contents
---------------

.. py:function:: extract_anns_and_attrs_from_spacy_doc(spacy_doc: spacy.tokens.Doc, medkit_source_ann: medkit.core.text.Segment | None = None, entities: list[str] | None = None, span_groups: list[str] | None = None, attrs: list[str] | None = None, attribute_factories: dict[str, Callable[[spacy.tokens.Span, str], medkit.core.Attribute]] | None = None, rebuild_medkit_anns_and_attrs: bool = False) -> tuple[list[medkit.core.text.Segment], dict[str, list[medkit.core.Attribute]]]

   
   Given a spacy document, convert selected entities or spans into Segments.

   Extract attributes for each annotation in the document.

   :Parameters:

       **spacy_doc** : Doc
           A Spacy Doc with spans to be converted

       **medkit_source_ann** : Segment, optional
           Segment used to rebuild spans referencing the original text

       **entities** : list of str, optional
           Labels of entities to be extracted
           If `None` (default) all new entities will be extracted as annotations

       **span_groups** : list of str, optional
           Name of span groups to be extracted
           If `None` (default) all new spans will be extracted as annotations

       **attrs** : list of str, optional
           Name of custom attributes to extract from the annotations that will be included.
           If `None` (default) all the custom attributes will be extracted

       **attribute_factories** : dict of str to Callable, optional
           Mapping of factories in charge of converting spacy attributes to medkit
           attributes. Factories will receive a spacy span and an attribute label
           when called. The key in the mapping is the attribute label.

       **rebuild_medkit_anns_and_attrs** : bool, default=False
           If True the annotations and attributes with medkit ids will become
           new annotations/attributes with new ids.
           If False (default) the annotations and attributes with medkit ids are not
           rebuilt, only new annotations and attributes are returned

   :Returns:

       annotations: list of Segment
           Segments extracted from the spacy Doc object

       attributes_by_ann: dict of str to list of Attribute
           Attributes extracted for each annotation, the key is a medkit uid




   :Raises:

       ValueError
           Raises when the given medkit source and the spacy doc do not have the same medkit uid









   ..
       !! processed by numpydoc !!

.. py:function:: build_spacy_doc_from_medkit_doc(nlp: spacy.Language, medkit_doc: medkit.core.text.TextDocument, labels_anns: list[str] | None = None, attrs: list[str] | None = None, include_medkit_info: bool = True) -> spacy.tokens.Doc

   
   Create a Spacy Doc from a TextDocument.


   :Parameters:

       **nlp:**
           Language object with the loaded pipeline from Spacy

       **medkit_doc:**
           TextDocument to convert

       **labels_anns:**
           Labels of annotations to include in the spacy document.
           If `None` (default) all the annotations will be included.

       **attrs:**
           Labels of attributes to add in the annotations that will be included.
           If `None` (default) all the attributes will be added as `custom attributes`
           in each annotation included.

       **include_medkit_info:**
           If True, medkitID is included as an extension in the Doc object
           to identify the medkit source annotation.
           If False, no information about IDs is included

   :Returns:

       Doc:
           A Spacy Doc with the selected annotations included.













   ..
       !! processed by numpydoc !!

.. py:function:: build_spacy_doc_from_medkit_segment(nlp: spacy.Language, segment: medkit.core.text.Segment, annotations: list[medkit.core.text.Segment] | None = None, attrs: list[str] | None = None, include_medkit_info: bool = True) -> spacy.tokens.Doc

   
   Create a Spacy Doc from a Segment.


   :Parameters:

       **nlp:**
           Language object with the loaded pipeline from Spacy

       **segment:**
           Segment to convert, this annotation contains the text to create the spacy doc

       **annotations:**
           List of annotations in `segment` to include

       **attrs:**
           Labels of attributes to add in the annotations that will be included.
           If `None` (default) all the attributes will be added as `custom attributes`
           in each annotation included.

       **include_medkit_info:**
           If True, medkitID is included as an extension in the Doc object
           to identify the medkit source annotation.
           If False, no information about IDs is included.

   :Returns:

       Doc:
           A Spacy Doc with the selected annotations included.













   ..
       !! processed by numpydoc !!

