:py:mod:`medkit.text.context.family_detector`
=============================================

.. py:module:: medkit.text.context.family_detector


Module Contents
---------------

Classes
~~~~~~~

.. autoapisummary::

   medkit.text.context.family_detector.FamilyDetectorRule
   medkit.text.context.family_detector.FamilyMetadata
   medkit.text.context.family_detector.FamilyDetector




.. py:class:: FamilyDetectorRule


   
   Regexp-based rule to use with `FamilyDetector`.

   Input text may be converted before detecting rule.

   :Parameters:

       **regexp** : str
           The regexp pattern used to match a family reference

       **exclusion_regexps** : list of str, optional
           Optional exclusion patterns

       **id** : str, optional
           Unique identifier of the rule to store in the metadata of the entities

       **case_sensitive** : bool, default=False
           Whether to consider case when running `regexp and `exclusion_regexs`

       **unicode_sensitive** : bool, default=False
           If True, rule matches are searched on unicode text.
           So, `regexp` and `exclusion_regexps` shall not contain non-ASCII chars because
           they would never be matched.
           If False, rule matches are searched on closest ASCII text when possible.
           (cf. FamilyDetector)














   ..
       !! processed by numpydoc !!
   .. py:attribute:: regexp
      :type: str

      

   .. py:attribute:: exclusion_regexps
      :type: list[str]

      

   .. py:attribute:: id
      :type: str | None

      

   .. py:attribute:: case_sensitive
      :type: bool
      :value: False

      

   .. py:attribute:: unicode_sensitive
      :type: bool
      :value: False

      

   .. py:method:: __post_init__()



.. py:class:: FamilyMetadata


   Bases: :py:obj:`typing_extensions.TypedDict`

   
   Metadata dict added to family attributes with `True` value.


   :Parameters:

       **rule_id** : str or int
           Identifier of the rule used to detect a family reference.
           If the rule has no id, then the index of the rule in
           the list of rules is used instead.














   ..
       !! processed by numpydoc !!
   .. py:attribute:: rule_id
      :type: str | int

      


.. py:class:: FamilyDetector(output_label: str, rules: list[FamilyDetectorRule] | None = None, uid: str | None = None)


   Bases: :py:obj:`medkit.core.text.ContextOperation`

   
   Annotator for creating family attributes.

   Annotator creating family attributes with boolean values
   indicating if a family reference has been detected.

   Because family attributes will be attached to whole annotations,
   each input annotation should be "local"-enough rather than
   a big chunk of text (ie a sentence or a syntagma).

   For detecting family references, the module uses rules that may be sensitive to unicode or
   not. When the rule is not sensitive to unicode, we try to convert unicode chars to
   the closest ascii chars. However, some characters need to be pre-processed before
   (e.g., `n°` -> `number`). So, if the text lengths are different, we fall back on
   initial unicode text for detection even if rule is not unicode-sensitive.
   In this case, a warning is logged for recommending to pre-process data.

   Note that for better results, family detection should be run at the sentence
   level (ie on sentence segments) rather than at the syntagma level [1].

   :Parameters:

       **output_label** : str
           The label of the created attributes

       **rules** : list of FamilyDetectorRule, optional
           The set of rules to use when detecting family references. If none provided,
           the rules in "family_detector_default_rules.yml" will be used

       **uid** : str, optional
           Identifier of the detector










   .. rubric:: References

   [1] Garcelon, N., Neuraz, A., Benoit, V., Salomon, R., & Burgun, A. (2017).
       Improving a full-text search engine: the importance of negation detection and family history context
       to identify cases in a biomedical data warehouse.
       Journal of the American Medical Informatics Association : JAMIA, 24(3), 607-613.
       https://doi.org/10.1093/jamia/ocw144

   .. only:: latex

      




   ..
       !! processed by numpydoc !!
   .. py:method:: run(segments: list[medkit.core.text.Segment])

      
      Run the operation.

      Add a family attribute to each segment with a boolean value
      indicating if a family reference has been detected.

      Family attributes with a `True` value have a metadata dict with
      fields described in :class:`.FamilyMetadata`.

      :Parameters:

          **segments** : list of Segment
              List of segments to detect as being family references or not














      ..
          !! processed by numpydoc !!

   .. py:method:: _detect_family_ref_in_segment(segment: medkit.core.text.Segment) -> medkit.core.Attribute | None


   .. py:method:: _find_matching_rule(text: str) -> str | int | None


   .. py:method:: load_rules(path_to_rules: pathlib.Path, encoding: str | None = None) -> list[FamilyDetectorRule]
      :staticmethod:

      
      Load all rules stored in a yml file.


      :Parameters:

          **path_to_rules** : Path
              Path to a yml file containing a list of mappings
              with the same structure as `FamilyDetectorRule`

          **encoding** : str, optional
              Encoding of the file to open

      :Returns:

          list of FamilyDetectorRule
              List of all the rules in `path_to_rules`,
              can be used to init a `FamilyDetector`













      ..
          !! processed by numpydoc !!

   .. py:method:: check_rules_sanity(rules: list[FamilyDetectorRule])
      :staticmethod:

      
      Check consistency of a set of rules.
















      ..
          !! processed by numpydoc !!

   .. py:method:: save_rules(rules: list[FamilyDetectorRule], path_to_rules: pathlib.Path, encoding: str | None = None)
      :staticmethod:

      
      Store rules in a YAML file.


      :Parameters:

          **rules** : list of FamilyDetectorRule
              The rules to save

          **path_to_rules** : Path
              Path to a .yml file that will contain the rules

          **encoding** : str, optional
              Encoding of the .yml file














      ..
          !! processed by numpydoc !!


