Misc

Taxonomy

See also app.taxonomy_es

Fetching taxonomies files and representing them

See also app.taxonomy_es

class app.taxonomy.Taxonomy(name: str)[source]

A class representing a taxonomy.

For more information about taxonomy, see https://wiki.openfoodfacts.org/Global_taxonomies.

A Taxonomy instance has only a single nodes attribute, that maps the node identifier to a TaxonomyNode.

add(key: str, node: TaxonomyNode) None[source]

Add a node to the taxonomy under the id key.

Parameters:
  • key – The node id

  • node – the TaxonomyNode

find_deepest_nodes(nodes: List[TaxonomyNode]) List[TaxonomyNode][source]

Given a list of nodes, returns the list of nodes where all the parents within the list have been removed.

For example, for a taxonomy, ‘fish’ -> ‘salmon’ -> ‘smoked-salmon’:

[‘fish’, ‘salmon’] -> [‘salmon’] [‘fish’, ‘smoked-salmon’] -> [smoked-salmon’]

classmethod from_dict(name: str, data: dict[str, Any]) Taxonomy[source]

Create a Taxonomy from data.

Parameters:

data – the taxonomy as a dict

Returns:

a Taxonomy

classmethod from_path(name: str, file_path: str | Path) Taxonomy[source]

Create a Taxonomy from a JSON file.

Parameters:

file_path – a JSON file, gzipped (.json.gz) files are supported

Returns:

a Taxonomy

classmethod from_url(name: str, url: str, session: Session | None = None, timeout: int = 120) Taxonomy[source]

Create a Taxonomy from a taxonomy file hosted at url.

Parameters:
  • url – the URL of the taxonomy

  • session – the requests session, use a default session if None

  • timeout – the request timeout, defaults to 120

Returns:

a Taxonomy

get_localized_name(key: str, lang: str) str | None[source]

Return the name of a taxonomy element in a given language.

If key is not in the taxonomy or if no name is available for the requested language, return None.

Parameters:
  • key – the taxonomy element id

  • lang – the 2-letter language code

Returns:

the localized name or None

is_parent_of_any(item: str, candidates: Iterable[str], raises: bool = True) bool[source]

Return True if item is parent of any candidate, False otherwise.

If the item is not in the taxonomy and raises is False, return False.

Parameters:
  • item – The item to compare

  • candidates – A list of candidates

  • raises – if True, raises a ValueError if item is not in the taxonomy, defaults to True.

iter_nodes() Iterable[TaxonomyNode][source]

Iterate over the nodes of the taxonomy.

keys() Iterable[str][source]

Return all node IDs from the taxonomy.

to_dict() dict[str, Any][source]

Generate a dict from the Taxonomy.

pydantic model app.taxonomy.TaxonomyNode[source]

A taxonomy element.

Each node has 0+ parents and 0+ children. Each node has the following attributes:

  • id: the node identifier, it starts with a language prefix (ex: en:)

  • names: a dict mapping language 2-letter code to the node name for this language

  • parents: the list of the node parents

  • children: the list of the node children

  • properties: additional properties of the node (taxonomy-dependent)

  • synonyms: a dict mapping language 2-letter code to a list of synonyms for this language

Show JSON schema
{
   "$defs": {
      "TaxonomyNode": {
         "description": "A taxonomy element.\n\nEach node has 0+ parents and 0+ children. Each node has the following\nattributes:\n\n- `id`: the node identifier, it starts with a language prefix (ex: `en:`)\n- `names`: a dict mapping language 2-letter code to the node name for this\n  language\n- `parents`: the list of the node parents\n- `children`: the list of the node children\n- `properties`: additional properties of the node (taxonomy-dependent)\n- `synonyms`: a dict mapping language 2-letter code to a list of synonyms\n  for this language",
         "properties": {
            "id": {
               "title": "Id",
               "type": "string"
            },
            "names": {
               "additionalProperties": {
                  "type": "string"
               },
               "title": "Names",
               "type": "object"
            },
            "parents": {
               "default": [],
               "items": {
                  "$ref": "#/$defs/TaxonomyNode"
               },
               "title": "Parents",
               "type": "array"
            },
            "children": {
               "default": [],
               "items": {
                  "$ref": "#/$defs/TaxonomyNode"
               },
               "title": "Children",
               "type": "array"
            },
            "synonyms": {
               "additionalProperties": {
                  "items": {
                     "type": "string"
                  },
                  "type": "array"
               },
               "default": {},
               "title": "Synonyms",
               "type": "object"
            },
            "properties": {
               "type": "object",
               "default": {},
               "title": "Properties"
            }
         },
         "required": [
            "id",
            "names"
         ],
         "title": "TaxonomyNode",
         "type": "object"
      }
   },
   "allOf": [
      {
         "$ref": "#/$defs/TaxonomyNode"
      }
   ]
}

Fields:
field children: List[TaxonomyNode] = [][source]
field id: str [Required][source]
field names: Dict[str, str] [Required][source]
field parents: List[TaxonomyNode] = [][source]
field properties: Dict[str, Any] = {}[source]
field synonyms: Dict[str, List[str]] = {}[source]
add_parents(parents: Iterable[TaxonomyNode])[source]
get_localized_name(lang: str, add_xx: bool = False) str | None[source]

Return the localized name of the node.

We first check if there is an entry in names under the provided lang. Otherwise, we check the existence of an international name (xx) if add_xx=True. We return None if both checks failed.

Parameters:

lang – the language code

get_parents_hierarchy() List[TaxonomyNode][source]

Return the list of all parent nodes (direct and indirect).

get_synonyms(lang: str) List[str][source]
is_child_of(item: TaxonomyNode) bool[source]

Return True if item is a child of self in the taxonomy.

is_parent_of(candidate: TaxonomyNode) bool[source]

Return True if self is parent of candidate, False otherwise.

Parameters:

candidate – a TaxonomyNode of the same Taxonomy

is_parent_of_any(candidates: Iterable[TaxonomyNode]) bool[source]

Return True if self is a parent of any of candidates, False otherwise.

Parameters:

candidates – an iterable of TaxonomyNodes of the same Taxonomy

to_dict() dict[str, Any][source]
pydantic model app.taxonomy.TaxonomyNodeResult[source]

Result for a taxonomy node transformation.

This is used to eventually skip entry after preprocessing

Show JSON schema
{
   "title": "TaxonomyNodeResult",
   "description": "Result for a taxonomy node transformation.\n\nThis is used to eventually skip entry after preprocessing",
   "type": "object",
   "properties": {
      "status": {
         "$ref": "#/$defs/FetcherStatus"
      },
      "node": {
         "anyOf": [
            {
               "$ref": "#/$defs/TaxonomyNode"
            },
            {
               "type": "null"
            }
         ]
      }
   },
   "$defs": {
      "FetcherStatus": {
         "description": "Status of a fetcher\n\n* FOUND - document was found, index it\n* REMOVED - document was removed, remove it\n* SKIP - skip this document / update\n* RETRY - retry this document / update later\n* OTHER - unknown error",
         "enum": [
            1,
            -1,
            0,
            2,
            3
         ],
         "title": "FetcherStatus",
         "type": "integer"
      },
      "TaxonomyNode": {
         "description": "A taxonomy element.\n\nEach node has 0+ parents and 0+ children. Each node has the following\nattributes:\n\n- `id`: the node identifier, it starts with a language prefix (ex: `en:`)\n- `names`: a dict mapping language 2-letter code to the node name for this\n  language\n- `parents`: the list of the node parents\n- `children`: the list of the node children\n- `properties`: additional properties of the node (taxonomy-dependent)\n- `synonyms`: a dict mapping language 2-letter code to a list of synonyms\n  for this language",
         "properties": {
            "id": {
               "title": "Id",
               "type": "string"
            },
            "names": {
               "additionalProperties": {
                  "type": "string"
               },
               "title": "Names",
               "type": "object"
            },
            "parents": {
               "default": [],
               "items": {
                  "$ref": "#/$defs/TaxonomyNode"
               },
               "title": "Parents",
               "type": "array"
            },
            "children": {
               "default": [],
               "items": {
                  "$ref": "#/$defs/TaxonomyNode"
               },
               "title": "Children",
               "type": "array"
            },
            "synonyms": {
               "additionalProperties": {
                  "items": {
                     "type": "string"
                  },
                  "type": "array"
               },
               "default": {},
               "title": "Synonyms",
               "type": "object"
            },
            "properties": {
               "type": "object",
               "default": {},
               "title": "Properties"
            }
         },
         "required": [
            "id",
            "names"
         ],
         "title": "TaxonomyNode",
         "type": "object"
      }
   },
   "required": [
      "status",
      "node"
   ]
}

Config:
  • arbitrary_types_allowed: bool = True

Fields:
field node: TaxonomyNode | None [Required][source]
field status: FetcherStatus [Required][source]
app.taxonomy.get_taxonomy(taxonomy_name: str, taxonomy_url: str, force_download: bool = False, download_newer: bool = False, cache_dir: Path | None = None) Taxonomy[source]

Return the taxonomy of the provided name.

The taxonomy file is downloaded and cached locally.

Parameters:
  • taxonomy_name – the requested taxonomy name

  • taxonomy_url – the URL of the taxonomy

  • force_download – if True, (re)download the taxonomy even if it was cached, defaults to False

  • download_newer – if True, download the taxonomy if a more recent version is available (based on file Etag)

  • cache_dir – the cache directory to use, defaults to ~/.cache/openfoodfacts/taxonomy

Returns:

a Taxonomy

app.taxonomy.purge_none_values(d: Dict[str, str | None]) Dict[str, str][source]

Remove None values from a dict.

Health check

This module contains the health check functions for the application.

It is based upon the py-healthcheck library.

app.health.test_connect_es()[source]

Test connection to ElasticSearch.

app.health.test_connect_redis()[source]

Test connection to REDIS.