Misc¶
Taxonomy¶
See also app.taxonomy_es
Fetching taxonomies files and representing them
See also app.taxonomy_es
- class app.taxonomy.Taxonomy(name: str)[source]¶
A class representing a taxonomy.
For more information about taxonomy, see https://wiki.openfoodfacts.org/Global_taxonomies.
A Taxonomy instance has only a single nodes attribute, that maps the node identifier to a TaxonomyNode.
- add(key: str, node: TaxonomyNode) None [source]¶
Add a node to the taxonomy under the id key.
- Parameters:
key – The node id
node – the TaxonomyNode
- find_deepest_nodes(nodes: List[TaxonomyNode]) List[TaxonomyNode] [source]¶
Given a list of nodes, returns the list of nodes where all the parents within the list have been removed.
For example, for a taxonomy, ‘fish’ -> ‘salmon’ -> ‘smoked-salmon’:
[‘fish’, ‘salmon’] -> [‘salmon’] [‘fish’, ‘smoked-salmon’] -> [smoked-salmon’]
- classmethod from_dict(name: str, data: dict[str, Any]) Taxonomy [source]¶
Create a Taxonomy from data.
- Parameters:
data – the taxonomy as a dict
- Returns:
a Taxonomy
- classmethod from_path(name: str, file_path: str | Path) Taxonomy [source]¶
Create a Taxonomy from a JSON file.
- Parameters:
file_path – a JSON file, gzipped (.json.gz) files are supported
- Returns:
a Taxonomy
- classmethod from_url(name: str, url: str, session: Session | None = None, timeout: int = 120) Taxonomy [source]¶
Create a Taxonomy from a taxonomy file hosted at url.
- Parameters:
url – the URL of the taxonomy
session – the requests session, use a default session if None
timeout – the request timeout, defaults to 120
- Returns:
a Taxonomy
- get_localized_name(key: str, lang: str) str | None [source]¶
Return the name of a taxonomy element in a given language.
If key is not in the taxonomy or if no name is available for the requested language, return None.
- Parameters:
key – the taxonomy element id
lang – the 2-letter language code
- Returns:
the localized name or None
- is_parent_of_any(item: str, candidates: Iterable[str], raises: bool = True) bool [source]¶
Return True if item is parent of any candidate, False otherwise.
If the item is not in the taxonomy and raises is False, return False.
- Parameters:
item – The item to compare
candidates – A list of candidates
raises – if True, raises a ValueError if item is not in the taxonomy, defaults to True.
- iter_nodes() Iterable[TaxonomyNode] [source]¶
Iterate over the nodes of the taxonomy.
- pydantic model app.taxonomy.TaxonomyNode[source]¶
A taxonomy element.
Each node has 0+ parents and 0+ children. Each node has the following attributes:
id: the node identifier, it starts with a language prefix (ex: en:)
names: a dict mapping language 2-letter code to the node name for this language
parents: the list of the node parents
children: the list of the node children
properties: additional properties of the node (taxonomy-dependent)
synonyms: a dict mapping language 2-letter code to a list of synonyms for this language
Show JSON schema
{ "$defs": { "TaxonomyNode": { "description": "A taxonomy element.\n\nEach node has 0+ parents and 0+ children. Each node has the following\nattributes:\n\n- `id`: the node identifier, it starts with a language prefix (ex: `en:`)\n- `names`: a dict mapping language 2-letter code to the node name for this\n language\n- `parents`: the list of the node parents\n- `children`: the list of the node children\n- `properties`: additional properties of the node (taxonomy-dependent)\n- `synonyms`: a dict mapping language 2-letter code to a list of synonyms\n for this language", "properties": { "id": { "title": "Id", "type": "string" }, "names": { "additionalProperties": { "type": "string" }, "title": "Names", "type": "object" }, "parents": { "default": [], "items": { "$ref": "#/$defs/TaxonomyNode" }, "title": "Parents", "type": "array" }, "children": { "default": [], "items": { "$ref": "#/$defs/TaxonomyNode" }, "title": "Children", "type": "array" }, "synonyms": { "additionalProperties": { "items": { "type": "string" }, "type": "array" }, "default": {}, "title": "Synonyms", "type": "object" }, "properties": { "type": "object", "default": {}, "title": "Properties" } }, "required": [ "id", "names" ], "title": "TaxonomyNode", "type": "object" } }, "allOf": [ { "$ref": "#/$defs/TaxonomyNode" } ] }
- Fields:
- field children: List[TaxonomyNode] = [][source]¶
- field parents: List[TaxonomyNode] = [][source]¶
- add_parents(parents: Iterable[TaxonomyNode])[source]¶
- get_localized_name(lang: str, add_xx: bool = False) str | None [source]¶
Return the localized name of the node.
We first check if there is an entry in names under the provided lang. Otherwise, we check the existence of an international name (xx) if add_xx=True. We return None if both checks failed.
- Parameters:
lang – the language code
- get_parents_hierarchy() List[TaxonomyNode] [source]¶
Return the list of all parent nodes (direct and indirect).
- is_child_of(item: TaxonomyNode) bool [source]¶
Return True if item is a child of self in the taxonomy.
- is_parent_of(candidate: TaxonomyNode) bool [source]¶
Return True if self is parent of candidate, False otherwise.
- Parameters:
candidate – a TaxonomyNode of the same Taxonomy
- is_parent_of_any(candidates: Iterable[TaxonomyNode]) bool [source]¶
Return True if self is a parent of any of candidates, False otherwise.
- Parameters:
candidates – an iterable of TaxonomyNodes of the same Taxonomy
- pydantic model app.taxonomy.TaxonomyNodeResult[source]¶
Result for a taxonomy node transformation.
This is used to eventually skip entry after preprocessing
Show JSON schema
{ "title": "TaxonomyNodeResult", "description": "Result for a taxonomy node transformation.\n\nThis is used to eventually skip entry after preprocessing", "type": "object", "properties": { "status": { "$ref": "#/$defs/FetcherStatus" }, "node": { "anyOf": [ { "$ref": "#/$defs/TaxonomyNode" }, { "type": "null" } ] } }, "$defs": { "FetcherStatus": { "description": "Status of a fetcher\n\n* FOUND - document was found, index it\n* REMOVED - document was removed, remove it\n* SKIP - skip this document / update\n* RETRY - retry this document / update later\n* OTHER - unknown error", "enum": [ 1, -1, 0, 2, 3 ], "title": "FetcherStatus", "type": "integer" }, "TaxonomyNode": { "description": "A taxonomy element.\n\nEach node has 0+ parents and 0+ children. Each node has the following\nattributes:\n\n- `id`: the node identifier, it starts with a language prefix (ex: `en:`)\n- `names`: a dict mapping language 2-letter code to the node name for this\n language\n- `parents`: the list of the node parents\n- `children`: the list of the node children\n- `properties`: additional properties of the node (taxonomy-dependent)\n- `synonyms`: a dict mapping language 2-letter code to a list of synonyms\n for this language", "properties": { "id": { "title": "Id", "type": "string" }, "names": { "additionalProperties": { "type": "string" }, "title": "Names", "type": "object" }, "parents": { "default": [], "items": { "$ref": "#/$defs/TaxonomyNode" }, "title": "Parents", "type": "array" }, "children": { "default": [], "items": { "$ref": "#/$defs/TaxonomyNode" }, "title": "Children", "type": "array" }, "synonyms": { "additionalProperties": { "items": { "type": "string" }, "type": "array" }, "default": {}, "title": "Synonyms", "type": "object" }, "properties": { "type": "object", "default": {}, "title": "Properties" } }, "required": [ "id", "names" ], "title": "TaxonomyNode", "type": "object" } }, "required": [ "status", "node" ] }
- Config:
arbitrary_types_allowed: bool = True
- Fields:
- field node: TaxonomyNode | None [Required][source]¶
- field status: FetcherStatus [Required][source]¶
- app.taxonomy.get_taxonomy(taxonomy_name: str, taxonomy_url: str, force_download: bool = False, download_newer: bool = False, cache_dir: Path | None = None) Taxonomy [source]¶
Return the taxonomy of the provided name.
The taxonomy file is downloaded and cached locally.
- Parameters:
taxonomy_name – the requested taxonomy name
taxonomy_url – the URL of the taxonomy
force_download – if True, (re)download the taxonomy even if it was cached, defaults to False
download_newer – if True, download the taxonomy if a more recent version is available (based on file Etag)
cache_dir – the cache directory to use, defaults to ~/.cache/openfoodfacts/taxonomy
- Returns:
a Taxonomy
Health check¶
This module contains the health check functions for the application.
It is based upon the py-healthcheck library.