<<

NAME

ProductOpener::Tags - multilingual tags taxonomies (hierarchies of tags)

SYNOPSIS

ProductOpener::Tags provides functions to build multilingual tags taxonomies from source files, to use those taxonomies to canonicalize lists of tags, and to display them in different languages.

    use ProductOpener::Tags qw/:all/;

..

DESCRIPTION

..

GLOBAL VARIABLES

%tags_fields

This defines which are the fields that are list of values. To this initial list, taxonomized fields will be added by retrieve_tags_taxonomy

FUNCTIONS

get_property_from_tags ($tagtype, $tags_ref, $property)

Return the value of a property for the first tag of a list that has this property.

Parameters

$tagtype

$tags_ref Reference to a list of tags

$property

get_inherited_property_from_tags ($tagtype, $tags_ref, $property)

Return the value of an inherited property for the first tag of a list that has this property, and the corresponding matching tag.

Parameters

$tagtype

$tags_ref Reference to a list of tags

$property

Return values

$property_value

$matching_tagid

get_matching_regexp_property_from_tags ($tagtype, $tags_ref, $property, $regexp)

Return the value of a property for the first tag of a list that has this property that matches the regexp.

Parameters

$tagtype

$tags_ref Reference to a list of tags

$property

$regexp

get_inherited_property_from_categories_tags ($product_ref, $property) {

Iterating from the most specific category, try to get a property for a tag by exploring the taxonomy (using parents).

Parameters

$product_ref - the product reference

$property - the property - string

Return

$property_value

The property value if found.

$matching_category_id

The matching category id if we found a property value.

get_inherited_properties ($tagtype, $canon_tagid, $properties_names_ref, $fallback_lcs = ["xx", "en"]) {

Try to get a set of properties for a tag by exploring the taxonomy (using parents).

This methods take into account if a property is defined as "undef" (but it cuts value only for the considered branch and might still lead to a value if there are multiple parents branches).

Warning: The algorithm is a bit rough and my not work as you would expect on a DAG. It does not (currently) respect exploration of nodes that joins from multiple parent (in those case you would expect to first explore children from both branches). If we want to change the algorithm for this to work we should first explore parents, and then decide the order, but this methods is more eager to save time.

Parameters

$tagtype - str, name of taxonomy

$canon_tagid - tag id for which we want properties

$properties_names - ref to a list of property name

$fallback_lcs - fallback language code to try

If may search a description:fr but if fallback is ['xx', 'en'] and we find a description:xx or description:en property we will use this value.

Return

A ref to a hashmap where keys are property names and values are found value. If a property name is not present it means it was not found.

get_tags_grouped_by_property ($tagtype, $tagids_ref, $prop_name, $props_ref, $inherited_props_ref, $fallback_lcs = ["xx", "en"])

Retrieve properties of a series of tags given in $tagids_ref and return them, but grouped by $prop_name, also fetching $props_ref and $inherited_props_ref

Return

A ref to a hashmap, where keys are property $prop_name values, and values are in turn hashmaps where keys are tag ids, and values are a hashmap with of properties and their values.

Tags with undefined property are with group under "undef" value.

Example

we asks for quality tags, grouped by fix_action, while getting descriptions { "add_nutrition_facts" => { "en:kcal-does-not-match-other-nutrients" => { "description:en" => "Kcal is not matching value computed from other nutriments" }, "en:kcal-does-not-match-kj" => { "description:en" => "Kcal is not matching kJ value" }, }, "add_categories" => { "en:detected-category-baby-milk" { "description:en" => "Detected category … may be missing baby milks" } } }

remove_stopwords_from_start_or_end_of_string ( $tagtype, $lc, $string )

Remove stopwords (that are specific to each category) from the start or end of a string that has not been normalized. This function differs from remove_stopwords() that works on normalized tags instead of strings and that also removes stopwords in the middle.

Arguments

$tagtype

The type of the tag (e.g. categories, labels, allergens)

$lc - Language code

The language the string is in.

$string - string

The string to remove stopwords from.

remove_stopwords ( $tagtype, $lc, $tagid )

Remove stopwords (that are specific to each category) from a normalized tag.

Arguments

$tagtype

The type of the tag (e.g. categories, labels, allergens)

$lc - Language code

The language the tagid is in.

$tagid - normalized tag

Lowercased, unaccented depending on language, non-alphanumeric chars turned to dash.

sanitize_taxonomy_line( $line )

Sanitize a taxonomy line before processing

Arguments

str $line - the line read from the file

get_lc_tagid( $synonyms_ref, $lc, $tagtype, $tag, $warning )

Search for "current tag" (tag at start of line) for a given tag

Arguments

str $tag - tag string for which we search

reference to hash map $synonyms_ref - ref to %synonyms for $tagtype

str $tagtype - tag type

str $lc - language

str $warning

An optional prefix to display errors if we had to use stopwords / plurals.

If empty, no warning will be displayed.

return str - found current tagid or undef

build_tags_taxonomy( $tagtype, $file, $publish)

Build taxonomy from the taxonomy file

Taxonomy will be stored in global hash maps under the entry $tagtype

Arguments

str $tagtype - the tagtype

Like "categories", "ingredients"

$file - name of the file to read in taxonomies folder

$publish - if 1, store the result in sto

build_all_taxonomies ( $pubish)

Build all taxonomies, including the test taxonomy

Parameters

Publish STO file $publish

generate_tags_taxonomy_extract ( $tagtype, $tags_ref, $options_ref, $lcs_ref)

Generate an extract of the taxonomy for a specific set of tags.

Parameters

tag type $tagtype

reference to a list of tags ids $tags_ref

reference to a hash of key/value options

Options: - fields: comma separated lists of fields (e.g. "name,description,vegan:en,inherited:vegetarian:en" )

Properties can be requested with their name (e.g."description") or name + a specific language (e.g. "vegan:en"). Only properties directly defined for the entry are returned. To include inherited properties from parents, prefix the property with "inherited:" (e.g. "inherited:vegan:en").

- include_parents: include entries for all direct parents of the requested tags - include_children: include entries for all direct children of the requested tags

reference to an array of language codes

Languages for which we want to extract names, synonyms, properties.

init_taxonomies($die_if_some_taxonomies_cannot_be_loaded = 0)

Initialize all taxonomies. This function is called when the Tags.pm module is loaded, in order to load all available taxonomies, as most scripts / modules that load Tags.pm expect taxonomies to be loaded.

It is also called by lib/startup_apache.pl startup script with the $die_if_some_taxonomies_cannot_be_loaded set to 1.

Parameters

die if some taxonomies cannot be loaded $die_if_some_taxonomies_cannot_be_loaded

If set to 1, the function will die if some taxonomies cannot be loaded.

canonicalize_taxonomy_tag_link ($target_lc, $tagtype, $tag, $tag_prefix = undef)

Returns a link to the canonicalized tag

Arguments

$tagtype

$tagid

$tag_prefix (optional)

Can be - to indicate that the tag is a negative tag

get_tag_image ( $target_lc, $tagtype, $canon_tagid )

If an image is associated to a tag, return its relative url, otherwise return undef.

Arguments

$target_lc

The desired language for the image. If an image is not available in the target language, it can be returned in English or in the tag language.

$tagtype

The type of the tag (e.g. categories, labels, allergens)

$canon_tagid

display_tags_hierarchy_taxonomy ( $target_lc, $tagtype, $tags_ref )

Generates a comma separated list of tags in the target language, with links and images.

Arguments

$target_lc

$tagtype

The type of the tag (e.g. categories, labels, allergens)

$tags_ref

Reference to a list of tags. (usually the *_tags field corresponding to the tag type)

list_taxonomy_tags_in_language ( $target_lc, $tagtype, $tags_ref )

Generates a comma separated (with a space after the comma) list of tags in the target language.

Arguments

$target_lc

$tagtype

The type of the tag (e.g. categories, labels, allergens)

$tags_ref

Reference to a list of tags. (usually the *_tags field corresponding to the tag type)

The tags are expected to be in their canonical format.

canonicalize_taxonomy_tag_or_die ($tag_lc, $tagtype, $tag)

Canonicalize a string to check if matches an entry in a taxonomy, and die otherwise.

This function is used during initialization, to check that some initialization data has matching entries in taxonomies.

Arguments

$tag_lc

The language of the string.

$tagtype

The type of the tag (e.g. categories, labels, allergens)

$tag

The string that we want to match to a tag.

$exists_in_taxonomy_ref

A reference to a variable that will be assigned 1 if we found a matching taxonomy entry, or 0 otherwise.

Return value

If the string could be matched to an existing taxonomy entry, the canonical id for the entry is returned.

Otherwise, the function dies.

canonicalize_taxonomy_tag ($tag_lc, $tagtype, $tag, $exists_in_taxonomy_ref = undef)

Canonicalize a string to check if matches an entry in a taxonomy

Arguments

$tag_lc

The language of the string.

$tagtype

The type of the tag (e.g. categories, labels, allergens)

$tag

The string that we want to match to a tag.

$exists_in_taxonomy_ref

A reference to a variable that will be assigned 1 if we found a matching taxonomy entry, or 0 otherwise.

Return value

If the string could be matched to an existing taxonomy entry, the canonical id for the entry is returned.

Otherwise, we return the string prefixed with the language code (e.g. en:An unknown entry)

get_taxonomy_tag_synonyms ( $tagtype )

Return all entries in a taxonomy.

Arguments

$tagtype

$canon_tagid

Return values

- undef is the taxonomy does not exist or is not loaded - or a list of all tags

get_taxonomy_tag_synonyms ( $target_lc, $tagtype, $canon_tagid )

Return all synonyms (including extended synonyms) in a specific language for a taxonomy entry.

Arguments

$target_lc

$tagtype

$canon_tagid

Return values

- undef is the taxonomy does not exist or is not loaded, or if the tag does not exist - or a list of all synonyms

cached_display_taxonomy_tag ( $target_lc, $tagtype, $canon_tagid )

Return the name of a tag for displaying it to the user. This function builds a cache of the resulting names, in order to reduce execution time. The cache is an ever-growing hash of input parameters. This function should only be used in batch scripts, and not in code called from the Apache mod_perl processes.

Arguments

$target_lc - target language code

$tagtype

$canon_tagid

Return values

The tag translation if it exists in target language, otherwise, the tag id.

display_taxonomy_tag ( $target_lc, $tagtype, $canon_tagid )

Return the name of a tag for displaying it to the user

Arguments

$target_lc - target language code

$tagtype

$canon_tagid

Return values

The tag translation if it exists in target language, otherwise, the tag id.

display_taxonomy_tag_name ( $target_lc, $tagtype, $canon_tagid )

A version of display_taxonomy_tag that removes eventual language prefix

Arguments

$target_lc - target language code

$tagtype

$canon_tagid

Return values

The tag translation if it exists in target language, otherwise, the tag in its primary language

canonicalize_tag_link ($tagtype, $tagid, $tag_prefix = undef)

Return a relative link to a tag page.

Arguments

$tagtype

$tagid

$tag_prefix (optional)

Can be - to indicate that the tag is a negative tag

generate_regexps_matching_taxonomy_entries($taxonomy, $return_type, $options_ref)

Create regular expressions that will match entries of a taxonomy.

Arguments

$taxonomy

The type of the tag (e.g. categories, labels, allergens)

$return_type - string

Either "unique_regexp" to get one single regexp for all entries of one language.

Or "list_of_regexps" to get a list of regexps (1 per entry) for each language. For each entry, we return an array with the entry id, and the the regexp for that entry. e.g. ['en:coffee',"coffee|coffees"]

$options_ref

A reference to a hash to enable options to indicate how to match:

- add_simple_plurals : in some languages, like French, we will allow an extra "s" at the end of entries - add_simple_singulars: same with removing the "s" at the end of entries - match_space_with_dash: spaces or dashes in entries will match either a space or a dash (e.g. "South America" will match "South-America")

cmp_taxonomy_tags_alphabetically($tagtype, $target_lc, $a, $b)

Comparison function for canonical tags entries in a taxonomy.

To be used as a sort function in a sort() call.

Each tag is converted to a string, by priority: 1 - the tag name in the target language 2 - the tag name in the xx language 3 - the tag id

Arguments

$tagtype

The type of the tag (e.g. categories, labels, allergens)

$target_lc

$a

$b

get_knowledge_content ($tagtype, $tagid, $target_lc, $target_cc)

Fetch knowledge content as HTML about additive, categories,...

This content is used in knowledge panels.

Content is stored as HTML files in `${lang_dir}/${target_lc}/knowledge_panels/${tagtype}`. We first check the existence of a file specific to the country specified by `${target_cc}`, with a fallback on `world` otherwise. This is useful to have a more specific description for some countries compared to the `world` base content.

Arguments

$tagtype

The type of the tag (e.g. categories, labels, allergens)

$tagid

The tag we want to match, with language prefix (ex: `en:e255`).

$target_lc

The user language as a 2-letters code (fr, it,...)

$target_cc

The user country as a 2-letters code (fr, it, ch) or `world`

Return value

If a content exists for the tag type, tag value, language code and country code, return the HTML text, return undef otherwise.

<<