<<

NAME

ProductOpener::Images - adds, processes, manages and displays product photos

DESCRIPTION

ProductOpener::Images is used to: - upload product images - select and crop product images - run OCR on images - display product images

Product images on disk

Product images are stored in html/images/products/[product barcode split with slashes]/

For each product, this directory contains:

[image number].[extension].orig (e.g. 1.jpg.orig, 2.jpg.orig etc.)

Original images uploaded by users or imported

[image number].jpg

Same image saved as JPEG with specific settings, and after some minimal processing (auto orientation, removing EXIF data, flattening PNG images to remove transparency).

Those images are not displayed on the web site (except on the product edit form), but can be selected and cropped.

[image number].[100|400].jpg

Same image saved with a maximum width and height of 100 and 400 pixels. Those thumbnails are used in the product edit form to show the available images.

[image number].json

OCR output from Google Cloud Vision.

When a new image is uploaded, a symbolic link to it is created in /new_images. This triggers a script to generate and save the OCR: run_cloud_vision_ocr.pl.

[front|ingredients|nutrition|packaging]_[2 letter language code].[product revision].[full|100|200|400].jpg

Cropped and selected image for the front of the product, the ingredients list, the nutrition facts table, and the packaging information / recycling instructions, in 4 different sizes (full size, 100 / 200 / 400 pixels maximum width or height).

The product revision is a number that is incremented for each change to the product (each image upload and each image selection are also individual changes that create a new revision).

The selected images are shown on the website, in the app etc.

When a new image is selected for a given field (e.g. ingredients) and language (e.g. French), the existing selected images are kept. (e.g. we can have ingredients_fr.21.100.jpg and a new ingredients_fr.28.100.jpg).

Previously selected images are shown only when people access old product revisions.

Cropping coordinates for all revisions are stored in the "images" field of the product, so we could regenerate old selected and cropped images on demand.

SUPPORTED IMAGE TYPES

gif, jpeg, jpf, png, heic

VALID IMAGE TYPES

Depending on the product type, different image types are allowed. e.g. food, pet food and beauty products have an "ingredients" image type.

FUNCTIONS

get_image_type_and_image_lc_from_imagefield ($imagefield)

We used to identify selected images with a field called "imagefield" which was of the form [image type]_[language code]. In some very old products revisions (e.g. from 2012), we had values with only the image type (e.g. "front").

This function splits the field name into its components, and is used to maintain backward compatibility.

Arguments

$imagefield e.g. "front_fr"

Return values

$image_type e.g. "front" $image_lc e.g. "fr"

display_select_crop ($object_ref, $image_type, $image_lc, $language, $request_ref) {

This function is used in the product edit form to display the select cropper with the images that are already uploaded.

display_select_crop_init ($object_ref)

This function is used to generate the code to initialize the select cropper in the product edit form with the images that are already uploaded.

get_code_and_imagefield_from_file_name ( $l $filename )

This function is used to guess if an image is the front of the product, its list of ingredients, or the nutrition facts table, based on the filename.

It is used in particular for bulk upload of photos sent by manufacturers. The file names have many different formats, but they very often include the barcode of the product, and sometimes an indication of what the image is.

Producers are advised to use the [front|ingredients|nutrition|packaging]_[language code] format, but in practice we receive many other names.

The %file_names_to_imagefield_regexps structure below contains some patterns used to guess what the image is about.

generate_resized_images ($path, $filename, $image_source, $sizes_ref, @sizes)

This function generates resized images from the original image.

For uploaded images, we resize to 100 and 400 pixels maximum width or height.

For selected images, we resize to 100, 200, and 400 pixels maximum width or height.

Arguments

$path

The path to the image directory (e.g. html/images/products/1234567890123/).

$filename

The name of the image file (without the extension).

$image_source

The source image object (Image::Magick).

$sizes_ref

A reference to a hash that will be filled with the sizes of the generated images.

@sizes

An array of sizes to generate. The sizes are the maximum width or height of the image.

Return values

The function returns the error code from ImageMagick if there was an error writing the image.

process_image_upload ( $product_ref, $imagefield, $user_id, $time, $comment, $imgid_ref, $debug_string_ref )

Process an image uploaded to a product (from the web site, from the API, or from an import):

- Read the image - Create a JPEG version - Create resized versions - Store the image in the product data

Arguments

Product ref $product_ref

Image field $imagefield

Indicates what the image is and its language, or indicate a path to the image file (for imports and when uploading an image with a barcode)

Format: [front|ingredients|nutrition|packaging|other]_[2 letter language code]

User id $user_id

Timestamp of the image $time

Comment $comment

Reference to an image id $img_id

Used to return the number identifying the image to the caller.

Debug string reference $debug_string_ref

Used to return some debug information to the caller.

Return values

-2: imgupload field not set -3: we have already received an image with this file size -4: the image is too small -5: the image file cannot be read by ImageMagick

process_image_upload_using_filehandle ($product_ref, $filehandle, $user_id, $time, $comment, $imgid_ref, $debug_string_ref)

This function processes an image uploaded to a product using a file handle.

It is called by:

- the process_image_upload() function above when the image is uploaded with a CGI multipart form data encoded field (product form + API <= v2) - APIProductImagesUpload.pm for API v3

Arguments

Product ref $product_ref

File handle $filehandle to the image data

User id $user_id

Timestamp of the image $time

Comment $comment

Reference to an image id $imgid_ref

Used to return the number identifying the image to the caller.

Debug string reference $debug_string_ref

Used to return some debug information to the caller.

Return values

-2: imgupload field not set -3: we have already received an image with this file size -4: the image is too small -5: the image file cannot be read by ImageMagick

remove_images_by_prefix ( $product_ref, $prefix )

This function removes images files from a product by a given prefix (matching uploaded images or selected images). The image files are moved to a deleted directory.

Arguments

$product_ref

A reference to the product data structure.

$prefix

The prefix of the images to be removed.

For uploaded images, the prefix is the imgid.

For selected images, the prefix is the image type and language code + the product revision. e.g. "ingredients_en.5" or "nutrition_fr.6".

delete_uploaded_image_and_associated_selected_images ( $product_ref, $imgid )

This function deletes an uploaded image and its associated selected images.

Note: the corresponding product is not saved by this function, it should be saved by the caller. We do not save it in this function so that we can delete multiple images and save the updated product only once.

Arguments

$product_ref

A reference to the product data structure.

$imgid

The image id to be deleted.

Return values

1: success -1: image not found

process_image_move ( $user_id, $code, $imgids, $move_to, $ownerid )

This function moves images from one product to another, or to the trash.

Arguments

$user_id

The user id of the person moving the image.

$code

The code of the product from which the image is moved.

$imgids

The image ids to be moved, in a comma-separated list.

$move_to

The product code to which the image is moved, or 'trash' if the image is deleted.

$ownerid

The owner id of the product from which the image is moved.

Return values

The function returns an error message if there was an error, or undef if the operation was successful.

same_image_generation_parameters ( $generation_1_ref, $generation_2_ref )

This function checks if the image generation parameters are the same for two images.

It is useful to avoid selecting the same image with the same parameters twice.

Arguments

$generation_1_ref

A reference to the first image generation parameters.

$generation_2_ref

A reference to the second image generation parameters.

Return values

1: the image generation parameters are the same

0: the image generation parameters are different

normalize_generation_ref ( $generation_ref )

This function normalizes the generation_ref so that we store only useful values.

- If the image is not rotated, we don't store the angle. - If the image is not cropped, we don't store the coordinates. - If the image is not normalized, we don't store the normalize value. - If the image is not processed white magic, we don't store the white magic value.

If generation_ref is empty, we return an undef value

Arguments

$generation_ref

A reference to the image generation parameters.

Return values

A reference to the normalized generation_ref.

Or undef if the generation_ref is empty.

process_image_crop ( $user_id, $product_ref, $image_type, $image_lc, $imgid, $angle, $normalize, $white_magic, $x1, $y1, $x2, $y2, $coordinates_image_size )

Select and possibly crop an uploaded image to represent the front, ingredients, nutrition or packaging image in a specific language.

Return values

 1: crop done
-1: image not found
-2: image cannot be read

get_image_url ($product_ref, $image_ref, $size)

Return the URL of the image in the requested size.

Note: $image_ref in selected.images.[image type].[image code] does not contain the id field with the image type and language code (which are keys) It must be added to the image_ref before calling this function.

get_image_in_best_language ($product_ref, $image_type, $target_lc)

We return the image object in the best language available for the image type, in the order of preference: - $target_lc - main language of the product - English - any other available language (if any), in alphabetical order

Arguments

- $product_ref: the product reference - $image_type: the image type (front, ingredients, nutrition, packaging) - $target_lc: the target language code - $image_lc_ref: a reference to return the language code of the image (optional)

Return values

- the image reference in the best language available, with an added "id" field containing the image type and language code (e.g. "front_en")

The language code of the best language is set in $image_lc_ref (if provided)

add_images_urls_to_product ($product_ref, $target_lc, $specific_image_type = undef)

Add fields like image_[front|ingredients|nutrition|packaging]_[url|small_url|thumb_url] to a product object.

If it exists, the image for the target language will be returned, otherwise we will return the image in the main language of the product.

Parameters

$product_ref

Reference to a complete product a subfield.

$target_lc

2 language code of the preferred language for the product images.

$specific_image_type

Optional parameter to specify the type of image to add. Default is to add all types.

data_to_display_image ( $product_ref, $image_type, $target_lc )

Generates a data structure to display a product image.

The resulting data structure can be passed to a template to generate HTML or the JSON data for a knowledge panel.

Arguments

Product reference $product_ref

Image type $image_type: one of [front|ingredients|nutrition|packaging]

Language code $target_lc

Return values

- Reference to a data structure with needed data to display. - undef if no image is available for the requested image type

display_image ( $product_ref, $image_type, $target_lc, $size )

Generate the HTML code to display a product image.

Arguments

Product reference $product_ref

Image type $image_type: one of [front|ingredients|nutrition|packaging]

Language code $target_lc

Size $size: one of $thumb_size, $small_size, $display_size

Return values

- HTML code to display the image

extract_text_from_image( $product_ref, $id, $field, $ocr_engine, $results_ref )

Perform OCR for a specific image (either a source image, or a selected image) and return the results.

OCR can be performed with a locally installed Tesseract, or through Google Cloud Vision.

In the case of Google Cloud Vision, we also store the results of the OCR as a JSON file (requested through HTTP by Robotoff).

Arguments

product reference $product_ref

id of the image $id

Either a number like 1, 2 etc. to perform the OCR on a source image (1.jpg, 2.jpg) or a field name in the form of [front|ingredients|nutrition|packaging]_[2 letter language code].

If $id is a field name, the last selected image for that field is used.

field name $field

Field to update in the product object. e.g. ingredients_text_from_image, nutrition_text_from_image, packaging_text_from_image

OCR engine $ocr_engine

Either "tesseract" or "google_cloud_vision"

Results reference $results_ref

A hash reference to store the results.

send_image_to_cloud_vision ($image_path, $json_file, $features_ref, $gv_logs)

Call to Google Cloud vision API

Arguments

$image_path - str path to image

$json_file - str path to the file where we will store OCR result as gzipped JSON

$features_ref - hash reference - the "features" parameter of Google Cloud Vision

This determine which detection will be performed. Remember each feature is a cost.

@CLOUD_VISION_FEATURES_FULL and @CLOUD_VISION_FEATURES_TEXT are two constant you can use.

$gv_logs - file handle

A file where we write additional logs, specific to the service.

Response

Return JSON content of the response.

<<