Predictions and insights

Robotoff purpose is to generate predictions about Open Food Facts products from various sources: images, image OCRs, product meta data,...

A complete list of predictions types can be found in robotoff.prediction.types. The most common ones are brand, label, category,...

All predictions are stored in the PostgreSQL database in theprediction table.

Predictions most interesting fields are the following:

barcode: barcode of the product (string)
type: prediction type
value: the predicted untaxonomized value (ex: carrefour for brand prediction type), optional
value_tag: the predicted taxonomized value (ex: en:organic for label prediction type), optional
source_image: the path of the image the prediction was generated from (ex: /847/000/700/5117/1.jpg). May be null, it is mainly provided for OCR and object detection-based predictions.
automatic_processing: a boolean indicating whether we're confident enough in the prediction to apply it automatically in Open Food Facts without human supervision. This does not mean it will indeed be applied automatically, please refer to the import mechanism description below to know how automatic processing works.
data: a JSON structure containing prediction data. It either complements value and value_tag with additional data or contains the full prediction data.
predictor: name of the predictor that generated the prediction. Every insight type has its own predictors, but most common ones are:
- universal-logo-detector for predictions generated by the nearest-neighbors logo detector
- flashtext for all predictions generated using flashtext library
- regex for all predictions generated using simple regex
predictor_version: this is a version ID that is used to know when to replace predictions in database by new ones during import, and when to keep them. It is either an incrementing integer (for regex-based predictions) or the version of the model that generated the predictions.

From these predictions, we generate insights. Insights are refined predictions about the product that are directly actionable: if the insight is validated by a human, we can update the product accordingly. For example, we may have the following predictions on a (french) organic product:

prediction 1:
- type: label
- value_tag: en:organic

prediction 2:
- type: label
- value_tag: fr:ab-agriculture-biologique

As fr:ab-agriculture-biologique is a children of en:organic in the label taxonomy, we only generate a label insight with value_tag=fr:ab-agriculture-biologique. Furthermore, we don't generate any insight if the product already has the fr:ab-agriculture-biologique label. This is the different between insights and predictions: predictions can be viewed as raw data, and insights as actionable data.

Most insight types are generated directly from their corresponding prediction types (ex: brand, label,...). However, an insight type can require prediction of several types: this allow the generation of complex insight types.

Insights can either be applied automatically or require a human annotation.

Insights are saved in DB in the product_insight table. Insight fields are a superset of prediction fields — the most interesting additional fields are:

annotation: either 0 (incorrect insight), 1 (correct), or -1 (invalid). Automatically applied insights have annotation=1
automatic_processing: if True, the insight will be applied automatically
completed_at: timestamp of annotation (either by a human or automatically)
username: username of the human annotator (if any)
process_after: the field is used to add a delay between insight generation and processing, for insights that are automatically processable (to avoid product overwrite by third-party apps).
reserved_barcode: if True, the product has a reserved barcode, it's probably a variable weight product. We don't show any question about reserved barcode products by default in all /questions/* API routes.

Import mechanism

Once the predictions are generated, we use the import_insights function (in robotoff.insights.importer) to import the predictions and insights in database.

We start by importing predictions in database, in the prediction table:

we check that predictions are valid, i.e. that the product still has the image associated with the prediction (in case source_image is not null) and that the product still exists in Product Opener database. This check is disabled if ENABLE_MONGODB_ACCESS=0 (=default settings only for local environment).
We then group predictions by product barcode, and delete all predictions that have the same barcode, server_type, type, source_image and that have a different predictor_version.
We import predictions, only if no other predictions with the same (type, server_type, source_image, value_tag, value, predictor and automatic_processing) values exist.

predictor_version is therefore used to control when to keep the previous predictions and when to delete them.

From predictions, we then generate and import insights. Insights objects are created on-the-fly in InsightImporters.

Every insight type has a single InsightImporter that is in charge of creating/updating/deleting insights from a list of predictions.

The importer must have a generate_candidates method that returns a list of candidate insights — ProductInsights should be created from predictions in this method. This is also where all the selection logic is (what prediction data to keep, what to ignore).

From the list of candidate insight, we update the product_insight table, by deleting all insights and importing all candidates. The import mechanism is actually a bit smarter, we avoid unnecessary DB insertion/deletion by trying to match each candidate with a reference insight — an insight that is already in DB.

Once insights are refreshed/imported, they become available as questions in /questions/* API routes. Automatically processable insights are applied by the scheduler, once current_timestamp >= ${process_after}.