Predictions and insights
Robotoff purpose is to generate predictions about Open Food Facts products from various sources: images, image OCRs, product meta data,...
A complete list of predictions types can be found in robotoff.prediction.types. The most common ones are brand
, label
, category
,...
All predictions are stored in the PostgreSQL database in theprediction
table.
Predictions most interesting fields are the following:
barcode
: barcode of the product (string)type
: prediction typevalue
: the predicted untaxonomized value (ex:carrefour
forbrand
prediction type), optionalvalue_tag
: the predicted taxonomized value (ex:en:organic
forlabel
prediction type), optionalsource_image
: the path of the image the prediction was generated from (ex:/847/000/700/5117/1.jpg
). May be null, it is mainly provided for OCR and object detection-based predictions.automatic_processing
: a boolean indicating whether we're confident enough in the prediction to apply it automatically in Open Food Facts without human supervision. This does not mean it will indeed be applied automatically, please refer to the import mechanism description below to know how automatic processing works.data
: a JSON structure containing prediction data. It either complementsvalue
andvalue_tag
with additional data or contains the full prediction data.predictor
: name of the predictor that generated the prediction. Every insight type has its ownpredictor
s, but most common ones are:universal-logo-detector
for predictions generated by the nearest-neighbors logo detectorflashtext
for all predictions generated using flashtext libraryregex
for all predictions generated using simple regex
predictor_version
: this is a version ID that is used to know when to replace predictions in database by new ones during import, and when to keep them. It is either an incrementing integer (for regex-based predictions) or the version of the model that generated the predictions.
From these predictions, we generate insights. Insights are refined predictions about the product that are directly actionable: if the insight is validated by a human, we can update the product accordingly. For example, we may have the following predictions on a (french) organic product:
-
prediction 1:
- type:
label
- value_tag:
en:organic
- type:
-
prediction 2:
- type:
label
- value_tag:
fr:ab-agriculture-biologique
- type:
As fr:ab-agriculture-biologique
is a children of en:organic
in the label taxonomy, we only generate a label
insight with value_tag=fr:ab-agriculture-biologique
. Furthermore, we don't generate any insight if the product already has the fr:ab-agriculture-biologique
label.
This is the different between insights and predictions: predictions can be viewed as raw data, and insights as actionable data.
Most insight types are generated directly from their corresponding prediction types (ex: brand
, label
,...). However, an insight type can require prediction of several types: this allow the generation of complex insight types.
Insights can either be applied automatically or require a human annotation.
Insights are saved in DB in the product_insight
table. Insight fields are a superset of prediction fields — the most interesting additional fields are:
annotation
: either0
(incorrect insight),1
(correct), or-1
(invalid). Automatically applied insights haveannotation=1
automatic_processing
: if True, the insight will be applied automaticallycompleted_at
: timestamp of annotation (either by a human or automatically)username
: username of the human annotator (if any)process_after
: the field is used to add a delay between insight generation and processing, for insights that are automatically processable (to avoid product overwrite by third-party apps).reserved_barcode
: ifTrue
, the product has a reserved barcode, it's probably a variable weight product. We don't show any question about reserved barcode products by default in all/questions/*
API routes.
Import mechanism
Once the predictions are generated, we use the import_insights
function (in robotoff.insights.importer
) to import the predictions and insights in database.
We start by importing predictions in database, in the prediction
table:
- we check that predictions are valid, i.e. that the product still has the image associated with the prediction (in case
source_image
is not null) and that the product still exists in Product Opener database. This check is disabled ifENABLE_MONGODB_ACCESS=0
(=default settings only for local environment). - We then group predictions by product barcode, and delete all predictions that have the same
barcode
,server_type
,type
,source_image
and that have a differentpredictor_version
. - We import predictions, only if no other predictions with the same
(type, server_type, source_image, value_tag, value, predictor and automatic_processing)
values exist.
predictor_version
is therefore used to control when to keep the previous predictions and when to delete them.
From predictions, we then generate and import insights. Insights objects are created on-the-fly in InsightImporter
s.
Every insight type has a single InsightImporter
that is in charge of creating/updating/deleting insights from a list of predictions.
The importer must have a generate_candidates
method that returns a list of candidate insights — ProductInsight
s should be created from predictions in this method. This is also where all the selection logic is (what prediction data to keep, what to ignore).
From the list of candidate insight, we update the product_insight
table, by deleting all insights and importing all candidates. The import mechanism is actually a bit smarter, we avoid unnecessary DB insertion/deletion by trying to match each candidate with a reference insight — an insight that is already in DB.
Once insights are refreshed/imported, they become available as questions in /questions/*
API routes. Automatically processable insights are applied by the scheduler, once current_timestamp >= ${process_after}
.