Architecture

Robotoff Architecture

Robotoff is made of several services:

the public API service
the scheduler, responsible for launching recurrent tasks (downloading new dataset, processing insights automatically,...) ¹
the workers, responsible for all long-lasting tasks
a redis instance

Communication between API and workers happens through Redis DB using rq. ²

Jobs are sent through rq messaging queues. We currently have two types of queues: - High-priority queues, used when a product is updated/deleted, or when a new image is uploaded. All jobs associated with a product are always sent to the same queue, based on the product barcode ³. This way, we greatly reduce the risk of concurrent processing for the same product (DB deadlocks or integrity errors). - Low priority queue robotoff-low, which is used for all lower-priority jobs.

We also have two kind of workers, low and high priority workers: worker_low and worker_high respectively. All types of workers handle high-priority jobs first. Each worker listens to a single high priority queue. Only low priority workers can handle low-priority jobs. This way, we ensure low priority jobs don't use excessive system resources, due to the limited number of workers that can handle such jobs.

Robotoff allows to predict many information (also called insights), mostly from the product images or OCR.

Each time a contributor uploads a new image on Open Food Facts, the text on this image is extracted using Google Cloud Vision, an OCR (Optical Character Recognition) service. Robotoff receives a new event through a webhook each time this occurs, with the URLs of the image and the resulting OCR (as a JSON file). We use simple string matching algorithms to find patterns in the OCR text to generate new predictions ⁴.

We also use a ML model to extract objects from images. ⁵

One model tries to detect any logo ⁷. Detected logos are then embedded in a vector space using the openAI pre-trained model CLIP-vit-base-patch32. In this space we use a k-nearest-neighbor approach to try to classify the logo, predicting a brand or a label. Hunger game also collects users annotations to have ground truth (logo game).

Another model tries to detect the grade of the Nutri-Score (A to E) with a computer vision model.

The above detections generate predictions which in turn generate many types of insights ⁶:

labels
stores
packager codes
packaging
product weight
expiration date
brand
...

Predictions, as well as insights are stored in the PostgreSQL database.

These new insights are then accessible to all annotation tools (Hunger Games, mobile apps,...), that can validate or not the insight.

If the insight is validated by an authenticated user, it's applied immediately and the product is updated through Product Opener API ⁹. If it's reported as invalid, no update is performed, but the insight is marked as annotated so that it is not suggested to another annotator. If the user is not authenticated, a system of votes is used (3 consistent votes trigger the insight application).

Some insights with high confidence are applied automatically, 10 minutes after import.

Robotoff is also notified by Product Opener every time a product is updated or deleted ⁸. This is used to delete insights associated with deleted products, or to update them accordingly.

Other services

Robotoff also depends on the following services:

a single node Elasticsearch instance, used to index all logos to run ANN search for automatic logo classification ⁷
a Triton instance, used to serve object detection models (nutriscore, nutrition-table, universal-logo-detector) ¹⁰.
MongoDB, to fetch the product latest version without querying Product Opener API.

See scheduler.run ↩
See robotoff.workers.queues and robotoff.workers.tasks ↩
See get_high_queue function in robotoff.workers.queues ↩
see robotoff.models.Prediction ↩
see robotoff.models.ImagePrediction and robotoff.workers.tasks.import_image.run_import_image_job ↩
see robotoff.models.ProductInsight ↩
see robotoff.models.ImageAnnotation robotoff.logos ↩↩
see workers.tasks.product_updated and workers.tasks.delete_product_insights_job ↩
see robotoff.insights.annotate ↩
see docker/ml.yml ↩