How to download product images#
The preferred method of downloading Open Food Facts images depends on what you wish to achieve.
If you want to download a few images (say up to 10), especially if these images have been uploaded recently, you should download the image from the Open Food Facts server.
If you plan to download more images, you should instead use the Open Food Facts images dataset hosted on AWS.
Download from AWS#
If you want to download many images, this is the recommended option, as AWS S3 is faster and allows concurrent download, unlike the Open Food Facts server, where you should preferably download images one at a time. See AWS Images dataset for more information about how to download images from the AWS dataset.
Download from Open Food Facts server#
All images are hosted under the https://images.openfoodfacts.org/images/products/ folder. But you have to build the right URL from the product info.
Computing single product image folder#
Images of a product are stored in a single directory. The path of this directory can be inferred easily from the product barcode. There are two cases:
-
If the product barcode is 8 digits long or shorter (ex: "22222222"), the directory path is simply the barcode:
https://images.openfoodfacts.org/images/products/{barcode}
. -
Otherwise, split the first 9 digits of the barcode into 3 groups of 3 digits to get the first 3 folder names, and use the rest of the barcode as the last folder name^[split-regexp]. For example, barcode
3435660768163
is split into:343/566/076/8163
, thus product images will be inhttps://images.openfoodfacts.org/images/products/343/566/076/8163
^[split-regexp]: The following regex can be used to split the barcode into subfolders: /^(...)(...)(...)(.*)$/
Computing single image file name#
Above we get the folder name, now we need the filename inside that folder for a particular image.
Understanding images data#
To get the image file names, we have to use the database dump or the API.
All images information are stored in the images
field.
Eg. For product 3168930010883, we have (trimmed the data):
{
"1": {
"sizes": {
"full": {
"w": 850,
"h": 1200
},
"100": {
"h": 100,
"w": 71
},
"400": {
"h": 400,
"w": 283
}
},
"uploader": "kiliweb",
"uploaded_t": "1527184614"
},
"front_fr": {
"x1": null,
"angle": null,
"y2": null,
"white_magic": "0",
"imgid": "1",
"rev": "4",
"sizes": {
"200": {
"w": 142,
"h": 200
},
"full": {
"w": 850,
"h": 1200
},
"400": {
"h": 400,
"w": 283
},
"100": {
"w": 71,
"h": 100
}
},
"y1": null,
"normalize": "0",
"geometry": "0x0-0-0",
"x2": null
}
}
The keys of the map are the keys of the images. These keys can be:
- digits: the image is the raw image sent by the contributor (full resolution).
- selected images:
*
front_{lang}
correspond to the front product image in language with codelang
*ingredients_{lang}
correspond to the ingredients image in language with codelang
*nutrition_{lang}
is the same but for nutrition data *packaging_{lang}
for packaging logoslang
is a 2-letter ISO 639-1 language code (fr, en, es, …).
Each image is available in different resolutions:
100
, 200
, 400
or full
, each corresponding to image height (full
means not resized).
The available resolutions can be found in the sizes
subfield.
Filename for a raw image#
For a raw image (the one under a numeric key in images field), the filename is very easy to compute:
- just take the image digit +
.jpg
for full resolution - image digit +
.
+ resolution +.jpg
for a lower resolution
For our example above, the filename for image "1"
- in resolution 400px is
1.400.jpg
- in full resolution, it is
1.jpg
So, adding the folder part, the final url for our example is:
- https://images.openfoodfacts.org/images/products/316/893/001/0883/1.jpg for the full image
- https://images.openfoodfacts.org/images/products/316/893/001/0883/1.400.jpg for the 400px version
Filename for a selected image#
In the structure, selected images have additional fields:
rev
(as revision) indicates the revision number of the image to use (each time a new image is selected, cropped or rotated, a new image with an incremented rev is generated).imgid
, the image ID of the raw image used to generate the selected image.angle
,x1
,x2
,y1
,y2
: rotation angle and cropping coordinates (it's to be able to regenerate the image from the raw image)
For selected images, the filename is the image key followed by the revision number and the resolution: <image_name>.<rev>.<resolution>.jpg
.
Resolution must always be specified, but you can use full
keyword to get the full resolution image.
In our above example, the filename for the front image in french (front_fr
key) is:
front_fr.4.400.jpg
for 400 px versionfront_fr.4.full.jpg
for full resolution version
So, adding the folder part, the final url for our example is:
- https://images.openfoodfacts.org/images/products/316/893/001/0883/front_fr.4.full.jpg for the full image
- https://images.openfoodfacts.org/images/products/316/893/001/0883/front_fr.4.400.jpg for the 400px version
A python snippet#
So if we have the product_data in a dict, Python code for doing it would be something like:
def get_image_url(product_data, image_name, resolution="full"):
if image_name not in product_data["images"]:
return None
base_url = "https://images.openfoodfacts.org/images/products"
# get product folder name
folder_name = product_data["code"]
if len(folder_name) > 8:
folder_name = re.sub(r'(...)(...)(...)(.*)', r'\1/\2/\3/\4', folder_name)
# get filename
if re.match("^\d+$", image_name): # only digits
# raw image
resolution_suffix = "" if resolution == "full" else f".{resolution}"
filename = f"{image_name}{resolution_suffix}.jpg"
else:
# selected image
rev = product_data["images"][image_name]["rev"]
filename = f"{image_name}.{rev}.{resolution}.jpg"
# join things together
return f"{base_url}/{folder_name}/{filename}"