Moving OFF to scaleway#
After we moved opff, obf and opf, off-pro, it's time to move the last component to scaleway: the openfoodfacts container (aka off).
We not only have to move the off container, but also all the ZFS datasets that are common to off and the other instances. So we will need to reconfigure and restart all VMs to point to the right ZFS datasets (currently they are using NFS mounts).
Preparing stunnel to off-query#
I first had to add an ipv6 ip to my stunnel client. I tried hard to use a ULA, having the proxmox host act as a NAT66, but I failed… So for the moment:
- I added a public ipv6 to the container
- although the iptables should stop non private connections, I preferred to only accept stunnel connections on 10.13.1.101 (for all exposed stunnel entries)
On OSM45, I edited stunnel server configuration just to add a PSK in /etc/stunnel/psk/off-query-psk.txt.
I added this PSK to scaleway-stunnel-client-secrets.yml and run:
ansible-playbook sites/stunnel-client.yml -l scaleway-stunnel-client --tags stunnel
c option).
I edited the stunnel config on scaleway-stunnel-client to add:
# off query at moji
[MojiOffQuery]
client = yes
# BEWARE: as we have a public ipv6, only accept on private ipv4
accept = 10.13.1.101:16001
connect = 2a06:c484:5::102:16001
PSKsecrets = /etc/stunnel/psk/moji-off-query-psk.txt
Checked with a stunnel /etc/stunnel/off.conf
And restarted: systemctl restart stunnel@off
Checked it's still ok systemctl status stunnel@off
Check if I can join off-query:
curl 10.13.1.101:16001/health
{"status":"ok","info":{"postgres":{"status":"up","info":{"last_scheduled_update":"2026-04-09T14:31:19Z"}},"mongodb":{"status":"up"},"redis":{"status":"up","info":{"last-generated-id":"1777932576222-0","last-processed-id":"1777932576222-0"}}}}
FIXME try to use a ULA ipv6 nated by the proxmox host (seems harder than it should be…)
moving users and orgs to nvme#
We will take the opportunity of moving to scaleway to move some datasets to nvme,
because we write to them quite a lot.
I will move orgs and users.
(note that off/logs is also a good candidate, but too heavy, right now, due to archive,
we should find a better approach on that one).
For this, I did a sync of the users and orgs datasets to the nvme disk (not directly to their final destination, to avoid sanoid creating conflicting snapshots)
syncoid --no-sync-snap zfs-hdd/off-backups/off2-zfs-hdd/off/users zfs-nvme/off-backups/off2-zfs-nvme/off/users
syncoid --no-sync-snap zfs-hdd/off-backups/off2-zfs-hdd/off/orgs zfs-nvme/off-backups/off2-zfs-nvme/off/orgs
Preparing target off container#
First create the 111 container named off using ansible (see proxmox - How to create a new container with ansible) following what was done for 115 (opff).
On scaleway-01:
- shutdown the container:
pct shutdown 111 - edit the container configuration to add mountpoints: FIXME
mp0: /zfs-hdd/podata/off,mp=/mnt/off mp1: /zfs-hdd/podata/off/cache,mp=/mnt/off/cache mp2: /zfs-hdd/podata/off/html_data,mp=/mnt/off/html_data mp3: /zfs-hdd/podata/off/logs,mp=/mnt/off/logs mp4: /zfs-nvme/podata/products,mp=/mnt/off/products mp5: /zfs-hdd/podata/images,mp=/mnt/off/images mp6: /zfs-nvme/podata/users,mp=/mnt/off/users mp7: /zfs-nvme/podata/orgs,mp=/mnt/off/orgs mp8: /zfs-hdd/podata/pro_export_files,mp=/mnt/off-pro/cache/export_files mp9: /zfs-hdd/podata/off-pro/images,mp=/mnt/off-pro/imagesalso added:lxc.cap.drop: "sys_rawio audit_read" - remove the created disk:
zfs destroy zfs-hdd/pve/subvol-111-disk-0
Do not start the container yet!
Preparing Reverse Proxy#
We have to setup the reverse proxy in scaleway-proxy
First we copy the certificates from off2-proxy to scaleway-proxy (following https://charlesreid1.github.io/copying-letsencrypt-certs-between-machines.html) On off2-proxy:
CONF=/etc/nginx/sites-enabled/openfoodfacts.org
DOM=$(sed -nr "s|.*letsencrypt/live/(.*)/privkey.*|\1|p" $CONF)
echo "COPYING CERTS FOR $DOM in $DOM.tar.gz"
sudo tar -cvzf $DOM.tar.gz \
/etc/letsencrypt/archive/${DOM} \
/etc/letsencrypt/renewal/${DOM}.conf \
/etc/letsencrypt/live/${DOM}
sudo chown alex:alex $DOM.tar.gz
sudo chmod go-rw $DOM.tar.gz
sudo -i
DOM=openfoodfacts.org
cd /
tar xzf /home/alex/$DOM.tar.gz
# verify
ls -l /etc/letsencrypt/*/$DOM /etc/letsencrypt/renewal/$DOM.conf
I also had to do it for secondary websites: howmuchsugar.in and madenear.me
On scaleway-proxy: I copied the configuration from off2 reverse proxy in scaleway-proxy conf dir (common config were already copied and linked while moving opff):
cd /opt/openfoodfacts-infrastructure/
cp -r confs/off2-reverse-proxy/nginx/snippets confs/scaleway-proxy/nginx/
ln -s /opt/openfoodfacts-infrastructure/confs/scaleway-proxy/nginx/snippets/backend-proxy.conf /etc/nginx/snippets
for hostname in openfoodfacts.org howmuchsugar.in madenear.me; do \
cp confs/off2-reverse-proxy/nginx/sites-enabled/$hostname confs/scaleway-proxy/nginx/sites-enabled/; \
ln -s /opt/openfoodfacts-infrastructure/confs/scaleway-proxy/nginx/sites-enabled/$hostname /etc/nginx/sites-enabled/; \
done
I also edited openfoodfacts.org conf
listen 443 ... http2is now deprecated in favour ofhttp2 on;- comment
ssl_staplingandssl_stapling_verifydirectives (deprecated) - to replace any occurrence of
10.1.0.113with10.13.1.111.
Also I prepared the maintenance page to use the "planned maintenance" page,
that is in /opt/openfoodfacts-infrastructure/html:
unlink 502.html
ln -s 502-planned.html 502.html
Preparing DNS#
On OVH web console: modify TTL on openfoodfacts.org and *.openfoodfacts.org to 60s
On your desktop prepare the following lines to add to your /etc/hosts (comment them until migration):
151.115.132.10 world.openfoodfacts.org fr.openfoodfacts.org static.openfoodfacts.org images.openfoodfacts.org
You might use this configuration to do a quick test that your reverse proxy is setup correctly (but you will get a gateway timeout of course, as 111 is shutdown)
(Note: didn't test madenear.me, howmuchsugar.in and so on, because downtime is far less important on those websites).
Preparing KS1#
KS1 will have to sync images from scaleway, for this we need to have an operator on scaleway-01 for it.
I just followed our sanoid doc on creating operator on PROD_SERVER
Migration#
- on off2-reverse-proxy, put the "planned maintenance page on":
cd /opt/openfoodfacts-infrastructure/html unlink 502.html ln -s 502-planned.html 502.html - login to ks1 and comment the syncoid line for images (until end of migration)
- on scaleway-01 comment off backups in /etc/sanoid/syncoid-args.conf (line for zfs-hdd and for nvme)
Now we hurry:
- on off2, create a before last snapshot:
# mimic sanoid SNAP_NAME=autosnap_$(date --utc +"%Y-%m-%d_%H:%M:%S")_hourly for dataset in zfs-hdd/pve/subvol-113-disk-0 zfs-hdd/off{,/cache,/html_data,/logs,/images,/orgs,/users,/pro_export_files} zfs-nvme/off{,/products}; \ do \ zfs snapshot $dataset@$SNAP_NAME; \ echo DONE: $dataset@$SNAP_NAME; \ done -
on
scaleway-01do a sync before shutting down off:syncoid --no-sync-snap --no-privilege-elevation scaleway01operator@off2.openfoodfacts.org:zfs-hdd/pve/subvol-113-disk-0 zfs-hdd/off-backups/off2-zfs-hdd/pve/subvol-113-disk-0 syncoid --no-sync-snap --no-privilege-elevation --recursive scaleway01operator@off2.openfoodfacts.org:zfs-hdd/off zfs-hdd/off-backups/off2-zfs-hdd/off syncoid --no-sync-snap --no-privilege-elevation --recursive scaleway01operator@off2.openfoodfacts.org:zfs-nvme/off zfs-nvme/off-backups/off2-zfs-nvme/off # sync our users and orgs new location syncoid --no-sync-snap zfs-hdd/off-backups/off2-zfs-hdd/off/users zfs-nvme/off-backups/off2-zfs-nvme/off/users syncoid --no-sync-snap zfs-hdd/off-backups/off2-zfs-hdd/off/orgs zfs-nvme/off-backups/off2-zfs-nvme/off/orgs -
on off2, stop off container
pct shutdown 113and verify withpct list - on scaleway-01 stop all containers accessing off data
echo 11{2,3,4,5}|xargs -P 4 -n 1 pct shutdown --forceStop=1verify withpct list - on off2, create a last snapshot:
# mimic sanoid SNAP_NAME=autosnap_$(date --utc +"%Y-%m-%d_%H:%M:%S")_hourly for dataset in zfs-hdd/pve/subvol-113-disk-0 zfs-hdd/off{,/cache,/html_data,/logs,/images,/orgs,/users,/pro_export_files} zfs-nvme/off{,/products}; \ do \ zfs snapshot $dataset@$SNAP_NAME; \ echo DONE: $dataset@$SNAP_NAME; \ done - on
scaleway-01, make a last sync of datasets:verify:syncoid --no-sync-snap --no-privilege-elevation scaleway01operator@off2.openfoodfacts.org:zfs-hdd/pve/subvol-113-disk-0 zfs-hdd/off-backups/off2-zfs-hdd/pve/subvol-113-disk-0 syncoid --no-sync-snap --no-privilege-elevation --recursive scaleway01operator@off2.openfoodfacts.org:zfs-hdd/off zfs-hdd/off-backups/off2-zfs-hdd/off syncoid --no-sync-snap --no-privilege-elevation --recursive scaleway01operator@off2.openfoodfacts.org:zfs-nvme/off zfs-nvme/off-backups/off2-zfs-nvme/off # sync our users and orgs new location syncoid --no-sync-snap zfs-hdd/off-backups/off2-zfs-hdd/off/users zfs-nvme/off-backups/off2-zfs-nvme/off/users syncoid --no-sync-snap zfs-hdd/off-backups/off2-zfs-hdd/off/orgs zfs-nvme/off-backups/off2-zfs-nvme/off/orgszfs list -t snap -o name zfs-hdd/off-backups/off2-zfs-hdd/pve/subvol-113-disk-0 |tail -n 1 for dataset in zfs-hdd/off-backups/off2-zfs-hdd/off{,/cache,/html_data,/logs,/images,/pro_export_files} zfs-nvme/off-backups/off2-zfs-nvme/off{,/users,/orgs,/products}; \ do \ zfs list -t snap -o name $dataset |tail -n 1; \ done - move the backup zfs to their new location:
zfs rename zfs-hdd/off-backups/off2-zfs-hdd/pve/subvol-113-disk-0 zfs-hdd/pve/subvol-111-disk-0 zfs rename zfs-hdd/off-backups/off2-zfs-hdd/off zfs-hdd/podata/off # products, users, orgs on nvme are common zfs rename zfs-nvme/off-backups/off2-zfs-nvme/off/products zfs-nvme/podata/products zfs rename zfs-nvme/off-backups/off2-zfs-nvme/off/users zfs-nvme/podata/users zfs rename zfs-nvme/off-backups/off2-zfs-nvme/off/orgs zfs-nvme/podata/orgs # we don’t need hdd users and orgs anymore zfs destroy -r zfs-hdd/podata/off/users zfs destroy -r zfs-hdd/podata/off/orgs # images, and pro_export_files are common, move them up zfs rename zfs-hdd/podata/off/images zfs-hdd/podata/images zfs rename zfs-hdd/podata/off/pro_export_files zfs-hdd/podata/pro_export_files -
modify the configuration
srv/off/lib/ProductOpener/Config2.pmin/zfs-hdd/pve/subvol-111-disk-0/: ``` $mongodb_host = "10.13.1.200"; ... $memd_servers = [ "10.13.1.102:11211" ]; ... $redis_url = '10.13.1.200:6379'; ... $query_url = "http://10.13.1.101:16001"; ... %server_options = (cookie_domain => "openfoodfacts.org", # if not set, default to $server_domain minion_backend => {'Pg' => 'postgresql://off:********@10.13.1.200/minion'}, minion_local_queue => "openfoodfacts.org",```
-
modify links to folders:
cd /zfs-hdd/pve/subvol-111-disk-0/ # some old refs rm -rf srv/obf/new_images rm -rf srv/opf/new_images rm -rf srv/opff/new_images # remove old refs for dirname in srv/{obf,opf,opff}/{html/images/products,products}; \ do \ unlink $dirname; \ done for dirname in mnt/{obf,opf,opff}/{images,products,} srv/{obf,opf,opff}/{html/{images,},}; \ do \ rmdir $dirname; \ done - on scaleway-01, start the service
pct start 111 - on your computer, verify the service is working with a modified
/etc/hosts - in OVH web console, change the
openfoodfacts.orgAentry to point to151.115.132.10 - on your computer remove your
/etc/hostsspecific configuration and test again - unmount all nfs to be sure we are not writing to them again:
umount /mnt/nfs/off/orgs /mnt/nfs/off/images /mnt/nfs/off/users /mnt/nfs/off/products /mnt/nfs/off/pro_export_files /mnt/nfs/off/data for dirname in /mnt/nfs/off/orgs /mnt/nfs/off/images /mnt/nfs/off/users /mnt/nfs/off/products /mnt/nfs/off/pro_export_files /mnt/nfs/off/data /mnt/nfs/off/off /mnt/nfs/off /mnt/nfs; do rmdir $dirname; done - We now continue for the other containers, for each container (113 to 115, aka obf,opf,opff):
declare -x ct=<id>- change the mountpoints for the container:
vim /etc/pve/lxc/$ct.lxc … %s!/mnt/nfs/off!/zfs-hdd/podata! %s!/zfs-hdd/podata/products!/zfs-nvme/podata/products %s!/zfs-hdd/podata/users!/zfs-nvme/podata/users %s!/zfs-hdd/podata/orgs!/zfs-nvme/podata/orgs - verify your modification and save
- restart the container:
pct start $ct
- We now change off-pro (112):
- change the mountpoints for the container:
vim /etc/pve/lxc/112.lxc … %s!/mnt/nfs/off!/zfs-hdd/podata! %s!/zfs-hdd/podata/products!/zfs-nvme/podata/products %s!/zfs-hdd/podata/users!/zfs-nvme/podata/users %s!/zfs-hdd/podata/orgs!/zfs-nvme/podata/orgs %s!/zfs-hdd/podata/data!/zfs-nvme/podata/off/data - verify your modification and save
- restart the container:
pct start 112
- change the mountpoints for the container:
- It's live !
- login to ks1 and change the sync for images to sync from scaleway
- deal with howmuchsugar.in and madenear.me
- change DNS for:
- howmuchsugar.in, ~~howmuchsugar.info~~, ~~combiendesucres.fr~~
- madenear.me, madenear.me.uk, cestemballepresdechezvous.fr, ~~cestemballepresdechezvous.info~~
- check it works
- change DNS for:
After migration#
- [DONE] on off2: rename subvol-113 to avoid confusion
zfs rename zfs-hdd/pve/subvol-113-disk-0 zfs-hdd/backups/subvol-113-disk-0 - [TODO] verify backups of the new datasets are done on scaleway-03:
zfs list zfs-hdd/off-backups/scaleway-01-podata-hdd -rzfs list zfs-hdd/off-backups/scaleway-01-podata-nvme -r- (you may need to manually sync big datasets like images !)
- rerun the ansible:
- container creation on scaleway-01:
ansible-playbook sites/proxmox-node.yml --tags containers -l scaleway-01 -e proxmox_containers__limit_to_containers=111 - jobs/configure for off:
ansible-playbook jobs/configure.yml -l off[^SSH_RESTART]
- container creation on scaleway-01:
- change the backup datasets at ovh3:
- we don't want to sync images from scratch it's too heavy… see OVH3 backups update below
- Verify podata is synced on ovh3
- put back the TTL for domain to a normal level
- check any traffic still coming to off2 reverse proxy
- [DONE] change fallback server for images.openfoodfacts.org (commit 724f156e)
- [DONE] move query.openfoodfacts.org to scaleway reverse proxy (following same procedure as in Preparing reverse Proxy
- [DONE] install fail2ban jails we had on off2 proxy on scaleway-proxy:
- [DONE] change sync_images to AWS parameters (see commit 9bd66e6)
- [TODO] change monitoring to scrape on scaleway But we want to do it using the exporters pattern, so we need a new ansible role for that.
Post fixes:
- query.openfoodfacts.org was not responding, it was a DNS problem because proxy2.openfoodfacts.org was not defined any more, while query was a CNAME to it… So I redefined proxy2 as an A entry pointing to off2 reverse proxy IP
- https://robotoff.openfoodfacts.org/api/v1/health
- I had to whitelist scaleway servers in PMG because emails were not arriving (I previously think it was not mandatory, because I though the iptables redirect rule was masking the real ip)
- we had to whitelist the scaleway-01 ip on brevo, as openfoodfacts use the API key to subscribe users to the mailing list.
- the certbot certificates renewal did not get well, see belw
Later:
- [DONE] shutdown stunnel services that are not needed anymore
- shutdown container 105 on off1
- remove config to old services on scaleway-stunnel-client
- on off2: remove the pct 113:
pct remove 113 - could we move off logs to nvme on scaleway ?
- for that we need to remove a lot of current logs that are not useful (maybe rewrite logrotate config)
- use ULA for container ipv6, nated by the proxmox host (investigate how on a test container)
- [TODO] decide whether we move the experimental opensuplementsfacts to scaleway… (note that its database is not merged with off)
OVH3 backup updates#
We need to have backups on OVH3. But resyncing images would take too long (and disk space may not even be there).
So here is the procedure:
- stop off staging instances (off-net) on ovh1 docker-staging VM, and add a NO_DEPLOY file to avoid deployments
- in
/etc/sanoid/syncoid-args.conf,- comment lines that sync old off2 data
- also comment lines that syncs podata from scaleway-01
- remove the clones on ovh3:
to be able to remove the images clones, I had to restart nfs server.
# better restart the service systemctl restart nfs-server.service for dataset in rpool/staging-clones/{off-images,off-products,orgs,users};do echo $dataset; zfs destroy -r $dataset; done - move old dataset taken from off2 to the new locations
# move root zfs rename rpool/off rpool/off-backups/podata/off # move common dataset up zfs rename rpool/off-backups/podata{/off,}/users zfs rename rpool/off-backups/podata{/off,}/orgs zfs rename rpool/off-backups/podata{/off,}/products zfs rename rpool/off-backups/podata{/off,}/images zfs rename rpool/off-backups/podata{/off,}/pro_export_files - come back to a pre migration snapshot:
(note: this took a lot of time)
for dataset in rpool/off-backups/podata/{off{,/cache,/html_data,/logs},users,orgs,products,images,pro_export_files}; \ do \ SNAP=$(zfs list -H -o name -t snap $dataset|grep "@autosnap_2026-05-05.*_daily"); \ echo $SNAP; \ time zfs rollback -r $SNAP; \ done - verify structure is coherent with the one of scaleway-01, using
zfs list -o name .../podata - rerun syncoid for podata:
syncoid --no-sync-snap --no-privilege-elevation --recursive ovh3operator@scaleway-01.infra.openfoodfacts.org:zfs-hdd/podata rpool/off-backups/podata - uncomment the line for podata sync in
/etc/sanoid/syncoid-args.conf - rerun the clone script:
/opt/openfoodfacts-infrastructure/scripts/ovh3/maj-clones-nfs-VM-dockers.sh(note: I had to kill the sync-s3-images script to be able to move images dataset, see d52f6099 and 7785b412) -
I also removed the old data that are not useful anymore (as they are in nvme):
for dataset in products orgs users; \ do \ zfs destroy -r rpool/off-backups/podata/$dataset; \ donefix on certificates#
Certificates renewal did not work after install. I saw this on 26/05. This was due to several reason:
- some live directory where not using simlink…
this is because PR corresponding to commit
62b39187was not yet merged, and so I used wrong tar options (-h, which should not have been used) ovh-dnscertbot plugin was not installed onscaleway-proxy, so I added the installation in the ansible role. I also had to add the credential files (with ansible).- the let'sencrypt account name had to be changed in every renewal files,
to use the one found in
/etc/letsencrypt/accounts/acme-v02.api.letsencrypt.org/directory/, so in my case I use:sed -e "s/account \?=.*$/account = a0d2da217c3d569b7964d2f500ba96ca/" -i /etc/letsencrypt/renewal/*.conf
- some live directory where not using simlink…
this is because PR corresponding to commit