Skip to content

ovh3 server logs#

Report here the timeline of incidents and interventions on ovh3 server. Keep things short or write a report.

2023-12-05 certificates for images expired#

Images not displaying anymore on the website due to SSL problem (signaled by Edouard, with alert by blackbox exporter) In syslog:

Dec  5 13:12:58 off2 certbot[385770]: Failed to renew certificate with error: The requested nginx plugin does not appear to be installed
Checked /etc/letsencrypt/renewal/ it uses nginx authenticator. To resolve:
apt install python3-certbot-nginx
certbot renew
also mv /etc/letsencrypt/renewal/{conf,backup} as the domain is not served anymore

2023-09-12 logrotate nginx#

Nginx has a big static-access.log file (52G). I changed /etc/logrotate.d/nginx to take into account /rpool/logs-nginx/*.log and launched /usr/sbin/logrotate /etc/logrotate.conf.

2023-09-07 ZFS dataset stalled#

The day before, we switch back to ovh3. Alert was down. Trying to restart nginx, it failed and left behind zombie processes, that can't be killed even with kill -9 (meaning there a stuck waiting an I/O). I could list a file (ls /rpool/off/images/products/376/002/924/8001/1.100.jpg) but getting content wait indefinitely (cat /rpool/off/images/products/376/002/924/8001/1.json). zpool status takes a lot of time to run and indicates 1 READ error on sdb. We did a hard reboot. see Slack thread.

2023-07-16 Disk sdc errors on OVH3#

See 2023-07-16 Disk sdc errors on OVH3

2023-06-08 ZFS dataset stalled#

Nginx images not responding, as well as

ZFS dataset stop to work at 00:50 (UTC) in the morning. Add to do a hard reboot - no symptoms in log. slack thread

2023-05-30 ZFS dataset stalled#

Hard reboot.

2023-05-04 ZFS dataset stalled#

Hard reboot.