2024-01-02 monitoring disk space - filebeat#
problem#
We had an alert because free disk space on monitoring docker VM (203) was below 20%.
After a quick look at docker volumes size, I saw that Elasticsearch was taking most of the space.
Using Kibana, stack management, index management, I saw that the logs data stream was huge. It was expected since we have an index lifecycle management problem. See https://github.com/openfoodfacts/openfoodfacts-infrastructure/issues/199
Resolution#
short term#
I abruptely removed all data stream (we don't use log so much, so it was an expeditive way).
But as filebeat was still running, a logs-current was created, and we would run in the same problem again.
long term#
I logged to docker production VM (200), staging VM (201) and monitoring (203) and stopped filebeat docker on all those VMs.
Then I removed logs-current index.
As a tentative to repair ILM, I modified the filebeat/config.yml on monitoring to add add setup.ilm.overwrite: true
directive
But before I backup my policy by saving it as a new policy.
I added KIBANA_URI to docker-compose.node.yml because it was missing (see openfoodfacts-monitoring commit 7131f0c5).
Then in off monitoring VM, /home/off/filebeat
, I run:
docker-compose run --rm filebeat bash
$ filebeat setup --index-management
$ filebeat setup --dashboards # does not work complains about kibana version
I then removed the setup.ilm overwrite: true
directive
I restarted all the filebeat in all VM.
I can see the logs-current-logs-2024.01.02-000001
index created in kibana. That's kind of a good sign.
I can see in kibana that the Index lifecycle management logs
was created, but it's not configured to remove files. So I changed it's setup, thanks to backup I made.
If I see index templates, I'm not yet sure that everything is ok because there is a template for logs-*-*-*, logs-*-*-*-*, logs-*.*.*, logs-*.*.*-*
and one for logs-current-*
, so I'm not totally sure which one will apply (I think logs-current-*
would be the good one). Let's see tomorrow.
I also added the index life cycle management definition in the repository, so that next time it's correctly configured. (see openfoodfacts-monitoring commit 234fd2da.)