2025 10 15 off1 move to ssd
# 2025-10-15 off1 move to SSD
We have a problem with a disk on off1, and it puts containers at risk.
We have identified, that while hdd-zfs
pool is at risk,
with one disk already offline and another having SMART alerts.
the rpool
is in mirror-0
and can survive a second disk loss.
Analysis#
Looking at currently CT, there are:
MID Status Lock Name
100 stopped proxy
102 running mongodb
104 running keycloak
115 stopped off-query
The proxy was used as an attempt to serve images but is not active. off-query have been moved to Moji.
So the real important CT are:
- mongodb
- keycloak
Both of them have volumes on hdd-zfs
and zfs-nvme
:
cat /etc/pve/lxc/10{2,4}.conf |grep -P 'mp\d|rootfs'
mp0: zfs-nvme:subvol-102-disk-0,mp=/mongo,mountoptions=noatime,size=96G
rootfs: zfs-hdd:subvol-102-disk-0,mountoptions=noatime,size=64G
mp0: zfs-nvme:subvol-104-disk-0,mp=/var/lib/docker/volumes,mountoptions=noatime,size=50G
rootfs: zfs-hdd:subvol-104-disk-0,mountoptions=noatime,size=40G
Looking at volume size on hdd-zfs
(64G
+ 40G
),
and available size on zfs-nvme
:
# zpool list -v zfs-nvme
NAME SIZE ALLOC FREE CKPOINT EXPANDSZ FRAG CAP DEDUP HEALTH ALTROOT
zfs-nvme 1.81T 398G 1.42T - - 42% 21% 1.00x ONLINE -
nvme0n1p1 1.82T 398G 1.42T - - 42% 21.5% - ONLINE
logs - - - - - - - - -
nvme1n1p1 6.71G 15.2M 6.49G - - 0% 0.22% - ONLINE
zfs-hdd
volumes to zfs-nvme
.
Doing it#
Doing it first for container 102.
-
let's first sync data to zfs-nvme using syncoid:
(took around 15 minutes)syncoid zfs-hdd/pve/subvol-102-disk-0 zfs-nvme/pve/subvol-102-disk-1
-
Then in a quick move:
# sync a last time syncoid zfs-hdd/pve/subvol-102-disk-0 zfs-nvme/pve/subvol-102-disk-1 # stop container pct stop 102 # sync again syncoid zfs-hdd/pve/subvol-102-disk-0 zfs-nvme/pve/subvol-102-disk-1 # change config vim /etc/pve/lxc/102.conf ... change zfs-hdd:subvol-102-disk-0 to zfs-nvme:subvol-102-disk-1 ... # start container pct start 102
We have no need to change sanoid configuration to take snapshots for the new volume (instead of the old one), because the recursive configuration on zfs-nvme/pve applies.
Same for syncoid backups, they are included on remote machines thanks to recursive option.
Note: there was a reboot of whole server between those operations, but I don't know why …
Now for 104:
-
let's first sync data to zfs-nvme using syncoid:
(took around 2 minutes)syncoid zfs-hdd/pve/subvol-104-disk-0 zfs-nvme/pve/subvol-104-disk-1
-
Then in a quick move:
# sync a last time syncoid zfs-hdd/pve/subvol-104-disk-0 zfs-nvme/pve/subvol-104-disk-1 # stop container pct stop 104 # sync again syncoid zfs-hdd/pve/subvol-104-disk-0 zfs-nvme/pve/subvol-104-disk-1 # change config vim /etc/pve/lxc/104.conf ... change zfs-hdd:subvol-104-disk-0 to zfs-nvme:subvol-104-disk-1 ... # start container pct start 104