上一次双11的时候买了两块镁光1TB的磁盘替换了nas上的东芝的TC10 500G固态,然后这块硬盘便成为了PVE系统盘的冗余,当时就想这这块金士顿128G的也有点年份了,总感觉某天会突然暴毙,结果正在不到半年的时间,金士顿SV300 128G寿终正寝
---Proxmox VE ZFS系统分区从无冗余状态转为mirror镜像池
root@vServer:~# zpool status pool: rpool state: DEGRADED status: One or more devices are faulted in response to persistent errors. Sufficient replicas exist for the pool to continue functioning in a degraded state. action: Replace the faulted device, or use 'zpool clear' to mark the device repaired. scan: scrub repaired 0B in 00:01:27 with 0 errors on Sun Feb 13 00:25:28 2022 config: NAME STATE READ WRITE CKSUM rpool DEGRADED 0 0 0 mirror-0 DEGRADED 0 0 0 ata-KINGSTON_SV300S37A120G_50026B7754003826-part3 FAULTED 3 0 0 too many errors ata-KIOXIA-EXCERIA_SATA_SSD_61IB837QKA93-part3 ONLINE 0 0 0 errors: No known data errors
当时看到池子报错的时候,我还以为是误报,对zfs池子进行了zpool clear操作,看看能不能恢复,结果一清除报错告警,不到2秒,又降级了
看来是真没救了
这次就来替换掉这块SV300,整体存储池从原来的128G扩容至双盘500G 镜像
还是和之前一样,先把新的500G进行分区好,这次用的硬盘是联想的OEM 镁光 MTFDHBA512QFD QLC NVME硬盘,用来做个系统盘存储下日志文件应该是绰绰有余了,分区内容请参照上一篇文章连接Proxmox VE ZFS系统分区从无冗余状态转为mirror镜像池
分区好之后,显示还是一样2个分区
nvme-Micron_MTFDHBA512QFD_20432B47499A nvme-Micron_MTFDHBA512QFD_20432B47499A-part2 nvme-Micron_MTFDHBA512QFD_20432B47499A-part3
然后我们直接使用命令,替换掉坏盘
zpool replace -f <池名称> <旧盘设备名称> <新设备名称>
替换完之后检查池子状态
root@vServer:~# zpool status pool: rpool state: DEGRADED status: One or more devices is currently being resilvered. The pool will continue to function, possibly in a degraded state. action: Wait for the resilver to complete. scan: resilver in progress since Thu Mar 3 10:47:42 2022 17.3G scanned at 393M/s, 16.8G issued at 383M/s, 17.3G total 17.0G resilvered, 97.33% done, 00:00:01 to go config: NAME STATE READ WRITE CKSUM rpool DEGRADED 0 0 0 mirror-0 DEGRADED 0 0 0 replacing-0 DEGRADED 0 0 0 ata-KINGSTON_SV300S37A120G_50026B7754003826-part3 FAULTED 3 0 0 too many errors nvme-Micron_MTFDHBA512QFD_20432B47499A-part3 ONLINE 0 0 0 (resilvering) ata-KIOXIA-EXCERIA_SATA_SSD_61IB837QKA93-part3 ONLINE 0 0 0 errors: No known data errors
已经显示正在替换,重建中,数据量不大的情况下应该很快就会处理完毕,急的可以用iostat看重建速率
root@vServer:~# iostat -m -d nvme0n1 1 10 Linux 5.11.22-5-pve (vServer) 03/03/2022 _x86_64_ (12 CPU) Device tps MB_read/s MB_wrtn/s MB_dscd/s MB_read MB_wrtn MB_dscd nvme0n1 9.94 0.17 0.08 0.02 786976 378278 94440
然后等一段时间
root@vServer:~# zpool status pool: rpool state: ONLINE status: Some supported features are not enabled on the pool. The pool can still be used, but some features are unavailable. action: Enable all features using 'zpool upgrade'. Once this is done, the pool may no longer be accessible by software that does not support the features. See zpool-features(5) for details. scan: resilvered 17.4G in 00:00:46 with 0 errors on Thu Mar 3 10:48:28 2022 config: NAME STATE READ WRITE CKSUM rpool ONLINE 0 0 0 mirror-0 ONLINE 0 0 0 nvme-Micron_MTFDHBA512QFD_20432B47499A-part3 ONLINE 0 0 0 ata-KIOXIA-EXCERIA_SATA_SSD_61IB837QKA93-part3 ONLINE 0 0 0 errors: No known data errors
就回归正常ONLINE状态了
结束之后,不要忘记给新的硬盘安装UEFI引导,不然下次重启,虽然说数据还在,但是由于旧盘携带引导,新盘没有,重启之后没有引导,那就尴尬了。
先格式化EFI分区
proxmox-boot-tool format /dev/nvme0n1p2 UUID="BC36-D69D" SIZE="536870912" FSTYPE="vfat" PARTTYPE="c12a7328-f81f-11d2-ba4b-00a0c93ec93b" PKNAME="nvme0n1" MOUNTPOINT="" Formatting '/dev/nvme0n1p2' as vfat.. mkfs.fat 4.2 (2021-01-31) Done.
然后安装UEFI引导
proxmox-boot-tool init /dev/sdc2 root@vServer:~# proxmox-boot-tool init /dev/nvme0n1p nvme0n1p2 nvme0n1p3 root@vServer:~# proxmox-boot-tool init /dev/nvme0n1p2 Re-executing '/usr/sbin/proxmox-boot-tool' in new private mount namespace.. UUID="DC21-6837" SIZE="536870912" FSTYPE="vfat" PARTTYPE="c12a7328-f81f-11d2-ba4b-00a0c93ec93b" PKNAME="nvme0n1" MOUNTPOINT="" Mounting '/dev/nvme0n1p2' on '/var/tmp/espmounts/DC21-6837'. Installing systemd-boot.. Created "/var/tmp/espmounts/DC21-6837/EFI/systemd". Created "/var/tmp/espmounts/DC21-6837/EFI/BOOT". Created "/var/tmp/espmounts/DC21-6837/loader". Created "/var/tmp/espmounts/DC21-6837/loader/entries". Created "/var/tmp/espmounts/DC21-6837/EFI/Linux". Copied "/usr/lib/systemd/boot/efi/systemd-bootx64.efi" to "/var/tmp/espmounts/DC21-6837/EFI/systemd/systemd-bootx64.efi". Copied "/usr/lib/systemd/boot/efi/systemd-bootx64.efi" to "/var/tmp/espmounts/DC21-6837/EFI/BOOT/BOOTX64.EFI". Random seed file /var/tmp/espmounts/DC21-6837/loader/random-seed successfully written (512 bytes). Created EFI boot entry "Linux Boot Manager". Configuring systemd-boot.. Unmounting '/dev/nvme0n1p2'. Adding '/dev/nvme0n1p2' to list of synced ESPs.. Refreshing kernels and initrds.. Running hook script 'proxmox-auto-removal'.. Running hook script 'zz-proxmox-boot'.. mount: /var/tmp/espmounts/794B-2A48: can't read superblock on /dev/sdb2. mount of /dev/disk/by-uuid/794B-2A48 failed - skipping Copying and configuring kernels on /dev/disk/by-uuid/9EE8-66BA Copying kernel and creating boot-entry for 5.11.22-5-pve Copying kernel and creating boot-entry for 5.4.143-1-pve Copying and configuring kernels on /dev/disk/by-uuid/DC21-6837 Copying kernel and creating boot-entry for 5.11.22-5-pve Copying kernel and creating boot-entry for 5.4.143-1-pve
检查下引导数量,不出意外应该是2个引导设备
root@vServer:~# proxmox-boot-tool status Re-executing '/usr/sbin/proxmox-boot-tool' in new private mount namespace.. System currently booted with uefi mount: /var/tmp/espmounts/794B-2A48: can't read superblock on /dev/sdb2. mount of /dev/disk/by-uuid/794B-2A48 failed - skipping 9EE8-66BA is configured with: uefi (versions: 5.11.22-5-pve, 5.4.143-1-pve) DC21-6837 is configured with: uefi (versions: 5.11.22-5-pve, 5.4.143-1-pve) root@vServer:~#
sdb2是原来的金士顿SV300,因为扑街了,现在连引导分区都无法读取,所以挂载后会提示报错,无视即可,实际显示两条uefi信息即可。
然后顺手把池子扩容成500G的,原来是128G
root@vServer:~# zpool list NAME SIZE ALLOC FREE CKPOINT EXPANDSZ FRAG CAP DEDUP HEALTH ALTROOT rpool 111G 17.3G 93.7G - 335G 33% 15% 1.00x ONLINE -
这个扩容方式有两种,一种是手动扩容,另外一种是自动扩容
方案1
先查看zpool的状态以及对应的设备名称
root@vServer:~# zpool status pool: rpool state: ONLINE status: Some supported features are not enabled on the pool. The pool can still be used, but some features are unavailable. action: Enable all features using 'zpool upgrade'. Once this is done, the pool may no longer be accessible by software that does not support the features. See zpool-features(5) for details. scan: resilvered 17.4G in 00:00:46 with 0 errors on Thu Mar 3 10:48:28 2022 config: NAME STATE READ WRITE CKSUM rpool ONLINE 0 0 0 mirror-0 ONLINE 0 0 0 nvme-Micron_MTFDHBA512QFD_20432B47499A-part3 ONLINE 0 0 0 ata-KIOXIA-EXCERIA_SATA_SSD_61IB837QKA93-part3 ONLINE 0 0 0 errors: No known data errors
得到zpool的名称为rpool \当前挂载的硬盘为nvme-Micron_MTFDHBA512QFD_20432B47499A-part3和ata-KIOXIA-EXCERIA_SATA_SSD_61IB837QKA93-part3
然后执行扩容命令
zpool online -e <池名称> <设备名称> root@vServer:~# zpool online -e rpool nvme-Micron_MTFDHBA512QFD_20432B47499A-part3 root@vServer:~# zpool online -e rpool ata-KIOXIA-EXCERIA_SATA_SSD_61IB837QKA93-part3 root@vServer:~#
然后再查看下zpool信息
root@vServer:~# zpool list NAME SIZE ALLOC FREE CKPOINT EXPANDSZ FRAG CAP DEDUP HEALTH ALTROOT rpool 446G 17.3G 429G - - 8% 3% 1.00x ONLINE - root@vServer:~#
变成446G了,这就扩容完成了
方案2就更简单了
直接打开zpool的自动扩容功能即可,不过我没验证过,但有这种方式,欢迎各位尝试下。
zpool set autoexpand=on <池名称>
到这里,磁盘损坏替换,到存储池扩容就完成咯。