qemu rbd 卷存储空间同步释放

Jun 30, 2023 23:00 · 980 words · 2 minute read Virtualization Linux

一句话描述问题:libvirt + qemu 实例,使用 Ceph 存储,数据卷中删除文件后 Ceph 中存储资源并未立即释放。

Guest OS 操作系统 CentOS 7.8,内核版本 3.10.0。

$ lsblk
NAME   MAJ:MIN RM SIZE RO TYPE MOUNTPOINT
vda      8:0    0  50G  0 disk
vda    253:0    0  20G  0 disk
└─vda1 253:1    0  20G  0 part /
vdb    253:16   0   1M  0 disk

vda 为数据盘,格式化后将其挂载到 /mnt 挂载点:

$ mkfs.xfs /dev/vda
Discarding blocks...Done.
meta-data=/dev/vda               isize=512    agcount=4, agsize=3276800 blks
         =                       sectsz=512   attr=2, projid32bit=1
         =                       crc=1        finobt=0, sparse=0
data     =                       bsize=4096   blocks=13107200, imaxpct=25
         =                       sunit=0      swidth=0 blks
naming   =version 2              bsize=4096   ascii-ci=0 ftype=1
log      =internal log           bsize=4096   blocks=6400, version=2
         =                       sectsz=512   sunit=0 blks, lazy-count=1
realtime =none                   extsz=4096   blocks=0, rtextents=0
$ mount /dev/vda /mnt/
[94035.914556] XFS (vda): Mounting V5 Filesystem
[94036.028276] XFS (vda): Ending clean mount
$ lsblk
NAME   MAJ:MIN RM SIZE RO TYPE MOUNTPOINT
vda      8:0    0  50G  0 disk /mnt
vda    253:0    0  20G  0 disk
└─vda1 253:1    0  20G  0 part /
vdb    253:16   0   1M  0 disk

查看 /dev/vda 对应的 rbd 卷在 Ceph 集群中的存储使用:

$ rbd du mec-ecs-pool/csi-vol-65a6dd12-162a-11ee-81b6-38ca843ae36c
warning: fast-diff map is not enabled for csi-vol-65a6dd12-162a-11ee-81b6-38ca843ae36c. operation may be slow.
NAME                                          PROVISIONED  USED
csi-vol-65a6dd12-162a-11ee-81b6-38ca843ae36c       50 GiB  44 MiB

rbd 卷本身 50GB,但实际只使用了 44 MiB。

现在向 /mnt 路径下写入 20G 数据:

$ dd if=/dev/zero of=/mnt/test bs=1GB count=20
20+0 records in
20+0 records out
20000000000 bytes (20 GB) copied, 268.295 s, 74.5 MB/s
$ df -h /mnt/
Filesystem      Size  Used Avail Use% Mounted on
/dev/vda         50G   19G   32G  38% /mnt

查看 rbd 卷在 Ceph 中的存储使用:

$ rbd du mec-ecs-pool/csi-vol-65a6dd12-162a-11ee-81b6-38ca843ae36c
warning: fast-diff map is not enabled for csi-vol-65a6dd12-162a-11ee-81b6-38ca843ae36c. operation may be slow.
NAME                                          PROVISIONED  USED
csi-vol-65a6dd12-162a-11ee-81b6-38ca843ae36c       50 GiB  19 GiB

然后删掉 dd 生成的大体积测试文件:

$ rm -rf /mnt/test

再次查看 rbd 卷在 Ceph 中的存储使用:

$ rbd du mec-ecs-pool/csi-vol-65a6dd12-162a-11ee-81b6-38ca843ae36c
warning: fast-diff map is not enabled for csi-vol-65a6dd12-162a-11ee-81b6-38ca843ae36c. operation may be slow.
NAME                                          PROVISIONED  USED
csi-vol-65a6dd12-162a-11ee-81b6-38ca843ae36c       50 GiB  19 GiB

Ceph 侧 rbd 卷中的存储并未释放,这会造成存储资源的浪费。

Discard

环境满足 Ceph 0.46+qemu 1.1+,Ceph 块设备就可以支持 Discard 操作,即 Guest OS 文件系统可以发送 TRIM 请求来让块设备回收不使用的空间。

libvirt domain 定义要满足:

  1. 数据盘使用 SCSI 总线
  2. qemu 磁盘驱动 discard 设置为 unmap
  3. SCSI 的 PCI 控制器 model 设置为 virtio-scsi

我们将实例关机后 virsh edit 修改其 domain:

    <disk type='network' device='disk'>
      <driver name='qemu' type='raw' cache='none' error_policy='stop' discard='unmap'/>
      <auth username='csi-rbd-provisioner'>
        <secret type='ceph' uuid='8fedf300-282c-4531-a66d-ca2691aaa88b'/>
      </auth>
      <source protocol='rbd' name='mec-ecs-pool/csi-vol-65a6dd12-162a-11ee-81b6-38ca843ae36c' index='1'>
        <host name='192.168.81.37' port='6789'/>
        <host name='192.168.81.38' port='6789'/>
        <host name='192.168.81.39' port='6789'/>
      </source>
      <target dev='sda' bus='scsi'/>
      <serial>ecs-test0-datavol</serial>
      <alias name='ua-datadisk'/>
      <address type='drive' controller='0' bus='0' target='0' unit='0'/>
    </disk>

    <controller type='scsi' index='0' model='virtio-scsi'>
      <alias name='scsi0'/>
      <address type='pci' domain='0x0000' bus='0x02' slot='0x00' function='0x0'/>
    </controller>

保存后开机进入实例执行 fstrim

# guest os
$ mount /dev/vda /mnt/
$ lsblk
NAME   MAJ:MIN RM SIZE RO TYPE MOUNTPOINT
sda      8:0    0  50G  0 disk /mnt
vda    253:0    0  20G  0 disk
└─vda1 253:1    0  20G  0 part /
vdb    253:16   0   1M  0 disk
$ fstrim /mnt/

# ceph
$ rbd du mec-ecs-pool/csi-vol-65a6dd12-162a-11ee-81b6-38ca843ae36c
warning: fast-diff map is not enabled for csi-vol-65a6dd12-162a-11ee-81b6-38ca843ae36c. operation may be slow.
NAME                                          PROVISIONED  USED
csi-vol-65a6dd12-162a-11ee-81b6-38ca843ae36c       50 GiB  68 MiB

fstrim 主动触发了 rbd 卷的存储空间释放,20GB 的空间被回收。如果想要删除文件自动触发 TRIM,那么在挂载数据卷时带上 discard 选项:

# guest os
$ mount -o discard /dev/sda /mnt/
[98639.925204] XFS (sda): Mounting V5 Filesystem
[98640.082683] XFS (sda): Ending clean mount
$ dd if=/dev/zero of=/mnt/test bs=1GB count=20
20+0 records in
20+0 records out
20000000000 bytes (20 GB) copied, 261.615 s, 76.4 MB/s
$ df -h /mnt/
Filesystem      Size  Used Avail Use% Mounted on
/dev/sda         50G   19G   32G  38% /mnt

# ceph
$ rbd du mec-ecs-pool/csi-vol-65a6dd12-162a-11ee-81b6-38ca843ae36c
warning: fast-diff map is not enabled for csi-vol-65a6dd12-162a-11ee-81b6-38ca843ae36c. operation may be slow.
NAME                                          PROVISIONED  USED
csi-vol-65a6dd12-162a-11ee-81b6-38ca843ae36c       50 GiB  18 GiB

# guest os
$ rm -rf /mnt/test
$ df -h /mnt/
Filesystem      Size  Used Avail Use% Mounted on
/dev/sda         50G   33M   50G   1% /mnt

# ceph
$ rbd du mec-ecs-pool/csi-vol-65a6dd12-162a-11ee-81b6-38ca843ae36c
warning: fast-diff map is not enabled for csi-vol-65a6dd12-162a-11ee-81b6-38ca843ae36c. operation may be slow.
NAME                                          PROVISIONED  USED
csi-vol-65a6dd12-162a-11ee-81b6-38ca843ae36c       50 GiB  76 MiB

如果 Guest OS(Linux)内核版本在 5.0 以上,domain 就无需使用 SCSI 总线和驱动了,直接用 VirtIO 即可。

参考文档