【故障现象】
超融合主机在 5.10.0-2-openeuler 内核下,在虚拟机中执行vdbench测试,会提示下面的错误。
12月 12, 2025 interval i/o MB/sec bytes read resp read write read write resp queue cpu% cpu%
rate 1024**2 i/o pct time resp resp max max stddev depth sys+u sys
......
10:36:08.002 954 50931.0 397.90 8192 100.00 0.626 0.626 0.000 2.47 0.00 0.109 31.9 3.9 3.4
10:36:09.005 955 51180.0 399.84 8192 100.00 0.623 0.623 0.000 1.85 0.00 0.098 31.9 3.8 3.4
10:36:09.052 localhost-0: 10:36:09.050 op: read lun: /dev/sdb lba: 27240685568 0x657ABE000 xfer: 8192 errno: BAD_READ_RETURN: 'Read was successful, but data buffer contents not changed'
10:36:10.008 956 50444.0 394.09 8192 100.00 0.632 0.632 0.000 10.91 0.00 0.224 31.9 4.3 3.6
10:36:11.003 957 50884.0 397.53 8192 100.00 0.623 0.623 0.000 4.23 0.00 0.102 31.7 4.5 3.9
10:36:12.002 958 50665.0 395.82 8192 100.00 0.625 0.625 0.000 2.71 0.00 0.107 31.7 3.9 3.4
10:36:13.003 959 51145.0 399.57 8192 100.00 0.624 0.624 0.000 1.24 0.00 0.094 31.9 3.5 2.9
10:36:14.005 960 50960.0 398.13 8192 100.00 0.626 0.626 0.000 5.69 0.00 0.140 31.9 3.3 2.8
此时,主机和虚拟机都没有报错。
【升级内核到5.10.0-2-294.0.0】
将内核版本更新到升级到最新版本5.10.0-294.0.0:
commit 152c8aee0c8ae7e86857a648adec0a4166402e98 (HEAD, tag: 5.10.0-294.0.0, origin/OLK-5.10)
Merge: c7c9a8efa744 832d086ab595
Author: openeuler-ci-bot <george@openeuler.sh>
Date: Wed Dec 10 08:42:52 2025 +0000
重新构建了openeuler内核,问题依然发生:
10:44:30.563 localhost-0: 10:44:30.560 op: read lun: /dev/sdb lba: 341478957056 0x4F81BB6000 xfer: 8192 errno: BAD_READ_RETURN: 'Read was successful, but data buffer contents not changed'
10:48:29.800 localhost-0: 10:48:29.797 op: read lun: /dev/sdb lba: 885798019072 0xCE3DAD2000 xfer: 8192 errno: BAD_READ_RETURN: 'Read was successful, but data buffer contents not changed'
【更换虚拟机存储类型】
虚拟机硬盘使用主机上的ssd-lvm-thin, nvme-lvm-thin,raid-lvm,zfs 等存储时,都会报错。
【更换主机内核】
将主机内核从5.10改为6.1.62-generice时,同样的虚拟机同样的测试,不会报错。
【更换测试目标】
将测试方法直接应到到主机上,5.10内核和6.1内核,都不会报错。
【测试错误说明】
根据vdbench的解释:
errno: BAD_READ_RETURN: ‘Read was successful, but data buffer contents not changed’
vdbench 在发起 read() 前,会预先用特定模式(如 deadeeee)填充读缓冲区。如果 read() 系统调用返回成功,但缓冲区内容仍然是老数据。
可能是内核或驱动没有真正从设备读取数据;或者“空操作”、“缓存命中但未更新”、静默失败(silent failure)。
这意味着I/O协议的基础假设被打破,数据的可靠性无法信任。
【主机配置】
Kunpeng-920主机,lscpu如下所示:
Architecture: aarch64
CPU op-mode(s): 64-bit
Byte Order: Little Endian
CPU(s): 96
On-line CPU(s) list: 0-95
Vendor ID: HiSilicon
BIOS Vendor ID: HiSilicon
Model name: Kunpeng-920
BIOS Model name: HUAWEI Kunpeng 920 5250 To be filled by O.E.M. CPU @ 2.6GHz
BIOS CPU family: 280
Model: 0
Thread(s) per core: 1
Core(s) per socket: 48
Socket(s): 2
Stepping: 0x1
Frequency boost: disabled
CPU(s) scaling MHz: 100%
CPU max MHz: 2600.0000
CPU min MHz: 200.0000
BogoMIPS: 200.00
Flags: fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm jscvt fcma dcpop asimddp asimdfhm
Caches (sum of all):
L1d: 6 MiB (96 instances)
L1i: 6 MiB (96 instances)
L2: 48 MiB (96 instances)
L3: 96 MiB (4 instances)
NUMA:
NUMA node(s): 4
NUMA node0 CPU(s): 0-23
NUMA node1 CPU(s): 24-47
NUMA node2 CPU(s): 48-71
NUMA node3 CPU(s): 72-95
Vulnerabilities:
Gather data sampling: Not affected
Itlb multihit: Not affected
L1tf: Not affected
Mds: Not affected
Meltdown: Not affected
Mmio stale data: Not affected
Retbleed: Not affected
Spec rstack overflow: Not affected
Spec store bypass: Not affected
Spectre v1: Mitigation; __user pointer sanitization
Spectre v2: Not affected
Srbds: Not affected
Tsx async abort: Not affected
【虚拟机版本】
qemu-kvm 9.2.0/qemu-kvm 8.1.2都会触发
【虚拟机配置】
root@tghci-node2:~# qm config 189
bios: ovmf
boot: order=scsi0;scsi2;net0
cores: 32
efidisk0: hdd:vm-189-disk-0,efitype=4m,pre-enrolled-keys=1,size=64M
memory: 8192
meta: creation-qemu=8.1.2,ctime=1765269847
name: openeuler
net0: virtio=BC:24:11:65:BD:B0,bridge=vmbr0,firewall=1
numa: 0
ostype: l26
scsi0: newnvm:vm-189-disk-0,iothread=1,size=32G
scsi1: ssd-lvm-thin:vm-189-disk-0,iothread=1,size=32G
scsi2: cephfs:iso/openEuler-22.03-LTS-aarch64-dvd.iso,media=cdrom,size=3177624K
scsi3: ssd-lvm-thin:vm-189-disk-1,iothread=1,size=1T
scsihw: virtio-scsi-single
smbios1: uuid=76537a2e-d6bb-4bb6-81ec-ddda62e0a607
sockets: 1
【虚拟机运行状态】
[root@localhost vdbench50407_arm64]# lscpu
架构: aarch64
CPU 运行模式: 64-bit
字节序: Little Endian
CPU: 32
在线 CPU 列表: 0-31
厂商 ID: HiSilicon
BIOS Vendor ID: QEMU
型号名称: Kunpeng-920
BIOS Model name: virt-8.1
型号: 0
每个核的线程数: 1
每个座的核数: 32
座: 1
步进: 0x1
BogoMIPS: 200.00
标记: fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm jscvt fcma dcpop asimddp asimdfhm
NUMA:
NUMA 节点: 1
NUMA 节点0 CPU: 0-31
Vulnerabilities:
Itlb multihit: Not affected
L1tf: Not affected
Mds: Not affected
Meltdown: Not affected
Spec store bypass: Not affected
Spectre v1: Mitigation; __user pointer sanitization
Spectre v2: Not affected
Srbds: Not affected
Tsx async abort: Not affected
[root@localhost vdbench50407_arm64]# uname -a
Linux localhost.localdomain 5.10.0-60.18.0.50.oe2203.aarch64 #1 SMP Wed Mar 30 02:43:08 UTC 2022 aarch64 aarch64 aarch64 GNU/Linux
【触发方式】
1)主机5.10.0-2-openeuler内核下运行,创建openeuler22.03版本的虚拟机,提供一个1TB的大硬盘。
2)登录虚拟机,在虚拟机上通过 vdbench 测试硬盘性能。
3)预埋数据脚本: ./vdbench -f yumai.txt -o yumai.log , yumai.txt文件内容为:
messagescan=no
sd=default,openflags=o_direct
sd=sd1,lun=/dev/sdb
wd=wd1,sd=sd*,xfersize=1m,seekpct=eof,rdpct=0
rd=run1,wd=wd1,iorate=max,elapsed=60000000,interval=1,threads=1
4)反复执行测试脚本: ./vdbench -f iotest.txt -o iotest.log , iotest.txt文件内容为:
messagescan=no
sd=default,openflags=o_direct
sd=sd1,lun=/dev/sdb
wd=wd11,sd=sd*,xfersize=256k,seekpct=0,rdpct=0
wd=wd12,sd=sd*,xfersize=256k,seekpct=0,rdpct=100
wd=wd21,sd=sd*,xfersize=8k,seekpct=100,rdpct=100
wd=wd22,sd=sd*,xfersize=8k,seekpct=100,rdpct=70
rd=run11,wd=wd11,iorate=max,elapsed=300,interval=1,warmup=300,threads=4
rd=run12,wd=wd12,iorate=max,elapsed=300,interval=1,warmup=300,threads=4
rd=run21,wd=wd21,iorate=max,elapsed=600,interval=1,warmup=600,threads=32
rd=run22,wd=wd22,iorate=max,elapsed=600,interval=1,warmup=600,threads=32
【其他信息】
多套环境测试都出现这个现象。比较而言,使用RAID盘更容易复现概率极高。而使用单个SSD盘做成lvm-thin,供虚拟机使用,复现概率低,但也出现一次。
【求助】
vdbench检查读数据返回成功,但数据缓冲区未变更的报错,使得我们对基于内核5.10.0上虚拟机运行的可靠性产生了怀疑,求助社区澄清或者解决该现象。