oe2403sp2.aarch64 mlx5e af_xdp 高负载下内核Oops

使用AF_XDP收取指定UDP端口的报文,收取即丢弃,当接近满负载时内核崩溃

现场:
[root@kunpeng132 ~]# tail -80 /var/crash/127.0.0.1-2025-07-23-14:35:07/vmcore-dmesg.txt
[ 38.522899] hns3 0000:bd:00.2 enp189s0f2: link up
[ 40.064487] hns3 0000:7d:00.0 enp125s0f0: link up
[ 40.069484] hns3 0000:bd:00.1 enp189s0f1: link up
[ 40.078846] hns3 0000:bd:00.0 enp189s0f0: link up
[ 40.139358] hns3 0000:bd:00.3 enp189s0f3: link up
[ 40.386571] hns3 0000:7d:00.1 enp125s0f1: link up
[ 43.272627] FS-Cache: Loaded
[ 43.353186] Key type dns_resolver registered
[ 43.528670] NFS: Registering the id_resolver key type
[ 43.528957] Key type id_resolver registered
[ 43.529142] Key type id_legacy registered
[ 48.709115] block dm-0: the capability attribute has been deprecated.
[ 80.384525] usb 1-1.1: USB disconnect, device number 3
[ 82.261626] usb 1-1.1: new full-speed USB device number 4 using ehci-pci
[ 82.414529] usb 1-1.1: New USB device found, idVendor=12d1, idProduct=0003, bcdDevice= 1.00
[ 82.414553] usb 1-1.1: New USB device strings: Mfr=1, Product=2, SerialNumber=0
[ 82.414571] usb 1-1.1: Product: Keyboard/Mouse KVM 1.1.0
[ 82.427767] input: Keyboard/Mouse KVM 1.1.0 as /devices/pci0000:7a/0000:7a:01.0/usb1/1-1/1-1.1/1-1.1:1.0/0003:12D1:0003.0002/input/input2
[ 82.557674] hid-generic 0003:12D1:0003.0002: input,hidraw0: USB HID v1.10 Keyboard [Keyboard/Mouse KVM 1.1.0] on usb-0000:7a:01.0-1.1/input0
[ 82.564952] input: Keyboard/Mouse KVM 1.1.0 as /devices/pci0000:7a/0000:7a:01.0/usb1/1-1/1-1.1/1-1.1:1.1/0003:12D1:0003.0003/input/input3
[ 82.565897] hid-generic 0003:12D1:0003.0003: input,hidraw1: USB HID v1.10 Mouse [Keyboard/Mouse KVM 1.1.0] on usb-0000:7a:01.0-1.1/input1
[ 3335.769720] mlx5_core 0000:84:00.0: Using 48-bit DMA addresses
[ 3624.817576] Unable to handle kernel NULL pointer dereference at virtual address 0000000000000034
[ 3624.817771] Mem abort info:
[ 3624.817838] ESR = 0x0000000096000006
[ 3624.817910] EC = 0x25: DABT (current EL), IL = 32 bits
[ 3624.817990] SET = 0, FnV = 0
[ 3624.818072] EA = 0, S1PTW = 0
[ 3624.818155] FSC = 0x06: level 2 translation fault
[ 3624.818245] Data abort info:
[ 3624.818337] ISV = 0, ISS = 0x00000006, ISS2 = 0x00000000
[ 3624.818437] CM = 0, WnR = 0, TnD = 0, TagAccess = 0
[ 3624.818544] GCS = 0, Overlay = 0, DirtyBit = 0, Xs = 0
[ 3624.818659] user pgtable: 4k pages, 48-bit VAs, pgdp=0000202099b38000
[ 3624.818782] [0000000000000034] pgd=0800202093036403, p4d=0800202093036403, pud=0800202093035403, pmd=0000000000000000
[ 3624.819052] Internal error: Oops: 0000000096000006 [#1] SMP
[ 3624.819203] Modules linked in: rpcsec_gss_krb5 auth_rpcgss nfsv4 dns_resolver nfs lockd grace fscache netfs rfkill sunrpc vfat fat ipmi_ssif hibmc_drm acpi_ipmi drm_vram_helper hns_roce_hw_v2 ses drm_ttm_helper bnxt_re mlx5_ib ttm enclosure ipmi_si ipmi_devintf ib_uverbs hisi_uncore_hha_pmu hisi_uncore_ddrc_pmu hisi_uncore_l3c_pmu drm_display_helper sg hisi_uncore_pmu ipmi_msghandler arm_smmuv3_pmu ib_core drm_kms_helper fuse drm nfnetlink ext4 mbcache jbd2 sd_mod t10_pi crc64_rocksoft_generic crc64_rocksoft crc64 realtek hclge hisi_sas_v3_hw crct10dif_ce hisi_sas_main ghash_ce libsas sha2_ce ahci mlx5_core sha256_arm64 sha1_ce sbsa_gwdt libahci scsi_transport_sas bnxt_en libata hns3 mlxfw megaraid_sas hnae3 host_edma_drv i2c_designware_platform i2c_designware_core dm_mirror dm_region_hash dm_log dm_multipath dm_mod aes_ce_blk aes_ce_cipher
[ 3624.821550] CPU: 38 PID: 0 Comm: swapper/38 Kdump: loaded Not tainted 6.6.0-101.0.0.107.oe2403sp2.aarch64 #1
[ 3624.822063] Hardware name: Huawei TaiShan 200 (Model 2280)/BC82AMDGA, BIOS 1.36 05/04/2020
[ 3624.822572] pstate: 20400009 (nzCv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=–)
[ 3624.823091] pc : mlx5e_xsk_free_rx_wqe.part.0.isra.0+0x1c/0xf0 [mlx5_core]
[ 3624.823751] lr : mlx5e_post_rx_wqes.part.0+0x2fc/0x338 [mlx5_core]
[ 3624.824117] sp : ffff800082fabcf0
[ 3624.824402] x29: ffff800082fabcf0 x28: 0000000000000000 x27: ffff2020957d6940
[ 3624.824973] x26: ffff2020957d76c0 x25: ffff202073c63000 x24: ffff2020957d3fd0
[ 3624.825558] x23: 00000000000005cc x22: 0000000000000200 x21: 0000000000000000
[ 3624.826157] x20: 00000000000003cc x19: ffff2020957d70c0 x18: ffffffffffffffff
[ 3624.826760] x17: ffffa036ddfe5000 x16: ffff800082fa8000 x15: ffffffffffffffff
[ 3624.827362] x14: ffff20200cd4d800 x13: ffff20200cd4d7d8 x12: 0000000005f5e100
[ 3624.827963] x11: 0000000000000000 x10: ffff8000d24a4000 x9 : ffff80007b322254
[ 3624.828563] x8 : ffff8000d315a000 x7 : ffff202014d8a410 x6 : 0000000000000000
[ 3624.829161] x5 : 0000000000000034 x4 : 0000000000000040 x3 : 0000000000000000
[ 3624.829757] x2 : 0000000000000000 x1 : ffff20200c0a8000 x0 : ffff20200bc89000
[ 3624.830351] Call trace:
[ 3624.830643] mlx5e_xsk_free_rx_wqe.part.0.isra.0+0x1c/0xf0 [mlx5_core]
[ 3624.831295] mlx5e_post_rx_wqes.part.0+0x2fc/0x338 [mlx5_core]
[ 3624.831665] mlx5e_post_rx_wqes+0x40/0x68 [mlx5_core]
[ 3624.832025] mlx5e_napi_poll+0x29c/0x750 [mlx5_core]
[ 3624.832382] __napi_poll+0x40/0x1d8
[ 3624.832662] napi_poll+0x158/0x198
[ 3624.832932] net_rx_action+0xe0/0x270
[ 3624.833198] handle_softirqs+0x128/0x330
[ 3624.833465] __do_softirq+0x1c/0x28
[ 3624.833723] ____do_softirq+0x18/0x30
[ 3624.833975] call_on_irq_stack+0x24/0x30
[ 3624.834223] do_softirq_own_stack+0x24/0x38
[ 3624.834465] irq_exit_rcu+0x108/0x130
[ 3624.834701] el1_interrupt+0x58/0x120
[ 3624.834932] el1h_64_irq_handler+0x24/0x30
[ 3624.835156] el1h_64_irq+0x78/0x80
[ 3624.835373] default_idle_call+0x74/0x150
[ 3624.835590] cpuidle_idle_call+0x18c/0x200
[ 3624.835797] do_idle+0xbc/0x188
[ 3624.835994] cpu_startup_entry+0x3c/0x50
[ 3624.836187] secondary_start_kernel+0x14c/0x1d8
[ 3624.836378] __secondary_switched+0xb8/0xc0
[ 3624.836565] Code: a9bc7bfd 910003fd a9025bf5 f9400015 (b94036a0)
[ 3624.836752] SMP: stopping secondary CPUs
[ 3624.839238] Starting crashdump kernel…
[ 3624.839433] Bye!
[root@kunpeng132 ~]#

网卡名称:Mellanox Technologies MT27710 Family [ConnectX-4 Lx]
网卡信息:
[root@kunpeng132 ~]# ethtool -i enp132s0f0np0
driver: mlx5_core
version: 6.6.0-101.0.0.107.oe2403sp2.aar
firmware-version: 14.20.1010 (HUA0020040036)
expansion-rom-version:
bus-info: 0000:84:00.0
supports-statistics: yes
supports-test: yes
supports-eeprom-access: no
supports-register-dump: no
supports-priv-flags: yes
[root@kunpeng132 ~]#

[root@kunpeng132 ~]# modinfo mlx5_core
filename: /lib/modules/6.6.0-101.0.0.107.oe2403sp2.aarch64/kernel/drivers/net/ethernet/mellanox/mlx5/core/mlx5_core.ko.xz
license: Dual BSD/GPL
description: Mellanox 5th generation network adapters (ConnectX series) core driver
author: Eli Cohen