xhci_hcd 主控挂死

[root@localhost ~]# uname -a
Linux localhost.localdomain 5.10.0-216.0.0.115.oe2203sp4.x86_64 #1 SMP Thu Jun 27 15:13:44 CST 2024 x86_64 x86_64 x86_64 GNU/Linux
[root@localhost ~]# more /etc/os-release
NAME=“openEuler”
VERSION=“22.03 (LTS-SP4)”
ID=“openEuler”
VERSION_ID=“22.03”
PRETTY_NAME=“openEuler 22.03 (LTS-SP4)”
ANSI_COLOR=“0;31”

message日志提示
Jul 3 18:34:50 localhost kernel: [ 6892.491082] xhci_hcd 0000:08:00.0: xHCI host controller not responding, assume dead
Jul 3 18:34:50 localhost kernel: [ 6892.491179] xhci_hcd 0000:08:00.0: HC died; cleaning up
Jul 3 18:34:50 localhost kernel: [ 6892.491273] usb 1-2: USB disconnect, device number 29
Jul 3 18:34:50 localhost kernel: [ 6892.491275] usb 1-2.1: USB disconnect, device number 30
Jul 3 18:34:50 localhost kernel: [ 6892.491277] usb 1-2.1.1: USB disconnect, device number 33
Jul 3 18:34:50 localhost kernel: [ 6892.692753] xhci_hcd 0000:0b:00.0: xHCI host controller not responding, assume dead
Jul 3 18:34:50 localhost kernel: [ 6892.692854] xhci_hcd 0000:0b:00.0: HC died; cleaning up
Jul 3 18:34:50 localhost kernel: [ 6892.692938] usb 7-2: USB disconnect, device number 24
Jul 3 18:34:50 localhost kernel: [ 6892.692941] usb 7-2.1: USB disconnect, device number 25
Jul 3 18:34:50 localhost kernel: [ 6892.692943] usb 7-2.1.1: USB disconnect, device number 28
Jul 3 18:34:51 localhost kernel: [ 6892.894524] xhci_hcd 0000:0a:00.0: xHCI host controller not responding, assume dead
Jul 3 18:34:51 localhost kernel: [ 6892.894595] usb 1-2.1.2: USB disconnect, device number 36
Jul 3 18:34:51 localhost kernel: [ 6892.894621] xhci_hcd 0000:0a:00.0: HC died; cleaning up
Jul 3 18:34:51 localhost kernel: [ 6892.894728] usb 5-2: USB disconnect, device number 29
Jul 3 18:34:51 localhost kernel: [ 6892.894730] usb 5-2.1: USB disconnect, device number 30
Jul 3 18:34:51 localhost kernel: [ 6892.894732] usb 5-2.1.1: USB disconnect, device number 33
Jul 3 18:34:51 localhost kernel: [ 6893.096288] xhci_hcd 0000:09:00.0: xHCI host controller not responding, assume dead
Jul 3 18:34:51 localhost kernel: [ 6893.096374] usb 1-2.1.3: USB disconnect, device number 39
Jul 3 18:34:51 localhost kernel: [ 6893.096390] xhci_hcd 0000:09:00.0: HC died; cleaning up
Jul 3 18:34:51 localhost kernel: [ 6893.096493] usb 3-2: USB disconnect, device number 24
Jul 3 18:34:51 localhost kernel: [ 6893.096496] usb 3-2.1: USB disconnect, device number 25
Jul 3 18:34:51 localhost kernel: [ 6893.096498] usb 3-2.1.1: USB disconnect, device number 28
Jul 3 18:34:51 localhost kernel: [ 6893.297930] usb 5-2.1.2: USB disconnect, device number 38
Jul 3 18:34:51 localhost kernel: [ 6893.499373] usb 7-2.1.2: USB disconnect, device number 30
Jul 3 18:34:51 localhost kernel: [ 6893.701685] usb 1-2.1.4: USB disconnect, device number 38
Jul 3 18:34:51 localhost kernel: [ 6893.701742] usb 3-2.1.2: USB disconnect, device number 33
Jul 3 18:34:52 localhost kernel: [ 6893.903215] usb 7-2.1.3: USB disconnect, device number 32
Jul 3 18:34:52 localhost kernel: [ 6893.903300] usb 3-2.1.3: USB disconnect, device number 30
Jul 3 18:34:52 localhost kernel: [ 6894.105227] usb 3-2.1.4: USB disconnect, device number 31
Jul 3 18:34:52 localhost kernel: [ 6894.306810] usb 7-2.1.4: USB disconnect, device number 33
Jul 3 18:34:52 localhost kernel: [ 6894.508219] usb 1-2.2: USB disconnect, device number 31
Jul 3 18:34:52 localhost kernel: [ 6894.508222] usb 1-2.2.1: USB disconnect, device number 32
Jul 3 18:34:52 localhost kernel: [ 6894.508300] usb 3-2.2: USB disconnect, device number 26
Jul 3 18:34:52 localhost kernel: [ 6894.508303] usb 3-2.2.1: USB disconnect, device number 27
Jul 3 18:34:52 localhost kernel: [ 6894.709947] usb 5-2.1.3: USB disconnect, device number 36
Jul 3 18:34:53 localhost kernel: [ 6894.911595] usb 3-2.2.2: USB disconnect, device number 34
Jul 3 18:34:53 localhost kernel: [ 6895.112884] usb 1-2.2.2: USB disconnect, device number 35
Jul 3 18:34:53 localhost kernel: [ 6895.314763] usb 1-2.2.3: USB disconnect, device number 37
Jul 3 18:34:53 localhost kernel: [ 6895.314854] usb 3-2.2.3: USB disconnect, device number 29
Jul 3 18:34:53 localhost kernel: [ 6895.516185] usb 7-2.2: USB disconnect, device number 26
Jul 3 18:34:53 localhost kernel: [ 6895.516189] usb 7-2.2.1: USB disconnect, device number 27
Jul 3 18:34:54 localhost kernel: [ 6895.717728] usb 5-2.1.4: USB disconnect, device number 37
Jul 3 18:34:54 localhost kernel: [ 6895.919191] usb 1-2.2.4: USB disconnect, device number 34
Jul 3 18:34:54 localhost kernel: [ 6895.919262] usb 3-2.2.4: USB disconnect, device number 32
Jul 3 18:34:54 localhost kernel: [ 6896.120715] usb 7-2.2.2: USB disconnect, device number 34
Jul 3 18:34:55 localhost kernel: [ 6896.725576] usb 7-2.2.3: USB disconnect, device number 31
Jul 3 18:34:55 localhost kernel: [ 6897.330214] usb 7-2.2.4: USB disconnect, device number 29
Jul 3 18:34:55 localhost kernel: [ 6897.531773] usb 5-2.2: USB disconnect, device number 31
Jul 3 18:34:55 localhost kernel: [ 6897.531779] usb 5-2.2.1: USB disconnect, device number 32
Jul 3 18:34:55 localhost kernel: [ 6897.733486] usb 5-2.2.2: USB disconnect, device number 35
Jul 3 18:34:56 localhost kernel: [ 6898.136809] usb 5-2.2.3: USB disconnect, device number 34
Jul 3 18:34:56 localhost kernel: [ 6898.338526] usb 5-2.2.4: USB disconnect, device number 39

供电没有任何问题,还请大佬帮忙分析下解决思路和方向

我看了下代码,log是由如下函数打印出来的:

void xhci_hc_died(struct xhci_hcd *xhci)
{
int i, j;

    if (xhci->xhc_state & XHCI_STATE_DYING)
            return;

    xhci_err(xhci, "xHCI host controller not responding, assume dead\n");
    xhci->xhc_state |= XHCI_STATE_DYING;

    xhci_cleanup_command_queue(xhci);

    /* return any pending urbs, remove may be waiting for them */
    for (i = 0; i <= HCS_MAX_SLOTS(xhci->hcs_params1); i++) {
            if (!xhci->devs[i])
                    continue;
            for (j = 0; j < 31; j++)
                    xhci_kill_endpoint_urbs(xhci, i, j);
    }

    /* inform usb core hc died if PCI remove isn't already handling it */
    if (!(xhci->xhc_state & XHCI_STATE_REMOVING))
            usb_hc_died(xhci_to_hcd(xhci));

}
但是这个函数有多处调用,由于debug选项没打开,很多debug的log没打出来,不好判断是从哪个分支进入的。不知道这个问题你那边好复现不,能复现的话,建议把xhci_dbg(),xhci_warn(),给打开,看下从哪个分支进入的。

debug选项如何打开,还请指导下

打开调试的话,主要是打开xhci_dbg这个函数的定义,
在如下文件中定义:/drivers/usb/host/xhci.h
#define xhci_dbg(xhci, fmt, args…)
dev_dbg(xhci_to_hcd(xhci)->self.controller , fmt , ## args)

现在的问题变成如何打开dev_dbg这个函数了,我在网上找了一篇帖子讲述的打开方式比较全,你可以参考下:开启dev_dbg方法_dev-dbg-CSDN博客

之前调试驱动的时候,我都是使用上述帖子里的添加 DEBUG 宏的方式。你可以根据你的情况,选择使用其中的一种来打开。