22.03 SP3/SP4升级到24.03后重启系统异常的故障处理过程

省流总结:

可能是22.03 SP3/SP4升级到24.03时,出现了selinux故障,导致升级重启后出错?

解决方法是用live cd chroot到故障盘重装所有包并禁止selinux,接着重启后重装selinux相关包,最后让selinux重建整个文件系统的label。

故障表现:

我是先从22.03 SP3升级到SP4正常,然后修改源地址升级24.03。

但升级过程中,时候先提示丢文件:


  Running scriptlet: selinux-policy-targeted-40.7-2.oe2403.noarch                                                                                                                    665/3030 
uavc:  op=load_policy lsm=selinux seqno=2 res=1Regex version mismatch, expected: 10.39 2021-10-29 actual: 10.42 2022-12-11
Regex version mismatch, expected: 10.39 2021-10-29 actual: 10.42 2022-12-11

  Running scriptlet: kmod-kvdo-8.2.1.2-4.oe2403.x86_64                                                                                                                               986/3030 
+ /usr/sbin/dkms --rpm_safe_upgrade add -m kmod-kvdo -v 8.2.1.2-4
Creating symlink /var/lib/dkms/kmod-kvdo/8.2.1.2-4/source -> /usr/src/kmod-kvdo-8.2.1.2-4
+ /usr/sbin/dkms --rpm_safe_upgrade build -m kmod-kvdo -v 8.2.1.2-4
Sign command: /lib/modules/5.10.0-232.0.0.131.oe2203sp4.x86_64/build/scripts/sign-file
Signing key: /var/lib/dkms/mok.key
Public certificate (MOK): /var/lib/dkms/mok.pub
Certificate or key are missing, generating self signed certificate for MOK...

Building module:
Cleaning build area...
make -j2 KERNELRELEASE=5.10.0-232.0.0.131.oe2203sp4.x86_64 -C /lib/modules/5.10.0-232.0.0.131.oe2203sp4.x86_64/build M=/var/lib/dkms/kmod-kvdo/8.2.1.2-4/build...(bad exit status: 2)
Error! Bad return status for module build on kernel: 5.10.0-232.0.0.131.oe2203sp4.x86_64 (x86_64)
Consult /var/lib/dkms/kmod-kvdo/8.2.1.2-4/build/make.log for more information.
+ /usr/sbin/dkms --rpm_safe_upgrade install -m kmod-kvdo -v 8.2.1.2-4
Sign command: /lib/modules/5.10.0-232.0.0.131.oe2203sp4.x86_64/build/scripts/sign-file
Signing key: /var/lib/dkms/mok.key
Public certificate (MOK): /var/lib/dkms/mok.pub

Building module:
Cleaning build area...
make -j2 KERNELRELEASE=5.10.0-232.0.0.131.oe2203sp4.x86_64 -C /lib/modules/5.10.0-232.0.0.131.oe2203sp4.x86_64/build M=/var/lib/dkms/kmod-kvdo/8.2.1.2-4/build...(bad exit status: 2)
Error! Bad return status for module build on kernel: 5.10.0-232.0.0.131.oe2203sp4.x86_64 (x86_64)
Consult /var/lib/dkms/kmod-kvdo/8.2.1.2-4/build/make.log for more information.
warning: %post(kmod-kvdo-8.2.1.2-4.oe2403.x86_64) scriptlet failed, exit status 10

Error in POSTIN scriptlet in rpm package kmod-kvdo


  Running scriptlet: libwbclient-4.17.5-12.oe2203sp4.x86_64                                                                                                                         2002/3030 
/sbin/ldconfig: /usr/lib64/libproxy.so.1 is not a symbolic link


  Cleanup          : libwbclient-4.17.5-12.oe2203sp4.x86_64                                                                                                                         2002/3030 
warning: file /usr/lib64/samba/wbclient/libwbclient.so.0.15: remove failed: No such file or directory
warning: file /usr/lib64/samba/wbclient/libwbclient.so.0: remove failed: No such file or directory

然后重启到24.03后,卡boot或者报kernel panic,系统不可用。

看到论坛里面好几个人出现类似情况,不过可能和其他人不一样的是,我在22.03 SP3中安装了DDE桌面,但默认是multi-user.target。

故障处理过程:

在升级到24.03重启后无法进入系统后,我的排除故障过程如下:

1、找任意一张linux发行版的live cd,然后挂载24.03所在故障盘(假设为/media/user/dddddddddd),chroot进入:

mount -o rbind /dev /media/user/dddddddddd/dev
mount -t proc none /media/user/dddddddddd/proc
mount -o bind /sys /media/user/dddddddddd/sys
mount -o bind /tmp /media/user/dddddddddd/tmp
cp /etc/resolv.conf /media/user/dddddddddd/etc/resolv.conf
chroot /media/user/dddddddddd

2、执行重装所有包

dnf reinstall --refresh $(rpm -qa)

3、重启系统到24.03后,发现无法登录。表现为能输入用户名和密码后重新返回登录界面。

4、再次进入live cd,故障盘里面的/var/log/messages有如下信息:

type=AVC msg=audit(1729844376.924:155): avc:  denied  { transition } for  pid=6432 comm="(systemd)" path="/usr/lib/systemd/systemd" dev="dm-0" ino=1078989 scontext=system_u:system_r:kernel_t:s0 tcontext=unconfined_u:unconfined_r:unconfined_t:s0 tclass=process permissive=0

type=AVC msg=audit(1729844376.942:160): avc:  denied  { transition } for  pid=6439 comm="login" path="/usr/bin/bash" dev="dm-0" ino=1048733 scontext=system_u:system_r:kernel_t:s0 tcontext=unconfined_u:unconfined_r:unconfined_t:s0 tclass=process permissive=0

然后将故障盘中的/etc/selinux/configSELINUX=enforcing改成SELINUX=permissive,重启到24.03能正常进入。

5、但此时24.03的selinux状态异常。执行audit2allow后,提示缺失大量selinux规则:

命令:ausearch -m AVC  | audit2allow

输出(部分):

#============= kernel_t ==============
allow kernel_t sshd_net_t:process dyntransition;
allow kernel_t unconfined_t:process { dyntransition transition };

命令:audit2allow -w -a

输出(部分):

type=AVC msg=audit(1689734806.566:106): avc:  denied  { write } for  pid=2211 comm="onboard" name="dbus-CrRLvzOqua" dev="tmpfs" ino=28 scontext=system_u:system_r:xdm_t:s0-s0:c0.c1023 tcontext=unconfined_u:object_r:session_dbusd_tmp_t:s0 tclass=sock_file permissive=0
	Was caused by:
		Unknown - would be allowed by active policy
		Possible mismatch between this policy and the one under which the audit message was generated.

		Possible mismatch between current in-memory boolean settings vs. permanent ones.

type=AVC msg=audit(1729844951.399:83): avc:  denied  { transition } for  pid=3106 comm="(systemd)" path="/usr/lib/systemd/systemd" dev="dm-0" ino=1078989 scontext=system_u:system_r:kernel_t:s0 tcontext=unconfined_u:unconfined_r:unconfined_t:s0 tclass=process permissive=1
	Was caused by:
		Missing type enforcement (TE) allow rule.

		You can use audit2allow to generate a loadable module to allow this access.

type=AVC msg=audit(1729844980.333:100): avc:  denied  { dyntransition } for  pid=3305 comm="sshd" scontext=system_u:system_r:kernel_t:s0 tcontext=system_u:system_r:sshd_net_t:s0 tclass=process permissive=1
	Was caused by:
		Missing type enforcement (TE) allow rule.

		You can use audit2allow to generate a loadable module to allow this access.

6、此时考虑重建selinux整个环境,一番搜索后,发现命令行如下:

mv /etc/selinux/targeted /etc/selinux/targeted.bak
mv /etc/selinux/config /etc/selinux/config.bak
dnf remove selinux-policy*
dnf install selinux-policy-targeted
dnf install selinux-policy-devel policycoreutils policycoreutils-devel
touch /.autorelabel

(注意要先单独装selinux-policy-targeted,再执行后面命令,不能合并安装,原因未知)

7、重启系统,此时selinux会重建整个文件系统的label,时间可能较长。

重建完成后,系统会再次自动重启。

这个时候命令行界面的故障解除了,也能正常ssh登录了。

8、如果没有安装DDE桌面环境,则故障完全解除;如果有,还需要进行下面操作。

此时DDE桌面环境仍然不能正常启动。需要卸载DDE再重装:

dnf remove dde-* startdde
dnf install ddednf remove 

最新的简洁修复方法,相比上述需要重装所有包,更快处理:

找任意发行版的live cd,以光盘形式启动到live cd。

然后chroot故障盘(假设为/media/user/dddddddddd):

mount -o rbind /dev /media/user/dddddddddd/dev
mount -t proc none /media/user/dddddddddd/proc
mount -o bind /sys /media/user/dddddddddd/sys
mount -o bind /tmp /media/user/dddddddddd/tmp
cp /etc/resolv.conf /media/user/dddddddddd/etc/resolv.conf
chroot /media/user/dddddddddd

接着,按顺序修复systemd和selinux关键组件:

(备注:不重装systemd-udev会导致systemd-remount-fs.service异常,表现为根目录只读)

dnf reinstall systemd systemd-udev

mv /etc/selinux/targeted /etc/selinux/targeted.bak
mv /etc/selinux/config /etc/selinux/config.bakold

dnf remove selinux-policy*
dnf install selinux-policy-targeted
dnf install selinux-policy-devel policycoreutils policycoreutils-devel

mv /etc/selinux/config /etc/selinux/config.bakrpmnew
mv /etc/selinux/config.bakold /etc/selinux/config

touch /.autorelabel

然后重启,此时selinux会重建整个文件系统的label,时间可能较长。

重建完成后,系统会再次自动重启。

重启后,故障解除。