部署 EulerCopilot 时 Pod 启动失败

image
image

执行 kubectl describe pods -n NAMESPACE POD_NAME 命令或 kubectl logs -n NAMESPACE POD_NAME 命令看看报错日志

[root@localhost ~]# kubectl describe pods -n euler-copilot framework-deploy-777f94d468-x6t8h
名称: framework-deploy-777f94d468-x6t8h
命名空间: euler-copilot
优先级: 0
服务账户: default
节点: localhost/192.168.74.150
开始时间: 2025年4月22日 星期二 17:06:22 +0800
标签: app=framework
pod-template-hash=777f94d468
注解: checksum/secret: 7047af794b488bedcfa928f9031a6fcec3974ac63e7c8f87ce39a853d4e8b3dc
状态: 运行中
IP: 10.42.0.43
IPs:
IP: 10.42.0.43
受控于: ReplicaSet/framework-deploy-777f94d468
初始化容器:
framework-copy:
容器ID: containerd://e44f268f51ae14cae0a5f2fa567c3b083208e333ddac90973e2637517c73fd7f
镜像: Harbor
镜像ID: Harbor
端口:
主机端口:
命令:
python3
./main.py
–config
config.yaml
–copy
状态: 已终止
原因: 完成
退出代码: 0
启动时间: 2025年4月22日 星期二 17:06:25 +0800
结束时间: 2025年4月22日 星期二 17:06:25 +0800
就绪: 是
重启次数: 0
环境变量:
挂载:
/app/config.yaml from framework-config (rw,path=“copy-config.yaml”)
/config-rw from framework-shared (rw)
/config/config.toml from framework-config (rw,path=“config.toml”)
/db-secrets from database-secrets (rw)
/system-secrets from system-secrets (rw)
容器:
framework:
容器ID: containerd://cdcdc5ce7acc5073c802b7419175fcd8bbbc7b97d15e3c412fb150c1994e87a3
镜像: Harbor
镜像ID: Harbor
端口: 8002/TCP
主机端口: 0/TCP
状态: 等待中
原因: CrashLoopBackOff
上次状态: 已终止
原因: 错误
退出代码: 1
启动时间: 2025年4月22日 星期二 22:26:38 +0800
结束时间: 2025年4月22日 星期二 22:26:39 +0800
就绪: 否
重启次数: 67
请求:
cpu: 200m
memory: 512Mi
存活探针: http-get http://:8002/health_check delay=60s timeout=1s period=90s #success=1 #failure=5
环境变量:
TZ: Asia/Shanghai
CONFIG: /app/config/config.toml
挂载:
/app/config from framework-shared (rw)
/app/data from framework-semantics-vl (rw)
/tmp from framework-tmp-volume (rw)
条件:
类型 状态
PodReadyToStartContainers True
已初始化 True
就绪 False
容器就绪 False
已调度 True
卷:
framework-config:
类型: ConfigMap (由 ConfigMap 填充的卷)
名称: framework-config
可选: 否
framework-semantics-vl:
类型: PersistentVolumeClaim (对同一命名空间中 PersistentVolumeClaim 的引用)
ClaimName: framework-semantics-claim
只读: 否
database-secrets:
类型: Secret (由 Secret 填充的卷)
SecretName: euler-copilot-database
可选: 否
system-secrets:
类型: Secret (由 Secret 填充的卷)
SecretName: euler-copilot-system
可选: 否
framework-tmp-volume:
类型: EmptyDir (与 Pod 生命周期共享的临时目录)
介质: 内存
大小限制:
framework-shared:
类型: EmptyDir (与 Pod 生命周期共享的临时目录)
介质: 内存
大小限制:
QoS 类别: Burstable
节点选择器:
容忍度: node.kubernetes.io/not-ready:NoExecute op=Exists for 300s
node.kubernetes.io/unreachable:NoExecute op=Exists for 300s
事件:
类型 原因 年龄 来源 消息


警告 BackOff 9分钟45秒 (共1472次,持续5小时24分钟) kubelet 重启失败容器 framework 的退避操作,Pod: framework-deploy-777f94d468-x6t8h_euler-copilot(d211b8a0-00c4-4ed5-9cf8-f371bd5ed4e0)
正常 Pulled 4分钟46秒 (共68次,持续5小时24分钟) kubelet 容器镜像 “Harbor” 已存在于机器上

[root@localhost ~]# kubectl describe pods -n euler-copilot rag-deploy-7457ffbcb-rp69w
Name: rag-deploy-7457ffbcb-rp69w
Namespace: euler-copilot
Priority: 0
Service Account: default
Node: localhost/192.168.74.150
Start Time: Tue, 22 Apr 2025 17:06:22 +0800
Labels: app=rag
pod-template-hash=7457ffbcb
Annotations: checksum/config: de085db6a7460492b67807274c7c90a7126fd9e3b113cba671f2e90a5a0fd8a5
Status: Pending
IP: 10.42.0.40
IPs:
IP: 10.42.0.40
Controlled By: ReplicaSet/rag-deploy-7457ffbcb
Init Containers:
rag-copy-secret:
Container ID: containerd://209e107698fd735c083f93d66e677b812d39d76d93216c8c7f4580b04474319d
Image: Harbor
Image ID: Harbor
Port:
Host Port:
Command:
python3
./main.py
–config
config.yaml
–copy
State: Waiting
Reason: CrashLoopBackOff
Last State: Terminated
Reason: Error
Exit Code: 1
Started: Tue, 22 Apr 2025 22:38:46 +0800
Finished: Tue, 22 Apr 2025 22:38:46 +0800
Ready: False
Restart Count: 70
Environment:
Mounts:
/app/config.yaml from rag-config-vl (rw,path=“copy-config.yaml”)
/config-rw from rag-shared (rw)
/config/.env from rag-config-vl (rw,path=“.env”)
/config/.env-sql from rag-config-vl (rw,path=“.env-sql”)
/db-secrets from database-secret (rw)
/system-secrets from system-secret (rw)
Containers:
rag:
Container ID:
Image: Harbor
Image ID:
Port: 9988/TCP
Host Port: 0/TCP
State: Waiting
Reason: PodInitializing
Ready: False
Restart Count: 0
Requests:
cpu: 250m
memory: 512Mi
Liveness: http-get http://:9988/health_check delay=60s timeout=1s period=90s #success=1 #failure=5
Environment:
TZ: Asia/Shanghai
Mounts:
/rag-service/chat2db/common/.env from rag-shared (rw,path=“.env-sql”)
/rag-service/data_chain/common/.env from rag-shared (rw,path=“.env”)
Conditions:
Type Status
PodReadyToStartContainers True
Initialized False
Ready False
ContainersReady False
PodScheduled True
Volumes:
rag-config-vl:
Type: ConfigMap (a volume populated by a ConfigMap)
Name: rag-config
Optional: false
database-secret:
Type: Secret (a volume populated by a Secret)
SecretName: euler-copilot-database
Optional: false
system-secret:
Type: Secret (a volume populated by a Secret)
SecretName: euler-copilot-system
Optional: false
rag-shared:
Type: EmptyDir (a temporary directory that shares a pod’s lifetime)
Medium: Memory
SizeLimit:
QoS Class: Burstable
Node-Selectors:
Tolerations: node.kubernetes.io/not-ready:NoExecute op=Exists for 300s
node.kubernetes.io/unreachable:NoExecute op=Exists for 300s
Events:
Type Reason Age From Message


Warning BackOff 3m59s (x1518 over 5h34m) kubelet Back-off restarting failed container rag-copy-secret in pod rag-deploy-7457ffbcb-rp69w_euler-copilot(b01cdc39-8d08-4bd1-a857-a8475f3347b8)

[root@localhost scripts]# kubectl logs -n euler-copilot rag-deploy-7457ffbcb-rp69w -c rag-copy-secret --previous
Traceback (most recent call last):
File “/app/./main.py”, line 18, in
with config.open(“r”) as f:
File “/usr/lib64/python3.9/pathlib.py”, line 1252, in open
return io.open(self, mode, buffering, encoding, errors, newline,
IsADirectoryError: [Errno 21] 是一个目录: ‘config.yaml’

明显部署的环境有问题

同求 镜像拉取失败 所有都ok了 版本24.03 LTS SP1就缺这一个
35m Warning Failed pod/framework-deploy-99d94f7bb-l7sls Failed to pull image “Harbor”: rpc error: code = NotFound desc = failed to pull and unpack image “Harbor”: failed to resolve reference “Harbor”: Harbor not found
35m Warning Failed pod/framework-deploy-99d94f7bb-l7sls Error: ErrImagePull
34m Normal BackOff pod/framework-deploy-99d94f7bb-l7sls Back-off pulling image “Harbor
34m Warning Failed pod/framework-deploy-99d94f7bb-l7sls Error: ImagePullBackOff

Harbor”: Harbor 未找到

缺euler-copilot-framework:0.9.6-x86