Перейти к содержанию

Поиск типовых неисправностей k8s

ControlPlane

Проверяем, что kubelet работает

systemctl status kubelet
Пример вывода:
[root@instance-master-1 bootsman]# systemctl status kubelet
 kubelet.service - kubelet: The Kubernetes Node Agent
     Loaded: loaded (/usr/lib/systemd/system/kubelet.service; disabled; vendor preset: disabled)
    Drop-In: /usr/lib/systemd/system/kubelet.service.d
             └─10-kubeadm.conf
     Active: active (running) since Sun 2023-10-15 20:02:26 MSK; 3 weeks 4 days ago
       Docs: https://kubernetes.io/docs/
   Main PID: 7034 (kubelet)
      Tasks: 12 (limit: 9495)
     Memory: 67.9M
        CPU: 9h 9min 2.203s
     CGroup: /system.slice/kubelet.service
             └─7034 /usr/bin/kubelet --bootstrap-kubeconfig=/etc/kubernetes/bootstrap-kubelet.conf --kubeconfig=/etc/kubernetes/kubelet.conf --config=/var/lib/kubelet/config.yaml --container-runtime-endpoint=unix:///var/run/containerd/containerd.sock

Kubelet должен быть запущен

Проверка работоспособности компонентов ControlPlane

Запускаем

crictl ps

Пример вывода:

[root@instance-master-1 bootsman]# crictl ps
CONTAINER           IMAGE               CREATED             STATE               NAME                      ATTEMPT             POD ID              POD
7b4104c3d04cc       770cd897072cf       3 weeks ago         Running             promtail                  0                   365e5927d3842       loki-promtail-msv9x
0413f403d2696       1dbe0e9319764       3 weeks ago         Running             node-exporter             0                   842b0138b9938       rancher-monitoring-prometheus-node-exporter-qwb4k
64b0a16b34cd2       bbd91fd54b288       3 weeks ago         Running             pushprox-client           0                   fc63dbef28fa5       pushprox-kube-scheduler-client-b7nf5
fbf4f748fd336       bbd91fd54b288       3 weeks ago         Running             pushprox-client           0                   e2fbef708b9f1       pushprox-kube-etcd-client-sjthv
2a5cb5aea628f       bbd91fd54b288       3 weeks ago         Running             pushprox-client           0                   28e0982488730       pushprox-kube-controller-manager-client-s65nb
d7ac2a9bf51e0       ead0a4a53df89       3 weeks ago         Running             coredns                   0                   af506bca5c1a6       coredns-b4bf48566-rz4xn
8096b0477c64e       ead0a4a53df89       3 weeks ago         Running             coredns                   0                   cffa19f90e6cf       coredns-b4bf48566-5mhh4
fc2248a358ac6       d00a7abfa71a6       3 weeks ago         Running             cilium-agent              0                   4a72e9f6420a0       cilium-lkhwc
7566d5669fcb5       88429d3e5d05e       3 weeks ago         Running             cilium-operator           0                   2a3b675d452e5       cilium-operator-777ddbc998-fprsq
4a9093098d9c3       09067696476ff       3 weeks ago         Running             kube-vip                  0                   1096be86add48       kube-vip-instance-master-1.kobik-personal
1f7b3c72ca6ed       86b6af7dd652c       3 weeks ago         Running             etcd                      0                   aaaaaa125e89e       etcd-instance-master-1.kobik-personal
4b2c95aeaf6f6       f466468864b7a       3 weeks ago         Running             kube-controller-manager   0                   153da6ce2ed91       kube-controller-manager-instance-master-1.kobik-personal
9f444dde5cfc6       98ef2570f3cde       3 weeks ago         Running             kube-scheduler            0                   74180ee0dd901       kube-scheduler-instance-master-1.kobik-personal
7b3cbbd00bea6       e7972205b6614       3 weeks ago         Running             kube-apiserver            0                   4c2d9a7e758e1       kube-apiserver-instance-master-1.kobik-personal
Эти контейнеры должны быть в состоянии running:

  • kube-apiserver
  • kube-vip
  • cilium-agent
  • kube-scheduler
  • kube-controller-manager

Worker

Проверяем, что kubelet работает

systemctl status kubelet
Пример вывода:
[root@instance-master-1 bootsman]# systemctl status kubelet
 kubelet.service - kubelet: The Kubernetes Node Agent
     Loaded: loaded (/usr/lib/systemd/system/kubelet.service; disabled; vendor preset: disabled)
    Drop-In: /usr/lib/systemd/system/kubelet.service.d
             └─10-kubeadm.conf
     Active: active (running) since Sun 2023-10-15 20:02:26 MSK; 3 weeks 4 days ago
       Docs: https://kubernetes.io/docs/
   Main PID: 7034 (kubelet)
      Tasks: 12 (limit: 9495)
     Memory: 67.9M
        CPU: 9h 9min 2.203s
     CGroup: /system.slice/kubelet.service
             └─7034 /usr/bin/kubelet --bootstrap-kubeconfig=/etc/kubernetes/bootstrap-kubelet.conf --kubeconfig=/etc/kubernetes/kubelet.conf --config=/var/lib/kubelet/config.yaml --container-runtime-endpoint=unix:///var/run/containerd/containerd.sock

Kubelet должен быть запущен

Просмотр логов kubelet

Выполняем команду

journalctl -u kubelet

ETCD

Проверка работы контейнера

Выполняем команду

crictl ps --name etcd
Пример вывода
[root@instance-master-1 bootsman]# crictl ps --name etcd
CONTAINER           IMAGE               CREATED             STATE               NAME                ATTEMPT             POD ID              POD
1f7b3c72ca6ed       86b6af7dd652c       3 weeks ago         Running             etcd                0                   aaaaaa125e89e       etcd-instance-master-1.kobik-personal

Просмотр логов ETCD

Получаем ID контейнера

crictl ps --name etcd
В первом столбце (CONTAINER) будет нужный ID
[root@instance-master-1 bootsman]# crictl ps --name etcd
CONTAINER           IMAGE               CREATED             STATE               NAME                ATTEMPT             POD ID              POD
1f7b3c72ca6ed       86b6af7dd652c       3 weeks ago         Running             etcd                0                   aaaaaa125e89e       etcd-instance-master-1.kobik-personal
Далее запрашиваем логи:
crictl logs 1f7b3c72ca6ed

Longhorn

Проверка деплоймента

Выполняем команду

kubectl get po -n longhorn-system
Все поды должны работать, пример вывода:
[root@instance-worker-1 bootsman]# kubectl get po -n longhorn-system
NAME                                                READY   STATUS    RESTARTS   AGE
csi-attacher-76cfbcc684-7mcs9                       1/1     Running   0          25d
csi-attacher-76cfbcc684-82wc9                       1/1     Running   0          25d
csi-attacher-76cfbcc684-j6xgg                       1/1     Running   0          25d
csi-provisioner-7fdb5f4c6c-5m6gb                    1/1     Running   0          25d
csi-provisioner-7fdb5f4c6c-ns42z                    1/1     Running   0          25d
csi-provisioner-7fdb5f4c6c-qkn6l                    1/1     Running   0          25d
csi-resizer-7c4fc545-5c7d2                          1/1     Running   0          25d
csi-resizer-7c4fc545-6l8pp                          1/1     Running   0          25d
csi-resizer-7c4fc545-zhvjt                          1/1     Running   0          25d
csi-snapshotter-595bc9d4c7-5g8c7                    1/1     Running   0          25d
csi-snapshotter-595bc9d4c7-ghzzs                    1/1     Running   0          25d
csi-snapshotter-595bc9d4c7-l7wzt                    1/1     Running   0          25d
engine-image-ei-9619d2ae-6snnx                      1/1     Running   0          25d
engine-image-ei-9619d2ae-kk4x8                      1/1     Running   0          25d
engine-image-ei-9619d2ae-tz7vg                      1/1     Running   0          25d
instance-manager-bb4a3b2ff0ded1fa189e5c3f2f3aea72   1/1     Running   0          25d
instance-manager-e33c4122c6efaf2a01cfd66cad4bf6eb   1/1     Running   0          25d
instance-manager-f8931f593fcb7288c3f4777469cd7901   1/1     Running   0          25d
longhorn-csi-plugin-9z96f                           3/3     Running   0          25d
longhorn-csi-plugin-clgjg                           3/3     Running   0          25d
longhorn-csi-plugin-r6949                           3/3     Running   0          25d
longhorn-driver-deployer-77fbb76899-xgt6j           1/1     Running   0          25d
longhorn-manager-8wssz                              1/1     Running   0          25d
longhorn-manager-q5pvv                              1/1     Running   0          25d
longhorn-manager-v6s4k                              1/1     Running   0          25d
longhorn-ui-799696dd6c-54wrn                        1/1     Running   0          25d
longhorn-ui-799696dd6c-rjphr                        1/1     Running   0          25d

Проверка серверов

На воркерах проверьте наличие и работоспособность демона, выполнив команду

systemctl status iscsid
Пример вывода с работающим демоном
[root@instance-worker-0 bootsman]# systemctl status iscsid
 iscsid.service - Open-iSCSI
     Loaded: loaded (/usr/lib/systemd/system/iscsid.service; enabled; vendor preset: disabled)
     Active: active (running) since Sun 2023-10-15 19:55:08 MSK; 3 weeks 4 days ago
TriggeredBy:  iscsid.socket
       Docs: man:iscsid(8)
             man:iscsiuio(8)
             man:iscsiadm(8)
   Main PID: 4022 (iscsid)
     Status: "Ready to process requests"
      Tasks: 1 (limit: 9495)
     Memory: 2.4M
        CPU: 8ms
     CGroup: /system.slice/iscsid.service
             └─4022 /usr/sbin/iscsid -f