How to check Kubernetes cluster health?

There are some essential tools to have a quick look at Kubernetes cluster health. Let’s review them here. As a result you would be able quickly tell if cluster has any obvious issues.

Install node problem detector

node-problem-detector aims to make various node problems visible to the upstream layers in cluster management stack. It is a daemon which runs on each node, detects node problems and reports them to apiserver.

kubectl apply -f https://k8s.io/examples/debug/node-problem-detector.yaml

Use node-problem detector in conjunction with drainer daemon. So, to quickly replace unhealthy nodes. Learn more about it at Monitor Node Health.

Kubernetes cluster info

To see if kubectl connect to master and master is running and on which port use kubectl cluster-info. To debug cluster state use kubectl cluster-info dump as a result it will print full cluster state including pod logs to stdout, but you can setup output to a directory.

kubectl cluster-info

Kubernetes master is running at https://10.0.0.10:6443
KubeDNS is running at https://10.0.0.10:6443/api/v1/namespaces/kube-system/services/kube-dns:dns/proxy
Metrics-server is running at https://10.0.0.10:6443/api/v1/namespaces/kube-system/services/https:metrics-server:/proxy

Nodes information

Get extended output of node information. Pay attention to STATUS, ROLES, AGE and IP columns. So, you see that ip addresses is the one which works in your network and able to communicate with each other. Also, nodes age is a kind of uptime for node, it could tell if nodes are stable enough – very useful if you use spot instances.

kubectl get nodes -o wide

NAME      STATUS   ROLES    AGE   VERSION   INTERNAL-IP   EXTERNAL-IP   OS-IMAGE             KERNEL-VERSION      CONTAINER-RUNTIME
master    Ready    master   11h   v1.18.2   10.0.0.10     <none>        Ubuntu 18.04.4 LTS   4.15.0-99-generic   docker://19.3.8
worker1   Ready    <none>   11h   v1.18.2   10.0.0.11     <none>        Ubuntu 18.04.4 LTS   4.15.0-99-generic   docker://19.3.8
worker2   Ready    <none>   11h   v1.18.2   10.0.0.12     <none>        Ubuntu 18.04.4 LTS   4.15.0-99-generic   docker://19.3.8

API Component statuses

Status of the most important component of Kubernetes cluster apart from apiserver could be retrieved using get componentstatusescommand.

kubectl get componentstatuses
NAME                 STATUS    MESSAGE             ERROR
scheduler            Healthy   ok                  
controller-manager   Healthy   ok                  
etcd-0               Healthy   {"health":"true"} 

Pods statuses

Checking for not running pods with extended output could help you understand if there are any commonalities between failed pods like they are all at the same node or they all are belong to same availability zone.

kubectl get pods -o wide --all-namespaces |grep -v " Running "

Retrieve cluster events

Check events from all namespaces sorted by timestamp. As a result you will see how the state of the cluster have been changed for past two hours. Events are stored only for two hours to prevent apiserver from disk overload.

kubectl get events --all-namespaces --sort-by=.metadata.creationTimestamp

Api server health

You can check api server health using healthz endpoint which return HTTPS status 200 and message ‘ok’ when it’s healthy. So, you can keep an eye on the pulse of the cluster using simple tools like pingdom or nagios.

curl -k https://api-server-ip:6443/healthz
ok

Multi node Kubernetes cluster on Vagrant

This is fast and easy way to install Kubernetes on Vagrant with Metrics server addon.

git clone https://github.com/vorozhko/practical-guide-to-kubernetes-administration-exam
cd vagrant/kubernetes
vagrant up

At this point you would have one master node and two worker nodes ready.

Lets check cluster health

vagrant ssh master

kubectl get nodes
NAME      STATUS   ROLES    AGE     VERSION
master    Ready    master   2m46s   v1.18.2
worker1   Ready    <none>   35s     v1.18.2
worker2   Ready    <none>   32s     v1.18.2

All nodes are ready.

Lets install Metrics server addon

kubectl apply -f https://github.com/kubernetes-sigs/metrics-server/releases/download/v0.3.6/components.yaml

Update metrics server startup flags to solve nodes name resolution issue

kubectl -n kube-system edit deployment metrics-server

#Add following settings to metrics-server start command
- --kubelet-preferred-address-types=InternalIP,Hostname,InternalDNS,ExternalDNS,ExternalIP
- --kubelet-insecure-tls

At this point Metrics server is installed.

After about few minutes of collecting data you should see:

kubectl top node
NAME      CPU(cores)   CPU%   MEMORY(bytes)   MEMORY%   
master    264m         13%    1091Mi          57%       
worker1   109m         5%     746Mi           39%       
worker2   109m         5%     762Mi           40% 

kubectl top pod
NAME                                       CPU(cores)   MEMORY(bytes)   
calico-kube-controllers-75d56dfc47-bdsxr   1m           5Mi             
calico-node-rvqwp                          20m          23Mi            
calico-node-thtd4                          31m          25Mi            
calico-node-vkhgs                          23m          22Mi            
coredns-66bff467f8-x68zs                   4m           5Mi             
coredns-66bff467f8-z7kzh                   4m           10Mi            
etcd-master                                22m          39Mi            
kube-apiserver-master                      52m          352Mi           
kube-controller-manager-master             18m          55Mi            
kube-proxy-tdwpf                           1m           18Mi            
kube-proxy-wvsb9                           1m           8Mi             
kube-proxy-zfd2c                           1m           9Mi             
kube-scheduler-master                      5m           23Mi            
metrics-server-7c557b6b9f-h4hz2            1m           11Mi

Build Kubernetes control plane image with Packer

Steps to prepare single control plane image is quite simple:

  • Prepare Docker and Kubernetes packages and settings
  • Execute kubeadm bootstrap script when EC2 start up first time

One unanswered question is: How to add additional control plane nodes and worker nodes which required tokens and certificates to be preset when joining the cluster?

Continue reading Build Kubernetes control plane image with Packer

Practical guide to Kubernetes Certified Administration exam

I have published practical guide to Kubernetes Certified Administration exam https://github.com/vorozhko/practical-guide-to-kubernetes-administration-exam

Covered topics so far are:

Share your efforts

If your are also working on preparation to Kubernetes Certified Administration exam lets combine our efforts by sharing the practical side of exam.

Disaster recovery of single node Kubernetes control plane

Overview

There are many possible root causes why control plane might become unavailable. Lets review most common scenarios and mitigation steps.

Mitigation steps in this article build around AWS public cloud features, but all popular public cloud offerings have similar functionality.

Apiserver VM shutdown or apiserver crashing

Results

  • unable to stop, update, or start new pods, services, replication controller
  • existing pods and services should continue to work normally, unless they depend on the Kubernetes API
Continue reading Disaster recovery of single node Kubernetes control plane

Thoughts on High available Kubernetes cluster with single control plane node

Why single node control plane?

Benefits are:

  • Monitoring and alerting are simple and on point. It reduce the number of false positive alerts.
  • Setup and maintenance are quick and straightforward. Less complex install process lead to more robust setup.
  • Disaster recovery and recovery documentation are more clear and shorter.
  • Application will continue to work even if Kubernetes control plane is down.
  • Multiple worker nodes and multiple deployment replicas will provide necessary high availability for your applications.

Disadvantages are:

  • Downtime of control plane node make it impossible to change any Kubernetes object. For example to schedule new deployments, update application configuration or to add/remove worker nodes.
  • If worker node goes down during control plane downtime when it will not be able to re-join the cluster after recovery.

Conclusions:

  • If you have a heavy load on Kubernetes API like frequent deployments from many teams then you might consider to use multi control plane setup.
  • If changes to Kubernetes objects are infrequent and your team can tolerate a bit of downtime when single control plane Kubernetes cluster can be great choice.

Go http middleware chain with context package

Middleware is a function which wrap http.Handler to do pre or post processing of the request.

Chain of middleware is popular pattern in handling http requests in go languge. Using a chain we can:

  • Log application requests
  • Rate limit requests
  • Set HTTP security headers
  • and more

Go context package help to setup communication between middleware handlers.

Continue reading Go http middleware chain with context package

How to enable minikube kvm2 driver on Ubuntu 18.04

Verify kvm2 support

Confirm virtualization support by CPU

 egrep -c ‘(svm|vmx)’ /proc/cpuinfo

An output of 1 or more indicate that CPU can use virtualization technology.

sudo kvm-ok

Output “KVM acceleration can be used. ” indicate that the system has virtualization enabled and KVM can be used.

Continue reading How to enable minikube kvm2 driver on Ubuntu 18.04

Istio sidecar injection

There are several ways to inject istio sidecar configuration into Pods. For example: automated injection, YAML/JSON deployment update, using Helm or Kustomize and update of existing live deployment. We will look into each of them.

Automatic Sidecar injection

Istio uses ValidatingAdmissionWebhooks for validating Istio configuration and MutatingAdmissionWebhooks for automatically injecting the sidecar proxy into user pods.

For automatic side car injection to work admissionregistration.k8s.io/v1beta1 should be enabled:

$ kubectl api-versions | grep admissionregistration.k8s.io/v1beta1
admissionregistration.k8s.io/v1beta1

Step two is to verify MutatingAdmissionWebhook and ValidatingAdmissionWebhook plugins are listed in the kube-apiserver –enable-admission-plugins. That can be done by cluster administrators.

Continue reading Istio sidecar injection

How to organize Namespaces in Kubernetes

There are two main objectives:

  1. Users are able to do their job with the highest velocity possible
  2. Users organized by groups in multi tenant setup 

Multi tenancy

Kubernetes namespaces help to setup boundaries between groups of users and applications in a cluster.
To make it more pleasant and secure for your users to work in shared cluster Kubernetes has a number of policies and controls.

Access policies

RBAC primary objective is authorize users and applications to do specific operations in the namespace or in whole cluster. Use RBAC to give your users enough permissions in the namespace, so they can do day to day operations on their own.
Network Policy control how pods can communicate with each other. Use it to firewall traffic between namespaces or inside namespace to critical components like Databases.

Continue reading How to organize Namespaces in Kubernetes