120 Days of AWS EKS in Staging - Site Reliability Engineer Blog

Felix Georgii wakeboarding at Wake Crane Project in Pula, Croatia on September 25, 2016

My journey with Kubernetes started with Google Kubernetes Engine then one year later with self managed kuberntes and then with migration to Amazon EKS.

EKS as a managed kubernetes cluster is not 100% managed. Core tools didn’t work as expcted. Customers expectation was not aligned with functions provided. Here I have summarized all our experience we gained by running EKS cluster in Staging.

To run EKS you still have to:

Prepare network layer: VPC, subnets, firewalls…
Install worker nodes
Periodically apply security patches on workers nodes
Monitor worker nodes health by install node problem detector and monitoring stack
Setup security groups and NACLs
and more

EKS Staging how to?

EKS Setup

Use terraform EKS module or eksctl to make installation and maintenance easier.

EKS Essentials

Install node problem detector to monitor for unforeseen kernel or docker issues
Scale up kube-dns to two or more instances
See more EKS core tips in 90 Days EKS in Production

EKS Autoscaling

Kubernetes cluster autoscaling is no doubt must have addition to EKS toolkit. Scale your cluster up and down to 0 instances if you wish. Base your scaling on cluster state of Pending/Running pods to get maximum from it.
Kubernetes custom metrics, node exporter and kube state metrics is must have to enable horizonal pod autoscaling based on build in metrics like cpu/memory and as well on application specific metrics like request rate or data throughput.
Prometheus and cadvisor is another addition you would need to enable metrics collection

Ingress controller

Istio one of the most advanced, but breaking changes and beta status might introduce hard to debug bugs
Contour looks like good replacement to Istio. It didn’t have that good community support as istio, but stable enough and has quite cool CRD IngressRoute which makes Ingress fun to use
Nginx ingress is battle tested and has the best support from community. Have huge number of features, so is a good choice to setup the most stable environment

Statefull applications

Ensure you have enough nodes in each AZ where data volumes are. Good start is to create dedicated node group for each AZ with minimum number of nodes needed.
Ensure persistent volume claim(PVC) is created in desired AZ. Create dedicated storage class for specific AZ you need PVC to be in. See allowedTopologies in following example.

kind: StorageClass
apiVersion: storage.k8s.io/v1
metadata:
  name: standard-eu-west1-a
provisioner: kubernetes.io/aws-ebs
parameters:
  type: gp2
volumeBindingMode: WaitForFirstConsumer
allowedTopologies:
- matchLabelExpressions:
  - key: failure-domain.beta.kubernetes.io/zone
    values:
    - eu-west1-a

Summary

EKS is a good managed Kubernetes service. Some of mentioned tasks are common for all Kubernetes platforms, but there is a lot of space to grow for the better service. The burden for maintenance is still quite high, but fortunately Kubernetes ecosystem has a lot of opensource tools to easy it.

Have fun!