AWS Archives - Site Reliability Engineer Blog

Build Kubernetes control plane image with Packer

Steps to prepare single control plane image is quite simple:

Prepare Docker and Kubernetes packages and settings
Execute kubeadm bootstrap script when EC2 start up first time

One unanswered question is: How to add additional control plane nodes and worker nodes which required tokens and certificates to be preset when joining the cluster?

Practical guide to Kubernetes Certified Administration exam

I have published practical guide to Kubernetes Certified Administration exam https://github.com/vorozhko/practical-guide-to-kubernetes-administration-exam

Covered topics so far are:

Share your efforts

If your are also working on preparation to Kubernetes Certified Administration exam lets combine our efforts by sharing the practical side of exam.

Disaster recovery of single node Kubernetes control plane

Overview

There are many possible root causes why control plane might become unavailable. Lets review most common scenarios and mitigation steps.

Mitigation steps in this article build around AWS public cloud features, but all popular public cloud offerings have similar functionality.

Apiserver VM shutdown or apiserver crashing

Results

unable to stop, update, or start new pods, services, replication controller
existing pods and services should continue to work normally, unless they depend on the Kubernetes API

120 Days of AWS EKS in Staging

Felix Georgii wakeboarding at Wake Crane Project in Pula, Croatia on September 25, 2016

My journey with Kubernetes started with Google Kubernetes Engine then one year later with self managed kuberntes and then with migration to Amazon EKS.

EKS as a managed kubernetes cluster is not 100% managed. Core tools didn’t work as expcted. Customers expectation was not aligned with functions provided. Here I have summarized all our experience we gained by running EKS cluster in Staging.

To run EKS you still have to:

Prepare network layer: VPC, subnets, firewalls…
Install worker nodes
Periodically apply security patches on workers nodes
Monitor worker nodes health by install node problem detector and monitoring stack
Setup security groups and NACLs
and more

Production Readiness Checklist

In bookmarks – production readiness checklist.
Very good job by Gruntwork !

Amazon EKS Workshop

I really hesitated to post about Amazon EKS workshop, but this one is really looking good and it is free and open source!

Going through workshop you will learn how:

Setup manager kubernetes cluster in AWS
Spin up services and deploy micro services
Autoscaling
Deployment with Helm charts
Continuous integration and continuous deployment
Monitoring and logging
Istio service mesh

AWS Transit Gateway to Simplify Your Network Architecture

This is how your network architecture probably looks now

this is how it can be with Transit Gateway

Transit gateway was one of the missing peaces to full fill network administration needs.

Amazon EKS IAM based authentication explained

One image worth 1000 words!

Read full article by Daniel Weibel on how authentication and authorization works on amazon EKS.

Terraform remote state and state locking

Terraform remote state and state locking is important part in team collaboration. What are challenges when working on Terraform in a team:
1. how to synchronize terraform state between people
2. how to avoid collisions of running terraform at the same time

Terraform remote state

Terraform remote state is a mechanism to share state file by hosting it on a shared resource like aws s3 bucket or consul server.

Example of storing state in s3 bucket.

terraform {
  backend "s3" {
    bucket         = "mybucket-terraform-state-file"
    key            = "example/terraform.tfstate"
    region         = "us-east-1"
    encrypt        = true
  }
}

Bucket have to be created beforehand. You the key to separate states from difference modules and projects.

Terraform state locking

Terraform locking state isolate state changes. As soon as lock is acquired by terraform plan or apply no other terraform plan/apply command will succeed until lock is released.

To store lock in dynamodb table you need:
– Create dynamodb table in your aws account in the same region as specified in your terramform backend configuration (us-east-1 in our case)
– primary key must have a name LockID without it locking will not work

terraform {
  backend "s3" {
    bucket         = "mybucket-terraform-state-file"
    key            = "example/terraform.tfstate"
    region         = "us-east-1"
    encrypt        = true
    dynamodb_table = "terraform_example_lock"
  }
}

Note that terraform provide a way to disable locking from command line using -lock=false flag, but it is not recommended.

One improvement is to set “key” keyword as variable and base it on module or project, so you don’t have to set it manually. One caveat with that approach is to make sure they key is unique across projects.

Best,
Iaroslav

AWS best practices for Lambda functions in Production

Hey folks,

just a month ago I have been involved in AWS project based on Lambda functions. In this article I will explain what I learned so far and how to create production Lambda AWS environment with best practices in mind.

I will start from top level and will explain everything you need to have basic infrastructure supporting your Lambda functions and other applications in your cloud.

VPC

First, you need to created dedicated VPC and reserve range of IPs which doesn’t conflict with your other networks in case you would need to pair them together. As a general rule you should never use default VPC for production needs.
Create a security group which only allow 80 and 443 incoming traffic.

Subnets

You would need at least 4 subnets, two private and two public. Each type of subnet have to split in at least two different availability zones.
Public subnet have to contain AWS services endpoints and your servers which needs to have direct connection to internet like ELB, API gateway endpoints or bastion host (your ssh jump server).
Private subnet have to contain all your infrastructure servers like web servers, database server or backend applications.

Note that You should never place your infrastructure servers in public subnets.

Internet gateway and NAT

To function properly your VPC have to be attached to internet gateway and your private subnet should have NAT service enabled.

MySQL

For the database I use MySQL RDS. You need to disable public access to the instance and deploy it into private subnet. In security group add port 3306 for incoming connections and only from internal IP range. So, we have double protection here with security group and internal dns name for database.
There are a lot of best practices of how to setup production ready mysql instance, so I will skip most of it, but what you definitely need is to have read replica and shadow copy enabled. Make sure you set maintenance window which is right for you.

Lambda functions

To have access to our private database Lambda functions needs to be deployed inside the same VPC in private subnets. To setup https endpoints for lambda functions you would need to attach API gateway. In Lambda security groups add ports 80 and 443 for incoming connections.

That’s pretty much it, but very often you will have other web applications running in your vpc and to route traffic properly between Lambda and other apps you would need some web proxy like nginx.

Nginx

To have common entry point for your web applications and Lambda function Nginx is the best way to go. There is a new possibility to use ELB for that, but it isn’t good enough yet.

To have reliable and secure setup of nginx you would need to use common pattern of AWS which include: ELB, Autoscaling group, Launch configuration and security groups.

On the configuration side nginx will proxy traffic to Lambda functions through API gateway.

Elastic load balancer

Here you need to decide what kind of ELB suits your needs. I choose ELB with HTTPS support which provide SSL termination. In the ELB security group I added ports 80 and 443 for all incoming traffic.

Launch configuration

Within Launch configuration you need to define what kind of instance you want to launch when autoscaling is trigger in.

Autoscaling group

ASG define what is desired number of instance you want to run at any given moment. Using metrics such as CPU you can setup it to scale up or down to desired maximum or minimum number of instances.

Almost there!

Last step is to connect ELB with ASG and with Launch configuration!

Note I have skipped setting up of Target group and health checks, but they are pretty much basics.

That’s it!

Now you have a good start to develop with AWS Lambda in conjunction with general approach of web tier architecture.

What’s next?

Second part of the topic is to setup CI and automation. Next time I will write how to code infrastructure with terraform, create nginx image with packer and run configuration management with ansible.