Backup and restore of Etcd cluster

Kubernetes disaster recovery plan is usually consist of backing up etcd cluster and having infrastructure as a code to provision new set of servers in the cloud. Let’s see how to do first – backup etcd in two basic and easy ways.

Etcd backup

The only stateful component of Kubernetes cluster is etcd server. The etcd server is where Kuberenetes store all API objects and configuration.
Backing up this storage is sufficient for complete recovery of Kubernetes cluster state.

Backup with etcdctl

etcdctl is command line tool to manage etcd server and it’s date.
command to make a backup is:

Making a backup

ETCDCTL_API=3 etcdctl --endpoints $ENDPOINT snapshot save snapshot.db

command to restore snapshot is:

ETCDCTL_API=3 etcdctl snapshot restore snapshot.db

Note: For https endpoints you might need to specify paths to certificate keys in order to access etcd server api with etcdctl.

Store backup at remote storage

It’s important to backup data on remote storage like s3. It’s guarantee that a copy of etcd data will be available even if control plane volume is unaccessible or corrupted.

  • Make an s3 bucket.
  • Copy snapshot.db to s3 with new filename
  • Setup s3 object expiration to clean up old backup files
# new s3 bucket for etcd backups
aws s3 mb etcd-backup
# define a backup filename based on current date and time
filename=`date +%F-%H-%M`.db
aws s3 cp ./snapshot.db s3://etcd-backup/etcd-data/$filename
# set backup life cycle configuration for backup files rotation
aws s3api put-bucket-lifecycle-configuration --bucket my-bucket --life
cycle-configuration  file://lifecycle.json

Example of lifecycle.json which transition backups to s3 Glacier:

{
              "Rules": [
                  {
                      "ID": "Move rotated backups to Glacier",
                      "Prefix": "etcd-data/",
                      "Status": "Enabled",
                      "Transitions": [
                          {
                              "Date": "2015-11-10T00:00:00.000Z",
                              "StorageClass": "GLACIER"
                          }
                      ]
                  },
                  {
                      "Status": "Enabled",
                      "Prefix": "",
                      "NoncurrentVersionTransitions": [
                          {
                              "NoncurrentDays": 2,
                              "StorageClass": "GLACIER"
                          }
                      ],
                      "ID": "Move old versions to Glacier"
                  }
              ]
          }

Simplify etcd backup with Velero

Velero is powerful Kubernetes backup tool. It simplify many operation tasks.
As a result using Velero it’s easier to:

  • Choose what to backup(objects, volumes or everything)
  • Choose what NOT to backup(e.g. secrets)
  • Schedule cluster backups
  • Store backups on remote storage
  • Fast disaster recovery process

Install and configure Velero

1)Download latest version at Velero github page

2)Create AWS credential file:

[default]
aws_access_key_id=<your AWS access key ID>
aws_secret_access_key=<your AWS secret access key>

3)Create s3 bucket for etcd-backups

aws s3 mb s3://kubernetes-velero-backup-bucket

4)Install velero to kubernetes cluster:

velero install --provider aws --plugins velero/velero-plugin-for-aws:v1.0.0 --bucket kubernetes-velero-backup-bucket --secret-file ./aws-iam-creds --backup-location-config region=us-east-1 --snapshot-location-config region=us-east-1

Note: we use s3 plugin to access remote storage. Velero support many different storage providers. See which works for you best.

Schedule automated backups

1)Schedule daily backups:

velero schedule create <SCHEDULE NAME> --schedule "0 7 * * *"

2)Create a backup manually:

velero backup create <BACKUP NAME>

Disaster Recovery with Velero

Note: You might need to re-install Velero in case of full etcd data loss.

When Velero is up disaster recovery process are simple and straightforward:

1)Update your backup storage location to read-only mode

kubectl patch backupstoragelocation <STORAGE LOCATION NAME> \
    --namespace velero \
    --type merge \
    --patch '{"spec":{"accessMode":"ReadOnly"}}'

By default, <STORAGE LOCATION NAME> is expected to be named default, however the name can be changed by specifying --default-backup-storage-location on velero server.

2)Create a restore with your most recent Velero Backup:

velero restore create --from-backup <SCHEDULE NAME>-<TIMESTAMP>

3)When ready, revert your backup storage location to read-write mode:

kubectl patch backupstoragelocation <STORAGE LOCATION NAME> \
   --namespace velero \
   --type merge \
   --patch '{"spec":{"accessMode":"ReadWrite"}}'

Conclusions

  • Kubernetes cluster with infrequent change to API server is great choice for single control plane setup.
  • Frequent backups of etcd cluster will minimize time window of potential data loss.