Etcd and SkyDNS on UpCloud.

16 minute read

This article is part 5 in a series: Server setup on UpCloud



What is a DNS server

A DNS (short for Domain Name System) is a system which provides machines and services a domain name and let other machines and services query the said system to find out which ip address is connected to which service or machine.
In this post, I will go through how to set up a dedicated DNS for the internal cluster. Kubernetes have a built in dns server which is used to give each and every service running in the cluster its own domain name, but in this case, we want it for all our machines.
When a new machine connects to the cluster, it will be provided a domain name, so that each certificate provided can be set to a given wildcard domain instead of a ip-range or specific ip addresses. We use our own certificates and CA for this, so it makes it a lot easier to manage. There are more positive things with this, but the above is very important in our case.

Installing SkyDNS and Etcd

For DNS we use a program called SkyDNS, it’s a package which uses etcd as a backend for service discovery, so when we add a new domain we just put it into the etcd storage and it will be used in the lookup process.
Initially we need to install Etcd. And to make all the installation parts as easy as possible, I have chosen to use the latest 18.04 version of Ubuntu in this tutorial, as it have both etcd and skydns in its repositories already (I know, I know, CoreOS would be a perfect fit for this, but I have very little CoreOS experience, so I will stick to Ubuntu for now!).

The following command is due to the server version a working way of installing the services:

apt-get install etcd
apt-get install skydns

SkyDNS and Etcd should of course run on a server, I prefer to use a dedicated server for only this, if we have a huge cluster, the servers should be split up with a dedicated etcd cluster for this and possibly a single machine for skydns. But in this case, a small instance will be enough, as we don’t have that much traffic between the servers (yet).

Initially we need to set up a server, and for that, we use Terraform!

The server setup we use is quite identical to earlier examples, it’s a 1xcpu-1gb plan, the zone is defined in the variables file and such.
We set the server resource name to skydns and the hostname will become dns-0, which is easy to remember!

Note that the storage key in the devices is set to the 18.04 ubuntu version.

resource "upcloud_server" "skydns" {
  count = 1

  zone     = "${var.zone}"
  hostname = "dns-${count.index}"

  plan               = "1xCPU-1GB"
  private_networking = true
  ipv4               = true
  ipv6               = false

  login {
    user            = "root"
    keys            = "${var.ssh_keys}"
    create_password = false
  }

  storage_devices = [
    {
      tier    = "maxiops"
      size    = 25
      action  = "clone"
      storage = "Ubuntu Server 18.04 (Bionic Beaver)"
    },
  ]

  provisioner "remote-exec" {
    inline = [
      "apt-get -qq update",
      "apt-get -qq install python -y",
    ]

    connection {
      host        = "${self.ipv4_address}"
      type        = "ssh"
      user        = "root"
      private_key = "${file(var.ssh_key_private)}"
    }
  }
}

With the server created and python installed we can move on to installing the actual software.

The first thing we need to install is the Etcd service. Etcd is a very fast key-value storage which can do a whole lot of lookups in short times, it’s perfect as a backend for this type of things.

Requirements

Before we can begin it all, we need certificates, if you do not know how to create a CA and issue certificates, I will try to go through that in a later post, for now, go search for it and you will likely find a few good examples (I use cfssl to create certificates easily).
We want the following certificates generated:

Root

This is the certificate (ONLY THE PUBLIC) of the CA, this is to allow all the machines to recognize a properly signed certificate.

Server

The server certificate (server.crt, server.key) requires the following usages:

  • Signing
  • Key Encipherment
  • Server Auth

Client

The client certificate (client.crt, client.key) requires the following usages:

  • Signing
  • Key Encipherment
  • Client Auth

Peer

The peer certificate (peer.crt, peer.key) requires the following usages:

  • Signing
  • Key Enchipherment
  • Client Auth
  • Server Auth

The following is the configuration I use for my test servers:

{
    "signing": {
        "profiles": {
            "server": {
                "expiry": "43800h",
                "usages": [
                    "signing",
                    "key encipherment",
                    "server auth"
                ]
            },
            "client": {
                "expiry": "43800h",
                "usages": [
                    "signing",
                    "key encipherment",
                    "client auth"
                ]
            },
            "peer": {
                "expiry": "43800h",
                "usages": [
                    "signing",
                    "key encipherment",
                    "server auth",
                    "client auth"
                ]
            }
        }
    }
}

When generating the certificates we aught to be sure to set the CN and possibly the hosts/SANs to the correct values.
The domain name I choose to use in this example is example.tdl, so we set the CN to *.example.tdl on each of our certificates.

Moving and Installing certificates

When the server is running, it needs a way to know that a certificate is signed with the correct authority.
The certificate that is able to allow for the verification is the root.ca file. This is also the only certificate that we have to actually “install” on the server, so it have to be moved to the /usr/local/share/ca-certificates directory of the server and then we need to update the certificate storage of the machine with the /usr/sbin/update-ca-certificates command.

As this is a very common thing for us to do, we might as well thread into the concept of roles in ansible, because we will use that for the common tasks.

Ansible roles

A role in ansible is a set of commonly used features that a playbook can use on a given resource. Roles have a pre-defined common file structure which looks like this:

playbook.yml
  /roles/
    /rolename/
      /files/
      /vars/
      /tasks/
      /templates/
      /handlers/
      /meta/

If a playbook uses a role, all the tasks in the role will be ran, the handlers will be added and the files and templates will be used at will.
In this case, we will only care about the files and tasks directories, so the others we don’t even need to add. Our role we set to common

In the files directory, we put the keys and certs that we can be sure that all the machines will use: client, root. And then we add a new YML file named main.yml in the tasks directory.
At this point, we should have a structure as this:

playbook.yml
  /roles/
    /common/
      /files/
        root.crt
        client.crt
        client.key
      /tasks/
        main.yml

Now we need to write some tasks. When a role is implemented, there is no need for “relative” paths, we can always use the roles directory as the first one, so when a file is copied, instead of going ../files/client.crt we use "{{ role_path }}"/files/client.crt.
In this specific role, we know that we will want to add the root certificate to the machine and also copy the client certificate to some directory, this could easily be done with a few copy clauses, but I wanted to show how to use loops, lists and maps in this case, so the first task in the main.yml file will be creation of the directories needed and the second task the copying of the certificates.

- name: Create certificate directories.
  file:
    path: "{{ item }}"
    state: directory
    mode: 0600
  with_items:
    - /usr/local/share/ca-certificates
    - /home/root/.certs

- name: Add certificates to machine.
  copy:
    src: "{{ role_path }}/files/{{ item.file }}"
    dest: "{{ item.dest }}"
    owner: root
    mode: 0600
  loop:
    - { file: "root.crt", dest: "/usr/local/share/ca-certificates/" }
    - { file: "client.crt", dest: "/home/root/.certs/" }
    - { file: "client.key", dest: "/home/root/.certs/" }

In the first task, instead of stating the directories in the path and use two different tasks to create them, we use a template placeholder ("{{ item }}"), by doing that we are allowed to add a with_items list which we populate with the directories. Then when running the ansible file, the directories will be created in the same task, making it noticeable quicker.

The second task uses a similar type of template placeholder, but in this case, we use a map. The map contains a file key and a dest key, each with our cert and our destination, the "{{ item.file }}" placeholder will in each iteration be replaced with the current objects file value, while the dest will be replaced with the dest value. Three loops, for each a new cert will be added, but only one task.

When we have copied said certificates to the machine we will want to let the machine add the root certificate to its ca-certificates storage. Easiest way of doing it on ubuntu is with a shell command:

- name: Import root cert.
  shell: /usr/sbin/update-ca-certificates

Now the machine will know which certificates that are safe, and we have our first role!

Installing the required software

The software that we need installed on the server is skydns and etcd. Due to using ubuntu, we will use the ones in the apt repository right away to make it as easy as possible. If you want the absolutely latest version of both programs, you can compile them yourself, but I won’t go through that in this part.

When we create the playbook, we should attach the newly created role in the initial part, the following will be sufficient for this tutorial:

---
- name: Etcd and SkyDns installation.
  become: true
  hosts: all
  gather_facts: yes # I will explain this further down.
  roles:
    - common # This is the newly created role.
  tasks:
    ...

When a role is added, its tasks will run before the tasks defined in the playbook. You can use the pre_tasks or post_tasks clauses if you wish to specify a bit more how and when stuff runs.

When gather_facts is set to true, ansible will collect a bunch of information about the remote machine. It’s possible to filter the stuff it collects, making it faster, but in this tutorial we stick to collecting them all.

- name: Install etcd and skydns
  apt:
    name: [ etcd, skydns ]
    state: present

In the above snippet, we use a list instead of running the installation twice, that is pretty much the same as running:

apt install etcd skydns
# Instead of
apt install etcd
apt intall skydns

Which, of course, is a bit swifter.

Installing the programs is not enough, we need to set them up, and we especially want to run them as services, not directly in the terminal. So, using ubuntu, we create a couple of files… service files!

First: etcd.service.j2 (as you might notice, it’s a j2 or jninja2 file, which is the filetype that the ansible templates uses).

[Unit]
After=network.targets           # This part lets the system know that when starting, etcd shall start after networks are initialized.
Description=Etcd for our DNS!
StartLimitIntervalSec=5

[Serivce]         # This is the service definition, the important part!
LimitNOFILE=65536 # This part allows the etcd service to use up to 65536 open file descriptors.
Restart=always
RestartSec=1
Type=notify
User=root
# The following is the startup command, it's intended as a one line command, but to make it easier to read, 
# I usually put every argument on its own line and use the \ character at the end of each line to let the system
# know that it is not to be seen as a new line!
ExecStart=/usr/bin/etcd \
  --name={{ ansible_hostname }} \
  --data-dir=/home/root/etcd \ # This dir can be changed, but it's the one that we created before in the ansible script.
  --client-cert-auth \         # We here tell etcd that ALL clients requires certificates to authorize.
  --trusted-ca-file=/usr/local/share/ca-certificates/root.crt \ # This tells etcd that the certs for auth need to be signed by the root CA.
  --cert-file=/home/root/.certs/server.crt \  # And then we give etcd a set of cert and key to sign requests with.
  --key-file=/home/root/.certs/server.key \
  --advertise-client-urls=https://{{ ansible_eth1.ipv4.address }}:2379 \
  --listen-client-urls=https://{{ ansible_eht1.ipv4.address }}:2379 \
  --initial-cluster-state=new

[Install]
WantedBy=multi-user.target

By specifying the advertise and listen urls, and setting them to the internal ipv4 address of the machine, we make sure that only the private network is available to connect from, the 2379 port is the standard port for etcd to listen to clients on.

On to the skydns.service file:

[Unit]
Description=SkyDNS Service
After=etcd.service
StartLimitIntervalSec=5

[Service]
Type=simple
Restart=always
RestartSec=1
User=root
ExecStart=/usr/bin/skydns \
  -machines=https://{{ ansible_hostname }}.example.tdl:2379 \
  -tls-key=/home/root/.certs/client.key \
  -tls-pem=/home/root/.certs/client.crt \
  -ca-cert=/usr/local/share/ca-certificates/root.crt

[Install]
WantedBy=multi-user.target

This service is much alike the etcd service, the command points to the skydns program and we point the -machines argument to the local etcd server (by the URL the server will have). The certificates the service uses is the client certificates, the root certificate is also included, so that the responses from etcd can be verified.

Thanks to the certificates, all the communication between etcd and clients will be using TLS, this is good, it increases the security of the cluster a whole lot.

At this point you might wonder what the ansible_eht1.ipv4.address and ansible_hostname placeholders are.
Well, due to setting the gather_facts variable in the ansible playbook to yes, we are able to fetch the current machines hostname and ip address, on any interface. In this case we use the eth1 interface, as we know that is the private network. If you have more network interfaces, it might be worth checking before running the script!

Copying the services with ansible is nothing hard when using the template task, just make sure they are put in the correct directory!

- name: Copy service files.
  template:
    src: "/services/path/{{ item }}" # Change this to the path you wish to use for the configuration.
    dest: "/lib/systemd/system/{{ item }}"
    mode: 0600
  with_items: [ skydns.service, etcd.service ]
  # While we are at it, we add the internal ip of the server to the hosts file, this is 
  # because the dns wont be able to start without first adding some configs.
- name: Update hosts file.
  shell: "echo '{{ ansible_eth1.ipv4.address }} {{ ansible_hostname }}.example.tld' >> /etc/hosts"
  # We also make sure to update the systemctl daemon, as the etcd file it already had now is changed!
- name: Reload systemctl
  shell: "systemctl daemon-reload"
  # and NOW we start the etcd server.
- name: Restart etcd.
  service:
    name: etcd
    enabled: true
    state: restarted

The reason restarted is used is to make sure that it is either started, or if already running, restarted with the correct configuration.
It’s very important that the etcd server is running before the skydns server is started, this because we need to put some configuration into the etcd storage that skydns can use when it’s starting.

Easiest way to add the configuration is by using curl and send a couple of requests to the etcd server.
The following configuration is quite a minimal config, there is more that can be added, so make sure to check out the skydns documentation for more info.

- name: Skydns Configuration
  shell: 'curl -XPUT https://{{ ansible_hostname }}.example.tdl:2379/v2/keys/skydns/config -d value={"host": "{{ ansible_hostname }}", "dns_addr": "{{ ansible_eth1.ipv4.address }}"}'
- name: Add skydns to etcd as a domain
  shell: 'curl -XPUT https://{{ ansible_hostname }}.example.tdl:2379/v2/keys/skydns/tdl/example/{{ ansible_hostname }} -d value={"host": "{{ ansible_eht1.ipv4.address }}"}'

It’s also recommended to add a nameserver, but I will not go through that at the moment.

As you see in the second task, we add the server itself to the DNS, it’s added by using the etcd route /v2/keys/skydns/<top_domain>/<domain>/<sub_domain> and putting the host data which is pointing to the IP of the machine.
We will use basically the exact same command to add future servers to the DNS too!

When the commands have been ran, we are ready to start the skydns service too.

- name: Start skydns service.
  service:
    name: skydns
    enabled: true
    state: restarted

Now, running the playbook would install the services and set everything up to make it possible to query the DNS from your other servers!

If you encounter any problems and don’t know what it is, connect to the server via ssh and check the journalctl for skydns and etcd info:

journalctl -u etcd
journalctl -u skydns

Run it from terraform

When the playbook is done, we want to wrapp it all up so that the terraform script can run the playbook for us. To make it as easy to work with as possible, we will create a new resource for this. When you want to create a resource that is not really bound to anything, the easiest thing to use is a so called null_resource.

The one we use should look something like this:

resource "null_resource" "skydns-ansible" {
  depends_on = ["upcloud_server.skydns"]

  provisioner "local-exec" {
    environment {
      ANSIBLE_HOST_KEY_CHECKING = "False"
    }

    working_dir = "../ansible/etcd"
    command     = "ansible-playbook -u root --private-key ${var.ssh_key_private} playbook.yml -i ${join(",", upcloud_server.etcd.*.ipv4_address)},"
  }
}

It’s quite simple, the null_resource depends on the upcloud_server resource named skydns, when that is ready, it will run a local-exec provisioner which in turn runs the ansible playbook with the private key we use and a list of ip-addresses (in this case just one) as inventory, ending with a , to make sure ansible thinks it’s a list!

Make sure that you add some type of host key remove/add part to either the ansible script or somewhere in terraform, the above example uses the ANSIBLE_HOST_KEY_CHECKING="False", which does not check the host keys at all, this is not ultimate when it comes to a security perspective, but makes it a lot easier to work with when debugging the scripts.

Final words

The DNS server is the first part of the actual infrastructure we are to use for kubernetes. The DNS server is quite a core piece of the infrastructure!

As always: If you find any errors, have any questions or just want to say hi! Leave a comment below!