Introduction to Kubernetes

We have been using Amazon AWS ECS to deploy containerised applications for most of our clients. Now, we are exploring Kubernetes and its integration with Google Cloud Services, to keep up with the new possibilities out there and to choose the best tools for each project.

Lately, I have been investigating Kubernetes because at MarsBased we want to learn more about container orchestration and deployments automation, to see if it would be a good fit for any of our clients.

Here is a summary of what I have learnt peppered with some personal opinions! Hope it's useful!

What is Kubernetes?

In short, Kubernetes is like an open source self-contained PaaS (Platform as a Service). It allows you to deploy containerized applications and it manages all the heavy lifting of placing the containers, restarting them, scaling them, updates, and other frequent actions.

From the outside to the inside, these are the parts Kubernetes is composed of:

Cluster: A Cluster is a set of machines (either virtual machines or physical machines) where applications run. Each of these machines is a Node.
Node: A Node is a machine where Pods run. A Pod is a set of containers that run together.
Pod: The atomic unit in Kubernetes. A Pod is a set of containers that always run together and are always set up in the same node. Pods share a single IP address and usually, there will be only one container (the Puma server serving a Rails app, for instance). But having the ability to run multiple containers as a single unit allows for powerful patterns, such as:
- A logger daemon that sends the logs to an aggregation service.
- A proxy to connect to the DB (this is how Google SQL service works).
- A monitoring daemon performing checks.

Apart from the physical components that Kubernetes manages, there are a set of virtual components that allow running an application. These resources are defined in YAML configuration files.

The most important ones are:

Deployment: This is the core of running an application in Kubernetes. A deployment defines a Pod configuration. That is, it defines the containers to run, the Docker options to apply, resources constraints, deployment strategy (rolling update, replace, etc.), environment variables, and more.
Service: By default, Pods are isolated from the rest of the world. Services allow making these Pods accessible within the Cluster (by assigning a private IP to it) and even outside the Cluster. Services allow assigning IPs to Pods to access certain ports on the running Containers. There are several types of Services, the most interesting one is the LoadBalancer which allows creating a load balancer in the provider of choice (Google Cloud, AWS, etc.). The Pods affected by a Service are selected by label.
Volumes: Like Docker volumes but it has a custom implementation with some differences. Kubernetes volumes are more powerful (or so they claim in their documentation). Their usage is the same as in Docker, where you can define Volumes and mount them on certain paths in the Container.

How does it work?

Kubernetes works declaratively. That is, you define the state you want to have (these deployments with these containers and resources, these services to access that one, etc.) using YAML files, pass them to Kubernetes and forget. Kubernetes takes care of always maintaining that state by taking the necessary actions: starting containers, terminating containers, changing containers from one node to another, etc.

Thus, to deploy an application, for instance, what you would do is to just update the deployment with the new image (built from the new code) and Kubernetes would take care of shutting down the old Pods and running new ones, making sure that there are always available Pods to serve requests.

A normal Kubernetes workflow is:

Create a deployment YAML for every service that we want to deploy.
- As we are using it to deploy monolithic Ruby on Rails applications for now, we define a service for the application and another one for the job workers (using Sidekiq)
Create a service YAML to be able to access the application.
Send all of these YAMLs to Kubernetes.
When instructed to deploy a new version of the app: update the image in both deployments.
Enjoy.

Kubernetes Architecture

Kubernetes requires using a Master Node. So, in every Cluster, you will need to dedicate a Node to be the master, and you will not deploy applications into this Node. The Master Node provides an API to interact with it and stores its state in etcd (a highly available linux key-value store).

There is a CLI (Command Line Interface) to operate with Kubernetes which, in turn, in the background talks to this API. The CLI has both commands to manage the state of the Cluster (deployments, services, etc.) and to interact with the Pods (execute a command, view logs, etc.).

Furthermore, all Nodes in the Cluster run a process called kubelet to receive commands from the master and send data to it.

Apart from that, you have the option to install add-ins - which are like optional components - such as:

A dashboard to visualize and interact with the cluster.
An internal DNS server.
A monitoring service.
A logging aggregation service.

Secrets in Kubernetes

Kubernetes has a clever way to manage Secrets. You can create Secrets from the CLI either from literals or from a file and these are securely stored in the cluster. Then, from a deployment configuration file, you can reference these Secrets by name.

The most common scenario is to define an environment variable in a deployment to get the value from a Secret (API key, for instance).

Runnings Jobs

Kubernetes has another useful resource which is the Job. As its name implies, the idea of this resource is to execute a command once and wait for it to complete. Jobs are also defined using YAML files.

One situation for which Jobs are really useful is database migrations. However, there is a big caveat with jobs. They (and the pods they run on) are not deleted after the jobs finish. The Pod is shut down but the Job and Pods resources are left in order to be able to view the status and the logs. This makes it more difficult to use it for migrations since you need to delete the Job and Pods manually.

Conclusions

Kubernetes is a truly powerful beast. It has excellent and extensive (maybe a bit too much!) documentation and tutorials, as well as great tooling available overall.

Besides the CLI, it has mini-kube which as a local Kubernetes Cluster that can be installed in your machine to play with it, using Docker or a virtual machine. It has a built-in dashboard, too, built-in monitoring and built-in logging. The community is huge, too, and you can find tonnes of courses (both free and paid), tutorials and blog posts such as this one.

In my opinion, I think it's a very good option for MarsBased as a Rails consultancy, and we will benefit from using it, although we have the challenge of finding the right provider to go with it.

Google Cloud seems like the best option by far - after all, Google are the creators of Kubernetes - but this means that we'll need to become as experts in it as we already are with Amazon AWS, our cloud provider of choice.

However, Kubernetes is not meant for small deployments. It's overkill to use it for a low-throughput application as you have the extra overhead of having to keep the Master Node which takes some room just on its own.

Imagine an app that you want to deploy only on a small machine. Having the master would be 50% of the infrastructure. But for a medium-to-large deployment, the overhead is next to negligible. Also, note that Google Cloud does not charge for the Cluster itself (it does charge for the Master Node), it only charges for the Nodes in the Cluster.

Future work

As I mentioned earlier in the post, this is just the beginning. It's been an exploratory ride into the exciting world of Kubernetes.

Currently, we are deploying our first production application into Google Cloud Service using Kubernetes. There's more stuff that we will cover in next posts, exploring some of the challenges that we have been encountering and their solutions:

The jobs not being deleted after their execution, requiring manual interaction to remove them after they are executed.
How to share resources configurations. It's not trivial and it's not built-in. For similar services like the Rails app YAML and the Sidekiq one it requires a lot of duplication in the specification.
Finding an easy way to run commands as the Rails console one. You need to do it similarly to Docker by finding the name of the pod and then running kubectl exec -ti [pod_name] [command].
Exploring other Kubernetes providers, including Amazon EKS or other PaaS like Cloud66.

I will follow up soon with more findings to keep you posted!

Introduction to Kubernetes

What is Kubernetes?

How does it work?

Kubernetes Architecture

Secrets in Kubernetes

Runnings Jobs

Conclusions

Future work

Artículos relacionados

How I use Docker for Rails development: running services

MySQL deferred constraints and unique checks

Query data from PostgreSQL to represent it in a time series graph