Following on from my first post on the different resources I have used to familiarize myself with containers & Kubernetes I wanted to follow up with a more technical post. In this post I want to cover the core components of a Kubernetes cluster. There are many components within Kubernetes which allow application operators and developers to focus entirely on container-centric primitives for self-service operation. As a network engineer and someone who works on the infrastructure side of I.T some of the deeper components within Kuberentes are not something I will be covering in this post. Looking at what I feel are fundamental components will be the focus for this post. Also just to note all of the below components I will be discussing can be found in the resources I mentioned in my previous post. These are my own notes from studying through Kubernetes.
Kubernetes from above
For those who are familiar with SDN, central controller managing resources on distributed hardware, will find understanding Kubernetes at a 1000 foot view has a lot of similarities. Kubernetes gives us constructs to deal with management at the application or service level which provide high availability. By programming Kubernetes for the orchestration of your containers it can now manage when your application/container is deployed, where it will be deployed and how many containers are deployed across the cluster. While been able to mange the resource usage, such as CPU, memory, and disk space across the infrastructure. This is done by setting what is called the desired state for your application. To keep to this desired state Kuberentes is constantly monitoring the actual state of the cluster and will synchronize to the desired state. The interaction between the Kubernetes cluster administrator is done by kubectl scripts and/or RESTful service calls to the API server.
The master controller is the brains of the cluster. On the master controller it will have the core API server, which maintains RESTful web services for querying and defining our desired cluster and workload state. This is how we only access the master to initiate changes and not the nodes directly.
The master controller also includes the scheduler which works with the API server to schedule workloads in the form of pods on the actual nodes. These pods include the various containers that make up our application. The scheduler spreads pods across the cluster and uses different nodes for matching pod. We can specifying necessary resources for each container, so scheduling can be altered by these additional factors.
The replication controller works with the API server to ensure that the correct number of pod replicas are running at any given time. The replication controller is what keeps the cluster in the desired state. If our replication controller is defining three replicas and our actual state is two copies of the pod running, then the scheduler will be invoked to add a third pod somewhere on our cluster. The same is true if there are too many pods running in the cluster at any given time. Remember Kubernetes is always pushing towards that desired state.
The etcd is running as a distributed configuration store. The Kubernetes state is stored here and etcd allows values to be watched for changes.
The Kubernetes node has the services necessary to run application containers and be managed from the master systems. A node may be a VM or physical machine, depending on the cluster. Each node has the services necessary to run pods. These services include kubelet and kube-proxy ;
- The Kubelet interacts with the API server to update state and to start new workloads that have been invoked by the scheduler. The Kubelet is seen as one of the most important components within Kubernetes.
- Kube-proxy provides basic load balancing and directs traffic destined for specific services to the proper pod on the backend.
Pods allow you to keep related containers close in terms of the network and hardware infrastructure. Data can live near the application, so processing can be done without incurring a high latency from network traversal. Each Pod is meant to run a single instance of a given application. If you want to scale your application horizontally you should use multiple Pods, one for each instance. In Kubernetes, this is generally referred to as replication.
The containers in a Pod are automatically co-located and co-scheduled on the same physical or virtual machine in the cluster. Multiple Pods can run on the same Node host meaning that the resources, memory & CPU, on the Node host is shared by allocation across all Pods.
Pods provide two kinds of shared resources for their constituent containers: networking and storage.
- Networking – Each Pod is assigned a unique IP address. Every container in a Pod shares the network namespace, including the IP address and network ports. Containers inside a Pod can communicate with one another using
localhost. When containers in a Pod communicate with entities outside the Pod, they must coordinate how they use the shared network resources (such as ports).
- Storage – A Pod can specify a set of shared storage volumes. All containers in the Pod can access the shared volumes, allowing those containers to share data. Volumes also allow persistent data in a Pod to survive in case one of the containers within needs to be restarted.
When a Pod gets created (directly by you, or indirectly by a Controller), it is scheduled to run on a Node in your cluster. The Pod remains on that Node until the process is terminated, the pod object is deleted, or the pod is evicted for lack of resources, or the Node fails. Pods do not, by themselves, self-heal. If a Pod is scheduled to a Node that fails, or if the scheduling operation itself fails, the Pod is deleted; likewise, a Pod won’t survive an eviction due to a lack of resources or Node maintenance. Kubernetes uses a higher-level abstraction, called a Controller, that handles the work of managing the relatively disposable Pod instances. Thus, while it is possible to use Pod directly, it’s far more common in Kubernetes to manage your pods using a Controller.
Pods essentially allow you to logically group containers and pieces of our application stacks together.
Replication controllers, rc shortcut in the kubelet commands, manage the number of nodes that a pod and included container images are running at any one time. It ensure that an instance of an image is always up and available.
RCs are mostly used for scaling, rescheduling & rolling updates for your application and container environment.
RCs are simply charged with ensuring that you have the desired scale for your application. You define the number of pod replicas you want running and give it a template for how to create new pods.
Labels give you another level of categorization, which becomes very helpful in terms of everyday operations and management. Similar to tags, labels can be used as the basis of service discovery as well as a useful grouping tool for day-to-day operations and management tasks.
Labels are intended to be used to specify identifying attributes of objects that are meaningful and relevant to users, but which do not directly imply semantics to the core system. Labels can be used to organize and to select subsets of objects. Labels can be attached to objects at creation time and subsequently added and modified at any time. Each object can have a set of key/value labels defined. Each Key must be unique for a given object. You will see them on pods, replication controllers, services, and so on.
Kubernetes achieves this by making sure that every node in the cluster runs a proxy named kube-proxy. As the name suggests, kube-proxy’s job is to proxy communication from a service endpoint back to the corresponding pod that is running the actual application.