Kubernetes Basics #6 - Scheduling & Autoscaling

Last Edited: 5/20/2025

This blog post introduces concepts regarding scheduling and autoscaling in Kubernetes.

DevOps

In the previous article, we covered how to achieve data persistence for databases and file storage systems, and mentioned how difficult it is to horizontally scale those systems. When we resort to vertical scaling instead, we need to ensure that the pod is always assigned to a specific node that we are upgrading the resources on, which we haven't yet discussed. There are also other situations where we want to schedule pods to a specific set of nodes. Hence, in this article, we will discuss several features that allow us to do that, and also about autoscaling, which lets us avoid manually changing the number of replicas as the workload changes.

Node Selectors

The easiest way to assign a single pod to a particular node is by providing the node's name (visible via kubectl get nodes) in the nodeName field in the pod's specification. This is sufficient for situations where a pod is restricted to being scheduled to only one node, such as for vertical scaling of a database or file storage system in the cluster. However, we might want to allow the pod to be scheduled to multiple nodes when those nodes meet certain requirements (e.g., when the pod is running a machine learning workload, we want to assign it to any node equipped with a GPU).

Just like deployments and services can identify the pods they are responsible for using labels and selectors, pods can also identify the nodes they can be assigned to using nodeSelector in the pod's specification, and they can be scheduled to any node with matching labels. We can assign labels to nodes using kubectl label <node-name> <label-name>=<label-value>. The mechanism is intuitive and easy to use, but it cannot enforce complex scheduling logic.

Taints & Tolerations

While node selectors specify which nodes to assign pods to, taints and tolerations control which nodes to not assign pods to. Specifically, taints can be set up so that new pods without tolerations to a node's taints cannot be scheduled (NoSchedule), cannot be scheduled or executed on the node (NoExecute), or cannot be scheduled unless there are no other nodes available for assignment (PreferNoSchedule). For example, we can taint a node with a GPU capability like kubectl taint node cluster-worker1 gpu=true:NoExecute, which prevents pods without a toleration to the taint gpu=true from being scheduled or executed on the node.

example-pod.yaml
# ...
spec:
  # ...
  tolerations:
    - key: "gpu"
      operator: "Equal"
      value: "true"
      effect: "NoExecute"

To assign a toleration to a specific taint, we can do so as shown above. While taints and tolerations can guarantee that certain pods are not scheduled or executed on certain nodes, they do not guarantee that pods with tolerations to the taints are scheduled only to nodes with the taints. Hence, they are often utilized together with node selectors to ensure that scheduling behaves in a particular desired way.

Node Affinity

While node selectors are powerful due to their simplicity, they cannot specify complex logic, such as assigning pods to nodes with labels matching one of two values and another label matching a particular value. For example, node selectors cannot express scheduling pods to nodes with SSD or HDD and GPU available. Node affinity allows us to do exactly that and set preferences instead of requirements.

example-pod.yaml
# ...
spec:
  # ...
  affinity:
    nodeAffinity:
      requireDuringSchedulingIgnoredDuringExecution:
        nodeSelectorTerms:
          - matchingExpressions:
            - key: "disk"
              operator: "In"
              values: 
                - "ssd"
                - "hhd"
            - key: "gpu"
              operator: "Equal"
              value: "true"

The above example demonstrates node affinity assigning a pod. The node affinity requires the node label disk to match either ssd or hhd, and gpu to match true. We can also use preferredDuring... to specify that pods prefer nodes with matching labels to a certain extent, though they can still be assigned to other nodes if they are not available. You can see the Kubernetes official documentation (cited below) for more information. Though it's a great substitute for node selectors, it can be unnecessarily complicated and less intuitive to use for some use cases, so it's important to choose between them wisely.

Requests & Limits

As we have seen in previous discussions, we often specify nodes due to the resource requirements of pods. For memory and CPU, Kubernetes has features called requests and limits, which allow us to set the minimum and maximum resources for a pod, preventing situations where the pod consumes the entire resource of the node and goes out of memory (OOM) of the node (making the node unavailable), or pods with high resource requirements get scheduled to nodes that do not meet those requirements.

example-pod.yaml
# ...
spec:
  containers:
    - name: container
      image: my-image
      resources:
        requests:
          memory: "100Mi" # 100MiB
          cpu: "250m" # 0.25 CPU (25% of a CPU core power)
        limits:
          memory: "200Mi" # 200MiB
          cpu: "500m" # 0.5 CPU (50% of a CPU core power)

Since the processes with resource requirements run and depend on the container, we set requests and limits per container as shown above. The above requests specify at least 100MiB of memory and 0.25 CPU, preventing the pod from being scheduled to nodes that do not satisfy the resource request, and it limits itself to consuming 200MiB and 0.5 CPU, preventing the entire node from crashing by crashing the pod itself with OOM.

Horizontal Pod Autoscaling

So far, we have been scaling the cluster by manually increasing the number of pods in the deployment. However, we cannot constantly monitor the cluster every second to set an appropriate number of replicas for cost efficiency and availability. Hence, Kubernetes allows us to autoscale the cluster by adjusting the number of pods in the deployment within defined thresholds, depending on CPU utilization. This method is called horizontal pod autoscaling (HPA), and it's achievable using a command like kubectl autoscale deployment <deployment-name> --cpu-percent=50 --min=1 --max=10, which creates an HPA that automatically adjusts the number of pods from 1 to 10 to achieve 50% CPU utilization at maximum.

We can obtain information about the HPA, such as current CPU utilization and the number of replicas (as opposed to the target), using the command kubectl get hpa. Horizontal node autoscaling (automatically adding more nodes) is another option for horizontally autoscaling the cluster, though the implementation depends on the cloud provider. There are vertical pod autoscaling and vertical node autoscaling as well for limited use cases. However, they are primarily not well-suited to Kubernetes clusters and are outside the scope of this article.

Conclusion

In this article, we covered the basics of how we can configure scheduling and autoscaling to meet the resource requirements of pods, prevent node crashes, and maintain the cluster easily. Though we have covered many foundational concepts for setting up a cluster so far, there are still many other important concepts remaining, especially for setting up a cluster for production. Hence, we will continue the discussion on some of those concepts in the next article.

Resources