Kubernetes Basics #8 - Monitoring

Last Edited: 6/9/2025

This blog post introduces some monitoring tools in Kubernetes.

DevOps

In the previous article, we briefly mentioned how service accounts are used primarily to assign appropriate privileges to build tools and monitoring services. For proper cluster management, setting up monitoring services that measure multiple aspects of cluster performance is essential. Hence, in this article, we will cover some prominent tools for monitoring in Kubernetes.

Health Probes

Health probes are HTTP requests, TCP requests, or command executions by a kubelet to a pod, which customely define processes in the container. These probes obtain a response used for checking if the pod has started running, if the pod is ready to receive requests, and if the pod is live and needs restarting. If a pod is not starting up, ready, or live, the health probe restarts the pod to remedy any issues regarding its health.

# http-liveness.yaml
apiVersion: v1
kind: Pod
metadata:
  labels:
    test: liveness
  name: liveness-http
spec:
  containers:
  - name: liveness
    image: registry.k8s.io/e2e-test-images/agnhost:2.40
    args:
    - liveness
    livenessProbe:
      httpGet:
        path: /healthz
        port: 8080
        httpHeaders:
        - name: Custom-Header
          value: Awesome
      initialDelaySeconds: 3
      periodSeconds: 3
---
# tcp-readiness.yaml
apiVersion: v1
kind: Pod
metadata:
  name: goproxy
  labels:
    app: goproxy
spec:
  containers:
  - name: goproxy
    image: registry.k8s.io/goproxy:0.1
    ports:
    - containerPort: 8080
    readinessProbe:
      tcpSocket:
        port: 8080
      initialDelaySeconds: 15
      periodSeconds: 10

The example above shows a liveness probe (HTTP) and a readiness probe (TCP) configuration. Depending on the networking layer that the pod is operating on, we can choose to set up an HTTP endpoint or listen on a port. (We can also make the kubelet run commands in the pod for health checks.) The initialDelaySeconds field specifies the delay until the first health check call is made, and periodSeconds specifies how many seconds to wait until the next health check. Until the pod is ready, it is removed from the service load balancer and restarted by the kubelet, and until the pod is live, the kubelet restarts the container.

Metric Server

The metric server is a pod in the kube-system namespace that monitors CPU and memory utilization. The Horizontal Pod Autoscaler (HPA), which we discussed in the article Kubernetes Basics #6 - Scheduling & Autoscaling, depends on the metric server and returns an error when the metric server is not properly set up. To install the metric server, you can follow the instructions here. (It is likely that you need to add the --kubelet-insecure-tls flag to the command in the metric server YAML file, which can be done by getting the YAML file with kubectl get deploy and applying the change.)

To confirm that the metric server is configured correctly, you can run kubectl top pods to list all the pods and their CPU and memory utilization in descending order. Although the metric server is essential for HPA (and VPA) and useful for monitoring the cluster, it is not suited for production clusters with hundreds of resources and failure points. We also cannot obtain detailed metrics like HTTP response time, get notified for potential failures, and aggregate and store the metrics for analytics. Hence, we typically resort to more sophisticated third-party monitoring tools.

Prometheus

Prometheus is one of the most popular monitoring tools, especially for projects with containerized microservice architectures, due to its high compatibility with Docker and Kubernetes. It's well-suited for production environments, as its stack includes an alerting mechanism, a time-series database, a query language, a user interface (Grafana), and numerous prebuilt exporters and client libraries that can export various metrics to Prometheus. The following describes the system architecture of Prometheus.

Prometheus

Prometheus is compatible with the Kubernetes API server for discovering the services it needs to collect metrics from. It pulls metrics from services exposed by exporters and client libraries. The Alertmanager can be configured to alert administrators when potential issues are detected from the cluster (based on custom-defined rules, such as alerting when CPU usage exceeds 90%). The stored data can be displayed on Grafana and queried using PromQL based on its data model.

While you can manually set up each component of the system architecture, including the time-series database (local or remote storage), HTTP server, retriever, exporter, and alert manager, and apply service discovery and Prometheus configuration on Kubernetes, this requires considerable effort (involving setting up multiple secrets, ConfigMaps, StatefulSets, DaemonSets, Ingresses, cluster roles, etc.). Therefore, you can use the Prometheus Operator, provided by Prometheus, which automatically sets up and manages the resources needed for basic usage. You can install the Prometheus Operator by following the instructions here.

Although Prometheus is a widely used monitoring tool, it can be difficult to set up, even with the Prometheus Operator, and its data model, query language, and client libraries can also be challenging to learn. I might dedicate separate article series on Prometheus in the future (when I feel it's necessary), but for now, we can consider this an introduction to what a production-level monitoring system looks like.

Conclusion

In this article, we covered health probes for pod health checks and recovery, the Metrics Server for monitoring pod resource utilization and autoscaling, and Prometheus, one of the most popular monitoring tools suited for production. We didn't cover how to set up Prometheus or how to use it in its entirety due to its complexities, though we covered the basic system architecture, which should be helpful for further inquiries.

Resources