This blog post introduces persistent volumes and stateful sets in Kubernetes.

Almost all the applications require file storage and/or a database, where data persists. However, default containers do not inherently provide data persistence. In Docker, this can be achieved using volumes (which we's previously covered in the context of setting up development environments), and Kubernetes offers various abstractions around volumes for horizontally scaling systems. Therefore, in this article, we will briefly discuss persistent volumes and stateful sets, which allow us to achieve data persistence in Kubernetes.
Persistent Volumes
There are multiple ways to set up persistent volumes, but the recommended approach is to use persistent volume claims (PVCs) and storage classes. A PVC specifies the persistent volumes that a pod(s) requires, often set up by developers. A storage class is an abstraction of the underlying local or cloud storage that creates and provides persistent volumes based on the PVC, typically set up by cluster administrators. These abstractions enable seamless collaboration between developers and cluster administrators.

The above is a simplified diagram of how PVCs and storage classes can be set up to provide persistent storage for pods. The diagram only mentions cloud storage, since persistent volumes tied to local storage are tied to a specific node in the cluster and do not survive when the cluster crashes, making them unsuitable for truly persistent file storage and databases. Also, as the diagram shows, persistent volumes are not namespaced and are accessible to the entire cluster.
# Pod ("postgres.yaml")
# ...
spec:
containers:
- name: postgres
image: postgres # image installed in Docker
ports:
- containerPort: 5432
volumeMounts: # mount to the container
- mountPath: "/var/lib/postgresql/data"
name: postgres_volume
volumes: # mount to the pod
- name: postgres_volume
persistentVolumeClaim:
claim-name: pvc-postgres
# PVC ("pvc-postgres.yaml")
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: pvc-postgres
spec:
storageClassName: sc-default # name of the storage class to request pv
accessModes:
- ReadWriteOnce # Read/Write from only 1 node
# (There are other access modes like ReadWriteOncePod, ReadOnlyMany, ReadWriteMany)
resources:
requests:
storage: 10Gi # requesting 10GiB (gibibyte) of storage
The above shows an example YAML file for developers setting up a pod and PVC for PostgreSQL. We can already see that the abstraction of a PVC allows developers to declaratively request persistent volumes, regardless of the storage location or chosen cloud solutions. Similarly, PVCs can be set up for ConfigMaps and Secrets, and mounted to pods and containers using the same file structure, allowing containers to access configurations and secrets (e.g., PostgreSQL configurations and credentials) through volumes.
apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
name: sc-default
annotations:
storageclass.kubernetes.io/is-default-class: "true" # PVC with storageClassName="" sends request to this storage class.
provisioner: csi-driver.example-vendor.example
reclaimPolicy: Retain # what to do after the volume is released (default value is Delete)
allowVolumeExpansion: true # available depending on the privisoner
parameters: # provider-specific
#...
The above shows an example YAML file for the administrator setting up a storage class. By creating a storage class, administrators can avoid configuring persistent volumes for each pod. Depending on the provisioner (AWS, Azure, etc.), the available parameters differ, so administrators need to check the available parameters to configure the storage class appropriately.
Stateful Sets
In the previous section, we covered how to set up Persistent Volume Claims (PVCs) for a PostgreSQL pod and mount the persistent volume to the pod and its containers. However, we usually want to create multiple replicas of the pod to avoid bottlenecks, which isn't as straightforward with stateless applications. When horizontally scaling a database through replication, we need to ensure data consistency by setting up a master database, which allows reads and writes, and worker databases that are read-only and synchronized with the master.
Deployments don't allow configuring different roles for pods, as they assume identical,
interchangeable pods created in random order with random hashes. Therefore, we need to use
stateful sets instead. Stateful sets allow us to assign different roles to pods and create
them in a specific order with fixed identities and endpoints. This allows us to create a
master database, postgres-0
, accessible via postgres-0.svc-db
, for reads and writes,
and worker databases, which are replicas of the master (or previously created worker),
accessible via postgres-<number>.svc-db
for reads.
While stateful sets offer characteristics helpful for setting up replicas of stateful applications like file storage and databases, we still need to manually manage cloning, data synchronization, and backups. Also, containerization isn't inherently well-suited for scaling stateful applications like file storage and databases, and these applications often have their own solutions for horizontal scaling, such as sharding (which we briefly mentioned in the database series). Therefore, it's often better to use cloud solutions like AWS S3 and RDS for scalable stateful applications requiring persistent storage.
Conclusion
In this article, we briefly covered persistent volumes and Stateful Sets, which are important for setting up and scaling stateful applications. Although they might be relevant in a relatively large project involving the development of in-house, scalable solutions requiring data persistence, they are rarely used compared to cloud solutions due to the inherent incompatibility between stateful applications and containerization. We may use persistent volumes with PVCs and storage classes on a pod for smaller projects, though we can simply use Docker Compose and volumes to set up a single container for the database and storage instead of using such an over-engineered solution.
Resources
- Kubernetes. n.d. Kubernetes Documentation. Kubernetes.
- TechWorld with Nana. 2021. Kubernetes StatefulSet simply explained | Deployment vs StatefulSet. YouTube.
- TechWorld with Nana. 2021. Kubernetes Volumes explained | Persistent Volume, Persistent Volume Claim & Storage Class. YouTube.