Contents
- Namespace & Labels
- Deployment Manifest
- Resource Requests & Limits
- Service — ClusterIP & LoadBalancer
- Ingress
- Rolling Updates & Rollback
- HorizontalPodAutoscaler
- Essential kubectl Commands
A Namespace is a virtual cluster within a Kubernetes cluster. It provides an isolated scope for resource names — two teams can each have a payment-service Deployment without conflicting, as long as they live in different namespaces. Namespaces are the primary way to organise resources by team, application, or environment (e.g., payments-dev, payments-prod). They also act as the boundary for RBAC permissions and resource quotas.
Labels are arbitrary key-value pairs attached to any Kubernetes object. They are not just metadata — Kubernetes uses them actively: a Service finds the pods it should route to by matching labels, an HPA targets a Deployment by label, and kubectl get pods -l app=payment-service filters by label. Consistent labelling across all your manifests is essential for manageability.
# namespace.yaml
apiVersion: v1
kind: Namespace
metadata:
name: payments
labels:
team: platform
env: production
# Create the namespace from the manifest
kubectl apply -f namespace.yaml
# Set it as the default namespace for your current kubectl context
# so you don't have to add -n payments to every command
kubectl config set-context --current --namespace=payments
A Deployment is the standard Kubernetes resource for running a stateless application. You describe the desired state — which container image to run, how many replicas to keep alive, what environment variables and resource limits to apply — and Kubernetes continuously reconciles the actual cluster state to match it. If a pod crashes or a node goes down, the Deployment controller automatically schedules a replacement. The manifest below shows a production-ready setup with JVM container-aware flags, health probes, and topology spread to distribute pods across nodes for high availability.
# deployment.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
name: payment-service
namespace: payments
labels:
app: payment-service
version: "1.0.0"
spec:
replicas: 3
selector:
matchLabels:
app: payment-service
template:
metadata:
labels:
app: payment-service
version: "1.0.0"
spec:
containers:
- name: payment-service
image: myorg/payment-service:1.0.0
imagePullPolicy: Always
ports:
- containerPort: 8080
name: http
env:
- name: SPRING_PROFILES_ACTIVE
value: "production"
- name: SERVER_PORT
value: "8080"
# JVM flags — use container-aware memory settings
env:
- name: JAVA_TOOL_OPTIONS
value: >-
-XX:+UseContainerSupport
-XX:MaxRAMPercentage=75.0
-XX:+ExitOnOutOfMemoryError
resources:
requests:
cpu: "250m"
memory: "512Mi"
limits:
cpu: "1000m"
memory: "1Gi"
# Probes covered in the dedicated article
readinessProbe:
httpGet:
path: /actuator/health/readiness
port: 8080
initialDelaySeconds: 20
periodSeconds: 10
livenessProbe:
httpGet:
path: /actuator/health/liveness
port: 8080
initialDelaySeconds: 30
periodSeconds: 15
# Spread pods across nodes for HA
topologySpreadConstraints:
- maxSkew: 1
topologyKey: kubernetes.io/hostname
whenUnsatisfiable: DoNotSchedule
labelSelector:
matchLabels:
app: payment-service
# Graceful termination — allow in-flight requests to drain
terminationGracePeriodSeconds: 60
Correctly sizing resource requests and limits is critical for reliable scheduling and avoiding OOMKilled pods.
| Setting | Effect | Rule of Thumb |
| requests.memory | Reserved on the node — used for scheduling | Set to typical heap + off-heap (metaspace, code cache, threads) |
| limits.memory | Hard cap — pod is OOMKilled if exceeded | 1.5–2× requests; use -XX:+ExitOnOutOfMemoryError |
| requests.cpu | Guaranteed CPU share | Set to average utilisation, not peak |
| limits.cpu | Throttled when exceeded (not killed) | 2–4× requests; high limits cause GC pauses if throttled |
# Profile actual usage before setting limits
kubectl top pods -n payments
# Watch for OOMKilled pods
kubectl get events -n payments --field-selector reason=OOMKilling
Set -XX:MaxRAMPercentage=75.0 and -XX:+UseContainerSupport so the JVM reads its memory limit from cgroups rather than the host. Without these flags, the JVM may allocate a heap larger than the pod limit and get OOMKilled immediately.
Pods are ephemeral — they are created and destroyed constantly, and each gets a different IP address every time. A Service sits in front of your pods and gives your application a stable, permanent DNS name and IP address that other services (or external clients) can rely on regardless of how many times pods have been restarted or rescheduled.
Kubernetes offers several Service types. The two most common are:
- ClusterIP (the default) — exposes the Service on an internal cluster IP. Reachable only from within the cluster. Use this for service-to-service communication.
- LoadBalancer — provisions an external load balancer from the cloud provider (AWS ELB, GCP LB, Azure LB). Gives the Service a public IP. Use sparingly — each LoadBalancer costs money and consumes a public IP. Prefer a single Ingress instead (see the next section).
# service.yaml — ClusterIP for internal cluster traffic
apiVersion: v1
kind: Service
metadata:
name: payment-service
namespace: payments
spec:
selector:
app: payment-service
ports:
- name: http
port: 80 # port exposed within the cluster
targetPort: 8080 # container port
type: ClusterIP # internal only — use Ingress for external access
# LoadBalancer — for direct external access (cloud providers provision an ELB/NLB)
apiVersion: v1
kind: Service
metadata:
name: payment-service-lb
namespace: payments
annotations:
service.beta.kubernetes.io/aws-load-balancer-type: "nlb"
spec:
selector:
app: payment-service
ports:
- port: 80
targetPort: 8080
type: LoadBalancer
An Ingress is a Kubernetes resource that manages external HTTP and HTTPS traffic into your cluster. The key advantage over a LoadBalancer Service is that a single Ingress can route traffic to many backend services using host-based or path-based rules — api.example.com/payments goes to the payment service, api.example.com/orders goes to the order service — all through one external IP address, which is significantly cheaper and simpler to manage at scale.
An Ingress Controller (a separate pod running in the cluster, commonly nginx-ingress or Traefik) watches for Ingress resources and configures the actual reverse proxy. The example below uses cert-manager to automatically provision and renew a TLS certificate from Let's Encrypt.
# ingress.yaml — route external traffic via nginx-ingress-controller
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
name: payment-service
namespace: payments
annotations:
nginx.ingress.kubernetes.io/rewrite-target: /
nginx.ingress.kubernetes.io/proxy-body-size: "10m"
cert-manager.io/cluster-issuer: letsencrypt-prod
spec:
ingressClassName: nginx
tls:
- hosts:
- api.example.com
secretName: api-tls-cert
rules:
- host: api.example.com
http:
paths:
- path: /payments
pathType: Prefix
backend:
service:
name: payment-service
port:
number: 80
Kubernetes performs a rolling update by default when you change a Deployment's image or configuration. Rather than stopping all old pods and starting new ones simultaneously (which causes downtime), it gradually replaces them — starting new pods, waiting for them to pass the readiness probe, then terminating old ones. The result is a zero-downtime deployment.
Two settings control the pace: maxSurge — how many extra pods are allowed above the desired count during the update — and maxUnavailable — how many pods can be unavailable. Setting maxUnavailable: 0 means capacity never drops below the desired replica count, which is the safest choice for production. If anything goes wrong, a single kubectl rollout undo reverts to the previous working version within seconds.
# In the Deployment spec — zero-downtime rolling update strategy
spec:
strategy:
type: RollingUpdate
rollingUpdate:
maxSurge: 1 # allow 1 extra pod during update
maxUnavailable: 0 # never reduce below desired replica count
# Deploy a new image version
kubectl set image deployment/payment-service \
payment-service=myorg/payment-service:1.1.0 \
-n payments
# Watch rollout progress
kubectl rollout status deployment/payment-service -n payments
# View rollout history
kubectl rollout history deployment/payment-service -n payments
# Rollback to previous version immediately
kubectl rollout undo deployment/payment-service -n payments
# Rollback to a specific revision
kubectl rollout undo deployment/payment-service --to-revision=2 -n payments
The HorizontalPodAutoscaler (HPA) watches your Deployment's resource utilisation and automatically adjusts the replica count to match demand. When traffic spikes and average CPU climbs above your threshold, the HPA adds pods to spread the load. When traffic drops, it removes pods to reduce cost. This removes the need to manually scale and ensures your application is always right-sized.
The behavior block is important for production: scaleUp.stabilizationWindowSeconds prevents the HPA from adding pods too eagerly in response to a momentary spike, and scaleDown.stabilizationWindowSeconds prevents it from removing pods too quickly after a burst — giving traffic time to settle before cutting capacity.
The HPA requires the Metrics Server to be installed in the cluster. On managed clusters (EKS, GKE, AKS) it is usually pre-installed. Run kubectl top pods to verify metrics are available before applying an HPA.
# hpa.yaml — scale based on CPU and memory
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
name: payment-service
namespace: payments
spec:
scaleTargetRef:
apiVersion: apps/v1
kind: Deployment
name: payment-service
minReplicas: 2
maxReplicas: 20
metrics:
- type: Resource
resource:
name: cpu
target:
type: Utilization
averageUtilization: 70 # scale up when avg CPU > 70%
- type: Resource
resource:
name: memory
target:
type: AverageValue
averageValue: 768Mi # scale up when avg memory > 768Mi
behavior:
scaleUp:
stabilizationWindowSeconds: 30 # wait 30s before scaling up again
policies:
- type: Pods
value: 2
periodSeconds: 60 # add max 2 pods per minute
scaleDown:
stabilizationWindowSeconds: 300 # wait 5min before scaling down
kubectl is the command-line tool for interacting with a Kubernetes cluster. The commands below cover the most common day-to-day operations you'll use when deploying, inspecting, and debugging your Spring Boot application — from applying manifests and watching rollouts to tailing logs and opening a shell inside a running pod.
# Apply all manifests in a directory
kubectl apply -f k8s/ -n payments
# Get all resources in namespace
kubectl get all -n payments
# Describe pod for events and status
kubectl describe pod payment-service-abc123 -n payments
# View logs (follow)
kubectl logs -f deployment/payment-service -n payments
# Exec into a running pod
kubectl exec -it payment-service-abc123 -n payments -- sh
# Port-forward for local testing (no Ingress needed)
kubectl port-forward svc/payment-service 8080:80 -n payments
# Scale manually
kubectl scale deployment/payment-service --replicas=5 -n payments
# Check HPA status
kubectl get hpa -n payments
# Force pod restart (rolling)
kubectl rollout restart deployment/payment-service -n payments