Kubernetes — Deploying Spring Boot with Deployments & Services

Namespace & Labels
Deployment Manifest
Resource Requests & Limits
Service — ClusterIP & LoadBalancer
Ingress
Rolling Updates & Rollback
HorizontalPodAutoscaler
Essential kubectl Commands

A Namespace is a virtual cluster within a Kubernetes cluster. It provides an isolated scope for resource names — two teams can each have a payment-service Deployment without conflicting, as long as they live in different namespaces. Namespaces are the primary way to organise resources by team, application, or environment (e.g., payments-dev, payments-prod). They also act as the boundary for RBAC permissions and resource quotas.

Labels are arbitrary key-value pairs attached to any Kubernetes object. They are not just metadata — Kubernetes uses them actively: a Service finds the pods it should route to by matching labels, an HPA targets a Deployment by label, and kubectl get pods -l app=payment-service filters by label. Consistent labelling across all your manifests is essential for manageability.

# namespace.yaml apiVersion: v1 kind: Namespace metadata: name: payments labels: team: platform env: production

# Create the namespace from the manifest kubectl apply -f namespace.yaml # Set it as the default namespace for your current kubectl context # so you don't have to add -n payments to every command kubectl config set-context --current --namespace=payments

A Deployment is the standard Kubernetes resource for running a stateless application. You describe the desired state — which container image to run, how many replicas to keep alive, what environment variables and resource limits to apply — and Kubernetes continuously reconciles the actual cluster state to match it. If a pod crashes or a node goes down, the Deployment controller automatically schedules a replacement. The manifest below shows a production-ready setup with JVM container-aware flags, health probes, and topology spread to distribute pods across nodes for high availability.

# deployment.yaml apiVersion: apps/v1 kind: Deployment metadata: name: payment-service namespace: payments labels: app: payment-service version: "1.0.0" spec: replicas: 3 selector: matchLabels: app: payment-service template: metadata: labels: app: payment-service version: "1.0.0" spec: containers: - name: payment-service image: myorg/payment-service:1.0.0 imagePullPolicy: Always ports: - containerPort: 8080 name: http env: - name: SPRING_PROFILES_ACTIVE value: "production" - name: SERVER_PORT value: "8080" # JVM flags — use container-aware memory settings env: - name: JAVA_TOOL_OPTIONS value: >- -XX:+UseContainerSupport -XX:MaxRAMPercentage=75.0 -XX:+ExitOnOutOfMemoryError resources: requests: cpu: "250m" memory: "512Mi" limits: cpu: "1000m" memory: "1Gi" # Probes covered in the dedicated article readinessProbe: httpGet: path: /actuator/health/readiness port: 8080 initialDelaySeconds: 20 periodSeconds: 10 livenessProbe: httpGet: path: /actuator/health/liveness port: 8080 initialDelaySeconds: 30 periodSeconds: 15 # Spread pods across nodes for HA topologySpreadConstraints: - maxSkew: 1 topologyKey: kubernetes.io/hostname whenUnsatisfiable: DoNotSchedule labelSelector: matchLabels: app: payment-service # Graceful termination — allow in-flight requests to drain terminationGracePeriodSeconds: 60

Correctly sizing resource requests and limits is critical for reliable scheduling and avoiding OOMKilled pods.

Setting	Effect	Rule of Thumb
requests.memory	Reserved on the node — used for scheduling	Set to typical heap + off-heap (metaspace, code cache, threads)
limits.memory	Hard cap — pod is OOMKilled if exceeded	1.5–2× requests; use -XX:+ExitOnOutOfMemoryError
requests.cpu	Guaranteed CPU share	Set to average utilisation, not peak
limits.cpu	Throttled when exceeded (not killed)	2–4× requests; high limits cause GC pauses if throttled

# Profile actual usage before setting limits kubectl top pods -n payments # Watch for OOMKilled pods kubectl get events -n payments --field-selector reason=OOMKilling

Set -XX:MaxRAMPercentage=75.0 and -XX:+UseContainerSupport so the JVM reads its memory limit from cgroups rather than the host. Without these flags, the JVM may allocate a heap larger than the pod limit and get OOMKilled immediately.

Pods are ephemeral — they are created and destroyed constantly, and each gets a different IP address every time. A Service sits in front of your pods and gives your application a stable, permanent DNS name and IP address that other services (or external clients) can rely on regardless of how many times pods have been restarted or rescheduled.

Kubernetes offers several Service types. The two most common are:

ClusterIP (the default) — exposes the Service on an internal cluster IP. Reachable only from within the cluster. Use this for service-to-service communication.
LoadBalancer — provisions an external load balancer from the cloud provider (AWS ELB, GCP LB, Azure LB). Gives the Service a public IP. Use sparingly — each LoadBalancer costs money and consumes a public IP. Prefer a single Ingress instead (see the next section).

# service.yaml — ClusterIP for internal cluster traffic apiVersion: v1 kind: Service metadata: name: payment-service namespace: payments spec: selector: app: payment-service ports: - name: http port: 80 # port exposed within the cluster targetPort: 8080 # container port type: ClusterIP # internal only — use Ingress for external access

# LoadBalancer — for direct external access (cloud providers provision an ELB/NLB) apiVersion: v1 kind: Service metadata: name: payment-service-lb namespace: payments annotations: service.beta.kubernetes.io/aws-load-balancer-type: "nlb" spec: selector: app: payment-service ports: - port: 80 targetPort: 8080 type: LoadBalancer

An Ingress is a Kubernetes resource that manages external HTTP and HTTPS traffic into your cluster. The key advantage over a LoadBalancer Service is that a single Ingress can route traffic to many backend services using host-based or path-based rules — api.example.com/payments goes to the payment service, api.example.com/orders goes to the order service — all through one external IP address, which is significantly cheaper and simpler to manage at scale.

An Ingress Controller (a separate pod running in the cluster, commonly nginx-ingress or Traefik) watches for Ingress resources and configures the actual reverse proxy. The example below uses cert-manager to automatically provision and renew a TLS certificate from Let's Encrypt.

# ingress.yaml — route external traffic via nginx-ingress-controller apiVersion: networking.k8s.io/v1 kind: Ingress metadata: name: payment-service namespace: payments annotations: nginx.ingress.kubernetes.io/rewrite-target: / nginx.ingress.kubernetes.io/proxy-body-size: "10m" cert-manager.io/cluster-issuer: letsencrypt-prod spec: ingressClassName: nginx tls: - hosts: - api.example.com secretName: api-tls-cert rules: - host: api.example.com http: paths: - path: /payments pathType: Prefix backend: service: name: payment-service port: number: 80

Kubernetes performs a rolling update by default when you change a Deployment's image or configuration. Rather than stopping all old pods and starting new ones simultaneously (which causes downtime), it gradually replaces them — starting new pods, waiting for them to pass the readiness probe, then terminating old ones. The result is a zero-downtime deployment.

Two settings control the pace: maxSurge — how many extra pods are allowed above the desired count during the update — and maxUnavailable — how many pods can be unavailable. Setting maxUnavailable: 0 means capacity never drops below the desired replica count, which is the safest choice for production. If anything goes wrong, a single kubectl rollout undo reverts to the previous working version within seconds.

# In the Deployment spec — zero-downtime rolling update strategy spec: strategy: type: RollingUpdate rollingUpdate: maxSurge: 1 # allow 1 extra pod during update maxUnavailable: 0 # never reduce below desired replica count

# Deploy a new image version kubectl set image deployment/payment-service \ payment-service=myorg/payment-service:1.1.0 \ -n payments # Watch rollout progress kubectl rollout status deployment/payment-service -n payments # View rollout history kubectl rollout history deployment/payment-service -n payments # Rollback to previous version immediately kubectl rollout undo deployment/payment-service -n payments # Rollback to a specific revision kubectl rollout undo deployment/payment-service --to-revision=2 -n payments

The HorizontalPodAutoscaler (HPA) watches your Deployment's resource utilisation and automatically adjusts the replica count to match demand. When traffic spikes and average CPU climbs above your threshold, the HPA adds pods to spread the load. When traffic drops, it removes pods to reduce cost. This removes the need to manually scale and ensures your application is always right-sized.

The behavior block is important for production: scaleUp.stabilizationWindowSeconds prevents the HPA from adding pods too eagerly in response to a momentary spike, and scaleDown.stabilizationWindowSeconds prevents it from removing pods too quickly after a burst — giving traffic time to settle before cutting capacity.

The HPA requires the Metrics Server to be installed in the cluster. On managed clusters (EKS, GKE, AKS) it is usually pre-installed. Run kubectl top pods to verify metrics are available before applying an HPA.

# hpa.yaml — scale based on CPU and memory apiVersion: autoscaling/v2 kind: HorizontalPodAutoscaler metadata: name: payment-service namespace: payments spec: scaleTargetRef: apiVersion: apps/v1 kind: Deployment name: payment-service minReplicas: 2 maxReplicas: 20 metrics: - type: Resource resource: name: cpu target: type: Utilization averageUtilization: 70 # scale up when avg CPU > 70% - type: Resource resource: name: memory target: type: AverageValue averageValue: 768Mi # scale up when avg memory > 768Mi behavior: scaleUp: stabilizationWindowSeconds: 30 # wait 30s before scaling up again policies: - type: Pods value: 2 periodSeconds: 60 # add max 2 pods per minute scaleDown: stabilizationWindowSeconds: 300 # wait 5min before scaling down

kubectl is the command-line tool for interacting with a Kubernetes cluster. The commands below cover the most common day-to-day operations you'll use when deploying, inspecting, and debugging your Spring Boot application — from applying manifests and watching rollouts to tailing logs and opening a shell inside a running pod.

# Apply all manifests in a directory kubectl apply -f k8s/ -n payments # Get all resources in namespace kubectl get all -n payments # Describe pod for events and status kubectl describe pod payment-service-abc123 -n payments # View logs (follow) kubectl logs -f deployment/payment-service -n payments # Exec into a running pod kubectl exec -it payment-service-abc123 -n payments -- sh # Port-forward for local testing (no Ingress needed) kubectl port-forward svc/payment-service 8080:80 -n payments # Scale manually kubectl scale deployment/payment-service --replicas=5 -n payments # Check HPA status kubectl get hpa -n payments # Force pod restart (rolling) kubectl rollout restart deployment/payment-service -n payments

Contents