Kubernetes Best Practices for Production Environments

Learn essential Kubernetes practices for running reliable, scalable applications in production with real-world examples and proven strategies.

BryanLabs Team
12 min read
KubernetesDevOpsProductionBest Practices
Kubernetes Best Practices for Production Environments

Running Kubernetes in production requires careful planning, robust practices, and continuous monitoring. After managing numerous production Kubernetes clusters, we've compiled the essential best practices that ensure reliability, security, and scalability.

Resource Management

1. Resource Requests and Limits

Always define resource requests and limits for your containers:

resources:
  requests:
    memory: "256Mi"
    cpu: "250m"
  limits:
    memory: "512Mi"
    cpu: "500m"

Why this matters:

  • Requests ensure your pods get the minimum resources they need
  • Limits prevent any single pod from consuming all cluster resources
  • Helps the scheduler make better placement decisions

2. Quality of Service Classes

Understand the three QoS classes:

  • Guaranteed: Requests = Limits for all containers
  • Burstable: Has requests but limits > requests
  • BestEffort: No requests or limits defined

Recommendation: Use Guaranteed for critical workloads, Burstable for most applications.

Security Hardening

1. Pod Security Standards

Implement Pod Security Standards to enforce security policies:

apiVersion: v1
kind: Namespace
metadata:
  name: production
  labels:
    pod-security.kubernetes.io/enforce: restricted
    pod-security.kubernetes.io/audit: restricted
    pod-security.kubernetes.io/warn: restricted

2. Network Policies

Implement network segmentation with Network Policies:

apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
  name: deny-all
spec:
  podSelector: {}
  policyTypes:
  - Ingress
  - Egress

3. RBAC (Role-Based Access Control)

Follow the principle of least privilege:

  • Create specific roles for different teams
  • Use service accounts for applications
  • Regularly audit permissions

High Availability and Reliability

1. Pod Disruption Budgets

Protect your applications during cluster maintenance:

apiVersion: policy/v1
kind: PodDisruptionBudget
metadata:
  name: app-pdb
spec:
  minAvailable: 2
  selector:
    matchLabels:
      app: my-app

2. Health Checks

Implement comprehensive health checks:

livenessProbe:
  httpGet:
    path: /health
    port: 8080
  initialDelaySeconds: 30
  periodSeconds: 10

readinessProbe:
  httpGet:
    path: /ready
    port: 8080
  initialDelaySeconds: 5
  periodSeconds: 5

3. Anti-Affinity Rules

Spread pods across nodes and zones:

affinity:
  podAntiAffinity:
    preferredDuringSchedulingIgnoredDuringExecution:
    - weight: 100
      podAffinityTerm:
        labelSelector:
          matchExpressions:
          - key: app
            operator: In
            values:
            - my-app
        topologyKey: kubernetes.io/hostname

Monitoring and Observability

1. The Three Pillars

Implement comprehensive observability:

  • Metrics: Prometheus + Grafana
  • Logs: ELK Stack or Loki
  • Traces: Jaeger or Zipkin

2. Key Metrics to Monitor

Cluster Level:

  • Node resource utilization
  • Pod scheduling success rate
  • API server latency

Application Level:

  • Request rate, errors, duration (RED)
  • CPU, memory, disk usage
  • Custom business metrics

3. Alerting Strategy

Create meaningful alerts:

  • Critical: Immediate action required
  • Warning: Investigate within hours
  • Info: For awareness only

Deployment Strategies

1. Rolling Updates

Use rolling updates for zero-downtime deployments:

strategy:
  type: RollingUpdate
  rollingUpdate:
    maxUnavailable: 1
    maxSurge: 1

2. Blue-Green Deployments

For critical applications requiring instant rollback capability.

3. Canary Deployments

Gradually roll out changes to minimize risk.

Storage Best Practices

1. Persistent Volumes

Use appropriate storage classes:

  • Fast SSD: For databases and high-IOPS workloads
  • Standard: For general-purpose storage
  • Cold Storage: For backups and archives

2. Backup Strategy

Implement regular backups:

  • Application Data: Database dumps, file backups
  • Kubernetes Resources: YAML manifests, secrets
  • etcd: Regular etcd snapshots

Scaling Strategies

1. Horizontal Pod Autoscaler (HPA)

Automatically scale based on metrics:

apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
  name: app-hpa
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: my-app
  minReplicas: 3
  maxReplicas: 100
  metrics:
  - type: Resource
    resource:
      name: cpu
      target:
        type: Utilization
        averageUtilization: 70

2. Vertical Pod Autoscaler (VPA)

Automatically adjust resource requests and limits.

3. Cluster Autoscaler

Automatically scale cluster nodes based on demand.

Cost Optimization

1. Right-sizing

Regularly review and adjust resource allocations:

  • Use VPA recommendations
  • Monitor actual vs. requested resources
  • Remove unused resources

2. Spot Instances

Use spot instances for non-critical workloads:

  • Batch jobs
  • Development environments
  • Stateless applications with proper handling

3. Resource Quotas

Implement quotas to prevent resource waste:

apiVersion: v1
kind: ResourceQuota
metadata:
  name: compute-quota
spec:
  hard:
    requests.cpu: "100"
    requests.memory: 200Gi
    limits.cpu: "200"
    limits.memory: 400Gi

Disaster Recovery

1. Multi-Region Setup

Deploy across multiple regions for high availability.

2. Backup and Restore Procedures

Test your backup and restore procedures regularly:

  • Document the process
  • Automate where possible
  • Practice disaster recovery scenarios

3. Data Replication

Implement appropriate data replication strategies for your use case.

Conclusion

Running Kubernetes in production successfully requires attention to many details. Start with these fundamentals and gradually implement more advanced practices as your team's expertise grows.

Remember: Start simple, monitor everything, and iterate based on real-world usage patterns.

The key to success is not implementing every best practice at once, but rather building a solid foundation and continuously improving based on your specific needs and constraints.