Kubernetes Pod Priority Explained

Introduction

In Kubernetes, Pod Priority and Preemption is a powerful scheduling feature that ensures critical workloads are placed and maintained on your cluster, even when resources are scarce. With this mechanism, Kubernetes can automatically preempt (evict) lower-priority pods to make room for higher-priority ones, helping orchestrate resource-efficient and reliable workload execution. Introduced as generally available in Kubernetes v1.14, this feature has become a staple for cluster operations.

1. What Is Pod Priority?

Pod Priority is an integer value assigned to a Pod, representing its importance relative to others. Higher values indicate higher importance in scheduling decisions.

Pods without an explicit priority use a default value of 0.
Priorities are defined through PriorityClass objects, which are non-namespaced resources that map a name to an integer priority.

2. Defining Priority: `PriorityClass`

A PriorityClass defines both the name and numerical value of a priority:

apiVersion: scheduling.k8s.io/v1
kind: PriorityClass
metadata:
  name: high-priority-apps
value: 1000000
globalDefault: false
description: "Pods critical to business logic."

value: Higher numbers mean higher priority.
globalDefault: If true, this is the default for pods without a specified priorityClassName—but only for pods created after the class exists.

Kubernetes ships with two default system-critical classes:

system-node-critical (≈ 2,000,001,000)
system-cluster-critical (≈ 2,000,000,000)

3. Scheduling: How Priority Influences Order

Once Pod Priority is in place, the scheduler sorts pending pods by priority. High-priority pods are attempted first. If scheduling a high-priority pod fails due to resource constraints, the scheduler may then preempt lower-priority pods to make room.

4. Preemption: Making Space for What Matters

When a pending pod cannot be scheduled:

The scheduler looks for nodes where evicting one or more lower-priority pods would free enough capacity.
It evicts the minimal necessary set of pods to schedule the higher-priority pod.
When a pod is evicted (whether due to preemption or node pressure eviction):
1. The pod is terminated on the node where it is running.
2. The pod is deleted from the current node.
3. If the pod belongs to a controller (e.g., Deployment, StatefulSet, ReplicaSet, Job, etc.), that controller will notice the missing replica and create a new pod.
4. The scheduler will then place this new pod on another suitable node.

So effectively, Standalone Pod (not managed by a controller): Once evicted, it is gone permanently.

Pod managed by a controller: It gets recreated, usually on another node, assuming resources are available.

Scheduling metadata:

The pending pod’s status.nominatedNodeName indicates which node is targeted for preemption. However, the pod may ultimately be scheduled elsewhere if conditions change.

Important Constraints:

Victim pods terminate using their graceful termination period (default ~30 seconds), which delays when space becomes available.
PodDisruptionBudget (PDB) is respected on a best-effort basis but can be violated if no alternate victim set exists.
Inter-pod affinity: If the pending pod requires co-location with lower-priority pods, preemption won't occur on that node.
Cross-node preemption is not supported: The scheduler doesn’t preempt pods on other nodes to alleviate anti-affinity constraints.

5. Non-Preempting Priority Classes

Introduced in Kubernetes v1.24, you can define a PriorityClass with:

preemptionPolicy: Never

This means pods with this class will:

Queue ahead of lower-priority pods.
Not preempt other pods.
Be preempted by even higher-priority pods.

This is useful, for example, in ML or data science workflows where you want to ensure high scheduling priority without disrupting running services.

6. Interplay with QoS and Eviction

While Pod QoS classes (Guaranteed, Burstable, BestEffort) affect eviction precedence during node-pressure scenarios, they don’t influence scheduling preemption. The scheduler focuses solely on priority values—QoS only comes into play during evictions and not scheduling.

At node pressure, pods are ranked for eviction by:

Exceeding resource requests
Priority
Resource usage relative to requests

7. Why Use Pod Priority and Preemption?

Reliability: Ensures critical workloads are scheduled promptly without over-provisioning clusters.
Resource utilization: Hosts both mission-critical and lower-priority workloads together, evicting non-essential pods under pressure.
Operational flexibility: You can finely control priority and preemption behavior using Policy, Preemption settings, and PDB nuances.

8. Sample YAML Snippet

apiVersion: scheduling.k8s.io/v1
kind: PriorityClass
metadata:
  name: high-priority-apps
value: 1000000
globalDefault: false
description: "Priority for critical services."
---
apiVersion: v1
kind: Pod
metadata:
  name: nginx
spec:
  containers:
  - name: nginx
    image: nginx
  priorityClassName: high-priority-apps

To create a non-preempting class via kubectl:

kubectl create priorityclass high-priority --value=1000 \
  --description="High priority but non-preempting" \
  --preemption-policy="Never"

9. Best Practices & Troubleshooting

Scenario	Guidance
Unintended preemptions	Ensure priority levels are correctly assigned; empty `priorityClassName` defaults to `0`.
Pending pods not scheduling after preemption	Another higher-priority pod may have taken precedence. This is expected.
Higher-priority pods evicted first	The scheduler may choose nodes where victims have the lowest priority or where PDB isn't violated.
Affinity issues	Avoid inter-pod affinity that ties a high-priority pod to a lower-priority pod, as it can block preemption.
Termination latency in scheduling gap	Reduce or set `terminationGracePeriodSeconds` to a small value on lower-priority pods.

Understanding Pod Priority and Preemption in Kubernetes: A Detailed Guide

Introduction

1. What Is Pod Priority?

2. Defining Priority: `PriorityClass`

3. Scheduling: How Priority Influences Order

4. Preemption: Making Space for What Matters

Scheduling metadata:

Important Constraints:

5. Non-Preempting Priority Classes

6. Interplay with QoS and Eviction

7. Why Use Pod Priority and Preemption?

8. Sample YAML Snippet

9. Best Practices & Troubleshooting

Comments

Kubernetes & Docker

Deploying a Python Web App with on Kubernetes and Persistent NFS Storage

More from this blog

Reclaiming Flow: A Guide to Sustainable Productivity and Digital Sanity

How to Self-Host n8n for Free Forever on Oracle Cloud

How to Deploy Your First Proxmox Virtual Machine Using Terraform

How to Set Up Proxmox with Terraform: A Step-by-Step Guide

Command Palette

Introduction

1. What Is Pod Priority?

2. Defining Priority: PriorityClass

3. Scheduling: How Priority Influences Order

4. Preemption: Making Space for What Matters

Scheduling metadata:

Important Constraints:

5. Non-Preempting Priority Classes

6. Interplay with QoS and Eviction

7. Why Use Pod Priority and Preemption?

8. Sample YAML Snippet

9. Best Practices & Troubleshooting

Comments

Kubernetes & Docker

Deploying a Python Web App with on Kubernetes and Persistent NFS Storage

More from this blog

2. Defining Priority: `PriorityClass`