You cannot directly "change" the node an already running pod resides on. Instead, you control which node a pod is scheduled on when it is created or recreated. To effectively "move" a pod, you typically delete the existing pod (or the controller managing it, like a Deployment) and allow Kubernetes to reschedule a new instance based on updated node placement rules.
Kubernetes' scheduler automatically assigns pods to suitable nodes based on available resources and other constraints. However, you can influence this decision using various mechanisms.
Understanding Pod Scheduling in Kubernetes
When you create a pod, the Kubernetes scheduler identifies the best node to run it on. This decision considers several factors:
- Resource requirements: CPU, memory, GPU, etc.
- Node capacity: Available resources on each node.
- Node selectors: Directives to run on specific nodes.
- Node affinity/anti-affinity: More flexible rules for preferred or required nodes.
- Taints and tolerations: To prevent pods from scheduling on certain nodes unless explicitly allowed.
Methods to Control Pod Placement (Initial Scheduling)
To specify where a pod should run when it's first created or recreated, you use scheduling constructs within the pod's YAML definition.
1. Node Selectors
Node selectors are the simplest way to constrain a pod to run on nodes with specific labels. This is a common and straightforward method for basic node placement.
How it works:
- You apply a label to one or more nodes (e.g.,
disktype=ssd
,env=production
). - You add a
nodeSelector
field to your pod's specification, matching the node label.
Example:
Imagine you want a pod to run only on nodes designated for "high-performance" workloads.
Step 1: Label the Node
First, label the target node(s). This helps ensure that future pods with matching selectors are directed there.
kubectl label nodes <your-node-name> custom-role=high-perf
Step 2: Add Node Selector to Pod Definition
Then, update your pod's YAML to include the nodeSelector
:
apiVersion: v1
kind: Pod
metadata:
name: my-high-perf-app
spec:
containers:
- name: app-container
image: nginx:latest
nodeSelector:
custom-role: high-perf
When this pod is created, the scheduler will only place it on nodes that have the custom-role=high-perf
label. You cannot add a nodeSelector
directly to an existing, already scheduled pod. To apply this change to a running pod, you would need to recreate it.
2. Node Affinity
Node affinity is a more expressive and flexible way to constrain pods to nodes, offering "soft" (preferred) and "hard" (required) rules, as well as logical operators (e.g., In
, NotIn
, Exists
).
Types of Node Affinity:
requiredDuringSchedulingIgnoredDuringExecution
: Pod must meet the rules to be scheduled. If the node labels change after the pod is scheduled, the pod will continue to run.preferredDuringSchedulingIgnoredDuringExecution
: The scheduler tries to meet the rules, but won't fail if it can't.
Example (Required Affinity):
apiVersion: v1
kind: Pod
metadata:
name: my-app-with-affinity
spec:
containers:
- name: app-container
image: my-image:latest
affinity:
nodeAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
nodeSelectorTerms:
- matchExpressions:
- key: kubernetes.io/hostname
operator: NotIn
values:
- node-1
- node-2
- key: disktype
operator: In
values:
- ssd
This pod will only be scheduled on nodes that are not node-1
or node-2
, AND have the disktype=ssd
label.
3. Taints and Tolerations
Taints are applied to nodes, preventing pods from being scheduled on them unless the pod explicitly "tolerates" that taint. This is useful for dedicating nodes to specific purposes or for maintenance.
How it works:
- Taint: A node is "tainted" with a key-value pair and an effect (e.g.,
NoSchedule
,PreferNoSchedule
,NoExecute
). - Toleration: A pod specifies a "toleration" for a matching taint, allowing it to be scheduled on the tainted node.
Example (Taint and Toleration):
Step 1: Taint a Node
kubectl taint nodes <your-node-name> dedicated=gpu:NoSchedule
This prevents any pod without a matching toleration from being scheduled on this node.
Step 2: Add Toleration to Pod Definition
apiVersion: v1
kind: Pod
metadata:
name: my-gpu-app
spec:
containers:
- name: gpu-container
image: my-gpu-image:latest
tolerations:
- key: "dedicated"
operator: "Equal"
value: "gpu"
effect: "NoSchedule"
This pod can now be scheduled on the tainted node.
"Moving" an Existing Pod to a Different Node
As established, you cannot directly move a running pod to another node or add a nodeSelector
to an already scheduled pod. To achieve the effect of "moving" a pod, you must trigger a rescheduling event.
1. Recreate the Pod (or its Controller)
This is the most common approach:
- For standalone Pods: Delete the pod using
kubectl delete pod <pod-name>
. Then, recreate it usingkubectl apply -f <pod-definition.yaml>
with the updated scheduling rules (node selector, affinity, tolerations). - For Pods managed by a Deployment, StatefulSet, or DaemonSet: Update the controller's definition with the new scheduling rules. The controller will then terminate the old pods and create new ones that adhere to the updated rules, scheduling them on appropriate nodes.
- Example for a Deployment: Modify the
spec.template.spec
of your Deployment YAML to include the desirednodeSelector
oraffinity
. Then, apply the updated Deployment:kubectl apply -f <deployment.yaml>
. This will trigger a rolling update.
- Example for a Deployment: Modify the
2. Node Draining
For maintenance or decommissioning a node, you can use kubectl drain
. This command gracefully evicts all pods from a node, marking it as unschedulable. The evicted pods (if managed by a controller) will then be recreated by their respective controllers on other available nodes in the cluster.
kubectl drain <node-name> --ignore-daemonsets --delete-emptydir-data
This effectively "moves" the pods off the specified node.
Summary of Node Placement Options
Feature | Description | Use Case |
---|---|---|
Node Selectors | Simplest way to constrain a pod to nodes with specific labels. | Basic, direct placement on nodes with matching labels (e.g., disktype=ssd ). |
Node Affinity | More expressive, flexible rules (required or preferred) using logical operators. | Advanced placement based on node attributes, e.g., "prefer nodes with high CPU, but require SSD storage." |
Taints & Tolerations | Prevents pods from scheduling on a node unless they explicitly "tolerate" the node's taint. | Dedicating nodes (e.g., GPU nodes), isolating noisy neighbors, or for maintenance. |
Resource Requests | Pod requests specific CPU/memory; scheduler places it on a node with sufficient available resources. | Indirectly influences placement by ensuring resource availability. |
To summarize, while you can't "move" a running pod, Kubernetes provides powerful mechanisms through node selectors, node affinity, and taints/tolerations to precisely control where your pods are initially scheduled or rescheduled.