Ora

How Do You Control Which Node a Pod Runs On or Move a Pod to a Different Node in Kubernetes?

Published in Kubernetes Pod Scheduling 6 mins read

You cannot directly "change" the node an already running pod resides on. Instead, you control which node a pod is scheduled on when it is created or recreated. To effectively "move" a pod, you typically delete the existing pod (or the controller managing it, like a Deployment) and allow Kubernetes to reschedule a new instance based on updated node placement rules.

Kubernetes' scheduler automatically assigns pods to suitable nodes based on available resources and other constraints. However, you can influence this decision using various mechanisms.

Understanding Pod Scheduling in Kubernetes

When you create a pod, the Kubernetes scheduler identifies the best node to run it on. This decision considers several factors:

  • Resource requirements: CPU, memory, GPU, etc.
  • Node capacity: Available resources on each node.
  • Node selectors: Directives to run on specific nodes.
  • Node affinity/anti-affinity: More flexible rules for preferred or required nodes.
  • Taints and tolerations: To prevent pods from scheduling on certain nodes unless explicitly allowed.

Methods to Control Pod Placement (Initial Scheduling)

To specify where a pod should run when it's first created or recreated, you use scheduling constructs within the pod's YAML definition.

1. Node Selectors

Node selectors are the simplest way to constrain a pod to run on nodes with specific labels. This is a common and straightforward method for basic node placement.

How it works:

  • You apply a label to one or more nodes (e.g., disktype=ssd, env=production).
  • You add a nodeSelector field to your pod's specification, matching the node label.

Example:
Imagine you want a pod to run only on nodes designated for "high-performance" workloads.

Step 1: Label the Node
First, label the target node(s). This helps ensure that future pods with matching selectors are directed there.

kubectl label nodes <your-node-name> custom-role=high-perf

Step 2: Add Node Selector to Pod Definition
Then, update your pod's YAML to include the nodeSelector:

apiVersion: v1
kind: Pod
metadata:
  name: my-high-perf-app
spec:
  containers:
  - name: app-container
    image: nginx:latest
  nodeSelector:
    custom-role: high-perf

When this pod is created, the scheduler will only place it on nodes that have the custom-role=high-perf label. You cannot add a nodeSelector directly to an existing, already scheduled pod. To apply this change to a running pod, you would need to recreate it.

2. Node Affinity

Node affinity is a more expressive and flexible way to constrain pods to nodes, offering "soft" (preferred) and "hard" (required) rules, as well as logical operators (e.g., In, NotIn, Exists).

Types of Node Affinity:

  • requiredDuringSchedulingIgnoredDuringExecution: Pod must meet the rules to be scheduled. If the node labels change after the pod is scheduled, the pod will continue to run.
  • preferredDuringSchedulingIgnoredDuringExecution: The scheduler tries to meet the rules, but won't fail if it can't.

Example (Required Affinity):

apiVersion: v1
kind: Pod
metadata:
  name: my-app-with-affinity
spec:
  containers:
  - name: app-container
    image: my-image:latest
  affinity:
    nodeAffinity:
      requiredDuringSchedulingIgnoredDuringExecution:
        nodeSelectorTerms:
        - matchExpressions:
          - key: kubernetes.io/hostname
            operator: NotIn
            values:
            - node-1
            - node-2
          - key: disktype
            operator: In
            values:
            - ssd

This pod will only be scheduled on nodes that are not node-1 or node-2, AND have the disktype=ssd label.

3. Taints and Tolerations

Taints are applied to nodes, preventing pods from being scheduled on them unless the pod explicitly "tolerates" that taint. This is useful for dedicating nodes to specific purposes or for maintenance.

How it works:

  • Taint: A node is "tainted" with a key-value pair and an effect (e.g., NoSchedule, PreferNoSchedule, NoExecute).
  • Toleration: A pod specifies a "toleration" for a matching taint, allowing it to be scheduled on the tainted node.

Example (Taint and Toleration):

Step 1: Taint a Node

kubectl taint nodes <your-node-name> dedicated=gpu:NoSchedule

This prevents any pod without a matching toleration from being scheduled on this node.

Step 2: Add Toleration to Pod Definition

apiVersion: v1
kind: Pod
metadata:
  name: my-gpu-app
spec:
  containers:
  - name: gpu-container
    image: my-gpu-image:latest
  tolerations:
  - key: "dedicated"
    operator: "Equal"
    value: "gpu"
    effect: "NoSchedule"

This pod can now be scheduled on the tainted node.

"Moving" an Existing Pod to a Different Node

As established, you cannot directly move a running pod to another node or add a nodeSelector to an already scheduled pod. To achieve the effect of "moving" a pod, you must trigger a rescheduling event.

1. Recreate the Pod (or its Controller)

This is the most common approach:

  • For standalone Pods: Delete the pod using kubectl delete pod <pod-name>. Then, recreate it using kubectl apply -f <pod-definition.yaml> with the updated scheduling rules (node selector, affinity, tolerations).
  • For Pods managed by a Deployment, StatefulSet, or DaemonSet: Update the controller's definition with the new scheduling rules. The controller will then terminate the old pods and create new ones that adhere to the updated rules, scheduling them on appropriate nodes.
    • Example for a Deployment: Modify the spec.template.spec of your Deployment YAML to include the desired nodeSelector or affinity. Then, apply the updated Deployment: kubectl apply -f <deployment.yaml>. This will trigger a rolling update.

2. Node Draining

For maintenance or decommissioning a node, you can use kubectl drain. This command gracefully evicts all pods from a node, marking it as unschedulable. The evicted pods (if managed by a controller) will then be recreated by their respective controllers on other available nodes in the cluster.

kubectl drain <node-name> --ignore-daemonsets --delete-emptydir-data

This effectively "moves" the pods off the specified node.

Summary of Node Placement Options

Feature Description Use Case
Node Selectors Simplest way to constrain a pod to nodes with specific labels. Basic, direct placement on nodes with matching labels (e.g., disktype=ssd).
Node Affinity More expressive, flexible rules (required or preferred) using logical operators. Advanced placement based on node attributes, e.g., "prefer nodes with high CPU, but require SSD storage."
Taints & Tolerations Prevents pods from scheduling on a node unless they explicitly "tolerate" the node's taint. Dedicating nodes (e.g., GPU nodes), isolating noisy neighbors, or for maintenance.
Resource Requests Pod requests specific CPU/memory; scheduler places it on a node with sufficient available resources. Indirectly influences placement by ensuring resource availability.

To summarize, while you can't "move" a running pod, Kubernetes provides powerful mechanisms through node selectors, node affinity, and taints/tolerations to precisely control where your pods are initially scheduled or rescheduled.