How do I stop a Kubernetes cluster?

To stop a Kubernetes cluster, you'll generally perform a graceful shutdown of its underlying nodes and then manage any external dependencies. This process ensures data integrity and a smooth restart when needed.

Gracefully Shutting Down Your Kubernetes Cluster

Stopping a Kubernetes cluster involves halting the compute nodes that run your applications and the control plane, followed by addressing any external services. This is distinct from deleting a cluster, which permanently removes all its resources.

1. Identify and Shut Down Cluster Nodes

The core of your Kubernetes cluster consists of worker nodes and control plane nodes. Shutting these down is the primary step.

Get the Node List: First, you need to identify all the nodes participating in your cluster. You can do this using the kubectl command-line tool.
```
nodes=$(kubectl get nodes -o name)
echo "Identified nodes: $nodes"
```
This command fetches the names of all nodes in the format node/<node-name> and stores them in the nodes variable.
Execute Graceful Node Shutdown: Once you have the list, you can iterate through them and send a shutdown command to each node via SSH. This method is common for self-managed or on-premises clusters where you have direct access to the nodes.
```
for node in ${nodes[@]}; do
  echo "==== Shutting down $node ===="
  # Extract just the node name from the 'node/node-name' format
  node_name=$(echo $node | cut -d'/' -f2)
  ssh "$node_name" sudo shutdown -h 1
done
```
Explanation:
- ssh "$node_name" ...: This initiates an SSH connection to each node. Ensure your SSH key is properly configured for passwordless access or you'll be prompted for a password for each node.
- sudo shutdown -h 1: This command instructs the node's operating system to halt (shutdown) in 1 minute. The -h flag means to halt, and 1 specifies the delay in minutes. This allows processes to gracefully terminate.
Alternative (Drain and Cordon): For a more controlled, application-aware shutdown, especially if you want to perform maintenance on a single node without affecting others immediately, you might first cordon and drain the node:
- kubectl cordon <node-name>: Marks the node as unschedulable, preventing new pods from being placed on it.
- kubectl drain <node-name> --ignore-daemonsets --delete-emptydir-data: Evicts all pods from the node.
  After draining, you can then proceed with the ssh sudo shutdown -h 1 command for that specific node.

2. Address External Dependencies

After the Kubernetes nodes are shut down, it's critical to consider any services that your cluster relies on but are not running directly on the cluster nodes.

External Storage: If your cluster uses external storage solutions (e.g., network file systems, cloud block storage, shared databases), you must manage these separately. This might involve:
- Unmounting volumes.
- Backing up data.
- Shutting down the storage appliances or services if they are dedicated to this cluster.
- Example: If you use a cloud-managed database like AWS RDS or Google Cloud SQL, you might want to stop or snapshot it.
Load Balancers: External load balancers (e.g., cloud provider load balancers, bare-metal load balancers) that route traffic to your Kubernetes services will continue to exist. You may need to:
- Deregister node IPs from the load balancer.
- Delete the load balancer if it's no longer needed.
External Databases or Message Queues: Any services your applications consume that run outside the cluster (e.g., a shared Kafka cluster, external Redis instances) should be managed according to their own operational procedures.
Networking Infrastructure: Depending on your setup, you might have specific network configurations, VPNs, or firewalls that need adjustment or shutdown if they are solely for this cluster.

3. Consider Managed Kubernetes Services

If you are using a managed Kubernetes service (like Amazon EKS, Google GKE, Azure AKS, or DigitalOcean Kubernetes), the process for "stopping" a cluster is often different and more integrated with the cloud provider's console or CLI.

Scaling Down: Many managed services allow you to scale down the node pools to zero nodes, effectively stopping the computational cost, while the control plane might still incur a small fee. This is often the closest equivalent to "stopping" in a managed environment.
Cluster Deletion: For a full shutdown and cost cessation, managed Kubernetes clusters are typically deleted rather than "stopped" in the traditional sense, as the control plane itself is managed by the provider. Always check your cloud provider's documentation for exact steps.

Important Considerations

Data Backup: Always perform a comprehensive backup of critical data, especially persistent volumes and Kubernetes configuration, before initiating a full cluster shutdown.
Stateful Applications: Be aware of how stateful applications (e.g., databases, message queues running as Pods) handle sudden or graceful shutdowns. Ensure their data is preserved.
Reversibility: Understand the steps to restart your cluster. For node-level shutdowns, simply powering the nodes back on will typically bring the cluster back up, though control plane components might take some time to reconcile.

Comparison: Stopping vs. Deleting a Cluster

Feature	Stopping a Cluster	Deleting a Cluster
Purpose	Temporary halt (maintenance, cost saving)	Permanent removal
Resources	Nodes powered off, but resources (IPs, volumes) often reserved	All resources (nodes, volumes, control plane) removed
Data Retention	Data on persistent volumes should be retained	All data typically lost unless explicitly backed up
Cost Impact	Reduces compute costs, but some resources might still incur fees	Eliminates all cluster-related costs
Restart	Nodes can be powered back on, cluster eventually recovers	Requires creating a brand new cluster from scratch

By following these steps, you can effectively stop your Kubernetes cluster in a controlled manner, whether for maintenance or to manage operational costs.