What is the Topology Key?

The topology key is the specific key used within node labels that defines how nodes are grouped into logical topology domains within a cluster. Nodes possessing a label with this key and identical values are considered part of the same topology.

Understanding Node Topologies

In a distributed system, especially within container orchestration platforms like Kubernetes, nodes are often categorized based on their physical or logical location. The topology key acts as the identifier for these categories. Each unique combination of the topology key and its corresponding value represents a distinct "domain" or logical grouping. For instance, if the topology key is topology.kubernetes.io/zone, then all nodes labeled topology.kubernetes.io/zone: us-east-1a belong to one domain, and nodes labeled topology.kubernetes.io/zone: us-east-1b belong to another.

How Topology Keys Work

The primary function of topology keys is to enable the scheduler to make intelligent decisions about workload placement. By understanding these domains, the scheduler endeavors to distribute pods in a balanced manner across them. This mechanism is crucial for achieving high availability, fault tolerance, and efficient resource utilization.

Consider the following breakdown:

Node Labels: Each node in a cluster has labels attached to it, which are key-value pairs used to identify node attributes.
Topology Key: This is a specific label key (e.g., kubernetes.io/hostname, topology.kubernetes.io/zone) designated to represent a topological dimension.
Topology Domain: An instance of a topology, defined by a <key, value> pair. All nodes sharing the same topology key and value belong to the same domain.

Example of Nodes and Topology Domains

Node Name	Node Labels (Relevant)	Topology Key	Topology Value	Topology Domain
`node-1`	`topology.kubernetes.io/zone: zone-a`	`topology.kubernetes.io/zone`	`zone-a`	`topology.kubernetes.io/zone=zone-a`
`node-2`	`topology.kubernetes.io/zone: zone-a`	`topology.kubernetes.io/zone`	`zone-a`	`topology.kubernetes.io/zone=zone-a`
`node-3`	`topology.kubernetes.io/zone: zone-b`	`topology.kubernetes.io/zone`	`zone-b`	`topology.kubernetes.io/zone=zone-b`

Role in Workload Scheduling and Balance

The scheduler leverages topology keys to ensure workloads are distributed optimally. Its goal is to place a balanced number of pods into each defined topology domain. This prevents a single point of failure and improves the resilience of applications. For example, if a cluster spans multiple availability zones, the scheduler can use the zone topology key to ensure that replicas of an application are spread across different zones. If one zone experiences an outage, the application can continue running in other zones.

Common Examples of Topology Keys

Several standard topology keys are widely adopted, particularly in Kubernetes environments:

kubernetes.io/hostname: This key's value is the unique hostname of the node, effectively making each node its own topology domain. Useful for anti-affinity rules to keep pods off the same physical machine.
topology.kubernetes.io/zone (or failure-domain.beta.kubernetes.io/zone for older versions): Identifies the geographical or logical zone (e.g., an availability zone in a cloud provider) a node belongs to.
topology.kubernetes.io/region (or failure-domain.beta.kubernetes.io/region): Identifies the broader geographical region for a node.
kubernetes.io/os: Indicates the operating system running on the node (e.g., linux, windows).
kubernetes.io/arch: Specifies the architecture of the node's CPU (e.g., amd64, arm64).

Practical Applications and Benefits

The intelligent use of topology keys provides significant advantages for managing distributed applications:

Ensuring High Availability

By spreading pods across different topology domains (e.g., zones or regions), applications become resilient to localized failures. If one domain goes offline, the application remains operational due to replicas running in other domains. This is a cornerstone of building robust, fault-tolerant systems.

Optimizing Resource Utilization and Performance

Placing pods in specific domains can optimize network latency or leverage specialized hardware. For instance, data-intensive workloads might be placed closer to their data sources, or GPU-dependent tasks on nodes within a specific GPU-equipped domain. This also helps in avoiding resource hotspots.

Facilitating Cost Efficiency

In cloud environments, distributing workloads across different zones can sometimes be more cost-effective, especially when leveraging spot instances or specific pricing models. It also helps in capacity planning by understanding load distribution per domain.

Enabling Advanced Scheduling Policies

Topology keys are fundamental to advanced scheduling features like Pod Anti-Affinity, which allows you to prevent certain pods from being scheduled on nodes within the same topology domain as other specified pods. This ensures separation for high availability or performance reasons.

Configuration and Usage

Topology keys are typically defined by the infrastructure provider (for cloud-managed clusters) or by cluster administrators through node labeling. They are then referenced in pod specifications, particularly within affinity and topologySpreadConstraints sections, to guide the scheduler on where to place pods relative to each other or across topology domains.