Kubernetes v1.36 Enhances Route Syncing with New Metric and Watch-Based Reconciliation

Kubernetes v1.36 introduces a new alpha metric to help operators monitor and validate a more efficient route synchronization method. This article answers key questions about the metric, the associated feature gate, and how to test it. For a quick overview, you can jump to Question 1 or Question 3 for details on the feature gate.

What is the new metric introduced in Kubernetes v1.36 for route syncing?

Kubernetes v1.36 adds a new alpha counter metric called route_controller_route_sync_total to the Cloud Controller Manager (CCM) route controller implementation. This metric is located in the k8s.io/cloud-provider package. Each time the route controller syncs routes with the cloud provider, the metric increments by one. This provides a straightforward way to track how often route reconciliation occurs, which is especially useful when comparing different reconciliation strategies.

Kubernetes v1.36 Enhances Route Syncing with New Metric and Watch-Based Reconciliation

Why was the route_controller_route_sync_total metric added?

The metric was introduced to help operators validate the CloudControllerManagerWatchBasedRoutesReconciliation feature gate, which became available in Kubernetes v1.35. By comparing the sync count with the feature gate enabled versus disabled, operators can assess the impact of switching from a fixed-interval loop to a watch-based approach. This is critical for understanding potential reductions in API calls to the cloud provider, which can help manage rate limits and quota usage more efficiently. The metric acts as a simple yet powerful tool for A/B testing this optimization.

What does the CloudControllerManagerWatchBasedRoutesReconciliation feature gate do?

This feature gate, introduced in Kubernetes v1.35, changes how the route controller reconciles routes. By default, the controller runs a fixed-interval loop that syncs routes at regular intervals, regardless of whether any nodes have changed. When the feature gate is enabled, the controller switches to a watch-based approach: it only reconciles when it detects that nodes have been added, removed, or updated. This reduces unnecessary API calls to the infrastructure provider, lowering pressure on rate-limited endpoints and allowing operators to use their available quota more wisely.

How can operators A/B test the new route reconciliation approach?

To perform an A/B test, operators should compare the route_controller_route_sync_total metric under two conditions: with the feature gate disabled (the default loop) and with it enabled. In clusters where node changes are infrequent, the watch-based approach should show a significant drop in the sync rate. You can view the metric using tools like Prometheus or kubectl. For a proper comparison, run the test in a staging environment first, or use a single cluster at different times. The metric provides a clear numerical baseline to evaluate efficiency gains.

What is the expected behavior of the metric with the feature gate disabled vs enabled?

With the feature gate disabled (default fixed-interval loop), the counter increments steadily regardless of node changes. For example, after 10 minutes with no changes, you might see route_controller_route_sync_total 60; after 20 minutes, 120. With the feature gate enabled (watch-based), the counter increments only when nodes actually change. After 10 minutes with no changes, it stays at 1 (the initial sync), and after 20 minutes still 1. When a new node joins, it increments to 2. The difference is especially visible in stable clusters where nodes rarely change, making the watch-based approach far more efficient.

Where can users give feedback or learn more about this feature?

If you have feedback or questions, you can reach the Kubernetes community through several channels. Join the #sig-cloud-provider channel on Kubernetes Slack for real-time discussion. You can also visit the KEP-5237 issue on GitHub to provide input or track progress. For broader community interaction, check the SIG Cloud Provider community page for links to mailing lists, meetings, and other resources. For detailed technical information, refer directly to KEP-5237, which contains the full design and implementation details.

Tags:

Recommended

Discover More

10 Essential Tips for Browser-Based Vue Component Testing Without NodeA Step-by-Step Guide to Neoadjuvant Immunotherapy for Colorectal Cancer: The Pembrolizumab BreakthroughNavigating Airline Service Changes: A Guide to Understanding Delta’s New In-Flight PoliciesMigrating to Fedora Asahi Remix 44 on Apple Silicon Macs: A Complete Step-by-Step GuideDeep Dive: Live updates from Elon Musk and Sam Altman’s court battle over t...