Change Deployment UpdateStrategy #384

kattz-kawa · 2025-11-18T14:47:38Z

Why we need this PR

After new placement rule introduced in Use strict pod placement for controller-manager #383, the manager pods can only be scheduled on two specific nodes. Under these conditions, the default strategy may cause rollout deadlocks during node recovery.
Certificate rotation managed by OLM can trigger Deployment rollouts by changing the olmcahash annotation. With the current strategy, this may lead to deadlocks if a node is down during recovery, causing rollout failures. To improve reliability and ensure safe updates under these conditions, we need a more controlled approach.

Changes made

Added RollingUpdate strategy with maxSurge: 0 and maxUnavailable: 1 for safer deployment updates.
Before (default): When a rollout starts, the Deployment tries to keep the old pod until the new one is ready. On clusters with only 2 nodes, this can fail because there is no room to schedule the new pod, leading to a stuck rollout.
After: Pods are updated one at a time. The controller deletes one pod first, then creates a new one, ensuring that rollout can proceed safely even when only 2 nodes are available.

Which issue(s) this PR fixes

RHWA-366

Summary by CodeRabbit

Chores

Updated deployment strategies for the node-healthcheck-operator and controller-manager to use controlled RollingUpdate parameters, ensuring no pod surge and sequential pod replacement during rollouts for improved reliability.

…vailable=1) to mitigate rollout deadlocks during automatic certificate rotation by OLM Signed-off-by: Katsuya Kawakami <kkawakami-je@nec.com>

openshift-ci · 2025-11-18T14:47:50Z

Hi @kattz-kawa. Thanks for your PR.

I'm waiting for a github.com member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

coderabbitai · 2025-11-18T14:47:54Z

📝 Walkthrough

Walkthrough

Two deployment configuration files were updated to define explicit RollingUpdate strategies with conservative settings: maxSurge: 0 and maxUnavailable: 1. This ensures controlled pod rollouts by replacing one pod at a time without over-provisioning during updates.

Changes

Cohort / File(s)	Summary
Deployment strategy configuration `bundle/manifests/node-healthcheck-operator.clusterserviceversion.yaml`, `config/manager/manager.yaml`	Added RollingUpdate deployment strategy with maxSurge: 0 and maxUnavailable: 1 to both operator and controller-manager deployments for controlled rollout behavior

Estimated code review effort

🎯 1 (Trivial) | ⏱️ ~3 minutes

Identical, homogeneous configuration changes applied consistently across both files
No logic modifications or complex structural changes
Straightforward YAML configuration additions

Suggested labels

do-not-merge/hold, needs-ok-to-test

Suggested reviewers

clobrano
slintes

Pre-merge checks and finishing touches

✅ Passed checks (3 passed)

Check name	Status	Explanation
Docstring Coverage	✅ Passed	No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title 'Change Deployment UpdateStrategy' accurately reflects the main change in the PR, which modifies the Deployment update strategy configuration in two files.

✨ Finishing touches

📝 Generate docstrings

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 0

🧹 Nitpick comments (1)

bundle/manifests/node-healthcheck-operator.clusterserviceversion.yaml (1)
428-432: Strategy configuration is correct; align field order with config/manager/manager.yaml.

The RollingUpdate strategy settings match those in config/manager/manager.yaml and are correct for the stated purpose. However, the field order differs: this file places rollingUpdate before type, whereas the peer Deployment places type first. While both orderings are functionally equivalent in Kubernetes YAML, standardizing the order improves consistency and maintainability.

Suggested reorder for consistency:
           strategy:
-            rollingUpdate:
-              maxSurge: 0
-              maxUnavailable: 1
             type: RollingUpdate
+            rollingUpdate:
+              maxSurge: 0
+              maxUnavailable: 1
This aligns the field order with config/manager/manager.yaml (type → rollingUpdate).

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 09db172 and 008ad55.

📒 Files selected for processing (2)

bundle/manifests/node-healthcheck-operator.clusterserviceversion.yaml (1 hunks)
config/manager/manager.yaml (1 hunks)

🔇 Additional comments (1)

config/manager/manager.yaml (1)

20-24: RollingUpdate strategy appropriate for two-node deployment constraint.

The configuration correctly implements one-at-a-time pod replacement (maxSurge: 0, maxUnavailable: 1) on a 2-replica Deployment, avoiding the scheduling deadlock described in the PR objectives. With this strategy, the controller will delete one pod before scheduling its replacement, ensuring forward progress on capacity-constrained clusters during certificate rotation–induced rollouts.

openshift-ci · 2025-12-02T11:48:08Z

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: kattz-kawa, slintes

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

~~OWNERS~~ [slintes]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

slintes · 2025-12-02T11:48:57Z

/ok-to-test

slintes · 2025-12-02T12:25:17Z

/retest

slintes · 2025-12-02T14:32:58Z

/retest

slintes · 2025-12-02T14:39:24Z

only 4.19 wasn't green yet

/override ci/prow/4.20-openshift-e2e ci/prow/4.18-openshift-e2e ci/prow/4.17-openshift-e2e ci/prow/4.16-openshift-e2e

openshift-ci · 2025-12-02T14:39:33Z

@slintes: Overrode contexts on behalf of slintes: ci/prow/4.16-openshift-e2e, ci/prow/4.17-openshift-e2e, ci/prow/4.18-openshift-e2e, ci/prow/4.20-openshift-e2e

In response to this:

only 4.19 wasn't green yet

/override ci/prow/4.20-openshift-e2e ci/prow/4.18-openshift-e2e ci/prow/4.17-openshift-e2e ci/prow/4.16-openshift-e2e

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

slintes · 2025-12-03T08:01:46Z

/retest

openshift-ci · 2025-12-03T09:36:32Z

@kattz-kawa: The following test failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name	Commit	Details	Required	Rerun command
ci/prow/4.19-openshift-e2e	`008ad55`	link	true	`/test 4.19-openshift-e2e`

Full PR test history. Your PR dashboard.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

slintes · 2025-12-03T10:35:54Z

probably not related to this PR, but we need to investigate test failures...

Change Deployment updateStrategy to RollingUpdate (maxSurge=0, maxUna…

008ad55

…vailable=1) to mitigate rollout deadlocks during automatic certificate rotation by OLM Signed-off-by: Katsuya Kawakami <kkawakami-je@nec.com>

openshift-ci bot added do-not-merge/work-in-progress needs-ok-to-test labels Nov 18, 2025

coderabbitai bot reviewed Nov 18, 2025

View reviewed changes

slintes approved these changes Dec 2, 2025

View reviewed changes

openshift-ci bot assigned slintes Dec 2, 2025

openshift-ci bot added the lgtm label Dec 2, 2025

openshift-ci bot added the approved label Dec 2, 2025

openshift-ci bot added ok-to-test and removed needs-ok-to-test labels Dec 2, 2025

slintes marked this pull request as ready for review December 2, 2025 14:34

openshift-ci bot requested review from beekhof and clobrano December 2, 2025 14:34

slintes changed the title ~~[WIP] Change Deployment UpdateStrategy~~ Change Deployment UpdateStrategy Dec 2, 2025

openshift-ci bot removed the do-not-merge/work-in-progress label Dec 2, 2025

Change Deployment UpdateStrategy #384

Are you sure you want to change the base?

Change Deployment UpdateStrategy #384

Uh oh!

Conversation

kattz-kawa commented Nov 18, 2025 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Why we need this PR

Changes made

Which issue(s) this PR fixes

Summary by CodeRabbit

Chores

Uh oh!

openshift-ci bot commented Nov 18, 2025

Uh oh!

coderabbitai bot commented Nov 18, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Estimated code review effort

Suggested labels

Suggested reviewers

Pre-merge checks and finishing touches

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

openshift-ci bot commented Dec 2, 2025

Uh oh!

slintes commented Dec 2, 2025

Uh oh!

slintes commented Dec 2, 2025

Uh oh!

slintes commented Dec 2, 2025

Uh oh!

slintes commented Dec 2, 2025

Uh oh!

openshift-ci bot commented Dec 2, 2025

Uh oh!

slintes commented Dec 3, 2025

Uh oh!

openshift-ci bot commented Dec 3, 2025

Uh oh!

slintes commented Dec 3, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

kattz-kawa commented Nov 18, 2025 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Nov 18, 2025 •

edited

Loading