-
Notifications
You must be signed in to change notification settings - Fork 28
Change Deployment UpdateStrategy #384
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Change Deployment UpdateStrategy #384
Conversation
…vailable=1) to mitigate rollout deadlocks during automatic certificate rotation by OLM Signed-off-by: Katsuya Kawakami <kkawakami-je@nec.com>
|
Hi @kattz-kawa. Thanks for your PR. I'm waiting for a github.com member to verify that this patch is reasonable to test. If it is, they should reply with Once the patch is verified, the new status will be reflected by the I understand the commands that are listed here. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
📝 WalkthroughWalkthroughTwo deployment configuration files were updated to define explicit RollingUpdate strategies with conservative settings: maxSurge: 0 and maxUnavailable: 1. This ensures controlled pod rollouts by replacing one pod at a time without over-provisioning during updates. Changes
Estimated code review effort🎯 1 (Trivial) | ⏱️ ~3 minutes
Suggested labels
Suggested reviewers
Pre-merge checks and finishing touches✅ Passed checks (3 passed)
✨ Finishing touches
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Nitpick comments (1)
bundle/manifests/node-healthcheck-operator.clusterserviceversion.yaml (1)
428-432: Strategy configuration is correct; align field order with config/manager/manager.yaml.The RollingUpdate strategy settings match those in config/manager/manager.yaml and are correct for the stated purpose. However, the field order differs: this file places
rollingUpdatebeforetype, whereas the peer Deployment placestypefirst. While both orderings are functionally equivalent in Kubernetes YAML, standardizing the order improves consistency and maintainability.Suggested reorder for consistency:
strategy: - rollingUpdate: - maxSurge: 0 - maxUnavailable: 1 type: RollingUpdate + rollingUpdate: + maxSurge: 0 + maxUnavailable: 1This aligns the field order with config/manager/manager.yaml (type → rollingUpdate).
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (2)
bundle/manifests/node-healthcheck-operator.clusterserviceversion.yaml(1 hunks)config/manager/manager.yaml(1 hunks)
🔇 Additional comments (1)
config/manager/manager.yaml (1)
20-24: RollingUpdate strategy appropriate for two-node deployment constraint.The configuration correctly implements one-at-a-time pod replacement (maxSurge: 0, maxUnavailable: 1) on a 2-replica Deployment, avoiding the scheduling deadlock described in the PR objectives. With this strategy, the controller will delete one pod before scheduling its replacement, ensuring forward progress on capacity-constrained clusters during certificate rotation–induced rollouts.
|
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: kattz-kawa, slintes The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
|
/ok-to-test |
|
/retest |
1 similar comment
|
/retest |
|
only 4.19 wasn't green yet /override ci/prow/4.20-openshift-e2e ci/prow/4.18-openshift-e2e ci/prow/4.17-openshift-e2e ci/prow/4.16-openshift-e2e |
|
@slintes: Overrode contexts on behalf of slintes: ci/prow/4.16-openshift-e2e, ci/prow/4.17-openshift-e2e, ci/prow/4.18-openshift-e2e, ci/prow/4.20-openshift-e2e In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
|
/retest |
|
@kattz-kawa: The following test failed, say
Full PR test history. Your PR dashboard. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here. |
|
probably not related to this PR, but we need to investigate test failures... |
Why we need this PR
olmcahashannotation. With the current strategy, this may lead to deadlocks if a node is down during recovery, causing rollout failures. To improve reliability and ensure safe updates under these conditions, we need a more controlled approach.Changes made
Added
RollingUpdatestrategy withmaxSurge: 0andmaxUnavailable: 1for safer deployment updates.Before (default): When a rollout starts, the Deployment tries to keep the old pod until the new one is ready. On clusters with only 2 nodes, this can fail because there is no room to schedule the new pod, leading to a stuck rollout.
After: Pods are updated one at a time. The controller deletes one pod first, then creates a new one, ensuring that rollout can proceed safely even when only 2 nodes are available.
Which issue(s) this PR fixes
RHWA-366
Summary by CodeRabbit
Chores