Conversation
Up to standards ✅🟢 Issues
|
| Metric | Results |
|---|---|
| Duplication | 3 |
NEW Get contextual insights on your PRs based on Codacy's metrics, along with PR and Jira context, without leaving GitHub. Enable AI reviewer
TIP This summary will be updated as you push new changes. Give us feedback
|
Warning Rate limit exceeded
Your organization is not enrolled in usage-based pricing. Contact your admin to enable usage-based pricing to continue reviews beyond the rate limit, or try again in 48 minutes and 32 seconds. ⌛ How to resolve this issue?After the wait time has elapsed, a review can be triggered using the We recommend that you space out your commits to avoid hitting the rate limit. 🚦 How do rate limits work?CodeRabbit enforces hourly rate limits for each developer per organization. Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout. Please see our FAQ for further information. ℹ️ Review info⚙️ Run configurationConfiguration used: Organization UI Review profile: CHILL Plan: Pro Run ID: 📒 Files selected for processing (10)
📝 WalkthroughWalkthroughRefactored Patroni service management to implement a reload-and-wait pattern across systemd and Docker Swarm orchestration backends, replacing direct restart operations. Added new utility functions, a Patroni loop wait constant, refactored database instance resource operations, and introduced a system log maintenance task. Changes
Poem
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 2
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@lima/roles/stop_dbs/tasks/main.yaml`:
- Around line 41-43: The task "Clear journalctl" currently runs
ansible.builtin.command with "journalctl --flush --rotate --vacuum-time=1s",
which will destructively purge the host journal; change this so it is not run by
default: remove or guard the task with an explicit opt-in variable (e.g., add
when: vacuum_system_journal | default(false) or require a tag like
"vacuum_journal") and/or replace the vacuum-time flag with a less destructive,
targeted operation (e.g., vacuum by --unit or --vacuum-files and/or a
configurable --vacuum-time variable). Update the task that invokes journalctl to
reference the new variable (vacuum_system_journal or vacuum_time) and ensure
changed_when remains correct so the action is only performed when explicitly
enabled.
In `@server/internal/orchestrator/systemd/unit.go`:
- Around line 98-100: Update the error message produced after calling
client.ReloadOrRestartUnit so it reflects the actual operation name: change the
fmt.Errorf call in the ReloadOrRestartUnit error path to use "failed to reload
or restart unit '%s': %w" and pass the correct identifier (use r.Name rather
than path if r.Name was supplied to ReloadOrRestartUnit); the symbols to edit
are the ReloadOrRestartUnit call, the fmt.Errorf invocation, and the r.Name/path
argument used in that fmt.Errorf.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: fedeb80b-50fa-4eac-8e1d-5a6e04d917e3
📒 Files selected for processing (10)
lima/roles/stop_dbs/tasks/main.yamlserver/internal/database/instance_resource.goserver/internal/database/utils.goserver/internal/orchestrator/common/patroni_config_generator.goserver/internal/orchestrator/swarm/patroni_config.goserver/internal/orchestrator/systemd/client.goserver/internal/orchestrator/systemd/patroni_config.goserver/internal/orchestrator/systemd/unit.goserver/internal/patroni/config.goserver/internal/utils/utils.go
Patroni has an activity loop that they refer to as a 'run cycle'. Each run cycle is `loop_wait` seconds long. When a cycle takes less than `loop_wait` seconds, which is the typical case, Patroni will sleep until that time has elapsed. If the system is degraded and the cycle takes longer than `loop_wait`, it'll sleep for 1ms and log a warning. When we signal Patroni to reload its configuration, it will perform the reload at the beginning of the next run cycle. This means that under non-degraded conditions, it can take up to `loop_wait` seconds to reload its configuration. This commit fixes a bug where we were not waiting for the reload to complete. It was working most of the time because we signal Patroni to reload in the `UnitResource`, so the timing worked out that in a lot of cases we would get into `InstanceResource.Create` before the new config took effect, and it would take effect before we evaluated whether a restart was needed. Rather than doing the wait in the `InstanceResource`, which would happen on every update, I've moved the reload operation to each orchestrator's `PatroniConfig` resource. I've also added a sleep operation here to ensure that `loop_wait` seconds have elapsed before we proceed. This way, we can assume that the new configuration has taken effect in the `InstanceResource` lifecycle methods. This additional sleep does slow down some operations - especially in Swarm databases because the Patroni Config resource contains the entire instance spec, so there are plenty of conditions where there will be a resource diff even though there isn't a Patroni config change. We'll improve this condition when we adopt the new `common.PatroniConfigGenerator` in the `swarm.patroni_config` resource. PLAT-548
f2f6428 to
acbdeeb
Compare
tsivaprasad
left a comment
There was a problem hiding this comment.
Except for a minor change, everything looks good.
Verification:
2026/04/24 00:21:07 initializing cluster
2026/04/24 00:21:21 cluster initialized
=== RUN TestPortChange
=== PAUSE TestPortChange
=== CONT TestPortChange
port_change_test.go:23: [TestPortChange] creating database
port_change_test.go:56: [TestPortChange] updating database to change ports
port_change_test.go:85: [TestPortChange] validating that the database is usable
port_change_test.go:108: [TestPortChange] waiting for replication
database_test.go:357: [TestPortChange] waiting for replication to catch up on all nodes
port_change_test.go:112: [TestPortChange] validating that replication is functioning
fixture_test.go:200: cleaning up database 68cdaaeb-b419-4a30-8784-c58062e4bc35
--- PASS: TestPortChange (110.39s)
PASS
ok github.com/pgEdge/control-plane/e2e 124.320s
DONE 1 tests in 124.321s
2026/04/24 00:29:01 initializing cluster
2026/04/24 00:29:01 cluster initialized
=== RUN TestUpdateAddNode12
=== PAUSE TestUpdateAddNode12
=== CONT TestUpdateAddNode12
db_update_add_node_test.go:34: Creating 1 node database with 1 primary per node and no replicas
db_update_add_node_test.go:42: Verifying state of node n1
db_update_add_node_test.go:43: Verifying state of primary
db_update_add_node_test.go:43: Inserting test data on node n1 primary
db_update_add_node_test.go:51: Adding new node with zero down time enabled
db_update_add_node_test.go:53: nodes updated to total number 2
db_update_add_node_test.go:56: Inserting test data on node n2 primary
database_test.go:357: [TestUpdateAddNode12] waiting for replication to catch up on all nodes
db_update_add_node_test.go:269: Verifying data created on peer node n2 is present on n1
db_update_add_node_test.go:269: Verifying data created on peer node n1 is present on n2
fixture_test.go:200: cleaning up database 849bf860-43da-42a3-9917-71ef1f7e81ee
--- PASS: TestUpdateAddNode12 (91.81s)
PASS
ok github.com/pgEdge/control-plane/e2e 92.335s
DONE 1 tests in 92.336s
|
|
||
| func SleepContext(ctx context.Context, duration time.Duration) error { | ||
| select { | ||
| case <-time.After(duration): |
There was a problem hiding this comment.
Looks it will sleep even if there is a cancellation.
Do you think we can change like below?
func SleepContext(ctx context.Context, duration time.Duration) error {
t := time.NewTimer(duration)
defer t.Stop()
select {
case <-t.C:
return nil
case <-ctx.Done():
return ctx.Err()
}
}
Summary
Patroni has an activity loop that they refer to as a 'run cycle'. Each run cycle is
loop_waitseconds long. When a cycle takes less thanloop_waitseconds, which is the typical case, Patroni will sleep until that time has elapsed. If the system is degraded and the cycle takes longer thanloop_wait, it'll sleep for 1ms and log a warning.When we signal Patroni to reload its configuration, it will perform the reload at the beginning of the next run cycle. This means that under non-degraded conditions, it can take up to
loop_waitseconds to reload its configuration.This commit fixes a bug where we were not waiting for the reload to complete. It was working most of the time because we signal Patroni to reload in the
UnitResource, so the timing worked out that in a lot of cases we would get intoInstanceResource.Createbefore the new config took effect, and it would take effect before we evaluated whether a restart was needed.Rather than doing the wait in the
InstanceResource, which would happen on every update, I've moved the reload operation to each orchestrator'sPatroniConfigresource. I've also added a sleep operation here to ensure thatloop_waitseconds have elapsed before we proceed. This way, we can assume that the new configuration has taken effect in theInstanceResourcelifecycle methods.This additional sleep does slow down some operations - especially in Swarm databases because the Patroni Config resource contains the entire instance spec, so there are plenty of conditions where there will be a resource diff even though there isn't a Patroni config change. We'll improve this condition when we adopt the new
common.PatroniConfigGeneratorin theswarm.patroni_configresource.Testing
This mainly affects the
TestPortChangeE2E on systemd:PLAT-548