Skip to content

<feature>[zns]: add ZNS-Cloud data sync reconciliation#3777

Closed
MatheMatrix wants to merge 1 commit intofeature-5.5.12-znsfrom
sync/shixin.ruan/shixin.ruan-zcf-2133@@2
Closed

<feature>[zns]: add ZNS-Cloud data sync reconciliation#3777
MatheMatrix wants to merge 1 commit intofeature-5.5.12-znsfrom
sync/shixin.ruan/shixin.ruan-zcf-2133@@2

Conversation

@MatheMatrix
Copy link
Copy Markdown
Owner

Implement Cloud-ZNS periodic sync to ensure data consistency.

  • DirtySyncContext, SyncSdnControllerDataMsg/Reply extensions
  • SdnControllerFactory: getSyncChain/isSyncEnabled/isDryRun
  • SdnControllerBase: sync handler with ChainTask, events, alarm
  • DirtySyncTracker interface + impl for dirty marking
  • SdnControllerManagerImpl: dirty marking on NIC op failures
  • VmHostChangedExtensionPoint + VmHostChangeReason enum
  • VmInstanceBase: fire VmHostChanged on host binding changes

Resolves: ZCF-2133
Change-Id: I969632a01d77d8d63249f5c25230462d32f6f58e

sync from gitlab !9647

@coderabbitai
Copy link
Copy Markdown

coderabbitai bot commented Apr 16, 2026

Walkthrough

新增脏同步追踪接口与线程安全实现,扩展同步流程以支持干运行、同步统计与连续失败告警;在管理器与 ping 监控中集成脏同步标记与基于连续成功 ping 的自动触发同步;新增事件类型与回复字段以承载同步元数据。

Changes

Cohort / File(s) Summary
脏同步追踪
plugin/sdnController/src/main/java/org/zstack/sdnController/DirtySyncTracker.java, plugin/sdnController/src/main/java/org/zstack/sdnController/DirtySyncTrackerImpl.java
新增公共接口 DirtySyncTracker(markNeedSync/needsSync/clearNeedSync)及其线程安全内存实现(ConcurrentHashMap)。
同步流程与工厂配置
plugin/sdnController/src/main/java/org/zstack/sdnController/SdnControllerBase.java, plugin/sdnController/src/main/java/org/zstack/sdnController/SdnControllerFactory.java
在处理 SyncSdnControllerDataMsg 时加入 sync 启用检查、dryRun 注入、SYNC_STATS 读取与统计回填;增加工厂默认方法 isSyncEnabled/isDryRun/getSyncFailAlarmThreshold/getSyncPingSuccessThreshold
事件定义与流参数
plugin/sdnController/src/main/java/org/zstack/sdnController/header/SdnControllerCanonicalEvents.java, plugin/sdnController/src/main/java/org/zstack/sdnController/header/SdnControllerFlowDataParam.java, plugin/sdnController/src/main/java/org/zstack/sdnController/header/SyncSdnControllerDataReply.java
新增同步相关事件常量与 JSON 模式数据类(完成/失败/告警/diffOverThreshold),为回复添加 toCreate/toDelete/toUpdate/failed/dryRun 字段及访问器。
管理器与 ping 监控集成
plugin/sdnController/src/main/java/org/zstack/sdnController/SdnControllerManagerImpl.java, plugin/sdnController/src/main/java/org/zstack/sdnController/SdnControllerPingTracker.java
在操作失败处标记需同步(markNeedSync);Ping 跟踪维护连续成功计数,达到阈值时 clearNeedSync 并发送同步请求,失败时重新标记。

Sequence Diagram(s)

sequenceDiagram
    participant PingTracker as Ping追踪器
    participant DirtySync as 脏同步追踪器
    participant SdnBase as SdnControllerBase
    participant Factory as 控制器工厂
    participant EventBus as 事件总线

    PingTracker->>DirtySync: check needsSync(controllerUuid)
    alt 控制器标记为需同步且成功ping计数达阈值
        PingTracker->>DirtySync: clearNeedSync(controllerUuid)
        PingTracker->>SdnBase: Send SyncSdnControllerDataMsg(controllerUuid)
        SdnBase->>Factory: isDryRun()/isSyncEnabled()/getSyncFailAlarmThreshold()
        SdnBase->>SdnBase: 执行同步 FlowChain (注入 SYNC_DRY_RUN)
        alt 同步成功
            SdnBase->>EventBus: 发布 SDN_CONTROLLER_SYNC_COMPLETED(统计, dryRun, scope)
        else 同步失败
            SdnBase->>SdnBase: 增加 consecutiveFailCount
            SdnBase->>EventBus: 发布 SDN_CONTROLLER_SYNC_FAILED(错误详情, consecutiveFailCount)
            alt 告警条件满足
                SdnBase->>EventBus: 发布 SDN_CONTROLLER_SYNC_ALARM(consecutiveFailCount, threshold)
            end
            SdnBase->>DirtySync: markNeedSync(controllerUuid)
        end
    else ping 失败或未达阈值
        PingTracker->>DirtySync: (在失败路径)markNeedSync(controllerUuid)
    end
Loading

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Poem

🐰 我是小兔子来巡视,
脏同步旗帜轻轻立,
ping 连续算阈值,消息悄发送,
成功回报数统计,失败触发警示,
我跳着庆祝新变化,咕噜咕噜喜! 🥕✨

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 4.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (2 passed)
Check name Status Explanation
Title check ✅ Passed The pull request title follows the required format '[scope]: ' and is within the 72 character limit.
Description check ✅ Passed The description is directly related to the changeset, describing the implementation of Cloud-ZNS periodic synchronization.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch sync/shixin.ruan/shixin.ruan-zcf-2133@@2

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 7

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
plugin/sdnController/src/main/java/org/zstack/sdnController/SdnControllerBase.java (1)

137-138: ⚠️ Potential issue | 🟠 Major

SdnControllerBase.handleMessage() 缺少 @MessageSafe 注解,新增的消息分支无法正确处理异常。

验证结果显示,handleMessage(SdnControllerMessage msg) 方法(第 112 行)未标注 @MessageSafe。这意味着新增的 SyncSdnControllerDataMsg 分支(第 137-138 行)及现有的所有消息处理分支中抛出的异常都无法被 MessageSafeAspect.aj 自动转换为 reply error。根据 ZStack 设计,缺少 @MessageSafe 注解的 message handler 异常会导致消息总线丢失应答,造成调用方永久挂起。需添加 @MessageSafe(来自 org.zstack.core.cloudbus.MessageSafe)注解在 handleMessage 方法上,确保所有消息分支的异常都能正确转换为 reply error。

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In
`@plugin/sdnController/src/main/java/org/zstack/sdnController/SdnControllerBase.java`
around lines 137 - 138, The handleMessage(SdnControllerMessage msg) method lacks
the `@MessageSafe` annotation, so exceptions thrown by any branch (including the
new SyncSdnControllerDataMsg branch handled in the added case) won’t be
converted to reply errors by MessageSafeAspect.aj; fix this by adding the
org.zstack.core.cloudbus.MessageSafe `@MessageSafe` annotation to the
handleMessage(SdnControllerMessage msg) method declaration so all exceptions in
methods like handle((SyncSdnControllerDataMsg) msg) are caught and transformed
into reply errors.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@compute/src/main/java/org/zstack/compute/vm/VmInstanceBase.java`:
- Around line 329-341: The deriveHostChangeReason method currently maps any
unhandled VmInstanceStateEvent to VmHostChangeReason.START which mislabels
migrate/host-switch flows; update the logic in
deriveHostChangeReason(VmInstanceStateEvent) to explicitly handle
migration-related events (or add a new VmHostChangeReason value like
MIGRATE/HOST_CHANGE) instead of falling back to START, and remove the broad
default-to-START behavior (or return a distinct UNKNOWN/null and update callers
of deriveHostChangeReason to accept or be provided an explicit reason),
referencing the deriveHostChangeReason method, VmInstanceStateEvent enum values,
and VmHostChangeReason so callers such as doMigrateVm/checkState can correctly
pass or interpret the migration/host-change reason.
- Around line 411-419: The VmHostChangedExtensionPoint is only fired when host
changes go through changeVmStateInDb(), but other code paths in VmInstanceBase
directly modify hostUuid (e.g., the HA pre-start clear logic and the
ChangeVmMetaDataMsg handling), so unify host-binding updates by extracting a
single helper (e.g., updateHostBindingAndNotify(oldHostUuid, newHostUuid,
VmHostChangeReason)) and replace direct hostUuid writes with calls to that
helper, or alternatively add the same VmHostChangedExtensionPoint notification
in each branch that currently updates hostUuid directly (reference
VmHostChangedExtensionPoint, changeVmStateInDb(), hostUuid, the HA pre-start
clear and ChangeVmMetaDataMsg handling).

In
`@plugin/sdnController/src/main/java/org/zstack/sdnController/DirtySyncTrackerImpl.java`:
- Line 16: DirtySyncTrackerImpl is not registered as a Spring bean, causing
UnsatisfiedDependencyException when SdnControllerManagerImpl autowires
DirtySyncTracker; fix by annotating the implementation class
DirtySyncTrackerImpl with `@Component` (and add the corresponding import
org.springframework.stereotype.Component) so Spring can discover and register
the bean for the DirtySyncTracker interface used by SdnControllerManagerImpl
(which relies on `@Autowired`).

In
`@plugin/sdnController/src/main/java/org/zstack/sdnController/header/DirtySyncContext.java`:
- Around line 56-62: getDirtyResourceUuids() currently returns the internal Set
directly and setDirtyResourceUuids(...) assigns the incoming Set reference,
allowing external mutation and allowing nulls that can cause NPEs in merge();
fix by making defensive copies and null-safe assignments: in
getDirtyResourceUuids() return either an unmodifiable copy or a new HashSet copy
of the internal field, and in setDirtyResourceUuids(Set<String> s) assign
this.dirtyResourceUuids = (s == null ? new HashSet<>() : new HashSet<>(s));
ensure any other code (e.g., merge()) relies on the non-null internal set.

In
`@plugin/sdnController/src/main/java/org/zstack/sdnController/SdnControllerBase.java`:
- Around line 270-278: The alarm currently triggers on every sync when failCount
>= alarmThreshold, causing repeated alarms; change the trigger to fire only once
when the threshold is first crossed (e.g. use failCount == alarmThreshold
instead of >=) so that
SdnControllerCanonicalEvents.SDN_CONTROLLER_SYNC_ALARM_PATH is fired only on the
first exceedance (evtf.fire with SdnControllerSyncAlarmData) and subsequent
failures do not repeatedly log until the counter is cleared on success.
- Around line 175-176: The FlowChain data map mixes enum keys and string keys
(e.g., SYNC_CONTEXT.toString(), SYNC_DRY_RUN.toString(), SYNC_STATS.toString())
while other places use enum constants (e.g., SDN_CONTROLLER_UUID,
SDN_CONTROLLER_USERNAME); remove all .toString() uses and put the enum constants
themselves into flowChain.getData() (e.g., use SYNC_CONTEXT, SYNC_DRY_RUN,
SYNC_STATS directly) and update any corresponding getters/consumers to use the
enum keys consistently so all reads/writes to FlowChain.getData() use the same
enum objects (check SdnControllerBase and any code that reads these keys).

In
`@plugin/sdnController/src/main/java/org/zstack/sdnController/SdnControllerFactory.java`:
- Around line 30-32: SdnControllerFactory defines default methods
isSyncEnabled() and isDryRun() returning false, but SugonSdnControllerFactory
and H3cVcfcSdnControllerFactory do not override them, silently disabling
sync/dry-run; update each concrete factory (SugonSdnControllerFactory and
H3cVcfcSdnControllerFactory) to explicitly override isSyncEnabled() and
isDryRun() with the intended boolean values for that vendor (or compute them
from config), and ensure any callers use the overridden methods rather than
assuming the defaults.

---

Outside diff comments:
In
`@plugin/sdnController/src/main/java/org/zstack/sdnController/SdnControllerBase.java`:
- Around line 137-138: The handleMessage(SdnControllerMessage msg) method lacks
the `@MessageSafe` annotation, so exceptions thrown by any branch (including the
new SyncSdnControllerDataMsg branch handled in the added case) won’t be
converted to reply errors by MessageSafeAspect.aj; fix this by adding the
org.zstack.core.cloudbus.MessageSafe `@MessageSafe` annotation to the
handleMessage(SdnControllerMessage msg) method declaration so all exceptions in
methods like handle((SyncSdnControllerDataMsg) msg) are caught and transformed
into reply errors.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: http://open.zstack.ai:20001/code-reviews/zstack-cloud.yaml (via .coderabbit.yaml)

Review profile: CHILL

Plan: Pro

Run ID: 7749b611-9df5-4384-a1a2-7f897c02c7f1

📥 Commits

Reviewing files that changed from the base of the PR and between fb1a08b and 1afd19d.

⛔ Files ignored due to path filters (1)
  • conf/springConfigXml/sdnController.xml is excluded by !**/*.xml
📒 Files selected for processing (13)
  • compute/src/main/java/org/zstack/compute/vm/VmInstanceBase.java
  • header/src/main/java/org/zstack/header/vm/VmHostChangeReason.java
  • header/src/main/java/org/zstack/header/vm/VmHostChangedExtensionPoint.java
  • plugin/sdnController/src/main/java/org/zstack/sdnController/DirtySyncTracker.java
  • plugin/sdnController/src/main/java/org/zstack/sdnController/DirtySyncTrackerImpl.java
  • plugin/sdnController/src/main/java/org/zstack/sdnController/SdnControllerBase.java
  • plugin/sdnController/src/main/java/org/zstack/sdnController/SdnControllerFactory.java
  • plugin/sdnController/src/main/java/org/zstack/sdnController/SdnControllerManagerImpl.java
  • plugin/sdnController/src/main/java/org/zstack/sdnController/header/DirtySyncContext.java
  • plugin/sdnController/src/main/java/org/zstack/sdnController/header/SdnControllerCanonicalEvents.java
  • plugin/sdnController/src/main/java/org/zstack/sdnController/header/SdnControllerFlowDataParam.java
  • plugin/sdnController/src/main/java/org/zstack/sdnController/header/SyncSdnControllerDataMsg.java
  • plugin/sdnController/src/main/java/org/zstack/sdnController/header/SyncSdnControllerDataReply.java

Comment on lines +329 to +341
private VmHostChangeReason deriveHostChangeReason(VmInstanceStateEvent stateEvent) {
switch (stateEvent) {
case running:
return VmHostChangeReason.START;
case stopped:
return VmHostChangeReason.STOP;
case destroying:
case destroyed:
return VmHostChangeReason.DELETE;
default:
return VmHostChangeReason.START;
}
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

不要把所有未覆盖场景都默认归类成 START

迁移完成时这里也会返回 STARTdoMigrateVm() 在 Line 7046 之后会走 checkState(),最终以 VmInstanceStateEvent.running + 新 hostUuid 回到这里。这样扩展点收到的是“启动”而不是“迁移/宿主切换”,下游如果按 reason 选择同步逻辑会走错分支。建议把这些场景显式建模,或者直接由调用方传入 reason,避免 default -> START

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@compute/src/main/java/org/zstack/compute/vm/VmInstanceBase.java` around lines
329 - 341, The deriveHostChangeReason method currently maps any unhandled
VmInstanceStateEvent to VmHostChangeReason.START which mislabels
migrate/host-switch flows; update the logic in
deriveHostChangeReason(VmInstanceStateEvent) to explicitly handle
migration-related events (or add a new VmHostChangeReason value like
MIGRATE/HOST_CHANGE) instead of falling back to START, and remove the broad
default-to-START behavior (or return a distinct UNKNOWN/null and update callers
of deriveHostChangeReason to accept or be provided an explicit reason),
referencing the deriveHostChangeReason method, VmInstanceStateEvent enum values,
and VmHostChangeReason so callers such as doMigrateVm/checkState can correctly
pass or interpret the migration/host-change reason.

Comment on lines +411 to +419
// Fire VmHostChangedExtensionPoint if host binding changed
String oldHostUuid = oldHostHolder[0];
String newHostUuid = self.getHostUuid();
if (!Objects.equals(oldHostUuid, newHostUuid)) {
VmHostChangeReason reason = deriveHostChangeReason(stateEvent);
VmInstanceInventory inv = getSelfInventory();
CollectionUtils.safeForEach(pluginRgty.getExtensionList(VmHostChangedExtensionPoint.class),
ext -> ext.vmHostChanged(inv, oldHostUuid, newHostUuid, reason));
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

这个回调目前没有覆盖本类里所有 hostUuid 变更路径。

现在只有走 changeVmStateInDb() 的 host 变化才会触发扩展点,但本类还有直接改 hostUuid 的分支,例如 Line 1036-1045 的 HA 启动前清 host,以及 Line 1959-1964 的 ChangeVmMetaDataMsg 分支。那些路径不会触发 VmHostChangedExtensionPoint,新加的同步逻辑会漏掉这部分宿主绑定变化。建议把 host 绑定更新统一收敛到一个 helper,或者在这些直接写库的分支补发同样的通知。

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@compute/src/main/java/org/zstack/compute/vm/VmInstanceBase.java` around lines
411 - 419, The VmHostChangedExtensionPoint is only fired when host changes go
through changeVmStateInDb(), but other code paths in VmInstanceBase directly
modify hostUuid (e.g., the HA pre-start clear logic and the ChangeVmMetaDataMsg
handling), so unify host-binding updates by extracting a single helper (e.g.,
updateHostBindingAndNotify(oldHostUuid, newHostUuid, VmHostChangeReason)) and
replace direct hostUuid writes with calls to that helper, or alternatively add
the same VmHostChangedExtensionPoint notification in each branch that currently
updates hostUuid directly (reference VmHostChangedExtensionPoint,
changeVmStateInDb(), hostUuid, the HA pre-start clear and ChangeVmMetaDataMsg
handling).

* Dirty state is ephemeral — lost on MN restart, which is acceptable because
* the periodic timer provides eventual consistency as a fallback.
*/
public class DirtySyncTrackerImpl implements DirtySyncTracker {
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
set -euo pipefail

echo "== 查找 DirtySyncTracker 相关实现与注解 =="
rg -n --type=java -C2 'interface\s+DirtySyncTracker|class\s+DirtySyncTrackerImpl|@Component|@Service|@Configuration|@Bean'

echo
echo "== 查找 XML 中是否有 DirtySyncTracker/DirtySyncTrackerImpl Bean 注册 =="
rg -n -C2 'DirtySyncTracker|DirtySyncTrackerImpl|<bean' --iglob '*.xml'

echo
echo "== 查找是否存在其他 DirtySyncTracker 实现 =="
rg -n --type=java -C2 'class\s+\w+\s+implements\s+DirtySyncTracker'

Repository: MatheMatrix/zstack

Length of output: 50375


🏁 Script executed:

#!/bin/bash
set -euo pipefail

echo "== 检查 DirtySyncTrackerImpl 完整内容(前30行)=="
head -30 plugin/sdnController/src/main/java/org/zstack/sdnController/DirtySyncTrackerImpl.java

echo
echo "== 检查 SdnControllerManagerImpl 的 `@Autowired` 位置 =="
sed -n '65,75p' plugin/sdnController/src/main/java/org/zstack/sdnController/SdnControllerManagerImpl.java

echo
echo "== 查找 sdnController 目录下是否有配置文件 =="
find plugin/sdnController -name "*.xml" -o -name "*.properties" | head -20

Repository: MatheMatrix/zstack

Length of output: 1753


🏁 Script executed:

#!/bin/bash
set -euo pipefail

echo "== 查找 sdnController 模块中的 `@Configuration` 类 =="
rg -n --type=java '@Configuration' plugin/sdnController/

echo
echo "== 查找 sdnController 模块中的 `@SpringBootApplication` 或其他配置入口 =="
rg -n --type=java '@SpringBootApplication|@EnableAutoConfiguration|ComponentScan' plugin/sdnController/

echo
echo "== 查找 SdnControllerManagerImpl 类声明及其注解 =="
sed -n '1,30p' plugin/sdnController/src/main/java/org/zstack/sdnController/SdnControllerManagerImpl.java

Repository: MatheMatrix/zstack

Length of output: 106


🏁 Script executed:

#!/bin/bash
set -euo pipefail

echo "== 检查 SdnControllerManagerImpl 类声明(前50行)=="
head -50 plugin/sdnController/src/main/java/org/zstack/sdnController/SdnControllerManagerImpl.java

echo
echo "== 查找 sdnController 模块中所有 `@Component/`@Service/@Aspect 注解的类 =="
rg -n --type=java '@Component|@Service|@Aspect' plugin/sdnController/ -l

echo
echo "== 检查是否存在 Bean 工厂方法创建 DirtySyncTrackerImpl =="
rg -n --type=java '@Bean.*DirtySyncTracker|new DirtySyncTrackerImpl' plugin/sdnController/

Repository: MatheMatrix/zstack

Length of output: 2544


🏁 Script executed:

#!/bin/bash
set -euo pipefail

echo "== 查找 AbstractService 的定义 =="
rg -n --type=java 'class AbstractService' --max-count=5

echo
echo "== 检查 AbstractService 中的注解 =="
rg -B5 'class AbstractService' --type=java | head -20

echo
echo "== 查找所有实现 DirtySyncTracker 的类 =="
rg -n --type=java 'implements DirtySyncTracker'

echo
echo "== 检查 DirtySyncTrackerImpl 是否在测试或其他模块中被显式注册 =="
rg -n 'DirtySyncTrackerImpl' --type=java

Repository: MatheMatrix/zstack

Length of output: 1258


🏁 Script executed:

#!/bin/bash
set -euo pipefail

echo "== 最终确认: 检查 DirtySyncTrackerImpl 完整文件内容 =="
cat plugin/sdnController/src/main/java/org/zstack/sdnController/DirtySyncTrackerImpl.java | head -20

echo
echo "== 检查使用 DirtySyncTracker 注入的其他类 =="
rg -n '@Autowired.*DirtySyncTracker|@Resource.*DirtySyncTracker' --type=java

Repository: MatheMatrix/zstack

Length of output: 897


添加 @Component 注解到 DirtySyncTrackerImpl 以完成 Spring Bean 注册。

SdnControllerManagerImpl 在第 69-70 行通过 @Autowired 注入 DirtySyncTracker 接口,但当前 DirtySyncTrackerImpl 缺少任何 Spring Bean 注册注解(@Component@Service 等)。由于这是该接口的唯一实现,且找不到其他显式的 Bean 注册机制(XML 配置、@Configuration 类或 @Bean 工厂方法),应用启动时将抛出 UnsatisfiedDependencyException

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In
`@plugin/sdnController/src/main/java/org/zstack/sdnController/DirtySyncTrackerImpl.java`
at line 16, DirtySyncTrackerImpl is not registered as a Spring bean, causing
UnsatisfiedDependencyException when SdnControllerManagerImpl autowires
DirtySyncTracker; fix by annotating the implementation class
DirtySyncTrackerImpl with `@Component` (and add the corresponding import
org.springframework.stereotype.Component) so Spring can discover and register
the bean for the DirtySyncTracker interface used by SdnControllerManagerImpl
(which relies on `@Autowired`).

Comment on lines +56 to +62
public Set<String> getDirtyResourceUuids() {
return dirtyResourceUuids;
}

public void setDirtyResourceUuids(Set<String> dirtyResourceUuids) {
this.dirtyResourceUuids = dirtyResourceUuids;
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

避免暴露和接收可变/可空集合,防止状态漂移与 NPE。

Line 56-62 当前直接返回并保存内部 Set 引用,外部可绕过 merge() 直接修改状态;且若传入 null,后续在 Line 43/77 会触发空指针风险。

建议修复
 import java.io.Serializable;
+import java.util.Collections;
 import java.util.HashSet;
 import java.util.Set;
@@
     public Set<String> getDirtyResourceUuids() {
-        return dirtyResourceUuids;
+        return Collections.unmodifiableSet(dirtyResourceUuids);
     }
 
     public void setDirtyResourceUuids(Set<String> dirtyResourceUuids) {
-        this.dirtyResourceUuids = dirtyResourceUuids;
+        this.dirtyResourceUuids = dirtyResourceUuids == null
+                ? new HashSet<>()
+                : new HashSet<>(dirtyResourceUuids);
     }
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
public Set<String> getDirtyResourceUuids() {
return dirtyResourceUuids;
}
public void setDirtyResourceUuids(Set<String> dirtyResourceUuids) {
this.dirtyResourceUuids = dirtyResourceUuids;
}
public Set<String> getDirtyResourceUuids() {
return Collections.unmodifiableSet(dirtyResourceUuids);
}
public void setDirtyResourceUuids(Set<String> dirtyResourceUuids) {
this.dirtyResourceUuids = dirtyResourceUuids == null
? new HashSet<>()
: new HashSet<>(dirtyResourceUuids);
}
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In
`@plugin/sdnController/src/main/java/org/zstack/sdnController/header/DirtySyncContext.java`
around lines 56 - 62, getDirtyResourceUuids() currently returns the internal Set
directly and setDirtyResourceUuids(...) assigns the incoming Set reference,
allowing external mutation and allowing nulls that can cause NPEs in merge();
fix by making defensive copies and null-safe assignments: in
getDirtyResourceUuids() return either an unmodifiable copy or a new HashSet copy
of the internal field, and in setDirtyResourceUuids(Set<String> s) assign
this.dirtyResourceUuids = (s == null ? new HashSet<>() : new HashSet<>(s));
ensure any other code (e.g., merge()) relies on the non-null internal set.

Comment on lines +270 to +278
if (failCount >= alarmThreshold) {
logger.error(String.format("[sync-alarm] controller[uuid:%s] has %d consecutive sync failures (threshold=%d)",
self.getUuid(), failCount, alarmThreshold));
SdnControllerCanonicalEvents.SdnControllerSyncAlarmData d =
new SdnControllerCanonicalEvents.SdnControllerSyncAlarmData();
d.setSdnControllerUuid(self.getUuid());
d.setConsecutiveFailCount(failCount);
d.setAlarmThreshold(alarmThreshold);
evtf.fire(SdnControllerCanonicalEvents.SDN_CONTROLLER_SYNC_ALARM_PATH, d);
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

告警会在阈值之后每次失败都重复触发。

这里用 >=,意味着一旦达到阈值,后续每个同步周期都会再次 fire SDN_CONTROLLER_SYNC_ALARM 并打 error log,直到 Line 186 成功后才清零。这样很容易把“阈值告警”变成告警风暴。

🔕 建议至少只在首次越过阈值时触发一次
-        if (failCount >= alarmThreshold) {
+        if (failCount == alarmThreshold) {
             logger.error(String.format("[sync-alarm] controller[uuid:%s] has %d consecutive sync failures (threshold=%d)",
                     self.getUuid(), failCount, alarmThreshold));
             SdnControllerCanonicalEvents.SdnControllerSyncAlarmData d =
                     new SdnControllerCanonicalEvents.SdnControllerSyncAlarmData();
             d.setSdnControllerUuid(self.getUuid());
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In
`@plugin/sdnController/src/main/java/org/zstack/sdnController/SdnControllerBase.java`
around lines 270 - 278, The alarm currently triggers on every sync when
failCount >= alarmThreshold, causing repeated alarms; change the trigger to fire
only once when the threshold is first crossed (e.g. use failCount ==
alarmThreshold instead of >=) so that
SdnControllerCanonicalEvents.SDN_CONTROLLER_SYNC_ALARM_PATH is fired only on the
first exceedance (evtf.fire with SdnControllerSyncAlarmData) and subsequent
failures do not repeatedly log until the counter is cleared on success.

Comment on lines +30 to +32
default boolean isSyncEnabled() { return false; }

default boolean isDryRun() { return false; }
Copy link
Copy Markdown

@coderabbitai coderabbitai bot Apr 16, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
set -euo pipefail

echo "== 查找 SdnControllerFactory 实现类 =="
rg -nP --type=java -C2 'class\s+\w+.*implements\s+[^{]*\bSdnControllerFactory\b'

echo
echo "== 查找 isSyncEnabled 覆写 =="
rg -nP --type=java -C2 '\bboolean\s+isSyncEnabled\s*\('

echo
echo "== 查找 isDryRun 覆写 =="
rg -nP --type=java -C2 '\bboolean\s+isDryRun\s*\('

Repository: MatheMatrix/zstack

Length of output: 8881


确认目标厂商实现需覆写 isSyncEnabled()

当前 SdnControllerFactory 的两个已识别实现类(SugonSdnControllerFactoryH3cVcfcSdnControllerFactory)均未覆写此方法,将采用默认的 false 返回值,导致同步功能被静默禁用。建议各实现类明确覆写 isSyncEnabled()isDryRun() 方法,避免意外行为。

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In
`@plugin/sdnController/src/main/java/org/zstack/sdnController/SdnControllerFactory.java`
around lines 30 - 32, SdnControllerFactory defines default methods
isSyncEnabled() and isDryRun() returning false, but SugonSdnControllerFactory
and H3cVcfcSdnControllerFactory do not override them, silently disabling
sync/dry-run; update each concrete factory (SugonSdnControllerFactory and
H3cVcfcSdnControllerFactory) to explicitly override isSyncEnabled() and
isDryRun() with the intended boolean values for that vendor (or compute them
from config), and ensure any callers use the overridden methods rather than
assuming the defaults.

Copy link
Copy Markdown
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Comment from shixin.ruan:

这两个类原来有sync功能吗?如果没有这个这个功能,本次修改也不需要增加功能

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
echo "== Check SugonSdnControllerFactory for sync-related code =="
rg -nP --type=java -C3 'getSyncChain|isSyncEnabled|isDryRun|[Ss]ync' --iglob '*SugonSdnControllerFactory*'

echo
echo "== Check H3cVcfcSdnControllerFactory for sync-related code =="
rg -nP --type=java -C3 'getSyncChain|isSyncEnabled|isDryRun|[Ss]ync' --iglob '*H3cVcfcSdnControllerFactory*'

echo
echo "== Find which factory DOES override getSyncChain =="
rg -nP --type=java -C3 'getSyncChain\s*\('

Length of output: 4702


@MatheMatrix,shixin.ruan 的判断完全正确。

脚本确认 SugonSdnControllerFactoryH3cVcfcSdnControllerFactory 均无任何同步相关代码,也没有任何工厂类覆写了 getSyncChain()。这两个方法的默认值 false 正是用于表示"该厂商不支持同步",是合理的设计,无需在这两个实现类中显式覆写。

我之前的 review 评论有误,抱歉造成困扰,可以忽略该建议。

(∩`-´)⊃━☆゚.*・。゚

   (\_/)
  ( •_•)
  / >🙏  my bad!

✏️ Learnings added
Learnt from: MatheMatrix
Repo: MatheMatrix/zstack PR: 3777
File: plugin/sdnController/src/main/java/org/zstack/sdnController/SdnControllerFactory.java:30-32
Timestamp: 2026-04-17T02:23:21.142Z
Learning: In the ZStack SDN controller plugin, `SdnControllerFactory.isSyncEnabled()` and `isDryRun()` are intentional default methods returning `false`. Concrete factory implementations (e.g., SugonSdnControllerFactory, H3cVcfcSdnControllerFactory) that do NOT have sync functionality should NOT override these methods — the default `false` correctly signals that sync is disabled for that vendor. Only factories that actually implement the sync chain need to override these methods.

Note: Learnings are effective only in the context of similar code segments. To apply general coding guidelines, please configure review instructions. You can manage existing learnings in the app.


🧠 Learnings used
Learnt from: zstack-robot-1
Repo: MatheMatrix/zstack PR: 2419
File: plugin/sdnController/src/main/java/org/zstack/sdnController/h3cVcfc/H3cVcfcSdnController.java:568-570
Timestamp: 2025-08-13T02:40:34.873Z
Learning: 在ZStack的H3C SDN控制器实现中,getDeleteH3cNetworksCmd()方法需要保持private访问修饰符,因为它是非overwrite方法,改为protected可能导致H3cVcfcSdnController和H3cVcfcV2SdnController之间的意外继承行为。

Learnt from: MatheMatrix
Repo: MatheMatrix/zstack PR: 2419
File: plugin/sdnController/src/main/java/org/zstack/sdnController/h3cVcfc/H3cVcfcSdnController.java:560-562
Timestamp: 2025-08-13T02:41:52.024Z
Learning: 在ZStack的H3C SDN控制器实现中,getCreateH3cNetworksRspClass()方法需要保持private访问修饰符,因为它是非overwrite方法,改为protected可能导致H3cVcfcSdnController和H3cVcfcV2SdnController之间的意外继承行为。

Learnt from: zstack-robot-1
Repo: MatheMatrix/zstack PR: 2419
File: plugin/sdnController/src/main/java/org/zstack/sdnController/h3cVcfc/H3cVcfcSdnController.java:556-558
Timestamp: 2025-08-13T02:42:07.774Z
Learning: 在ZStack的H3C SDN控制器实现中,getCreateH3cNetworksCmd()方法需要保持private访问修饰符,因为它是非overwrite方法,改为protected可能导致H3cVcfcSdnController和H3cVcfcV2SdnController之间的意外继承行为。

Learnt from: zstack-robot-2
Repo: MatheMatrix/zstack PR: 2419
File: plugin/sdnController/src/main/java/org/zstack/sdnController/h3cVcfc/H3cVcfcSdnController.java:552-554
Timestamp: 2025-08-12T03:30:52.097Z
Learning: 在ZStack的H3C SDN控制器实现中,getNetworkCmd()方法在H3cVcfcSdnController和H3cVcfcV2SdnController中各自有独立的实现,需要保持private访问修饰符以防止意外的继承/重写行为,不能为了保持一致性而改为protected。

Learnt from: zstack-robot-1
Repo: MatheMatrix/zstack PR: 0
File: :0-0
Timestamp: 2026-04-02T01:40:40.916Z
Learning: 在 ZStack 中,资源创建成功路径(如 `handle(APIAddSdnControllerMsg)` 中)对 `tagMgr.createTagsFromAPICreateMessage(msg, vo.getUuid(), SdnControllerVO.class.getSimpleName())` 和 `dbf.findByUuid(vo.getUuid(), SdnControllerVO.class)` 等后续操作,不需要用 try/catch 来防御"资源不存在"异常。ZStack 的设计哲学是:在资源创建流程中,通过生命周期管理和数据库外键约束来保证一致性,一般假设此类情况不会发生;若需要解决框架层面的一致性问题,应通过声明式 API 等机制统一处理,而不是在每个操作中无休止地判断异常。(plugin/sdnController/src/main/java/org/zstack/sdnController/SdnControllerManagerImpl.java)

Learnt from: ZStack-Robot
Repo: MatheMatrix/zstack PR: 2419
File: plugin/sdnController/src/main/java/org/zstack/sdnController/h3cVcfc/H3cVcfcSdnController.java:540-542
Timestamp: 2025-08-12T03:30:08.976Z
Learning: 在ZStack的H3C VCFC SDN控制器实现中,getGetH3cTenantsRspClass()方法在H3cVcfcSdnController和H3cVcfcV2SdnController中各自有独立的实现,因此需要保持private访问修饰符以防止意外的继承/重写行为,不能为了保持一致性而改为protected。

Learnt from: zstack-robot-1
Repo: MatheMatrix/zstack PR: 3563
File: plugin/kvm/src/main/java/org/zstack/kvm/efi/KvmSecureBootManager.java:328-334
Timestamp: 2026-03-23T09:33:01.215Z
Learning: In ZStack message handlers, ensure your `handleMessage` method is annotated with `MessageSafe` (`org.zstack.core.cloudbus.MessageSafe`). This enables `MessageSafeAspect.aj` to automatically convert exceptions thrown by the handler into appropriate reply errors. Note that `ExceptionSafeAspect.aj` only applies to methods annotated with `ExceptionSafe` and will only log/swallow exceptions without sending a reply—so missing `MessageSafe` can break reply/error semantics for the message bus.

@MatheMatrix MatheMatrix force-pushed the sync/shixin.ruan/shixin.ruan-zcf-2133@@2 branch from 1afd19d to c645d3d Compare April 17, 2026 10:48
- Add sync reconciliation flow (segments, ports, DHCP)
- DirtySyncTracker for marking controllers needing sync
- Sync stats, dry-run mode, threshold-based auto-degrade
- Canonical events for sync lifecycle

Resolves: ZCF-2133

Change-Id: I969632a01d77d8d63249f5c25230462d32f6f58e
Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (2)
plugin/sdnController/src/main/java/org/zstack/sdnController/SdnControllerBase.java (1)

215-232: 建议使用接口替代反射以提高类型安全性。

当前使用反射读取 SYNC_STATS 对象的方法,虽然灵活但存在以下风险:

  • 如果统计对象的方法签名变更,运行时会静默失败
  • 缺少编译时类型检查

建议定义一个 SyncStats 接口,要求所有厂商实现该接口:

♻️ 建议的接口定义
public interface SyncStats {
    int getToCreateCount();
    int getToDeleteCount();
    int getToUpdateCount();
    int getFailedCount();
}

然后在 populateSyncReply 中使用:

-        Object statsObj = data.get(SYNC_STATS);
-        if (statsObj == null) return;
-        try {
-            java.lang.reflect.Method getToCreateCount = statsObj.getClass().getMethod("getToCreateCount");
-            // ... reflection calls
-        } catch (Exception e) {
-            logger.warn(String.format("failed to read sync stats: %s", e.getMessage()));
-        }
+        Object statsObj = data.get(SYNC_STATS);
+        if (statsObj instanceof SyncStats) {
+            SyncStats stats = (SyncStats) statsObj;
+            reply.setToCreateCount(stats.getToCreateCount());
+            reply.setToDeleteCount(stats.getToDeleteCount());
+            reply.setToUpdateCount(stats.getToUpdateCount());
+            reply.setFailedCount(stats.getFailedCount());
+        }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In
`@plugin/sdnController/src/main/java/org/zstack/sdnController/SdnControllerBase.java`
around lines 215 - 232, The method populateSyncReply currently uses reflection
to read SYNC_STATS which is fragile; define a SyncStats interface with methods
getToCreateCount, getToDeleteCount, getToUpdateCount, and getFailedCount and
change populateSyncReply to cast data.get(SYNC_STATS) to SyncStats (check for
null/instanceof) and call those methods directly to populate
reply.setToCreateCount, setToDeleteCount, setToUpdateCount and setFailedCount;
remove the reflection block and its catch, ensuring compile-time type safety and
clearer error handling.
plugin/sdnController/src/main/java/org/zstack/sdnController/SdnControllerPingTracker.java (1)

110-125: 计数器增长策略存在设计不匹配。

当前实现中,无论 needsSync 是否为 true,每次成功的 ping 都会递增 consecutivePingSuccessCount。这导致:

  • needsSync=false 时,计数器在内存中持续增长,造成不必要的开销
  • markNeedSync 被调用后,计数器可能已经远大于阈值,使得下一次 ping 立即触发同步,而不是等待新的"连续成功 ping"

这与代码注释中"count consecutive successes and trigger sync if needed"的语义不符。建议仅在 needsSync=true 时递增计数器,或在 needsSync 状态改变时重置计数器,确保"连续"的准确含义。

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In
`@plugin/sdnController/src/main/java/org/zstack/sdnController/SdnControllerPingTracker.java`
around lines 110 - 125, The consecutive success counter is incremented
unconditionally which breaks the "consecutive" semantics; change the logic in
the SdnControllerPingTracker block that references consecutivePingSuccessCount
and dirtySyncTracker.needsSync so that you only increment the AtomicInteger when
dirtySyncTracker.needsSync(resourceUuid) returns true; if needsSync is false,
remove or reset the entry in consecutivePingSuccessCount to avoid unbounded
growth; additionally ensure any method that transitions a controller to needing
sync (e.g., markNeedSync or equivalent in dirtySyncTracker) resets the counter
for that resourceUuid so the next sync waits for freshly observed consecutive
successes before calling triggerSync(resourceUuid).
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Nitpick comments:
In
`@plugin/sdnController/src/main/java/org/zstack/sdnController/SdnControllerBase.java`:
- Around line 215-232: The method populateSyncReply currently uses reflection to
read SYNC_STATS which is fragile; define a SyncStats interface with methods
getToCreateCount, getToDeleteCount, getToUpdateCount, and getFailedCount and
change populateSyncReply to cast data.get(SYNC_STATS) to SyncStats (check for
null/instanceof) and call those methods directly to populate
reply.setToCreateCount, setToDeleteCount, setToUpdateCount and setFailedCount;
remove the reflection block and its catch, ensuring compile-time type safety and
clearer error handling.

In
`@plugin/sdnController/src/main/java/org/zstack/sdnController/SdnControllerPingTracker.java`:
- Around line 110-125: The consecutive success counter is incremented
unconditionally which breaks the "consecutive" semantics; change the logic in
the SdnControllerPingTracker block that references consecutivePingSuccessCount
and dirtySyncTracker.needsSync so that you only increment the AtomicInteger when
dirtySyncTracker.needsSync(resourceUuid) returns true; if needsSync is false,
remove or reset the entry in consecutivePingSuccessCount to avoid unbounded
growth; additionally ensure any method that transitions a controller to needing
sync (e.g., markNeedSync or equivalent in dirtySyncTracker) resets the counter
for that resourceUuid so the next sync waits for freshly observed consecutive
successes before calling triggerSync(resourceUuid).

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: http://open.zstack.ai:20001/code-reviews/zstack-cloud.yaml (via .coderabbit.yaml)

Review profile: CHILL

Plan: Pro

Run ID: 0bcea848-8e84-44fa-84c1-f8381f5a4cf5

📥 Commits

Reviewing files that changed from the base of the PR and between 1afd19d and c645d3d.

⛔ Files ignored due to path filters (1)
  • conf/springConfigXml/sdnController.xml is excluded by !**/*.xml
📒 Files selected for processing (9)
  • plugin/sdnController/src/main/java/org/zstack/sdnController/DirtySyncTracker.java
  • plugin/sdnController/src/main/java/org/zstack/sdnController/DirtySyncTrackerImpl.java
  • plugin/sdnController/src/main/java/org/zstack/sdnController/SdnControllerBase.java
  • plugin/sdnController/src/main/java/org/zstack/sdnController/SdnControllerFactory.java
  • plugin/sdnController/src/main/java/org/zstack/sdnController/SdnControllerManagerImpl.java
  • plugin/sdnController/src/main/java/org/zstack/sdnController/SdnControllerPingTracker.java
  • plugin/sdnController/src/main/java/org/zstack/sdnController/header/SdnControllerCanonicalEvents.java
  • plugin/sdnController/src/main/java/org/zstack/sdnController/header/SdnControllerFlowDataParam.java
  • plugin/sdnController/src/main/java/org/zstack/sdnController/header/SyncSdnControllerDataReply.java
🚧 Files skipped from review as they are similar to previous changes (6)
  • plugin/sdnController/src/main/java/org/zstack/sdnController/SdnControllerManagerImpl.java
  • plugin/sdnController/src/main/java/org/zstack/sdnController/DirtySyncTracker.java
  • plugin/sdnController/src/main/java/org/zstack/sdnController/SdnControllerFactory.java
  • plugin/sdnController/src/main/java/org/zstack/sdnController/header/SyncSdnControllerDataReply.java
  • plugin/sdnController/src/main/java/org/zstack/sdnController/header/SdnControllerFlowDataParam.java
  • plugin/sdnController/src/main/java/org/zstack/sdnController/DirtySyncTrackerImpl.java

@MatheMatrix MatheMatrix force-pushed the sync/shixin.ruan/shixin.ruan-zcf-2133@@2 branch from c645d3d to 5e6d938 Compare April 17, 2026 10:57
Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🧹 Nitpick comments (1)
plugin/sdnController/src/main/java/org/zstack/sdnController/SdnControllerPingTracker.java (1)

53-53: 建议补齐 ownership 变更场景下的计数清理

consecutivePingSuccessCount 目前只在删除/失败/重连/触发同步后清理。若在 Line 155 的 untrackAll() 路径发生资源归属迁移,本节点可能残留不再追踪资源的计数项。建议在 trackOurs() 切换追踪集时同步清理,避免长期堆积陈旧状态。

Also applies to: 201-201

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In
`@plugin/sdnController/src/main/java/org/zstack/sdnController/SdnControllerPingTracker.java`
at line 53, The map consecutivePingSuccessCount in SdnControllerPingTracker can
retain stale entries when ownership changes via trackOurs(); update trackOurs()
to clean up counts for resources that are no longer tracked (same approach as
untrackAll()/onDelete/onFail/reconnect/sync triggers): after computing the new
tracked set in trackOurs() remove any keys from consecutivePingSuccessCount that
are not present in the new tracked set (or rebuild the map for the new set) so
stale counters do not accumulate.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In
`@plugin/sdnController/src/main/java/org/zstack/sdnController/header/SdnControllerCanonicalEvents.java`:
- Around line 122-134: The event SdnControllerSyncDiffOverThresholdData and path
SDN_CONTROLLER_SYNC_DIFF_OVER_THRESHOLD_PATH are defined but never fired; either
remove these unused definitions or add a firing point: locate SdnControllerBase
(state sync logic) and when computing the sync difference compare against the
threshold, instantiate SdnControllerSyncDiffOverThresholdData with
sdnControllerUuid, totalDiff and threshold and call
evtf.fire(SDN_CONTROLLER_SYNC_DIFF_OVER_THRESHOLD_PATH, thatData) to publish the
event; if you choose removal, delete both the constant/path and the
SdnControllerSyncDiffOverThresholdData class to avoid dead API surface.

In
`@plugin/sdnController/src/main/java/org/zstack/sdnController/SdnControllerPingTracker.java`:
- Around line 117-123: 当前逻辑先调用 dirtySyncTracker.needsSync(resourceUuid) 再调用
dirtySyncTracker.clearNeedSync(resourceUuid) 并忽略其返回值,存在竞态导致重复触发
triggerSync(resourceUuid)。修改为以 clearNeedSync(...) 的返回值作为原子门闩:在
factory.isSyncEnabled() 且 count >= factory.getSyncPingSuccessThreshold() 时,调用
dirtySyncTracker.clearNeedSync(resourceUuid);只有当该调用返回 true 时才调用
consecutivePingSuccessCount.remove(resourceUuid) 并触发
triggerSync(resourceUuid);否则不做任何操作,从而避免并发下重复发送 SyncSdnControllerDataMsg。

---

Nitpick comments:
In
`@plugin/sdnController/src/main/java/org/zstack/sdnController/SdnControllerPingTracker.java`:
- Line 53: The map consecutivePingSuccessCount in SdnControllerPingTracker can
retain stale entries when ownership changes via trackOurs(); update trackOurs()
to clean up counts for resources that are no longer tracked (same approach as
untrackAll()/onDelete/onFail/reconnect/sync triggers): after computing the new
tracked set in trackOurs() remove any keys from consecutivePingSuccessCount that
are not present in the new tracked set (or rebuild the map for the new set) so
stale counters do not accumulate.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: http://open.zstack.ai:20001/code-reviews/zstack-cloud.yaml (via .coderabbit.yaml)

Review profile: CHILL

Plan: Pro

Run ID: bd587ab6-d3ac-41f6-935e-16bd59034ec7

📥 Commits

Reviewing files that changed from the base of the PR and between c645d3d and 5e6d938.

⛔ Files ignored due to path filters (1)
  • conf/springConfigXml/sdnController.xml is excluded by !**/*.xml
📒 Files selected for processing (9)
  • plugin/sdnController/src/main/java/org/zstack/sdnController/DirtySyncTracker.java
  • plugin/sdnController/src/main/java/org/zstack/sdnController/DirtySyncTrackerImpl.java
  • plugin/sdnController/src/main/java/org/zstack/sdnController/SdnControllerBase.java
  • plugin/sdnController/src/main/java/org/zstack/sdnController/SdnControllerFactory.java
  • plugin/sdnController/src/main/java/org/zstack/sdnController/SdnControllerManagerImpl.java
  • plugin/sdnController/src/main/java/org/zstack/sdnController/SdnControllerPingTracker.java
  • plugin/sdnController/src/main/java/org/zstack/sdnController/header/SdnControllerCanonicalEvents.java
  • plugin/sdnController/src/main/java/org/zstack/sdnController/header/SdnControllerFlowDataParam.java
  • plugin/sdnController/src/main/java/org/zstack/sdnController/header/SyncSdnControllerDataReply.java
✅ Files skipped from review due to trivial changes (3)
  • plugin/sdnController/src/main/java/org/zstack/sdnController/DirtySyncTracker.java
  • plugin/sdnController/src/main/java/org/zstack/sdnController/SdnControllerFactory.java
  • plugin/sdnController/src/main/java/org/zstack/sdnController/header/SyncSdnControllerDataReply.java
🚧 Files skipped from review as they are similar to previous changes (4)
  • plugin/sdnController/src/main/java/org/zstack/sdnController/header/SdnControllerFlowDataParam.java
  • plugin/sdnController/src/main/java/org/zstack/sdnController/SdnControllerManagerImpl.java
  • plugin/sdnController/src/main/java/org/zstack/sdnController/DirtySyncTrackerImpl.java
  • plugin/sdnController/src/main/java/org/zstack/sdnController/SdnControllerBase.java

Comment on lines +122 to +134
@NeedJsonSchema
public static class SdnControllerSyncDiffOverThresholdData {
private String sdnControllerUuid;
private int totalDiff;
private int threshold;

public String getSdnControllerUuid() { return sdnControllerUuid; }
public void setSdnControllerUuid(String sdnControllerUuid) { this.sdnControllerUuid = sdnControllerUuid; }
public int getTotalDiff() { return totalDiff; }
public void setTotalDiff(int totalDiff) { this.totalDiff = totalDiff; }
public int getThreshold() { return threshold; }
public void setThreshold(int threshold) { this.threshold = threshold; }
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
# 目的:确认 SDN_CONTROLLER_SYNC_DIFF_OVER_THRESHOLD_PATH 是否被实际发布(evtf.fire)
# 预期:至少存在一处 fire(SDN_CONTROLLER_SYNC_DIFF_OVER_THRESHOLD_PATH, ...)

rg -n -C3 'SDN_CONTROLLER_SYNC_DIFF_OVER_THRESHOLD_PATH' plugin/sdnController/src/main/java/org/zstack/sdnController
rg -n -C3 'evtf\.fire\(' plugin/sdnController/src/main/java/org/zstack/sdnController
rg -n -C3 'diffOverThreshold|DiffOverThreshold|totalDiff|threshold' plugin/sdnController/src/main/java/org/zstack/sdnController

Repository: MatheMatrix/zstack

Length of output: 11938


🏁 Script executed:

# 搜索 SdnControllerSyncDiffOverThresholdData 是否被实例化或使用
rg -n 'new SdnControllerSyncDiffOverThresholdData|SdnControllerSyncDiffOverThresholdData' plugin/sdnController/src/

# 搜索所有的 evtf.fire 调用,查看是否有任何地方触发该事件
rg -n 'fire.*SDN_CONTROLLER_SYNC_DIFF_OVER_THRESHOLD' plugin/sdnController/src/

# 搜索是否在其他模块中有这个事件的引用
fd -t f . plugin/ --exec grep -l 'SDN_CONTROLLER_SYNC_DIFF_OVER_THRESHOLD_PATH' {} \;

Repository: MatheMatrix/zstack

Length of output: 332


删除未使用的事件定义,或添加相应的触发点。

当前代码定义了事件路径 SDN_CONTROLLER_SYNC_DIFF_OVER_THRESHOLD_PATH 和数据类 SdnControllerSyncDiffOverThresholdData,但在整个 sdnController 模块中未找到任何 evtf.fire() 调用来发布此事件。这将导致事件契约漂移(定义但永不触发)。建议:

  • 若此事件暂不需要,移除相关的定义;
  • 若需要,应在 SdnControllerBase.java 中检查状态同步逻辑,补充事件触发点(例如在同步差值超过阈值时调用 evtf.fire(SDN_CONTROLLER_SYNC_DIFF_OVER_THRESHOLD_PATH, ...))。
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In
`@plugin/sdnController/src/main/java/org/zstack/sdnController/header/SdnControllerCanonicalEvents.java`
around lines 122 - 134, The event SdnControllerSyncDiffOverThresholdData and
path SDN_CONTROLLER_SYNC_DIFF_OVER_THRESHOLD_PATH are defined but never fired;
either remove these unused definitions or add a firing point: locate
SdnControllerBase (state sync logic) and when computing the sync difference
compare against the threshold, instantiate
SdnControllerSyncDiffOverThresholdData with sdnControllerUuid, totalDiff and
threshold and call evtf.fire(SDN_CONTROLLER_SYNC_DIFF_OVER_THRESHOLD_PATH,
thatData) to publish the event; if you choose removal, delete both the
constant/path and the SdnControllerSyncDiffOverThresholdData class to avoid dead
API surface.

Comment on lines +117 to +123
if (factory.isSyncEnabled() && dirtySyncTracker.needsSync(resourceUuid)) {
int threshold = factory.getSyncPingSuccessThreshold();
if (count >= threshold) {
consecutivePingSuccessCount.remove(resourceUuid);
dirtySyncTracker.clearNeedSync(resourceUuid);
triggerSync(resourceUuid);
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

clearNeedSync() 作为原子门闸,避免并发下重复触发全量同步

当前先 needsSync()clearNeedSync(),且忽略 clearNeedSync() 返回值,存在竞态窗口;并发回调下可能重复发送 SyncSdnControllerDataMsg。建议只在 clearNeedSync(...) == true 时触发同步。

建议修改
-            if (factory.isSyncEnabled() && dirtySyncTracker.needsSync(resourceUuid)) {
-                int threshold = factory.getSyncPingSuccessThreshold();
-                if (count >= threshold) {
-                    consecutivePingSuccessCount.remove(resourceUuid);
-                    dirtySyncTracker.clearNeedSync(resourceUuid);
-                    triggerSync(resourceUuid);
-                }
-            }
+            if (factory.isSyncEnabled()) {
+                int threshold = Math.max(1, factory.getSyncPingSuccessThreshold());
+                if (count >= threshold && dirtySyncTracker.clearNeedSync(resourceUuid)) {
+                    consecutivePingSuccessCount.remove(resourceUuid);
+                    triggerSync(resourceUuid);
+                }
+            }
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
if (factory.isSyncEnabled() && dirtySyncTracker.needsSync(resourceUuid)) {
int threshold = factory.getSyncPingSuccessThreshold();
if (count >= threshold) {
consecutivePingSuccessCount.remove(resourceUuid);
dirtySyncTracker.clearNeedSync(resourceUuid);
triggerSync(resourceUuid);
}
if (factory.isSyncEnabled()) {
int threshold = Math.max(1, factory.getSyncPingSuccessThreshold());
if (count >= threshold && dirtySyncTracker.clearNeedSync(resourceUuid)) {
consecutivePingSuccessCount.remove(resourceUuid);
triggerSync(resourceUuid);
}
}
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In
`@plugin/sdnController/src/main/java/org/zstack/sdnController/SdnControllerPingTracker.java`
around lines 117 - 123, 当前逻辑先调用 dirtySyncTracker.needsSync(resourceUuid) 再调用
dirtySyncTracker.clearNeedSync(resourceUuid) 并忽略其返回值,存在竞态导致重复触发
triggerSync(resourceUuid)。修改为以 clearNeedSync(...) 的返回值作为原子门闩:在
factory.isSyncEnabled() 且 count >= factory.getSyncPingSuccessThreshold() 时,调用
dirtySyncTracker.clearNeedSync(resourceUuid);只有当该调用返回 true 时才调用
consecutivePingSuccessCount.remove(resourceUuid) 并触发
triggerSync(resourceUuid);否则不做任何操作,从而避免并发下重复发送 SyncSdnControllerDataMsg。

@MatheMatrix MatheMatrix deleted the sync/shixin.ruan/shixin.ruan-zcf-2133@@2 branch April 17, 2026 11:19
@MatheMatrix MatheMatrix restored the sync/shixin.ruan/shixin.ruan-zcf-2133@@2 branch April 17, 2026 11:20
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants