Skip to content

fix(trie): make TrieImpl.insert() idempotent for duplicate key-value#6696

Open
halibobo1205 wants to merge 3 commits intotronprotocol:developfrom
halibobo1205:fix/trie-insert-order-bug
Open

fix(trie): make TrieImpl.insert() idempotent for duplicate key-value#6696
halibobo1205 wants to merge 3 commits intotronprotocol:developfrom
halibobo1205:fix/trie-insert-order-bug

Conversation

@halibobo1205
Copy link
Copy Markdown
Collaborator

@halibobo1205 halibobo1205 commented Apr 20, 2026

Summary

Fix a correctness bug in TrieImpl.insert() that breaks Merkle Trie idempotency when the same key is put more than once (regardless of whether the value changes), and add a related optimization that avoids unnecessary recomputation on the same code path:

  1. Correctness (root-hash corruption). Reorder the condition checks in insert() so that commonPrefix.equals(k) is evaluated before commonPrefix.isEmpty(). The two conditions overlap only when both k and currentNodeKey are empty, which the recursion reaches exclusively on a duplicate put of an already-inserted key on a fully-split KV leaf. In that state, the old order incorrectly triggered the "no common prefix" branch, replacing KVNode("", v_old) with BranchNode{terminal: v_new}, thereby corrupting the trie structure and, in turn, the root hash. This happens whether or not the new value equals the old one.

  2. Optimization (avoid unnecessary recomputation). Short-circuit kvNodeSetValueOrNode() when the new byte[] value equals the existing one (by reference or byte content) to skip redundant dirty marking and downstream hash recomputation on the KV-leaf path. This is purely a performance optimization — it is not part of the correctness fix, and it does not by itself address the root-hash corruption; it only reduces cache churn. Branch-terminal writes and Node-valued writes are intentionally not short-circuited (parent nodes still need dirty propagation when a child is mutated in place).

Closes #6608.

What this fix really does

Restore the Merkle Trie invariant: the root hash must be a pure function of the {key → value} set, independent of insertion order. Before this fix, TrieImpl violated that definition — it only produced consistent roots in practice because callers fed it data in a deterministic order. After this fix, TrieImpl is a true Merkle Trie by definition.

When the latent defect is exercised

The defect lies in TrieImpl.insert() itself, not in any caller. Whether it has any observable effect depends on two independent conditions:

1. Is the buggy code path reachable?

  • ❌ Every key put at most once per trie → path not reachable, no effect.

  • ✅ Same key put ≥ 2 times (same or different value) → necessary but not sufficient. The buggy branch is reached only when the recursion lands on a node whose stored key and the incoming key both reduce to empty — i.e. the trie has evolved into a fully-split KV leaf at that position. This structural condition is rare in random workloads but not negligible, and is exactly what the regression sequences in testOrder reproduce.

    In practice, the only known production write path exercising TrieImpl.put() for account-state-root generation is AccountStateCallBack. Its trieEntryList is an undeduplicated ArrayList: if multiple transactions in a block touch the same account, trie.put(addr, …) is invoked repeatedly with the same key, so the precondition for triggering the path is satisfied during normal operation.

2. Does it produce a divergent root?

  • ❌ All callers feed the trie in the same order → all of them compute the same (non-canonical) root. No divergence, no observable problem.

  • ✅ Two callers feed the trie in different orders for the same {key → value} set → they compute different roots.

    On-chain block processing is deterministic: every node executes the same transactions in the same order, so condition 2 is never met in production. It is only met in synthetic scenarios — e.g. the unit test, which deliberately shuffles the input to expose the defect.

Net effect on a running chain: the bug is exercised, but its symptom is masked by deterministic execution order. All nodes converge on the same root. This is why it has gone unnoticed and why it is not a consensus issue — but it also means TrieImpl is not a true Merkle Trie: its root depends on insertion order rather than on the {key → value} set alone.

Consensus impact

None on mainnet and Nile. AccountStateCallBack is gated behind proposal 25 (ALLOW_ACCOUNT_STATE_ROOT), which is not activated on either network — the computed state root does not enter block headers and does not participate in consensus.

Even if proposal 25 were activated, all honest nodes would still agree on the root with or without this fix, because transaction execution order within a block is deterministic and identical across nodes. The shuffle used in the unit test is a synthetic construct to surface the latent defect; no such reordering occurs in production. In other words, on-chain TrieImpl happens to produce consistent roots today because the caller feeds it a deterministic sequence, not because the implementation satisfies the Merkle invariant.

So the fix is about correctness of the data structure itself, not about preventing a fork.

⚠️ Compatibility note for downstream fork chains

Any fork chain that has already activated proposal 25 in production has committed accountStateRoot values to block headers that were computed with the buggy TrieImpl. For those chains, this fix is not a transparent in-place upgrade:

  • Historical-block replay would recompute different roots than those stored in existing headers → validation failures.
  • New-block consensus would split between pre- and post-upgrade nodes.

Such forks must coordinate the upgrade via a hard-fork activation height (or gate the new behavior behind a dedicated proposal). Mainnet and Nile are not affected since proposal 25 has never been activated there; they can take this fix as an ordinary release.

Release scope

  • Mainnet / Nile: safe to include in any upcoming release. No consensus implications.
  • Recommended (defense-in-depth): include this fix before any future release that could enable proposal 25, so that the data structure is correct by definition before the state root is ever committed to headers.

See #6608 for the full reproduction and affected code paths.

…puts

Reorder condition checks in insert() so that commonPrefix.equals(k) is
evaluated before commonPrefix.isEmpty(). When both k and currentNodeKey
are empty (which happens on a duplicate put of a fully-split key), the
old order incorrectly fired the "no common prefix" branch and replaced
KVNode("",v) with BranchNode{terminal:v}, corrupting the root hash.

Also short-circuit kvNodeSetValueOrNode() when the new value equals the
existing one (by reference or by byte content) to avoid unnecessary
dirty marking and downstream hash recomputation.

close tronprotocol#6608
Comment thread framework/src/main/java/org/tron/core/trie/TrieImpl.java
Comment thread framework/src/main/java/org/tron/core/trie/TrieImpl.java
Comment thread framework/src/test/java/org/tron/core/tire/TrieTest.java Outdated
Comment thread framework/src/test/java/org/tron/core/tire/TrieTest.java Outdated
Comment thread framework/src/test/java/org/tron/core/tire/TrieTest.java
@waynercheung
Copy link
Copy Markdown
Collaborator

waynercheung commented Apr 20, 2026

[MUST] The consensus-impact analysis is already documented in #6608, but per the repo’s consensus-change requirements, please mirror a short summary into the PR description itself and explicitly document:

  1. the target release / upgrade scope for this fix;
  2. that this fix must be included in any release intended to run before / when Proposal 25 (ALLOW_ACCOUNT_STATE_ROOT) may activate.

A short summary referencing #6608 is sufficient; no need to restate the full analysis.

Optional cleanup: since this PR already touches TrieTest.java, it would also be nice to remove the two pre-existing System.out.println lines at TrieTest.java:67-68 in this PR if convenient. If you prefer to keep this PR tightly scoped, a small follow-up cleanup PR is also fine.

@halibobo1205
Copy link
Copy Markdown
Collaborator Author

@waynercheung

  1. The PR description is updated.
  2. TrieTest.java:67-68 is cleaned up.

@halibobo1205 halibobo1205 added the topic:DB Database label Apr 21, 2026
@halibobo1205 halibobo1205 added this to the GreatVoyage-v4.8.2 milestone Apr 21, 2026
@lvs0075
Copy link
Copy Markdown
Collaborator

lvs0075 commented Apr 22, 2026

I have a question regarding the description: the current code does not affect the consensus. The modification is merely intended to ensure that the results of the random write are consistent.

@halibobo1205
Copy link
Copy Markdown
Collaborator Author

@lvs0075

You're right, and thanks for pointing this out — I'll reword the description.

To be precise: this fix does not affect consensus on mainnet or Nile. On-chain write order is deterministic, so all honest nodes compute the same root regardless of this bug. The shuffle in the unit test is a synthetic scenario to expose the defect; it does not correspond to anything that happens in production execution.

What this PR actually does is restore the Merkle trie invariant itself: the root must be a pure function of the {key → value} set, independent of insertion order. Without this fix, TrieImpl is not a true Merkle trie by definition — it just happens to produce consistent roots in practice because the caller feeds it a deterministic order.

One caveat worth flagging in the description (not a consensus issue on mainnet/Nile, but a compatibility note for downstream): any fork chain that has already activated proposal 25 in production has committed accountStateRoot values computed with the buggy TrieImpl. Those forks cannot hot-upgrade to this fix — they'd need a coordinated activation height, otherwise historical-block validation would fail, and new blocks would split between pre- and post-upgrade nodes. Mainnet / Nile are unaffected since proposal 25 is not activated.

I'll update the PR description to remove the misleading "must be merged before proposal 25 or nodes will fork" framing and replace it with the invariant-restoration framing plus the downstream warning.

Copy link
Copy Markdown
Collaborator

@waynercheung waynercheung left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

topic:DB Database

Projects

None yet

Development

Successfully merging this pull request may close these issues.

bug: TrieImpl.insert() is not idempotent for duplicate key-value pairs

4 participants