fix(query): clarify condition resolution semantics for label queries by contrueCT · Pull Request #2994 · apache/hugegraph

contrueCT · 2026-04-13T11:15:30Z

Purpose of the PR

close [Improve]: clarify ConditionQuery.condition() semantics for missing, conflicting, and multi-value conditions #2992

ConditionQuery.condition() currently mixes several different meanings in one API, including:

no condition
conflicting conditions resolved to empty
a unique resolved value
a raw multi-value result
an exception for ambiguous resolved values

This PR keeps the legacy condition() behavior unchanged, adds explicit condition-resolution APIs, and migrates the high-risk LABEL call sites to use the clearer semantics.

Main Changes

Add explicit condition-resolution APIs to ConditionQuery
- containsCondition(Object key)
- conditionValues(Object key)
- conditionValue(Object key)
Keep the legacy condition() method backward-compatible
Document the semantic differences between the legacy API and the new explicit APIs
Migrate LABEL-related high-risk callers to the new APIs in:
- graph/index transactions
- serializers
- traversers
- in-memory / hstore paths
Preserve the old behavior for non-LABEL legacy usages in this first step

Verifying these changes

Added and extended regression coverage for the new semantics:

QueryTest#testConditionWithoutLabel
QueryTest#testConditionWithEqAndIn
QueryTest#testConditionWithSingleInValues
QueryTest#testConditionWithConflictingEqAndIn
QueryTest#testConditionWithMultipleMatchedInValues

Added a targeted regression for the label-index fallback path:

VertexCoreTest#testCollectMatchedIndexesByJointLabelsWithIndexedProperties

This test verifies:

a multi-label query can conservatively fall back and still match the indexed label
conflicting label conditions produce no matched indexes

Existing label-query regressions were also rechecked to ensure no behavior regression:

EdgeCoreTest#testQueryInEdgesOfVertexByLabels
EdgeCoreTest#testQueryInEdgesOfVertexByConflictingLabels
EdgeCoreTest#testQueryInEdgesOfVertexBySortkey
VertexCoreTest#testQueryByJointLabels

Trivial rework / code cleanup without any test coverage. (No Need)
Already covered by existing tests, such as (please modify tests here).
Need tests and can be verified as follows:
- mvn -pl hugegraph-server/hugegraph-test -am -P core-test,memory -DfailIfNoTests=false -Dtest='QueryTest#testConditionWithoutLabel+testConditionWithEqAndIn+testConditionWithSingleInValues+testConditionWithConflictingEqAndIn+testConditionWithMultipleMatchedInValues' test
- mvn -pl hugegraph-server/hugegraph-test -am -P core-test,memory -DfailIfNoTests=false -Dtest='EdgeCoreTest#testQueryInEdgesOfVertexByLabels+testQueryInEdgesOfVertexByConflictingLabels+testQueryInEdgesOfVertexBySortkey' test
- mvn -pl hugegraph-server/hugegraph-test -am -P core-test,memory -DfailIfNoTests=false -Dtest='VertexCoreTest#testQueryByJointLabels+testCollectMatchedIndexesByJointLabelsWithIndexedProperties' test

Does this PR potentially affect the following parts?

Dependencies (add/update license info & regenerate_known_dependencies.sh)
Modify configurations
The public API
Other affects (typed here)
Nope

Documentation Status

Doc - TODO
Doc - Done
Doc - No Need

Add explicit condition resolution APIs to ConditionQuery while preserving the legacy condition() behavior. Introduce containsCondition(Object), conditionValues(Object), and conditionValue(Object) so callers can distinguish missing, empty, unique, and multi-value results without overloading null semantics. Migrate LABEL-specific consumers in graph/index transactions, serializers, traversers, and stores to use the new APIs for unique-label resolution and conservative fallback behavior. Extend QueryTest and VertexCoreTest to cover absent, conflicting, and multi-value label conditions as well as collectMatchedIndexes() behavior for multi-label and conflicting label queries.

imbajin · 2026-04-18T20:08:59Z

+            return null;
+        }
+        return (Id) labels.iterator().next();
+    }


‼️ Code Duplication: uniqueLabel() duplicated in 4 files — consider consolidating

The uniqueLabel(ConditionQuery) helper is copy-pasted into 4 files with identical bodies:

GraphTransaction.java (private static)

GraphIndexTransaction.java (private static)

RamTable.java (private static)

HstoreStore.java (private instance method — inconsistent with the other 3)

I understand these callers need "return null when multiple labels exist" semantics (unlike conditionValue() which throws). Two suggestions:

Option A — Add this as a first-class method on ConditionQuery (e.g., uniqueConditionValue(Object key) or singleConditionValueOrNull(Object key)) to eliminate all 4 copies:

public <T> T uniqueConditionValue(Object key) { Set<Object> values = this.conditionValues(key); if (values.size() != 1) { return null; } @SuppressWarnings("unchecked") T value = (T) values.iterator().next(); return value; }

Option B — If you prefer not to expand the ConditionQuery API surface, at minimum consolidate to a static utility in one shared location (e.g., a package-visible helper in ConditionQuery or a QueryUtil).

Also note: the HstoreStore version is an instance method while the other 3 are static — this inconsistency should be fixed either way.

imbajin · 2026-04-18T20:09:00Z

            direction = Directions.OUT;
        }
-        Id label = cq.condition(HugeKeys.LABEL);
+        Id label = cq.conditionValue(HugeKeys.LABEL);


⚠️ Inconsistent migration: mixed conditionValue() vs uniqueLabel() for LABEL key

In this PR, some callers migrate to conditionValue(HugeKeys.LABEL) (serializers, traverser), while others use the new private uniqueLabel() helper (transactions, RamTable, HstoreStore). The two have different failure behavior for multi-label queries:

conditionValue() → throws IllegalStateException if multiple values remain

uniqueLabel() → returns null silently

Is this difference intentional per call site? If so, a brief comment at each call site explaining why that particular semantic was chosen would help future readers. Otherwise, consider unifying to a single approach — either all callers that need "single or nothing" use one API, and those that need "single or fail" use the other, with the distinction documented.

imbajin · 2026-04-18T20:09:01Z

        for (HugeKeys key : EdgeId.KEYS) {
-            Object value = cq.condition(key);
+            Object value = key == HugeKeys.LABEL ?
+                           cq.conditionValue(key) : cq.condition(key);


⚠️ Ternary for LABEL-only routing is fragile

This introduces a special-case ternary:

Object value = key == HugeKeys.LABEL ? cq.conditionValue(key) : cq.condition(key);

The iteration is over EdgeId.KEYS (OWNER_VERTEX, DIRECTION, LABEL, SORT_VALUES, OTHER_VERTEX), and only LABEL gets the new API — a fact that's not self-documenting. If another key eventually needs the same treatment, a reader must remember to add it here too.

Consider either:

Adding a brief inline comment explaining why only LABEL needs conditionValue() (e.g., "LABEL may have multiple IN conditions; other EdgeId keys are always single-valued"), or

Applying conditionValue() uniformly for all keys in this loop if the semantics are safe (since the other keys are guaranteed single-valued, conditionValue would behave identically to condition for them)

imbajin · 2026-04-18T20:09:03Z

+
+    private Set<Object> resolveConditionValues(List<Object> valuesEQ,
+                                               List<Object> valuesIN) {
+        boolean initialized = false;


⚠️ resolveConditionValues duplicates logic already in condition() — consider extracting the shared intersection

The new resolveConditionValues() method contains the same intersection logic as the existing condition() method. The only difference is condition() has extra fast-path returns and the final single-value check.

Since condition() now delegates collectConditionValues() to the new private method (good!), it could also delegate the intersection to resolveConditionValues() to fully eliminate the duplicated loop:

public <T> T condition(Object key) { List<Object> valuesEQ = InsertionOrderUtil.newList(); List<Object> valuesIN = InsertionOrderUtil.newList(); this.collectConditionValues(key, valuesEQ, valuesIN); if (valuesEQ.isEmpty() && valuesIN.isEmpty()) { return null; } // Keep legacy fast paths for backward compatibility if (valuesEQ.size() == 1 && valuesIN.isEmpty()) { ... } if (valuesEQ.isEmpty() && valuesIN.size() == 1) { ... } // Delegate to shared intersection instead of duplicating Set<Object> intersectValues = this.resolveConditionValues(valuesEQ, valuesIN); // ... rest of legacy checks }

This would make condition() and conditionValues() share the same intersection code path, reducing divergence risk.

imbajin · 2026-04-18T20:09:04Z

    }

+    private static Id uniqueLabel(ConditionQuery query) {
+        java.util.Set<Object> labels = query.conditionValues(HugeKeys.LABEL);


🧹 Minor: RamTable.uniqueLabel uses fully-qualified java.util.Set instead of import

private static Id uniqueLabel(ConditionQuery query) { java.util.Set<Object> labels = query.conditionValues(HugeKeys.LABEL);

The other 3 copies of uniqueLabel() use the imported Set. This one uses java.util.Set — likely a missing import. Trivial, but worth cleaning up for consistency (or moot if the duplication is resolved per the earlier comment).

dosubot bot added the size:L This PR changes 100-499 lines, ignoring generated files. label Apr 13, 2026

test(core): avoid unnecessary data setup in label index regression

43d4541

imbajin reviewed Apr 18, 2026

View reviewed changes

contrueCT changed the title ~~improve(query): clarify condition resolution semantics for label queries~~ fix(query): clarify condition resolution semantics for label queries Apr 19, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(query): clarify condition resolution semantics for label queries#2994

fix(query): clarify condition resolution semantics for label queries#2994
contrueCT wants to merge 2 commits intoapache:masterfrom
contrueCT:task/improve-condition-query-semantics

contrueCT commented Apr 13, 2026

Uh oh!

imbajin Apr 18, 2026

Uh oh!

imbajin Apr 18, 2026

Uh oh!

imbajin Apr 18, 2026

Uh oh!

imbajin Apr 18, 2026

Uh oh!

imbajin Apr 18, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

contrueCT commented Apr 13, 2026

Purpose of the PR

Main Changes

Verifying these changes

Does this PR potentially affect the following parts?

Documentation Status

Uh oh!

imbajin Apr 18, 2026

Choose a reason for hiding this comment

Uh oh!

imbajin Apr 18, 2026

Choose a reason for hiding this comment

Uh oh!

imbajin Apr 18, 2026

Choose a reason for hiding this comment

Uh oh!

imbajin Apr 18, 2026

Choose a reason for hiding this comment

Uh oh!

imbajin Apr 18, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants