fix(query): clarify condition resolution semantics for label queries#2994
fix(query): clarify condition resolution semantics for label queries#2994contrueCT wants to merge 2 commits intoapache:masterfrom
Conversation
Add explicit condition resolution APIs to ConditionQuery while preserving the legacy condition() behavior. Introduce containsCondition(Object), conditionValues(Object), and conditionValue(Object) so callers can distinguish missing, empty, unique, and multi-value results without overloading null semantics. Migrate LABEL-specific consumers in graph/index transactions, serializers, traversers, and stores to use the new APIs for unique-label resolution and conservative fallback behavior. Extend QueryTest and VertexCoreTest to cover absent, conflicting, and multi-value label conditions as well as collectMatchedIndexes() behavior for multi-label and conflicting label queries.
| return null; | ||
| } | ||
| return (Id) labels.iterator().next(); | ||
| } |
There was a problem hiding this comment.
uniqueLabel() duplicated in 4 files — consider consolidating
The uniqueLabel(ConditionQuery) helper is copy-pasted into 4 files with identical bodies:
GraphTransaction.java(private static)GraphIndexTransaction.java(private static)RamTable.java(private static)HstoreStore.java(private instance method — inconsistent with the other 3)
I understand these callers need "return null when multiple labels exist" semantics (unlike conditionValue() which throws). Two suggestions:
Option A — Add this as a first-class method on ConditionQuery (e.g., uniqueConditionValue(Object key) or singleConditionValueOrNull(Object key)) to eliminate all 4 copies:
public <T> T uniqueConditionValue(Object key) {
Set<Object> values = this.conditionValues(key);
if (values.size() != 1) {
return null;
}
@SuppressWarnings("unchecked")
T value = (T) values.iterator().next();
return value;
}Option B — If you prefer not to expand the ConditionQuery API surface, at minimum consolidate to a static utility in one shared location (e.g., a package-visible helper in ConditionQuery or a QueryUtil).
Also note: the HstoreStore version is an instance method while the other 3 are static — this inconsistency should be fixed either way.
| direction = Directions.OUT; | ||
| } | ||
| Id label = cq.condition(HugeKeys.LABEL); | ||
| Id label = cq.conditionValue(HugeKeys.LABEL); |
There was a problem hiding this comment.
conditionValue() vs uniqueLabel() for LABEL key
In this PR, some callers migrate to conditionValue(HugeKeys.LABEL) (serializers, traverser), while others use the new private uniqueLabel() helper (transactions, RamTable, HstoreStore). The two have different failure behavior for multi-label queries:
conditionValue()→ throwsIllegalStateExceptionif multiple values remainuniqueLabel()→ returns null silently
Is this difference intentional per call site? If so, a brief comment at each call site explaining why that particular semantic was chosen would help future readers. Otherwise, consider unifying to a single approach — either all callers that need "single or nothing" use one API, and those that need "single or fail" use the other, with the distinction documented.
| for (HugeKeys key : EdgeId.KEYS) { | ||
| Object value = cq.condition(key); | ||
| Object value = key == HugeKeys.LABEL ? | ||
| cq.conditionValue(key) : cq.condition(key); |
There was a problem hiding this comment.
This introduces a special-case ternary:
Object value = key == HugeKeys.LABEL ?
cq.conditionValue(key) : cq.condition(key);The iteration is over EdgeId.KEYS (OWNER_VERTEX, DIRECTION, LABEL, SORT_VALUES, OTHER_VERTEX), and only LABEL gets the new API — a fact that's not self-documenting. If another key eventually needs the same treatment, a reader must remember to add it here too.
Consider either:
- Adding a brief inline comment explaining why only LABEL needs
conditionValue()(e.g., "LABEL may have multiple IN conditions; other EdgeId keys are always single-valued"), or - Applying
conditionValue()uniformly for all keys in this loop if the semantics are safe (since the other keys are guaranteed single-valued,conditionValuewould behave identically toconditionfor them)
|
|
||
| private Set<Object> resolveConditionValues(List<Object> valuesEQ, | ||
| List<Object> valuesIN) { | ||
| boolean initialized = false; |
There was a problem hiding this comment.
resolveConditionValues duplicates logic already in condition() — consider extracting the shared intersection
The new resolveConditionValues() method contains the same intersection logic as the existing condition() method. The only difference is condition() has extra fast-path returns and the final single-value check.
Since condition() now delegates collectConditionValues() to the new private method (good!), it could also delegate the intersection to resolveConditionValues() to fully eliminate the duplicated loop:
public <T> T condition(Object key) {
List<Object> valuesEQ = InsertionOrderUtil.newList();
List<Object> valuesIN = InsertionOrderUtil.newList();
this.collectConditionValues(key, valuesEQ, valuesIN);
if (valuesEQ.isEmpty() && valuesIN.isEmpty()) {
return null;
}
// Keep legacy fast paths for backward compatibility
if (valuesEQ.size() == 1 && valuesIN.isEmpty()) { ... }
if (valuesEQ.isEmpty() && valuesIN.size() == 1) { ... }
// Delegate to shared intersection instead of duplicating
Set<Object> intersectValues = this.resolveConditionValues(valuesEQ, valuesIN);
// ... rest of legacy checks
}This would make condition() and conditionValues() share the same intersection code path, reducing divergence risk.
| } | ||
|
|
||
| private static Id uniqueLabel(ConditionQuery query) { | ||
| java.util.Set<Object> labels = query.conditionValues(HugeKeys.LABEL); |
There was a problem hiding this comment.
🧹 Minor: RamTable.uniqueLabel uses fully-qualified java.util.Set instead of import
private static Id uniqueLabel(ConditionQuery query) {
java.util.Set<Object> labels = query.conditionValues(HugeKeys.LABEL);The other 3 copies of uniqueLabel() use the imported Set. This one uses java.util.Set — likely a missing import. Trivial, but worth cleaning up for consistency (or moot if the duplication is resolved per the earlier comment).
Purpose of the PR
ConditionQuery.condition()currently mixes several different meanings in one API, including:This PR keeps the legacy
condition()behavior unchanged, adds explicit condition-resolution APIs, and migrates the high-riskLABELcall sites to use the clearer semantics.Main Changes
ConditionQuerycontainsCondition(Object key)conditionValues(Object key)conditionValue(Object key)condition()method backward-compatibleLABEL-related high-risk callers to the new APIs in:LABELlegacy usages in this first stepVerifying these changes
Added and extended regression coverage for the new semantics:
QueryTest#testConditionWithoutLabelQueryTest#testConditionWithEqAndInQueryTest#testConditionWithSingleInValuesQueryTest#testConditionWithConflictingEqAndInQueryTest#testConditionWithMultipleMatchedInValuesAdded a targeted regression for the label-index fallback path:
VertexCoreTest#testCollectMatchedIndexesByJointLabelsWithIndexedPropertiesThis test verifies:
Existing label-query regressions were also rechecked to ensure no behavior regression:
EdgeCoreTest#testQueryInEdgesOfVertexByLabelsEdgeCoreTest#testQueryInEdgesOfVertexByConflictingLabelsEdgeCoreTest#testQueryInEdgesOfVertexBySortkeyVertexCoreTest#testQueryByJointLabelsDoes this PR potentially affect the following parts?
Documentation Status
Doc - TODODoc - DoneDoc - No Need