Skip to content

deps: [DO NOT MERGE] Start testing what will be DataFusion 54.0#3916

Draft
mbutrovich wants to merge 21 commits intoapache:mainfrom
mbutrovich:df54
Draft

deps: [DO NOT MERGE] Start testing what will be DataFusion 54.0#3916
mbutrovich wants to merge 21 commits intoapache:mainfrom
mbutrovich:df54

Conversation

@mbutrovich
Copy link
Copy Markdown
Contributor

@mbutrovich mbutrovich commented Apr 8, 2026

Starting to test what will be DataFusion 54.0 to see if that helps find issues in DataFusion when the RC comes around. Lately we haven't gotten all the work done in time in Comet to test an RC, and find issues after the .0 release and have to wait for at least .1 to bump.

@mbutrovich mbutrovich closed this Apr 14, 2026
@mbutrovich mbutrovich reopened this Apr 15, 2026
@mbutrovich
Copy link
Copy Markdown
Contributor Author

Added a test for the Spark SQL failure for FULL join with filter on NULL. Testing the fix now, bumped the dependency for DataFusion to test apache/datafusion#21660

github-merge-queue bot pushed a commit to apache/datafusion that referenced this pull request Apr 16, 2026
…er evaluates to NULL (#21660)

## Which issue does this PR close?

- Closes #.

## Rationale for this change

While enabling `SortMergeJoinExec` with filters in [DataFusion
Comet](https://github.com/apache/datafusion-comet), I hit a correctness
bug in DataFusion's `SortMergeJoinExec` for full outer joins with
nullable filter columns. The bug was originally surfaced via the
[SPARK-43113](https://issues.apache.org/jira/browse/SPARK-43113)
reproducer.

When a join filter expression evaluates to `NULL` (e.g., `l.b < (r.b +
1)` where `r.b` is `NULL`), the full outer join treats the row pair as
**matched** instead of **unmatched**. Per SQL semantics, `NULL` in a
boolean filter context is `false` (not satisfied), so the rows should be
emitted as separate unmatched rows.

The bug has been present since filtered full outer join support was
added to SMJ in #12764 / #13369. It was never caught because:
1. The join fuzz tests generate filter column data with
`Int32Array::from_iter_values()`, which never produces `NULL` values.
2. No existing unit test or sqllogictest exercised a full outer join
filter that evaluates to `NULL`.

## What changes are included in this PR?

**Root cause:** The full outer join code path had a special case that
preserved raw `NULL` values from the filter expression result
(`pre_mask`) instead of converting them to `false` via
`prep_null_mask_filter` like left/right outer joins do. This caused two
problems:

1. **Streamed (left) side:** In `get_corrected_filter_mask()`, `NULL`
entries in `filter_mask` are treated as "pass through" (for
pre-null-joined rows from `append_nulls()`). But when the filter
expression itself returns `NULL`, those entries also appear as `NULL` in
the mask — and get incorrectly treated as matched. This produced wrong
join output (matched rows instead of unmatched).

2. **Buffered (right) side:** `BooleanArray::value()` was called on
`NULL` positions in `pre_mask` to update `FilterState`. At NULL
positions, the values buffer contains a deterministic but semantically
meaningless result (computed from the default zero-storage of NULL
inputs). For some rows this value happens to be `true`, which
incorrectly marks unmatched buffered rows as `SomePassed` and silently
drops them from the output.

**Fix:** Remove the full outer join special case in
`materializing_stream.rs`. All outer join types now uniformly use the
null-corrected `mask` (where `NULL` → `false` via
`prep_null_mask_filter`) for both deferred filtering metadata and
`FilterState` tracking. Semi/anti/mark joins are unaffected — they use
`BitwiseSortMergeJoinStream` which already converts NULLs to `false`.

**Tests:**
- New unit test `join_full_null_filter_result` reproducing the
SPARK-43113 scenario with a nullable right-side column.
- Modified `make_staggered_batches_i32` in `join_fuzz.rs` to inject ~10%
`NULL` values into the filter column (`x`), so the fuzz tests exercise
`NULL` filter handling across all join types.

## Are these changes tested?

Yes.
- New unit test (`join_full_null_filter_result`) directly reproduces the
bug.
- Existing 57 SMJ unit tests all pass.
- All 41 join fuzz tests pass with the new nullable filter column data,
including `test_full_join_1k_filtered` which compares `HashJoinExec` vs
`SortMergeJoinExec` and would have caught this bug if the fuzz data had
included `NULL`s.
- Will run 100 iterations of the fuzz tests overnight to shake out any
remaining nondeterministic issues.
- Testing in Comet CI (all Spark SQL tests)
apache/datafusion-comet#3916

## Are there any user-facing changes?

Full outer sort-merge joins with filters involving nullable columns now
produce correct results. Previously, rows where the filter evaluated to
`NULL` were incorrectly included as matched; they are now correctly
emitted as unmatched (null-joined) rows.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant