Skip to content

SOLR-18198 Add support for "missing" stats in rollup for streaming expressions#4286

Open
KhushJain wants to merge 3 commits intoapache:mainfrom
KhushJain:khushjain/SOLR-18198
Open

SOLR-18198 Add support for "missing" stats in rollup for streaming expressions#4286
KhushJain wants to merge 3 commits intoapache:mainfrom
KhushJain:khushjain/SOLR-18198

Conversation

@KhushJain
Copy link
Copy Markdown
Contributor

@KhushJain KhushJain commented Apr 15, 2026

https://issues.apache.org/jira/browse/SOLR-18198

Description

Rollup function to support missing statistics in /stream handler.

Solution

Add a new MissingMetric to the streaming expressions framework:

  1. MissingMetric.java: New metric class modeled after CountMetric. Increments a counter when tuple.get(columnName) == null. Registered as function name missing.
  2. Lang.java: Register missing function name.
  3. ParallelMetricsRollup.java : Added SumMetric aggregation.

Tests

Updated existing tests in StreamingTest.java:

  • Fixture data: Added a new field b_f to only 5 of the 10 documents in helloDocsUpdateRequest (ids 0, 4, 5, 6, 9), leaving the other 5 without it. This avoids changing existing a_f
    assertions.
  • testRollupStream and testParallelRollupStream: Added MissingMetric("b_f") to the metrics array and asserted.

Checklist

Please review the following and check all that apply:

  • I have reviewed the guidelines for How to Contribute and my code conforms to the standards described there to the best of my ability.
  • I have created a Jira issue and added the issue ID to my pull request title.
  • I have given Solr maintainers access to contribute to my PR branch. (optional but recommended, not available for branches on forks living under an organisation)
  • I have developed this patch against the main branch.
  • I have run ./gradlew check.
  • I have added tests for my changes.
  • I have added documentation for the Reference Guide
  • I have added a changelog entry for my change

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds a new missing(field) metric to SolrJ streaming expressions and ensures it can be correctly rolled up when metrics are computed in parallel/tiered contexts.

Changes:

  • Introduces MissingMetric (new metric function missing) for counting null/missing values.
  • Registers the new metric in Lang and adds rollup support in ParallelMetricsRollup.
  • Updates streaming rollup tests and adds a changelog entry.

Reviewed changes

Copilot reviewed 5 out of 5 changed files in this pull request and generated 2 comments.

Show a summary per file
File Description
solr/solrj-streaming/src/test/org/apache/solr/client/solrj/io/stream/StreamingTest.java Extends rollup/parallel rollup tests and fixtures to validate missing(b_f) behavior.
solr/solrj-streaming/src/java/org/apache/solr/client/solrj/io/stream/metrics/MissingMetric.java Adds the new MissingMetric implementation and streaming-expression parsing/toExpression support.
solr/solrj-streaming/src/java/org/apache/solr/client/solrj/io/stream/ParallelMetricsRollup.java Adds rollup aggregation logic for MissingMetric (sum across tiers).
solr/solrj-streaming/src/java/org/apache/solr/client/solrj/io/Lang.java Registers missing as a streaming expression metric function.
changelog/unreleased/SOLR-18198-support-missing-stats-count-in-rollup.yml Documents the feature addition in the changelog.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copy link
Copy Markdown
Contributor

@epugh epugh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very close!

Comment thread changelog/unreleased/SOLR-18198-support-missing-stats-count-in-rollup.yml Outdated

@Override
public void update(Tuple tuple) {
if (tuple.get(columnName) == null) {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could "" ever be a value?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"" is a valid value for a field. tuple.get(columnName) would only be null when the field doesn't exist. Missing counts the number of documents that doesn't have the field itself. The behavior is consistent with how stats component computes missing.

@epugh
Copy link
Copy Markdown
Contributor

epugh commented Apr 16, 2026

Can you update the docs as well???

@KhushJain
Copy link
Copy Markdown
Contributor Author

KhushJain commented Apr 16, 2026

Can you update the docs as well???

Yes, updated the doc.

@github-actions github-actions bot added the documentation Improvements or additions to documentation label Apr 16, 2026
@KhushJain KhushJain requested a review from epugh April 16, 2026 16:15
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

client:solrj documentation Improvements or additions to documentation tests

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants