feat(queryengine): adaptive Plan Cache for table model queries by Young-Leo · Pull Request #17490 · apache/iotdb

Young-Leo · 2026-04-15T10:30:59Z

Description

Introduce an adaptive Plan Cache that caches optimized logical plans for table model queries,
skipping repeated Logical Plan Generation and Logical Optimization for structurally identical SQL.

Core mechanism

Query fingerprinting: Parameterize literals in SQL and generate an MD5 cache key, so
SELECT * FROM t WHERE device_id = 'd1' and device_id = 'd2' share the same template.
Three-state machine per template: MONITOR → ACTIVE → BYPASS. Each template independently
accumulates EWMA samples before promotion; low-benefit templates are bypassed with a cooldown.
Safe plan reuse: Only the logical plan is cached. On HIT, literals are re-bound and
adjustSchema re-resolves devices, time filters, and partition info — Distribution Plan is
never cached, avoiding stale region/shard topology.
LRU eviction: LinkedHashMap(accessOrder=true) with dual limits (1000 entries, 64 MB).

Admission control

Transition	Condition
MONITOR → ACTIVE	`sampleCount ≥ 5` ∧ `ewmaPlanCost ≥ 1 ms` ∧ `benefitRatio ≥ 30%`
MONITOR → BYPASS	`sampleCount ≥ 5` ∧ (`planCost < 1 ms` ∨ `benefitRatio < 20%`)
ACTIVE → MONITOR	`benefitRatio < 30%` ∨ `planCost < 1 ms` (reset & re-evaluate)
BYPASS → MONITOR	cooldown expires (default 10 min)

HIT-path latency tracking: during sustained cache hits, firstResponseLatency is still
recorded and ewmaBenefitRatio is refreshed using historical ewmaReusablePlanningCost,
enabling natural degradation when execution cost grows.

Observability

EXPLAIN ANALYZE VERBOSE outputs Plan Cache diagnostics:
Plan Cache State: ACTIVE
Plan Cache Saved: 5.23 ms
EWMA Reusable Planning Cost: 5.23 ms (threshold: 1.00 ms)
EWMA First Response Latency: 11.80 ms
EWMA Benefit Ratio: 0.4434 (admit: 0.30, bypass: 0.20)
Profile Counters: samples=7, hits=12, misses=0, bypasses=0

Files changed (production)

File	Change
`PlanCacheManager.java`	New. LRU cache + three-state machine + EWMA evaluation
`CachedValue.java`	New. Cached plan container with deep-clone and memory estimation
`LiteralMarkerReplacer.java`	New. AST visitor for literal parameterization
`TableNameRewriter.java`	New. AST rewriter for table name normalization in cache key
`TableLogicalPlanner.java`	Integrate cache lookup / write-back / statistics feedback
`IoTDBConfig.java` / `IoTDBDescriptor.java`	10 new config properties (`smart_plan_cache_*`)
`MPPQueryContext.java`	Bridge planner diagnostics → EXPLAIN ANALYZE output
`QueryPlanStatistics.java`	New fields for EWMA metrics and profile counters
`QueryExecution.java`	Record execution time back to `PlanCacheManager`
`FragmentInstanceStatisticsDrawer.java`	Render Plan Cache diagnostics in VERBOSE mode
`ExplainAnalyzeOperator.java`	Pass `verbose` flag to statistics renderer
`Literal.java` / `StringLiteral` / `LongLiteral` / `BinaryLiteral`	Add `generalizedCopy()` for parameterization
`SymbolAllocator.java`	Expose `setNextId()` for cached symbol counter restore
`PushPredicateIntoTableScan.java`	Minor adjustment for cached plan compatibility
`ExpressionFormatter.java`	Support formatting of parameterized marker literals

Tests

PlanCacheTest.java — unit tests for state machine transitions and cache eviction

…prove planning observability Add MONITOR/ACTIVE/BYPASS state machine and admission control to Plan Cache Make cache decisions based on sample metrics and benefit ratio, with hit/miss/bypass reason tracking Extend query planning statistics with cache state, lookup cost, reusable planning cost, and related metrics Expose Plan Cache metrics in plan statistics output for easier performance analysis and debugging Improve logical planner cache path and plan cloning details (including ExplainAnalyze handling) Refine predicate pushdown metadata collection to support one-to-one mapping for multiple scan nodes Add Plan Cache tests for state promotion and bypass behavior, and fix related minor issues Add SQL generation script and sample batch insert file for validation/testing

…externalize config P0: Move recordExecution from TableLogicalPlanner to QueryExecution.recordExecutionTime() - Use first RPC server-side time as real firstResponseLatency instead of FE-only estimate - Store cachedKey and lookupAttempted in MPPQueryContext for deferred evaluation P1: Add ACTIVE->MONITOR degradation path in TemplateProfile state machine - Prevents historical hot templates from polluting cache indefinitely P2: Replace hardcoded constants with IoTDBConfig-backed configuration - Add smart_plan_cache_* properties to IoTDBConfig and IoTDBDescriptor - Support runtime config via iotdb-system.properties

… output When a query template is ACTIVE in plan cache, EXPLAIN ANALYZE was returning raw query results instead of the analysis report because the cache HIT path early-returned without wrapping in ExplainAnalyzeNode. Fix: skip cache early-return for EXPLAIN ANALYZE but still report the correct Plan Cache Status (HIT) and State (ACTIVE) in output.

When using EXPLAIN ANALYZE VERBOSE, the output now includes detailed plan cache profile diagnostics: - EWMA Reusable Planning Cost with threshold comparison - EWMA First Response Latency - EWMA Benefit Ratio with admit/bypass threshold comparison - Profile Counters (samples, hits, misses, bypasses) This helps diagnose why a query template is in ACTIVE/MONITOR/BYPASS state and how its EWMA metrics compare against configured thresholds.

The first execution of a query template after JVM startup has significantly inflated planning costs due to class loading, JIT compilation, and cold internal caches. With EWMA alpha=0.5, this single outlier takes 4-5 additional samples to decay, delaying accurate steady-state estimation. Skip the first recordExecution call per TemplateProfile so that EWMA metrics reflect true steady-state planning performance.

…cking design - Change default admitRatio from 0.20 to 0.30 and bypassRatio from 0.10 to 0.20 in IoTDBConfig to better separate high-benefit (LAST ~44%) from marginal (stddev ~23%) query templates - Document HIT-path latency tracking strategy: during sustained cache hits, continue recording firstResponseLatency and updating ewmaBenefitRatio using historical ewmaReusablePlanningCost, enabling natural ACTIVE to MONITOR degradation when execution cost grows - Update design doc to V1.4 with threshold tuning rationale, stddev observations, and degradation refresh mechanism

- Fix misleading adjustSchema arrow in system overview: replace single loopback arrow with explicit HIT vs normal path flow - Add table explaining why only MISS and MONITOR invoke recordExecution (HIT has no real planning cost; BYPASS is in cooldown)

Remove design docs, test scripts, and data generation files that were used during development but should not be included in the code PR.

Young-Leo added 8 commits April 15, 2026 14:47

chore: remove auxiliary files not intended for PR

761d803

Remove design docs, test scripts, and data generation files that were used during development but should not be included in the code PR.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(queryengine): adaptive Plan Cache for table model queries#17490

feat(queryengine): adaptive Plan Cache for table model queries#17490
Young-Leo wants to merge 8 commits intoapache:masterfrom
Young-Leo:ly/planCache2

Young-Leo commented Apr 15, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

Young-Leo commented Apr 15, 2026

Description

Core mechanism

Admission control

Observability

Files changed (production)

Tests

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant