Skip to content

Explorer v2 behind ?v=2 (lite parquet, lazy description, no RANDOM, lazy Cesium)#126

Merged
rdhyee merged 1 commit intoisamplesorg:mainfrom
rdhyee:feat/explorer-v2-flag
Apr 17, 2026
Merged

Explorer v2 behind ?v=2 (lite parquet, lazy description, no RANDOM, lazy Cesium)#126
rdhyee merged 1 commit intoisamplesorg:mainfrom
rdhyee:feat/explorer-v2-flag

Conversation

@rdhyee
Copy link
Copy Markdown
Contributor

@rdhyee rdhyee commented Apr 17, 2026

Second of two PRs for the Explorer rethink. v1 untouched; opt-in v2 via ?v=2.

Four moves (each gated on version):

  1. Lite parquet as primarysamples_map_lite.parquet (60 MB) instead of wide.parquet (278 MB).
  2. No ORDER BY RANDOM() — bare LIMIT is ~20× faster on columnar parquet; accept row-order bias for now.
  3. Lazy description fetch — query wide.parquet for just one pid when a sample is clicked.
  4. Lazy Cesium mount — don't construct the viewer until viewMode === 'globe'.

v1 baseline captured in PR #124's panel:

  • nav → count ready: 28.9 s
  • nav → samples ready: 16.5 s
  • sample data query: 14.1 s

v2 expected to slash these by a factor of 4-10. Measured via the same ?perf=1 panel — will post numbers after merge.

🤖 Generated with Claude Code

…RANDOM, lazy Cesium)

Four architectural moves, each gated on ?v=2. v1 stays unchanged.

1. Primary read surface: samples_map_lite.parquet (60 MB) instead of
   wide.parquet (278 MB). The lite file has every column the Explorer
   needs for the list + globe view except description.

2. No ORDER BY RANDOM(). v1 uses RANDOM() which forces a scan across
   row groups; v2 uses bare LIMIT, accepting row-order bias in exchange
   for ~20× query speedup on columnar parquet. (Trade-off acceptable
   for a viz sample; revisit if source clustering becomes visible.)

3. Lazy description fetch. v2 drops description from sampleData and
   adds a lazyDescription cell that queries wide.parquet for just the
   one pid when a sample is clicked. sampleCard falls back to
   lazyDescription when s.description is empty.

4. Lazy Cesium mount. v2 returns null from the viewer cell until
   viewMode === 'globe', so the viewer constructor (~500 ms) doesn't
   run for users who stay in list/table view. v1 mounts eagerly.

whereClause handles column-name drift (v1 uses `n`, v2 uses `source`)
and skips the otype filter for v2 (lite is samples-only). Text search
in v2 is limited to label + place_name (description isn't loaded
eagerly). v1 keeps description search.

Next: measure v2 and compare against the PR isamplesorg#124 baseline.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
@rdhyee rdhyee force-pushed the feat/explorer-v2-flag branch from 63ac873 to 8f48ba8 Compare April 17, 2026 17:02
@rdhyee rdhyee merged commit b9790f9 into isamplesorg:main Apr 17, 2026
1 check passed
@rdhyee
Copy link
Copy Markdown
Contributor Author

rdhyee commented Apr 17, 2026

v2 results (3 cold runs on production, headless Chrome, cache disabled)

Metric v1 v2 Δ
first-paint 512 ms 400 ms −22%
DuckDB init + views 1.59 s 1.35 s −15%
facet summaries query 93 ms 66 ms −29%
sample data query 14.1 s 8.1 s −42%
count query 26.5 s 21.1 s −20%
nav → samples ready 16.5 s 10.3 s −38%
nav → count ready 28.9 s 23.3 s −19%

Meaningful — sample data query cut nearly in half, overall time-to-usable dropped ~6 seconds. But less than expected (we hoped for 4–10×, got ~2×).

Why it's not more

v2 still makes 113 range requests to the lite parquet on initial load. Switching file size 278 MB → 60 MB only helped so much because the bottleneck isn't total bytes — it's DuckDB-WASM's many small range requests + the facet subquery join against sample_facets_v2 (63 MB). The four moves were architecturally right but hit a deeper wall.

Where the real wins likely are next

  1. Skip the COUNT query entirely for unfiltered/simple cases — use pre-computed counts from facet_summaries.parquet (already loaded, 2 KB) when no facets are selected. Could drop "nav → count ready" from 23 s to ~2 s. Lowest-hanging fruit.
  2. Port the cross-filter pre-aggregation pattern from Interactive Explorer — for common filter combinations, answer from the 6 KB cache instead of querying lite at all.
  3. ATTACH a .duckdb file — instead of read_parquet with many range requests, use DuckDB's native file format (one bigger fetch, fewer round-trips). Requires a build-step change to produce the file in the PQG pipeline.

Status

Paused here. v2 is live behind ?v=2. Users who want it can opt in; default is still v1. Next step is a separate design decision (options 1–3 above) before more code.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant