Explorer v2 behind ?v=2 (lite parquet, lazy description, no RANDOM, lazy Cesium) by rdhyee · Pull Request #126 · isamplesorg/isamplesorg.github.io

rdhyee · 2026-04-17T17:01:45Z

Second of two PRs for the Explorer rethink. v1 untouched; opt-in v2 via ?v=2.

Four moves (each gated on version):

Lite parquet as primary — samples_map_lite.parquet (60 MB) instead of wide.parquet (278 MB).
No ORDER BY RANDOM() — bare LIMIT is ~20× faster on columnar parquet; accept row-order bias for now.
Lazy description fetch — query wide.parquet for just one pid when a sample is clicked.
Lazy Cesium mount — don't construct the viewer until viewMode === 'globe'.

v1 baseline captured in PR #124's panel:

nav → count ready: 28.9 s
nav → samples ready: 16.5 s
sample data query: 14.1 s

v2 expected to slash these by a factor of 4-10. Measured via the same ?perf=1 panel — will post numbers after merge.

🤖 Generated with Claude Code

…RANDOM, lazy Cesium) Four architectural moves, each gated on ?v=2. v1 stays unchanged. 1. Primary read surface: samples_map_lite.parquet (60 MB) instead of wide.parquet (278 MB). The lite file has every column the Explorer needs for the list + globe view except description. 2. No ORDER BY RANDOM(). v1 uses RANDOM() which forces a scan across row groups; v2 uses bare LIMIT, accepting row-order bias in exchange for ~20× query speedup on columnar parquet. (Trade-off acceptable for a viz sample; revisit if source clustering becomes visible.) 3. Lazy description fetch. v2 drops description from sampleData and adds a lazyDescription cell that queries wide.parquet for just the one pid when a sample is clicked. sampleCard falls back to lazyDescription when s.description is empty. 4. Lazy Cesium mount. v2 returns null from the viewer cell until viewMode === 'globe', so the viewer constructor (~500 ms) doesn't run for users who stay in list/table view. v1 mounts eagerly. whereClause handles column-name drift (v1 uses `n`, v2 uses `source`) and skips the otype filter for v2 (lite is samples-only). Text search in v2 is limited to label + place_name (description isn't loaded eagerly). v1 keeps description search. Next: measure v2 and compare against the PR isamplesorg#124 baseline. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

rdhyee · 2026-04-17T17:15:04Z

v2 results (3 cold runs on production, headless Chrome, cache disabled)

Metric	v1	v2	Δ
first-paint	512 ms	400 ms	−22%
DuckDB init + views	1.59 s	1.35 s	−15%
facet summaries query	93 ms	66 ms	−29%
sample data query	14.1 s	8.1 s	−42%
count query	26.5 s	21.1 s	−20%
nav → samples ready	16.5 s	10.3 s	−38%
nav → count ready	28.9 s	23.3 s	−19%

Meaningful — sample data query cut nearly in half, overall time-to-usable dropped ~6 seconds. But less than expected (we hoped for 4–10×, got ~2×).

Why it's not more

v2 still makes 113 range requests to the lite parquet on initial load. Switching file size 278 MB → 60 MB only helped so much because the bottleneck isn't total bytes — it's DuckDB-WASM's many small range requests + the facet subquery join against sample_facets_v2 (63 MB). The four moves were architecturally right but hit a deeper wall.

Where the real wins likely are next

Skip the COUNT query entirely for unfiltered/simple cases — use pre-computed counts from facet_summaries.parquet (already loaded, 2 KB) when no facets are selected. Could drop "nav → count ready" from 23 s to ~2 s. Lowest-hanging fruit.
Port the cross-filter pre-aggregation pattern from Interactive Explorer — for common filter combinations, answer from the 6 KB cache instead of querying lite at all.
ATTACH a .duckdb file — instead of read_parquet with many range requests, use DuckDB's native file format (one bigger fetch, fewer round-trips). Requires a build-step change to produce the file in the PQG pipeline.

Status

Paused here. v2 is live behind ?v=2. Users who want it can opt in; default is still v1. Next step is a separate design decision (options 1–3 above) before more code.

rdhyee force-pushed the feat/explorer-v2-flag branch from 63ac873 to 8f48ba8 Compare April 17, 2026 17:02

rdhyee merged commit b9790f9 into isamplesorg:main Apr 17, 2026
1 check passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Explorer v2 behind ?v=2 (lite parquet, lazy description, no RANDOM, lazy Cesium)#126

Explorer v2 behind ?v=2 (lite parquet, lazy description, no RANDOM, lazy Cesium)#126
rdhyee merged 1 commit intoisamplesorg:mainfrom
rdhyee:feat/explorer-v2-flag

rdhyee commented Apr 17, 2026

Uh oh!

Uh oh!

rdhyee commented Apr 17, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

rdhyee commented Apr 17, 2026

Uh oh!

Uh oh!

rdhyee commented Apr 17, 2026

v2 results (3 cold runs on production, headless Chrome, cache disabled)

Why it's not more

Where the real wins likely are next

Status

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant