Skip to content

Add /current/ alias layer + OC-thumbnail enrichment script#133

Merged
rdhyee merged 1 commit intoisamplesorg:mainfrom
rdhyee:feat/current-alias-redirects
Apr 17, 2026
Merged

Add /current/ alias layer + OC-thumbnail enrichment script#133
rdhyee merged 1 commit intoisamplesorg:mainfrom
rdhyee:feat/current-alias-redirects

Conversation

@rdhyee
Copy link
Copy Markdown
Contributor

@rdhyee rdhyee commented Apr 17, 2026

Follow-up to #131. Adds /current/<flavor>.parquet Worker redirects for stable URLs that point at rotating dated files, and the DuckDB enrichment script used to build isamples_202604_wide.parquet (47,717 thumbnails recovered from Eric's OC PQG).

isamples_202601_wide.parquet stays untouched. /current/wide.parquet now serves the enriched file.

Verified live: duckdb query through /current/wide.parquet returns 47,717 rows with populated thumbnails.

Two additions for stable-URL access to rotating versioned parquets:

1. Worker alias route: GET /current/<flavor>.parquet reads
   current/manifest.json from R2 and 302-redirects to the dated file it
   points to. Redirect response carries short 5-min Cache-Control so
   rotation propagates quickly; the target (versioned file) keeps its
   immutable 1-year cache. DuckDB-WASM / curl / browsers all follow
   302s transparently, so range requests hit the target directly.

2. scripts/enrich_wide_with_oc_thumbnails.py: DuckDB LEFT-JOIN script
   that takes the unified Zenodo wide parquet (thumbnail_url all NULL,
   see isamplesorg#131) and Eric Kansa's oc_isamples_pqg.parquet (48K thumbnails)
   and produces an enriched wide file with ~47.7K thumbnails populated
   for MaterialSampleRecord pids that overlap both.

Used today to build and ship isamples_202604_wide.parquet via
https://data.isamples.org/current/wide.parquet. The older
isamples_202601_wide.parquet stays in place, untouched, still
immutable. Verified via DuckDB query through the /current/ URL:
47,717 rows with thumbnail_url populated.

Closes the "soft-link" piece of isamplesorg#131.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant