You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The wide parquet (`isamples_202601_wide.parquet`) exposes a `thumbnail_url` column, but it is NULL / empty for every single row.
```sql
SELECT n as source, COUNT() AS total,
COUNT() FILTER (WHERE thumbnail_url IS NOT NULL AND thumbnail_url <> '') AS with_thumb
FROM read_parquet('https://data.isamples.org/isamples_202601_wide.parquet')
WHERE otype = 'MaterialSampleRecord'
GROUP BY 1
```
source
total
with_thumb
OPENCONTEXT
1,064,831
0
SESAR
4,688,386
0
SMITHSONIAN
322,161
0
GEOME
605,554
0
Impact
No image-rich UX possible in the browser explorers without per-source API calls (slow, rate-limited)
Sample cards in the Interactive Explorer currently can't show a preview
Problem
The wide parquet (`isamples_202601_wide.parquet`) exposes a `thumbnail_url` column, but it is NULL / empty for every single row.
```sql
SELECT n as source, COUNT() AS total,
COUNT() FILTER (WHERE thumbnail_url IS NOT NULL AND thumbnail_url <> '') AS with_thumb
FROM read_parquet('https://data.isamples.org/isamples_202601_wide.parquet')
WHERE otype = 'MaterialSampleRecord'
GROUP BY 1
```
Impact
Upstream availability (spot-checks)
So the raw data exists — it's the PQG pipeline that isn't populating the column.
Proposal
Decide whether the pipeline should populate `thumbnail_url`:
Option 1: Populate it
Extend the narrow → wide conversion (or the source→PQG ingester) to extract a thumbnail URL per source:
Estimated effort: ~2 days, modest ongoing maintenance.
Option 2: Remove the column
If no one is going to populate it, drop from the schema so consumers don't have false expectations. Trivial.
Option 3: Leave as-is, document the gap
Update `how-to-use.qmd` or the PQG spec to note the column exists but is empty pending future work.
Related