Skip to content

Enrich dataset by filling NA/null fields from URL content#32

Open
likeajumprope wants to merge 5 commits intoReproNim:mainfrom
likeajumprope:main
Open

Enrich dataset by filling NA/null fields from URL content#32
likeajumprope wants to merge 5 commits intoReproNim:mainfrom
likeajumprope:main

Conversation

@likeajumprope
Copy link
Copy Markdown
Collaborator

No description provided.

likeajumprope and others added 3 commits April 16, 2026 16:09
Visited each entry's URL and enriched ~35 entries across these fields:
- platform: inferred from course type (Mac/Linux for shell courses, Jupyter
  for notebook courses, Mac/Windows/Linux for cross-platform web content)
- delivery: self-paced for reference books/videos, instructor for workshops
  and university courses, both for Carpentries-style materials
- language: filled English for all entries with null language (6 entries)
- instruction_medium: added for Brainhack School, NeuroHackademy, NMA,
  ABCD-ReproNim, and OHBM workshop
- course_length: set 1+ weeks for Brainhack School, NeuroHackademy,
  Neuromatch; 1-4 hrs for OHBM reproducible pipelines workshop
- assessment: true for courses with projects/quizzes (NMA, NeuroHackademy,
  Brainhack School, ABCD-ReproNim, QLS612); false for reference books
- neuroimaging_software: FSL for courses explicitly using FSL (IDs 18, 47);
  NA for Neuromatch (computational neuroscience, not imaging)
- imaging_modality: updated for guide to reproducible neuroimaging (ID 18),
  reproducibility science syllabus (ID 25), Brainhack School (ID 54)
- programming_language: Python+shell scripting for ID 18, shell scripting
  for MRI Analysis syllabus (ID 47)
- open_dataset: false for syllabi/handbooks/reference docs; true for
  ReproNim workshop (ID 39, uses published dataset hands-on)

Also fixed duplicate YAML keys in entry 58 (Andy's Brain Book) caused by
a prior partial edit leaving null placeholders alongside richer values.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request provides extensive metadata updates for the neuroimaging resource inventory in both JSON and YAML formats, including platform details, delivery methods, and software specifications. Feedback highlights several data integrity issues, such as the accidental removal of a description for entry 35, multiple typos in resource descriptions, and schema inconsistencies where dates were provided as integers rather than strings. Additionally, a placeholder 'test course' entry was identified that should be removed prior to merging.

Comment on lines +1021 to +1025
source:
- Adina Wagner
- Max Planck Institute
alias_links:
notes:
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

The description field for entry id: 35 was removed during this update. It should be retained to maintain dataset completeness.

  source:
  - Adina Wagner
  - Max Planck Institute
  description: Course outline for Data Lad Course
  alias_links:
  notes:

Comment thread frontend/public/data/reproinventory_data.json
(uclear if links work once you do that)
alias_links: null
notes: null
description: JupyterHub no longer accessible. Need to link GitHubClassroom to GitHub (uclear if links work once you do that)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

Typo in description: "uclear" should be "unclear".

  description: JupyterHub no longer accessible. Need to link GitHubClassroom to GitHub (unclear if links work once you do that)

source:
- Andrew Jahn
- University of Michigan
last_updated: 2026
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The last_updated value is an integer, but the schema defines it as a string and other entries use string values (e.g., "2024"). It should be quoted to ensure consistency.

  last_updated: '2026'

"Andrew Jahn",
"University of Michigan"
],
"last_updated": 2026,
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The last_updated value is an integer, but it should be a string to remain consistent with the schema and other entries in the dataset.

Suggested change
"last_updated": 2026,
"last_updated": "2026",

"ABCD-ReproNim",
"ReproNim"
],
"description": "3 week machone learning course",
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

Typo in description: "machone" should be "machine".

Suggested change
"description": "3 week machone learning course",
"description": "3 week machine learning course",

@@ -2395,20 +2536,39 @@
"open_dataset": true,
"last_updated": 2025,
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The last_updated value is an integer, but it should be a string to remain consistent with the schema and other entries.

Suggested change
"last_updated": 2025,
"last_updated": "2025",

Comment on lines +2558 to 2573
{
"id": 62,
"course_name": "test course",
"platform": [
"Mac"
],
"instruction_medium": [
"blog post"
],
"description": "Comprehensive website describing the necessity of reproducible research and how to achieve it (notebooks, visualisation)"
"delivery": [
"instructor"
],
"imaging_modality": [
"Behavioral"
]
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The entry with id: 62 appears to be a placeholder "test course" and should likely be removed from the production dataset.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant