speculative decoding: use mto config subsystem by h-guo18 · Pull Request #1328 · NVIDIA/Model-Optimizer

h-guo18 · 2026-04-23T03:42:52Z

What does this PR do?

Type of change: new feature

Port the speculative-decoding example to ModelOpt's recipe/config subsystem: model / data / training / <algo> now load from a single YAML with Pydantic validation and OmegaConf dotlist overrides. Adds built-in eagle3 / dflash recipes, drops the redundant training.mode field (inferred from recipe class), and shrinks main.py by ~145 lines (−208 / +63).

JIRA: https://jirasw.nvidia.com/browse/OMNIML-3859

Usage

python main.py --config general/speculative_decoding/eagle3 \
    model.model_name_or_path=meta-llama/Llama-3.2-1B \
    data.data_path=train.jsonl \
    training.output_dir=ckpts/test

Testing

pytest tests/unit/recipe/test_loader.py — new coverage for Eagle / DFlash YAML loading, dotlist overrides, and field-level validation.
Smoke-trained both built-in eagle3 and dflash recipes end-to-end.

Before your PR is "Ready for review"

Make sure you read and follow Contributor guidelines and your commits are signed (git commit -s -S).

Make sure you read and follow the Security Best Practices (e.g. avoiding hardcoded trust_remote_code=True, torch.load(..., weights_only=False), pickle, etc.).

Is this change backward compatible?: ❌ — main.py CLI switched to --config <recipe> (+ dotlist overrides); the old argparse flags are removed.
If you copied code from any other sources or added a new PIP dependency, did you follow guidance in CONTRIBUTING.md: N/A — no new deps (pydantic, omegaconf already in core).
Did you write any new necessary tests?: ✅ — tests/unit/recipe/test_loader.py.
Did you update Changelog?: ❌ — to be added.

Additional Information

Follow-up to the modelopt.recipe subsystem introduced for PTQ; this PR extends the same declarative-YAML pattern to speculative decoding (Eagle3 / DFlash / Medusa).

Signed-off-by: h-guo18 <67671475+h-guo18@users.noreply.github.com>

copy-pr-bot · 2026-04-23T03:42:56Z

Auto-sync is disabled for draft pull requests in this repository. Workflows must be run manually.

Contributors can view more details about this message here.

coderabbitai · 2026-04-23T03:42:58Z

Important

Review skipped

Draft detected.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro Plus

Run ID: 81fa0404-4177-4c8b-81f4-35b4837c3b47

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Use the checkbox below for a quick retry:

🔍 Trigger review

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch haoguo/spec-mto-config

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

github-actions · 2026-04-23T03:46:39Z

PR Preview Action v1.8.1
🚀 View preview at https://NVIDIA.github.io/Model-Optimizer/pr-preview/pr-1328/
Built to branch `gh-pages` at 2026-04-23 08:25 UTC. Preview will be ready when the GitHub Pages deployment is complete.

Signed-off-by: h-guo18 <67671475+h-guo18@users.noreply.github.com>

codecov · 2026-04-23T04:09:57Z

Codecov Report

❌ Patch coverage is 90.90909% with 8 lines in your changes missing coverage. Please review.
✅ Project coverage is 74.63%. Comparing base (c796611) to head (69f3c40).
⚠️ Report is 1 commits behind head on main.

Files with missing lines	Patch %	Lines
...lopt/torch/speculative/plugins/hf_training_args.py	86.95%	6 Missing ⚠️
modelopt/recipe/loader.py	90.90%	2 Missing ⚠️

Additional details and impacted files

@@            Coverage Diff             @@
##             main    #1328      +/-   ##
==========================================
+ Coverage   74.60%   74.63%   +0.02%     
==========================================
  Files         467      468       +1     
  Lines       50176    50260      +84     
==========================================
+ Hits        37435    37511      +76     
- Misses      12741    12749       +8

Flag	Coverage Δ
unit	`52.40% <90.90%> (+0.05%)`	⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Signed-off-by: h-guo18 <67671475+h-guo18@users.noreply.github.com>

h-guo18 · 2026-04-23T07:43:33Z

-        if use_offline_training:
-            # Load config first to preserve original num_hidden_layers before
-            # load_vlm_or_llm may reduce layers for offline space savings.
-            model_config = transformers.AutoConfig.from_pretrained(
-                model_args.model_name_or_path,
-                trust_remote_code=model_args.trust_remote_code,
-            )


Hi @yeyu-nvidia I removed these lines you added 2 weeks ago since they seems duplicate with what's inside load_vlm_or_llm. Could you take a look if this looks good to you. Thanks!

h-guo18 · 2026-04-23T07:44:09Z

-        if use_offline_training:
-            # When doing offline training, we need to set num_hidden_layers
-            # since we override it when loading the model for space savings.
-            # Some models (e.g. Kimi-K2.5) use non-standard config attributes,
-            # so fall back to the model's own config if the attribute is missing.
-            model.config.num_orig_hidden_layers = getattr(
-                model_config, "num_hidden_layers", model.config.num_hidden_layers
-            )
-            if hasattr(model.config, "layer_types"):
-                del (
-                    model.config.layer_types
-                )  # remove layer_types to avoid mismatch with the modified model


@yeyu-nvidia Same as above, removed due to duiplication. PTAL, thanks!

Signed-off-by: h-guo18 <67671475+h-guo18@users.noreply.github.com>

mto config subsystem

1b7a76d

Signed-off-by: h-guo18 <67671475+h-guo18@users.noreply.github.com>

h-guo18 changed the title ~~mto config subsystem~~ speculative decoding: use mto config subsystem Apr 23, 2026

move dflash to new config

e29375c

Signed-off-by: h-guo18 <67671475+h-guo18@users.noreply.github.com>

h-guo18 added 9 commits April 23, 2026 04:26

migrate model, data, training args

c62ff5d

Signed-off-by: h-guo18 <67671475+h-guo18@users.noreply.github.com>

polish

3513cd1

Signed-off-by: h-guo18 <67671475+h-guo18@users.noreply.github.com>

update medusa

cba78a8

Signed-off-by: h-guo18 <67671475+h-guo18@users.noreply.github.com>

polish

c47783e

Signed-off-by: h-guo18 <67671475+h-guo18@users.noreply.github.com>

omgeaconf config overwrite

9fe8e87

Signed-off-by: h-guo18 <67671475+h-guo18@users.noreply.github.com>

polish

b39e5bb

Signed-off-by: h-guo18 <67671475+h-guo18@users.noreply.github.com>

polish

28ab3a0

Signed-off-by: h-guo18 <67671475+h-guo18@users.noreply.github.com>

polish

6a8be6b

Signed-off-by: h-guo18 <67671475+h-guo18@users.noreply.github.com>

clean up main.py

c50e856

Signed-off-by: h-guo18 <67671475+h-guo18@users.noreply.github.com>

h-guo18 commented Apr 23, 2026

View reviewed changes

h-guo18 self-assigned this Apr 23, 2026

h-guo18 added 2 commits April 23, 2026 08:10

remove training args.mode

1895841

Signed-off-by: h-guo18 <67671475+h-guo18@users.noreply.github.com>

fix lora

69f3c40

Signed-off-by: h-guo18 <67671475+h-guo18@users.noreply.github.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

speculative decoding: use mto config subsystem#1328

speculative decoding: use mto config subsystem#1328
h-guo18 wants to merge 13 commits intomainfrom
haoguo/spec-mto-config

h-guo18 commented Apr 23, 2026 •

edited

Loading

Uh oh!

copy-pr-bot Bot commented Apr 23, 2026

Uh oh!

coderabbitai Bot commented Apr 23, 2026 •

edited

Loading

Review skipped

Uh oh!

github-actions Bot commented Apr 23, 2026 •

edited

Loading

Built to branch `gh-pages` at 2026-04-23 08:25 UTC.
Preview will be ready when the GitHub Pages deployment is complete.

Uh oh!

codecov Bot commented Apr 23, 2026 •

edited

Loading

Uh oh!

h-guo18 Apr 23, 2026 •

edited

Loading

Uh oh!

yeyu-nvidia Apr 23, 2026

Uh oh!

h-guo18 Apr 23, 2026

Uh oh!

yeyu-nvidia Apr 23, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

h-guo18 commented Apr 23, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What does this PR do?

Usage

Testing

Before your PR is "Ready for review"

Additional Information

Uh oh!

copy-pr-bot Bot commented Apr 23, 2026

Uh oh!

coderabbitai Bot commented Apr 23, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Review skipped

Uh oh!

github-actions Bot commented Apr 23, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Built to branch gh-pages at 2026-04-23 08:25 UTC. Preview will be ready when the GitHub Pages deployment is complete.

Uh oh!

codecov Bot commented Apr 23, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

h-guo18 Apr 23, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

yeyu-nvidia Apr 23, 2026

Choose a reason for hiding this comment

Uh oh!

h-guo18 Apr 23, 2026

Choose a reason for hiding this comment

Uh oh!

yeyu-nvidia Apr 23, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

h-guo18 commented Apr 23, 2026 •

edited

Loading

coderabbitai Bot commented Apr 23, 2026 •

edited

Loading

github-actions Bot commented Apr 23, 2026 •

edited

Loading

Built to branch `gh-pages` at 2026-04-23 08:25 UTC.
Preview will be ready when the GitHub Pages deployment is complete.

codecov Bot commented Apr 23, 2026 •

edited

Loading

h-guo18 Apr 23, 2026 •

edited

Loading