Skip to content

speculative decoding: use mto config subsystem#1328

Draft
h-guo18 wants to merge 13 commits intomainfrom
haoguo/spec-mto-config
Draft

speculative decoding: use mto config subsystem#1328
h-guo18 wants to merge 13 commits intomainfrom
haoguo/spec-mto-config

Conversation

@h-guo18
Copy link
Copy Markdown
Contributor

@h-guo18 h-guo18 commented Apr 23, 2026

What does this PR do?

Type of change: new feature

Port the speculative-decoding example to ModelOpt's recipe/config subsystem: model / data / training / <algo> now load from a single YAML with Pydantic validation and OmegaConf dotlist overrides. Adds built-in eagle3 / dflash recipes, drops the redundant training.mode field (inferred from recipe class), and shrinks main.py by ~145 lines (−208 / +63).

JIRA: https://jirasw.nvidia.com/browse/OMNIML-3859

Usage

python main.py --config general/speculative_decoding/eagle3 \
    model.model_name_or_path=meta-llama/Llama-3.2-1B \
    data.data_path=train.jsonl \
    training.output_dir=ckpts/test

Testing

  • pytest tests/unit/recipe/test_loader.py — new coverage for Eagle / DFlash YAML loading, dotlist overrides, and field-level validation.
  • Smoke-trained both built-in eagle3 and dflash recipes end-to-end.

Before your PR is "Ready for review"

Make sure you read and follow Contributor guidelines and your commits are signed (git commit -s -S).

Make sure you read and follow the Security Best Practices (e.g. avoiding hardcoded trust_remote_code=True, torch.load(..., weights_only=False), pickle, etc.).

  • Is this change backward compatible?: ❌ — main.py CLI switched to --config <recipe> (+ dotlist overrides); the old argparse flags are removed.
  • If you copied code from any other sources or added a new PIP dependency, did you follow guidance in CONTRIBUTING.md: N/A — no new deps (pydantic, omegaconf already in core).
  • Did you write any new necessary tests?: ✅ — tests/unit/recipe/test_loader.py.
  • Did you update Changelog?: ❌ — to be added.

Additional Information

Follow-up to the modelopt.recipe subsystem introduced for PTQ; this PR extends the same declarative-YAML pattern to speculative decoding (Eagle3 / DFlash / Medusa).

Signed-off-by: h-guo18 <67671475+h-guo18@users.noreply.github.com>
@copy-pr-bot
Copy link
Copy Markdown

copy-pr-bot Bot commented Apr 23, 2026

Auto-sync is disabled for draft pull requests in this repository. Workflows must be run manually.

Contributors can view more details about this message here.

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented Apr 23, 2026

Important

Review skipped

Draft detected.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro Plus

Run ID: 81fa0404-4177-4c8b-81f4-35b4837c3b47

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Use the checkbox below for a quick retry:

  • 🔍 Trigger review
✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch haoguo/spec-mto-config

Comment @coderabbitai help to get the list of available commands and usage tips.

@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented Apr 23, 2026

PR Preview Action v1.8.1

QR code for preview link

🚀 View preview at
https://NVIDIA.github.io/Model-Optimizer/pr-preview/pr-1328/

Built to branch gh-pages at 2026-04-23 08:25 UTC.
Preview will be ready when the GitHub Pages deployment is complete.

@h-guo18 h-guo18 changed the title mto config subsystem speculative decoding: use mto config subsystem Apr 23, 2026
Signed-off-by: h-guo18 <67671475+h-guo18@users.noreply.github.com>
@codecov
Copy link
Copy Markdown

codecov Bot commented Apr 23, 2026

Codecov Report

❌ Patch coverage is 90.90909% with 8 lines in your changes missing coverage. Please review.
✅ Project coverage is 74.63%. Comparing base (c796611) to head (69f3c40).
⚠️ Report is 1 commits behind head on main.

Files with missing lines Patch % Lines
...lopt/torch/speculative/plugins/hf_training_args.py 86.95% 6 Missing ⚠️
modelopt/recipe/loader.py 90.90% 2 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #1328      +/-   ##
==========================================
+ Coverage   74.60%   74.63%   +0.02%     
==========================================
  Files         467      468       +1     
  Lines       50176    50260      +84     
==========================================
+ Hits        37435    37511      +76     
- Misses      12741    12749       +8     
Flag Coverage Δ
unit 52.40% <90.90%> (+0.05%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

h-guo18 added 9 commits April 23, 2026 04:26
Signed-off-by: h-guo18 <67671475+h-guo18@users.noreply.github.com>
Signed-off-by: h-guo18 <67671475+h-guo18@users.noreply.github.com>
Signed-off-by: h-guo18 <67671475+h-guo18@users.noreply.github.com>
Signed-off-by: h-guo18 <67671475+h-guo18@users.noreply.github.com>
Signed-off-by: h-guo18 <67671475+h-guo18@users.noreply.github.com>
Signed-off-by: h-guo18 <67671475+h-guo18@users.noreply.github.com>
Signed-off-by: h-guo18 <67671475+h-guo18@users.noreply.github.com>
Signed-off-by: h-guo18 <67671475+h-guo18@users.noreply.github.com>
Signed-off-by: h-guo18 <67671475+h-guo18@users.noreply.github.com>
Comment on lines -266 to -272
if use_offline_training:
# Load config first to preserve original num_hidden_layers before
# load_vlm_or_llm may reduce layers for offline space savings.
model_config = transformers.AutoConfig.from_pretrained(
model_args.model_name_or_path,
trust_remote_code=model_args.trust_remote_code,
)
Copy link
Copy Markdown
Contributor Author

@h-guo18 h-guo18 Apr 23, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @yeyu-nvidia I removed these lines you added 2 weeks ago since they seems duplicate with what's inside load_vlm_or_llm. Could you take a look if this looks good to you. Thanks!

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Comment on lines -281 to -292
if use_offline_training:
# When doing offline training, we need to set num_hidden_layers
# since we override it when loading the model for space savings.
# Some models (e.g. Kimi-K2.5) use non-standard config attributes,
# so fall back to the model's own config if the attribute is missing.
model.config.num_orig_hidden_layers = getattr(
model_config, "num_hidden_layers", model.config.num_hidden_layers
)
if hasattr(model.config, "layer_types"):
del (
model.config.layer_types
) # remove layer_types to avoid mismatch with the modified model
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@yeyu-nvidia Same as above, removed due to duiplication. PTAL, thanks!

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. thx

@h-guo18 h-guo18 self-assigned this Apr 23, 2026
h-guo18 added 2 commits April 23, 2026 08:10
Signed-off-by: h-guo18 <67671475+h-guo18@users.noreply.github.com>
Signed-off-by: h-guo18 <67671475+h-guo18@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants