speculative decoding: use mto config subsystem#1328
Conversation
Signed-off-by: h-guo18 <67671475+h-guo18@users.noreply.github.com>
|
Auto-sync is disabled for draft pull requests in this repository. Workflows must be run manually. Contributors can view more details about this message here. |
|
Important Review skippedDraft detected. Please check the settings in the CodeRabbit UI or the ⚙️ Run configurationConfiguration used: Path: .coderabbit.yaml Review profile: CHILL Plan: Pro Plus Run ID: You can disable this status message by setting the Use the checkbox below for a quick retry:
✨ Finishing Touches🧪 Generate unit tests (beta)
Comment |
|
Signed-off-by: h-guo18 <67671475+h-guo18@users.noreply.github.com>
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #1328 +/- ##
==========================================
+ Coverage 74.60% 74.63% +0.02%
==========================================
Files 467 468 +1
Lines 50176 50260 +84
==========================================
+ Hits 37435 37511 +76
- Misses 12741 12749 +8
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
Signed-off-by: h-guo18 <67671475+h-guo18@users.noreply.github.com>
Signed-off-by: h-guo18 <67671475+h-guo18@users.noreply.github.com>
Signed-off-by: h-guo18 <67671475+h-guo18@users.noreply.github.com>
Signed-off-by: h-guo18 <67671475+h-guo18@users.noreply.github.com>
| if use_offline_training: | ||
| # Load config first to preserve original num_hidden_layers before | ||
| # load_vlm_or_llm may reduce layers for offline space savings. | ||
| model_config = transformers.AutoConfig.from_pretrained( | ||
| model_args.model_name_or_path, | ||
| trust_remote_code=model_args.trust_remote_code, | ||
| ) |
There was a problem hiding this comment.
Hi @yeyu-nvidia I removed these lines you added 2 weeks ago since they seems duplicate with what's inside load_vlm_or_llm. Could you take a look if this looks good to you. Thanks!
| if use_offline_training: | ||
| # When doing offline training, we need to set num_hidden_layers | ||
| # since we override it when loading the model for space savings. | ||
| # Some models (e.g. Kimi-K2.5) use non-standard config attributes, | ||
| # so fall back to the model's own config if the attribute is missing. | ||
| model.config.num_orig_hidden_layers = getattr( | ||
| model_config, "num_hidden_layers", model.config.num_hidden_layers | ||
| ) | ||
| if hasattr(model.config, "layer_types"): | ||
| del ( | ||
| model.config.layer_types | ||
| ) # remove layer_types to avoid mismatch with the modified model |
There was a problem hiding this comment.
@yeyu-nvidia Same as above, removed due to duiplication. PTAL, thanks!
Signed-off-by: h-guo18 <67671475+h-guo18@users.noreply.github.com>
What does this PR do?
Type of change: new feature
Port the speculative-decoding example to ModelOpt's recipe/config subsystem:
model/data/training/<algo>now load from a single YAML with Pydantic validation and OmegaConf dotlist overrides. Adds built-ineagle3/dflashrecipes, drops the redundanttraining.modefield (inferred from recipe class), and shrinksmain.pyby ~145 lines (−208 / +63).JIRA: https://jirasw.nvidia.com/browse/OMNIML-3859
Usage
python main.py --config general/speculative_decoding/eagle3 \ model.model_name_or_path=meta-llama/Llama-3.2-1B \ data.data_path=train.jsonl \ training.output_dir=ckpts/testTesting
pytest tests/unit/recipe/test_loader.py— new coverage for Eagle / DFlash YAML loading, dotlist overrides, and field-level validation.eagle3anddflashrecipes end-to-end.Before your PR is "Ready for review"
Make sure you read and follow Contributor guidelines and your commits are signed (
git commit -s -S).Make sure you read and follow the Security Best Practices (e.g. avoiding hardcoded
trust_remote_code=True,torch.load(..., weights_only=False),pickle, etc.).main.pyCLI switched to--config <recipe>(+ dotlist overrides); the old argparse flags are removed.CONTRIBUTING.md: N/A — no new deps (pydantic,omegaconfalready in core).tests/unit/recipe/test_loader.py.Additional Information
Follow-up to the
modelopt.recipesubsystem introduced for PTQ; this PR extends the same declarative-YAML pattern to speculative decoding (Eagle3 / DFlash / Medusa).