Add a general composable $import system for YAML configs, and use it to implement composable recipes#1253
Add a general composable $import system for YAML configs, and use it to implement composable recipes#1253shengliangxu wants to merge 43 commits intomainfrom
Conversation
|
Auto-sync is disabled for draft pull requests in this repository. Workflows must be run manually. Contributors can view more details about this message here. |
|
Note Reviews pausedIt looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the Use the following commands to manage reviews:
Use the checkboxes below for quick actions:
📝 WalkthroughWalkthroughAdds a composable YAML imports system ( Changes
Sequence Diagram(s)sequenceDiagram
participant U as User
participant L as modelopt.recipe.loader
participant R as _load_raw_config
participant S as _resolve_imports
participant FS as Files / Builtins
U->>L: load_recipe(path)
L->>R: _load_raw_config(path)
R->>FS: read YAML (file or builtin)
FS-->>R: raw documents
R-->>L: parsed raw config (may include imports)
alt imports present
L->>S: _resolve_imports(parsed_config)
S->>R: _load_raw_config(import_path)
R->>FS: read imported snippet
FS-->>R: snippet content
R-->>S: snippet (dict/list/_list_content)
S->>S: merge / splice / detect circular refs
S-->>L: resolved config
end
L-->>U: final resolved recipe
Estimated code review effort🎯 4 (Complex) | ⏱️ ~60 minutes 🚥 Pre-merge checks | ✅ 4✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Comment |
|
Commit description:
Introduce an import mechanism that lets recipe YAML files reference reusable
config snippets by name, reducing duplication across recipes.
Syntax:
imports:
fp8: configs/numerics/fp8
base_disable_all: configs/ptq/base_disable_all
quant_cfg:
- base_disable_all # string entry → replaced with imported dict or spliced
list
- quantizer_name: '*weight_quantizer'
cfg: fp8 # string cfg → replaced with imported dict
Features:
- Dict-based imports (keys are names, values are config paths) — no name conflicts
- Three resolution modes: string cfg value, string list entry (dict), string list entry (list
splice)
- Recursive resolution with circular import detection
- Path resolution via load_config (built-in library first, then filesystem)
- Works with both single-file and directory recipe formats
New reusable config snippets (modelopt_recipes/configs/):
- numerics/fp8.yml, nvfp4_dynamic.yml, nvfp4_static.yml
- ptq/base_disable_all.yaml, default_disabled_quantizers.yaml
All 6 built-in PTQ recipes converted to use imports, reducing each by ~30 lines.
Pre-commit hook updated to skip configs/ directory and allow string entries in
quant_cfg. load_config() now accepts YAML lists for list-valued snippets.
Signed-off-by: Shengliang Xu <shengliangx@nvidia.com>
Signed-off-by: Shengliang Xu <shengliangx@nvidia.com>
Signed-off-by: Shengliang Xu <shengliangx@nvidia.com>
Signed-off-by: Shengliang Xu <shengliangx@nvidia.com>
Signed-off-by: Shengliang Xu <shengliangx@nvidia.com>
Signed-off-by: Shengliang Xu <shengliangx@nvidia.com>
Signed-off-by: Shengliang Xu <shengliangx@nvidia.com>
Signed-off-by: Shengliang Xu <shengliangx@nvidia.com>
Signed-off-by: Shengliang Xu <shengliangx@nvidia.com>
Signed-off-by: Shengliang Xu <shengliangx@nvidia.com>
Signed-off-by: Shengliang Xu <shengliangx@nvidia.com>
Signed-off-by: Shengliang Xu <shengliangx@nvidia.com>
Signed-off-by: Shengliang Xu <shengliangx@nvidia.com>
Signed-off-by: Shengliang Xu <shengliangx@nvidia.com>
Signed-off-by: Shengliang Xu <shengliangx@nvidia.com>
ea910ae to
82d5a12
Compare
Signed-off-by: Shengliang Xu <shengliangx@nvidia.com>
Signed-off-by: Shengliang Xu <shengliangx@nvidia.com>
Signed-off-by: Shengliang Xu <shengliangx@nvidia.com>
Signed-off-by: Shengliang Xu <shengliangx@nvidia.com>
Signed-off-by: Shengliang Xu <shengliangx@nvidia.com>
Signed-off-by: Shengliang Xu <shengliangx@nvidia.com>
Signed-off-by: Shengliang Xu <shengliangx@nvidia.com>
Signed-off-by: Shengliang Xu <shengliangx@nvidia.com>
Signed-off-by: Shengliang Xu <shengliangx@nvidia.com>
h-guo18
left a comment
There was a problem hiding this comment.
Discussion regarding $import and Jinja make sense to me.
ChenhanYu
left a comment
There was a problem hiding this comment.
Review
The problem is real — 6 recipes duplicating the same 14-entry disable list and NVFP4 format definition is a maintenance nightmare. The $import system is well-designed (scoped, recursive, circular detection) and the test coverage is excellent (837 lines). The before/after on nvfp4_experts_only (75 → 18 lines) is compelling.
However, before we introduce a custom config composition system, I want to understand why we can't reuse what we already have:
1. Why not OmegaConf?
We already depend on OmegaConf — EAGLE training (main.py, PR #1134) uses it with __base__ inheritance and OmegaConf.merge() for config composition. OmegaConf natively supports:
- Structured configs with inheritance (
defaults:list) - Variable interpolation (
${numerics.fp8}) - Merge with override semantics
The $import system reimplements composition with different syntax ($import: markers, --- multi-doc for list snippets, scoped import namespaces). This means we now have two config composition systems in the codebase — OmegaConf for speculative decoding recipes and $import for quantization recipes. Contributors need to learn both.
Could the same snippet library (configs/numerics/, configs/ptq/units/) work with OmegaConf's native composition instead of a custom resolver?
2. Why not a Python factory?
The *_CFG dicts are already Python. A factory approach like:
def build_ptq_config(format="nvfp4", kv="fp8", experts_only=False):
cfg = BASE_DISABLE_ALL + DEFAULT_DISABLED
if experts_only:
cfg += expert_quantizers(format)
else:
cfg += all_linear_quantizers(format)
if kv:
cfg += kv_quantizers(kv)
return cfgwould be more debuggable (breakpoints, type checking, IDE navigation) and avoids YAML-specific complexity (multi-doc --- separator, list splicing semantics). This is essentially what nemo_run does with run.Config — Python factories with composable config objects.
The YAML approach has advantages (user-editable, diffable, declarative), but our recipes are primarily consumed programmatically by mtq.quantize(), not hand-edited by end users.
3. Migration path
If we go with $import, what's the end state? Do all hardcoded *_CFG dicts eventually move to YAML? Or do we maintain both Python dicts and YAML presets indefinitely? The PR removes 51 lines from config.py but the Python API (mtq.FP8_DEFAULT_CFG) still needs to work — how does that load path change?
I'd like to hear the rationale before approving. The implementation quality is high — this is about whether we want a second composition system in the codebase vs. reusing OmegaConf or Python factories.
Signed-off-by: Shengliang Xu <shengliangx@nvidia.com>
Thanks @ChenhanYu — these are legitimate questions. Point-by-point reply below: 1. OmegaConfTwo points on the premise before the substance:
The Inheritance vs CompositionI actually did explore a Inheritance answers "B is a specialization of A; override these fields." It expresses a single parent + overrides: # inheritance flavor
__base__: fp8_default
quant_cfg:
override_field: new_valueThe configs we target aren't shaped that way. Concretely,
None of these four is the parent of another. They're siblings. The "base" the author has in mind is the combination, not any single snippet. # composition flavor (what this PR lands)
imports:
base_disable_all: configs/ptq/units/base_disable_all
fp8: configs/numerics/fp8
fp8_kv: configs/ptq/units/fp8_kv
default_disabled: configs/ptq/units/default_disabled_quantizers
quant_cfg:
- $import: base_disable_all # splice list snippet
- quantizer_name: "*weight_quantizer"
cfg: { $import: fp8 } # merge dict snippet
- quantizer_name: "*input_quantizer"
cfg: { $import: fp8 } # re-use same snippet
- $import: fp8_kv # splice another list snippet
- $import: default_disabled # splice one moreTo express the same recipe as inheritance you'd have to pick one snippet as the "base" and encode the others as overrides on top of it — which doesn't match the actual structure (no snippet is a specialization of another) and forces authors to invent an artificial ordering. Two concrete capabilities fall out of composition that inheritance can't provide cleanly:
So Rebuilding
Where I agree: the "two composition systems" concern is legitimate — and the plan is to converge. Once this PR merges, we'll migrate the existing flat YAML configs (EAGLE, DFlash, etc.) onto the 2. Python factoryAgreed that a factory wins on debuggability. But I want to reframe the choice before arguing specifics: YAML
So the question isn't "YAML vs. Python factory" — it's "what belongs at each layer." My claim: structural composition within a single config is YAML's sweet spot, and Two side points on your specific examples:
3. Migration pathDirect answer:
|
cjluo-nv
left a comment
There was a problem hiding this comment.
Reviewed the general/ptq and the refactor LGTM
| - ``$import: fp8`` under ``cfg`` is a **dict value** — the snippet (a YAML dict of | ||
| quantizer attributes) replaces the ``cfg`` field. |
There was a problem hiding this comment.
This is a bit confusing.
Should we be doing something like this?
quantize:
algorithm: max
quant_cfg:
- $import: base_disable_all # spliced from a single-element list snippet
- quantizer_name: '*weight_quantizer'
$import: fp8 # cfg value replaced with imported dict
- $import: default_disabled # spliced from a multi-element list snippet
There was a problem hiding this comment.
no, unless the fp8 snippet is defined using:
cfg:
num_bits: e4m3
axis:
I don't think we want to do it because cfg includes both the base format and many other components
{
"num_bits": 8, # 8-bit integer quantization
"axis": None, # per-tensor scale (no per-channel axis)
"fake_quant": True, # simulate quantization in forward pass (PTQ / QAT)
"unsigned": False, # signed integer range, e.g. [-128, 127] for INT8
"narrow_range": False, # full range; True would restrict to [-127, 127] for INT8
"type": "static", # static calibration (not dynamic per-inference)
"block_sizes": None, # no block quantization; set for NF4 / MXFP formats
"bias": None, # no affine bias correction
"calibrator": "max", # use max-abs calibration to determine amax
"rotate": False, # no Hadamard rotation (QuaRot / SpinQuant)
"pass_through_bwd": True, # straight-through estimator for QAT gradients
"trt_high_precision_dtype": "Float", # cast QDQ nodes to fp32 for TRT StronglyType export
"backend": None, # use the built-in quantization backend
"backend_extra_args": None, # no extra args for custom backends
"use_constant_amax": False, # calibrate amax; True hard-codes FP8 E4M3 max (448.0)
}
Using the current design make you config beyond just base format:
quantize:
algorithm: max
quant_cfg:
- $import: base_disable_all # spliced from a single-element list snippet
- quantizer_name: '*weight_quantizer'
cfg:
$import: fp8 # base format
rotate: true # example more conifg that is beyond base format
$import: other_components
- $import: default_disabled # spliced from a multi-element list snippet
|
Thanks @shengliangxu — the composition-vs-inheritance distinction is well-argued and I agree that these configs are horizontal bags of independent pieces, not a parent-child chain. That point is convincing. However, your nemo_run response actually strengthens the case for using it rather than building
This describes exactly what we want. nemo_run already provides:
Our Your recipe could be expressed as: def fp8_default_recipe():
return run.Config(
PTQRecipe,
quant_cfg=[
base_disable_all(),
weight_quantizer(format=fp8()),
input_quantizer(format=fp8()),
fp8_kv(),
default_disabled(),
]
)Which serializes to: _target_: modelopt.torch.quantization.PTQRecipe
quant_cfg:
- _target_: modelopt.torch.quantization.QuantEntry
quantizer_name: '*'
enable: false
- _target_: modelopt.torch.quantization.QuantEntry
quantizer_name: '*weight_quantizer'
cfg:
num_bits: e4m3
- _target_: modelopt.torch.quantization.QuantEntry
quantizer_name: '*input_quantizer'
cfg:
num_bits: e4m3
- _target_: modelopt.torch.quantization.QuantEntry
quantizer_name: '*[kv]_bmm_quantizer'
cfg:
num_bits: e4m3The YAML is still readable and diffable. The That said, I see the tradeoff: nemo_run's serialized YAML has I'm not blocking this PR — the implementation quality is high and the problem is real. But before we commit to a custom |
|
Follow-up: I looked into nemo_run's Side-by-side:
|
$import |
_factory_ |
|
|---|---|---|
| References resolve to | YAML snippet files | Python factory functions |
| Debuggable | No | Yes (breakpoints, IDE navigation) |
| Type-checked | No | Yes |
| Overridable at any level | Only at YAML level | At any level — override the top-level factory, a mid-level factory, or a leaf numeric format |
| Custom DSL | Yes ($import, --- multi-doc, list splicing) |
No (standard nemo_run _factory_) |
| Already in our ecosystem | No (new) | Yes (tools/launcher uses it for Slurm configs) |
Nested factory overrides — the key advantage
With _factory_, composition is overridable at every level:
Leaf factory (numeric format — defined once):
@run.cli.factory
def nvfp4():
return {"num_bits": "e2m1", "block_sizes": {-1: 16, "type": "dynamic", "scale_bits": "e4m3"}}Mid-level factory (expert quantizer pattern):
@run.cli.factory
def experts_quantizer(pattern="*mlp.experts*", numeric=nvfp4):
return [
{"quantizer_name": f"{pattern}weight_quantizer", "enable": True, "cfg": numeric()},
{"quantizer_name": f"{pattern}input_quantizer", "enable": True, "cfg": numeric()},
]Top-level factory:
@run.cli.factory
def nvfp4_experts_only_fp8_kv():
return PTQRecipe(
quant_cfg=[*base_disable_all(), *experts_quantizer("*mlp.experts*", nvfp4), ...]
)User can operate at any level of abstraction:
# Level 1: Use the whole recipe as-is (1 line)
_factory_: nvfp4_experts_only_fp8_kv# Level 2: See and customize the composition (same length as $import)
_factory_: ptq_recipe
quant_cfg:
- _factory_: base_disable_all
- quantizer_name: '*mlp.experts*weight_quantizer'
cfg: { _factory_: nvfp4 }
- quantizer_name: '*mlp.experts*input_quantizer'
cfg: { _factory_: nvfp4 }
- _factory_: fp8_kv
- _factory_: default_disabled# Level 3: Override a leaf — swap NVFP4 for FP8 on one quantizer
_factory_: ptq_recipe
quant_cfg:
- _factory_: base_disable_all
- quantizer_name: '*mlp.experts*weight_quantizer'
cfg: { _factory_: fp8 } # ← changed from nvfp4 to fp8
- quantizer_name: '*mlp.experts*input_quantizer'
cfg: { _factory_: nvfp4 }
- _factory_: fp8_kv
- _factory_: default_disabledCLI overrides — nemo_run's dotlist syntax
nemo_run supports OmegaConf-style dotlist overrides from the command line, which compose on top of _factory_ resolution. This means any nested field can be overridden without editing the YAML at all:
# Use the standard recipe, override calibration size
python quantize.py --factory nvfp4_experts_only_fp8_kv \
calibration.calib_size=512
# Override a nested numeric format at the leaf level
python quantize.py --factory nvfp4_experts_only_fp8_kv \
quant_cfg.1.cfg.num_bits=e4m3 \
quant_cfg.1.cfg.block_sizes='{-1: 128}'
# Swap the expert quantizer pattern to target a different module
python quantize.py --factory nvfp4_experts_only_fp8_kv \
quant_cfg.1.quantizer_name='*moe.experts*weight_quantizer'With $import, CLI overrides would require a separate OmegaConf layer on top — two systems working together. With _factory_, it's built in: the factory produces the config, dotlist overrides patch it, done. One system, end to end.
Again — not blocking this PR. But I think _factory_ achieves the same deduplication goals with less custom infrastructure and more flexibility. Worth considering before we commit to a custom DSL.
|
|
||
| # --------------------------------------------------------------------------- | ||
| # $import resolution | ||
| # --------------------------------------------------------------------------- | ||
|
|
There was a problem hiding this comment.
nit:
This is one of the claude comment style I do not like.
| # --------------------------------------------------------------------------- | |
| # $import resolution | |
| # --------------------------------------------------------------------------- |
There was a problem hiding this comment.
this comment is simply adding redundancy IMO.
| # ``axis: null`` is explicit to match the hardcoded ``FP8_DEFAULT_CFG`` shape — | ||
| # downstream code that keys on ``"axis" in cfg`` sees the same dict layout. | ||
| num_bits: e4m3 | ||
| axis: |
There was a problem hiding this comment.
How can we specify None explicitly?
| # See the License for the specific language governing permissions and | ||
| # limitations under the License. | ||
|
|
||
| # NVFP4 E2M1 blockwise quantizer attributes with FP8 E4M3 scales (static calibration). |
There was a problem hiding this comment.
| # NVFP4 E2M1 blockwise quantizer attributes with FP8 E4M3 scales (static calibration). | |
| # NVFP4 E2M1 blockwise quantizer attributes with FP8 E4M3 scales (used for NVFP4 weights since weight scales can be static). |
There was a problem hiding this comment.
can we change this name to kv_fp8?
I am trying to have a good naming convention:
- numeric presets are the basic
nvfp4
mxfp4
fp8
- operator level presets live above that (format: {operator_name}_{numeric_format}):
weight_act_nvfp4_nvfp4
weight_act_fp8_fp8
weight_act_nvfp4_f16
weight_act_mxfp4_fp8
kv_nvfp4
kv_mxfp4
kv_fp8
|
Let's use an example to compare more clearly. Using Side-by-side below: Scaffolding (required before any factory can register)Assume
Numeric leaf
List snippet with an inline import
Flat list of 14 exclusions
Top-level recipe (same granularity, no wrapper factory)
SummaryThe actual differences:
|
|
Thanks @shengliangxu — your side-by-side comparison is thorough and the fiddle limitations are real. I want to step back and structure the discussion around three separable questions. 1. The circular dependency — why the resolver should move out of
|
| External deps | 0 (pure Python + pyyaml + omegaconf) |
| Code to write | ~300 lines (registry + resolver + OmegaConf override) |
| Maintenance | We own it — bugs are ours, must track nemo_run _factory_ spec for compatibility |
| Factories | Regular Python functions, no fiddle restrictions, no wrapper types needed |
| Compatibility | Must ensure our _factory_ semantics match nemo_run's so tools/launcher YAMLs work with both |
| Risk | Our implementation drifts from nemo_run's over time |
Option B: Depend on nemo_run [core] (if they add the extra)
| External deps | fiddle (~2 MB) + omegaconf (~0.6 MB) + catalogue (~0.1 MB) ≈ 3 MB, 4-6 packages |
| Code to write | 0 — use nemo_run's resolver as-is |
| Maintenance | nemo_run team owns it — just upgrade version |
| Factories | Subject to fiddle's auto_config restrictions (no list comprehensions, needs wrapper types for bare dict/list returns — as you demonstrated) |
| Compatibility | Guaranteed — same code path as tools/launcher |
| Risk | nemo_run doesn't add [core] extra → stuck with 179 MB ray dependency from leptonai |
Option C: $import (this PR as-is)
| External deps | 0 |
| Code to write | 336 lines (already done) |
| Maintenance | We own it |
| Factories | N/A — YAML snippets, not Python functions |
| Compatibility | Does not converge with tools/launcher — two systems in the repo |
| Risk | Future recipe types (QAT, QAD) mix $import + OmegaConf in the same file |
My preference is A (custom _factory_) if you're willing to implement it, with the explicit goal of compatibility with nemo_run's _factory_ spec so we can switch to B later if they trim their dependencies.
But I also recognize that C (this PR) solves the immediate problem well, and the two-system concern is a future issue that can be addressed when QAT/QAD recipes actually ship. What do you think?
What does this PR do?
Type of change: New feature
Add a composable
$importsystem for recipe YAML configs, plus reorganize config snippets and begin migrating hardcoded*_CFGdicts to YAML presets.Problem
*_CFGdicts inconfig.py(e.g.,FP8_DEFAULT_CFG) had no connection to the YAML recipe systemmodelopt.recipe, causing circular imports whenmodelopt.torch.quantization.configneeded to load YAML configsSolution
Composable
$importsystemRecipes declare an
importssection mapping names to config snippet files. The{$import: name}marker is explicitly resolved at load time. For example, a recipe can import shared FP8 attributes and standard exclusions rather than duplicating them inline.$importsemantics:$import: [a, b]) and inline override keys with ordered precedence (later imports override earlier, inline keys override all)---separator (first doc = imports, second doc = list content)Config snippet library (
modelopt_recipes/configs/):numerics/— all numeric format definitions live here (fp8, nvfp4, nvfp4_static). Any new numeric format (e.g., int8, mxfp8) should be added to this directory as the single source of truth for quantizer attributesptq/units/— reusable quant_cfg entries (base_disable_all, default_disabled, fp8_kv, w8a8_fp8_fp8, w4a4_nvfp4_nvfp4)ptq/presets/model/— complete configs replacing hardcoded*_CFGdictsptq/presets/kv/— KV cache config presetsHardcoded config migration:
FP8_DEFAULT_CFGandFP8_KV_CFGnow load from YAML presets viaload_config()modelopt.recipe._config_loadertomodelopt.torch.opt.config_loaderto eliminate circular importsAll 6 built-in PTQ recipes converted to use imports — each reduced by ~30 lines.
Changes by area
Config loading infrastructure:
modelopt/torch/opt/config_loader.py— new home for YAML loading +$importresolution (zeromodeloptimports, no circular dependency risk)modelopt/recipe/_config_loader.py— thin re-export fromtorch.opt.config_loadermodelopt/recipe/loader.py— uses shared_resolve_importsfrom config_loaderQuantization config:
modelopt/torch/quantization/config.py—FP8_DEFAULT_CFGandFP8_KV_CFGloaded from YAML presetsRecipe YAML files:
$importstylePre-commit:
.pre-commit-config.yaml— recipe validator excludesconfigs/directorytools/precommit/check_modelopt_recipes.py— recognizes$importentriesDocumentation:
docs/source/guides/10_recipes.rst— full$importspecification with examples, inline/import style comparison, multi-document snippets, override precedenceTests:
tests/unit/recipe/test_loader.py— 20+ new tests covering all import featuresTesting
All new and existing recipe loader tests pass. Built-in recipe smoke tests pass with converted recipes.
Before your PR is "Ready for review"
FP8_DEFAULT_CFG/FP8_KV_CFGproduce identical dicts)Summary by CodeRabbit
New Features
Documentation
Tests
Chores