Fix PTQ for VLMs with image calibration by LianaMikael · Pull Request #1318 · NVIDIA/Model-Optimizer

LianaMikael · 2026-04-22T10:47:17Z

What does this PR do?

This PR fixes PTQ with image claibration for VLMs.

Usage

python3 examples/llm_ptq/hf_ptq.py --pyt_ckpt_path Qwen/Qwen3-VL-8B-Instruct --qformat fp8 --export_path Qwen3-VL-8B-Instruct-fp8 --trust_remote_code --kv_cache_qformat none --calib_with_images --calib_size 512

Summary by CodeRabbit

Bug Fixes
- Image-text calibration now extends support to additional model architectures when image calibration is enabled.
- Improved tokenizer truncation handling in multimodal dataset processing to prevent configuration conflicts when image inputs are present.

Signed-off-by: Liana Mikaelyan <lmikaelyan@nvidia.com>

copy-pr-bot · 2026-04-22T10:47:22Z

Auto-sync is disabled for draft pull requests in this repository. Workflows must be run manually.

Contributors can view more details about this message here.

coderabbitai · 2026-04-22T10:47:25Z

📝 Walkthrough

Walkthrough

Two changes generalize image-text calibration support in the PTQ pipeline and refine tokenizer truncation handling in VLM dataset processing. The first broadens image-text calibration setup to any model when a flag is set, rather than restricting it to Nemotron VL models. The second prevents truncation parameters from being applied when image-based multimodal processing is active.

Changes

Cohort / File(s)	Summary
Image-Text Calibration Condition Broadening `examples/llm_ptq/hf_ptq.py`	Removed model-type check from `load_model` to allow image-text calibration (AutoProcessor initialization with `padding_side="left"`) for any model when `--calib_with_images` is set; updated associated comment from "For Nemotron VL image calibration" to "For VLM image calibration".
Conditional Tokenizer Truncation `modelopt/torch/utils/vlm_dataset_utils.py`	Modified `_collate_fn` to apply tokenizer truncation parameters only when `max_length` is provided AND the processor call does not include an `"images"` key, preventing truncation conflicts in image-based multimodal processing scenarios.

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~10 minutes

🚥 Pre-merge checks | ✅ 5 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 33.33% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (5 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title directly addresses the main change: fixing PTQ for VLMs with image calibration, which matches the core modifications to handle image-based calibration in both the model loading logic and dataset collation.
Linked Issues check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Security Anti-Patterns	✅ Passed	PR does not introduce security anti-patterns; trust_remote_code is user-configurable, no unsafe deserialization or forbidden security bypasses present.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

📝 Generate docstrings

Create stacked PR
Commit on current branch

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch lmikaelyan/fix-vlm-compression

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

github-actions · 2026-04-22T10:51:26Z

PR Preview Action v1.8.1
🚀 View preview at https://NVIDIA.github.io/Model-Optimizer/pr-preview/pr-1318/
Built to branch `gh-pages` at 2026-04-22 10:51 UTC. Preview will be ready when the GitHub Pages deployment is complete.

codecov · 2026-04-22T11:00:23Z

Codecov Report

❌ Patch coverage is 0% with 1 line in your changes missing coverage. Please review.
✅ Project coverage is 75.67%. Comparing base (785d3a2) to head (5dfda82).
⚠️ Report is 1 commits behind head on main.

Files with missing lines	Patch %	Lines
modelopt/torch/utils/vlm_dataset_utils.py	0.00%	1 Missing ⚠️

Additional details and impacted files

@@            Coverage Diff             @@
##             main    #1318      +/-   ##
==========================================
+ Coverage   74.40%   75.67%   +1.27%     
==========================================
  Files         464      464              
  Lines       50036    50293     +257     
==========================================
+ Hits        37227    38058     +831     
+ Misses      12809    12235     -574

Flag	Coverage Δ
examples	`41.53% <0.00%> (+5.49%)`	⬆️
gpu	`58.59% <0.00%> (-0.51%)`	⬇️
regression	`14.86% <0.00%> (+0.07%)`	⬆️
unit	`52.44% <0.00%> (+0.10%)`	⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

coderabbitai

Actionable comments posted: 1

🧹 Nitpick comments (1)

modelopt/torch/utils/vlm_dataset_utils.py (1)

460-461: The truncation guard is currently a tautology against local state.

Because Line 456 always inserts "images" into kwargs, the Line 460 condition is always false, so max_length is never applied in this path. Consider gating by actual image presence instead of key membership.

Proposed refactor

diff --git a/modelopt/torch/utils/vlm_dataset_utils.py b/modelopt/torch/utils/vlm_dataset_utils.py
@@
-        kwargs: dict[str, Any] = {
-            "text": list(prompts),
-            "images": list(images),
-            "return_tensors": "pt",
-            "padding": True,
-        }
-        if max_length is not None and "images" not in kwargs:
+        has_images = any(img is not None for img in images)
+        kwargs: dict[str, Any] = {
+            "text": list(prompts),
+            "return_tensors": "pt",
+            "padding": True,
+        }
+        if has_images:
+            kwargs["images"] = list(images)
+        if max_length is not None and not has_images:
             kwargs.update({"truncation": True, "max_length": max_length})

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@modelopt/torch/utils/vlm_dataset_utils.py` around lines 460 - 461, The
current guard uses '"images" not in kwargs' which always fails because earlier
code always inserts the "images" key; change the check to detect actual image
data instead of key membership. In the block that sets truncation (referencing
max_length, kwargs and the "images" key), replace the condition with a runtime
presence check such as "if max_length is not None and not kwargs.get('images'):"
(or equivalent that treats empty/None image values as absent) so
max_length/truncation is applied when there truly are no images, not just when
the key is missing.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@examples/llm_ptq/hf_ptq.py`:
- Around line 490-491: The change allows args.calib_with_images to enter the
multimodal branch for any model but the downstream selection still always uses
create_vlm_calibration_loop(...) only for Nemotron, which can misroute batches;
update the calibration loop selection so that when args.calib_with_images is
true you choose the correct loop based on the actual model type (e.g., call
create_vlm_calibration_loop(model, ...) only for models that implement the
Nemotron-style VLM interface and otherwise call the existing
create_calibration_loop(...) or a proper VLM-compatible loop), using the same
identifying symbols args.calib_with_images, create_vlm_calibration_loop,
create_calibration_loop and the model/type check to route multimodal batches to
the appropriate loop.

---

Nitpick comments:
In `@modelopt/torch/utils/vlm_dataset_utils.py`:
- Around line 460-461: The current guard uses '"images" not in kwargs' which
always fails because earlier code always inserts the "images" key; change the
check to detect actual image data instead of key membership. In the block that
sets truncation (referencing max_length, kwargs and the "images" key), replace
the condition with a runtime presence check such as "if max_length is not None
and not kwargs.get('images'):" (or equivalent that treats empty/None image
values as absent) so max_length/truncation is applied when there truly are no
images, not just when the key is missing.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro Plus

Run ID: 9eedd8b2-571e-4f28-9afe-b0dc1908bbca

📥 Commits

Reviewing files that changed from the base of the PR and between c417e6f and 5dfda82.

📒 Files selected for processing (2)

examples/llm_ptq/hf_ptq.py
modelopt/torch/utils/vlm_dataset_utils.py

coderabbitai · 2026-04-22T11:33:28Z

+    elif args.calib_with_images:
+        # For VLM image calibration, we need an AutoProcessor to build multimodal inputs.


⚠️ Potential issue | 🟠 Major

Broadened image-calibration entrypoint is not matched by downstream loop selection.

After this change, --calib_with_images can enter the multimodal path for any model, but Line 643 still only uses create_vlm_calibration_loop(...) for Nemotron. That can misroute multimodal batches for non-Nemotron VLMs and break calibration.

Proposed fix

diff --git a/examples/llm_ptq/hf_ptq.py b/examples/llm_ptq/hf_ptq.py @@ - elif args.calib_with_images: + elif args.calib_with_images: + if not is_multimodal_model(full_model): + raise ValueError("--calib_with_images requires a multimodal/VLM checkpoint.") # For VLM image calibration, we need an AutoProcessor to build multimodal inputs. processor = AutoProcessor.from_pretrained( args.pyt_ckpt_path, trust_remote_code=args.trust_remote_code, padding_side="left", ) @@ - if args.calib_with_images and is_nemotron_vl_model: + if args.calib_with_images and is_multimodal_model(full_model): calibrate_loop = create_vlm_calibration_loop(full_model, calib_dataloader) else: calibrate_loop = create_forward_loop( dataloader=calib_dataloader, allowed_non_tensor_keys={"base_model_outputs"}

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@examples/llm_ptq/hf_ptq.py` around lines 490 - 491, The change allows args.calib_with_images to enter the multimodal branch for any model but the downstream selection still always uses create_vlm_calibration_loop(...) only for Nemotron, which can misroute batches; update the calibration loop selection so that when args.calib_with_images is true you choose the correct loop based on the actual model type (e.g., call create_vlm_calibration_loop(model, ...) only for models that implement the Nemotron-style VLM interface and otherwise call the existing create_calibration_loop(...) or a proper VLM-compatible loop), using the same identifying symbols args.calib_with_images, create_vlm_calibration_loop, create_calibration_loop and the model/type check to route multimodal batches to the appropriate loop.

Fix VLM PTQ for Qwen 3 8B

5dfda82

Signed-off-by: Liana Mikaelyan <lmikaelyan@nvidia.com>

LianaMikael requested a review from kevalmorabia97 April 22, 2026 10:47

LianaMikael marked this pull request as ready for review April 22, 2026 11:28

LianaMikael requested review from a team as code owners April 22, 2026 11:28

LianaMikael requested a review from sugunav14 April 22, 2026 11:28

coderabbitai Bot reviewed Apr 22, 2026

View reviewed changes

kevalmorabia97 approved these changes Apr 22, 2026

View reviewed changes

kevalmorabia97 added the cherry-pick-0.44.0 After code freeze, cherry-pick to release branch for next rc (bulk update). Only for bug fixes / doc label Apr 22, 2026

kevalmorabia97 requested review from cjluo-nv and meenchen April 22, 2026 15:52

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix PTQ for VLMs with image calibration#1318

Fix PTQ for VLMs with image calibration#1318
LianaMikael wants to merge 1 commit intomainfrom
lmikaelyan/fix-vlm-compression

LianaMikael commented Apr 22, 2026 •

edited by coderabbitai Bot

Loading

Uh oh!

copy-pr-bot Bot commented Apr 22, 2026

Uh oh!

coderabbitai Bot commented Apr 22, 2026 •

edited

Loading

Walkthrough

Changes

Estimated code review effort

❌ Failed checks (1 warning)

Uh oh!

github-actions Bot commented Apr 22, 2026

Built to branch `gh-pages` at 2026-04-22 10:51 UTC.
Preview will be ready when the GitHub Pages deployment is complete.

Uh oh!

codecov Bot commented Apr 22, 2026 •

edited

Loading

Uh oh!

coderabbitai Bot left a comment

Uh oh!

coderabbitai Bot Apr 22, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

		elif args.calib_with_images:
		# For VLM image calibration, we need an AutoProcessor to build multimodal inputs.

Conversation

LianaMikael commented Apr 22, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What does this PR do?

Usage

Summary by CodeRabbit

Uh oh!

copy-pr-bot Bot commented Apr 22, 2026

Uh oh!

coderabbitai Bot commented Apr 22, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Estimated code review effort

❌ Failed checks (1 warning)

Uh oh!

github-actions Bot commented Apr 22, 2026

Built to branch gh-pages at 2026-04-22 10:51 UTC. Preview will be ready when the GitHub Pages deployment is complete.

Uh oh!

codecov Bot commented Apr 22, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Apr 22, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

LianaMikael commented Apr 22, 2026 •

edited by coderabbitai Bot

Loading

coderabbitai Bot commented Apr 22, 2026 •

edited

Loading

Built to branch `gh-pages` at 2026-04-22 10:51 UTC.
Preview will be ready when the GitHub Pages deployment is complete.

codecov Bot commented Apr 22, 2026 •

edited

Loading