Skip to content

Fix PTQ for VLMs with image calibration#1318

Open
LianaMikael wants to merge 1 commit intomainfrom
lmikaelyan/fix-vlm-compression
Open

Fix PTQ for VLMs with image calibration#1318
LianaMikael wants to merge 1 commit intomainfrom
lmikaelyan/fix-vlm-compression

Conversation

@LianaMikael
Copy link
Copy Markdown
Contributor

@LianaMikael LianaMikael commented Apr 22, 2026

What does this PR do?

This PR fixes PTQ with image claibration for VLMs.

Usage

python3 examples/llm_ptq/hf_ptq.py --pyt_ckpt_path Qwen/Qwen3-VL-8B-Instruct --qformat fp8 --export_path Qwen3-VL-8B-Instruct-fp8 --trust_remote_code --kv_cache_qformat none --calib_with_images --calib_size 512

Summary by CodeRabbit

  • Bug Fixes
    • Image-text calibration now extends support to additional model architectures when image calibration is enabled.
    • Improved tokenizer truncation handling in multimodal dataset processing to prevent configuration conflicts when image inputs are present.

Signed-off-by: Liana Mikaelyan <lmikaelyan@nvidia.com>
@copy-pr-bot
Copy link
Copy Markdown

copy-pr-bot Bot commented Apr 22, 2026

Auto-sync is disabled for draft pull requests in this repository. Workflows must be run manually.

Contributors can view more details about this message here.

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented Apr 22, 2026

📝 Walkthrough

Walkthrough

Two changes generalize image-text calibration support in the PTQ pipeline and refine tokenizer truncation handling in VLM dataset processing. The first broadens image-text calibration setup to any model when a flag is set, rather than restricting it to Nemotron VL models. The second prevents truncation parameters from being applied when image-based multimodal processing is active.

Changes

Cohort / File(s) Summary
Image-Text Calibration Condition Broadening
examples/llm_ptq/hf_ptq.py
Removed model-type check from load_model to allow image-text calibration (AutoProcessor initialization with padding_side="left") for any model when --calib_with_images is set; updated associated comment from "For Nemotron VL image calibration" to "For VLM image calibration".
Conditional Tokenizer Truncation
modelopt/torch/utils/vlm_dataset_utils.py
Modified _collate_fn to apply tokenizer truncation parameters only when max_length is provided AND the processor call does not include an "images" key, preventing truncation conflicts in image-based multimodal processing scenarios.

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~10 minutes

🚥 Pre-merge checks | ✅ 5 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 33.33% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (5 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title directly addresses the main change: fixing PTQ for VLMs with image calibration, which matches the core modifications to handle image-based calibration in both the model loading logic and dataset collation.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.
Security Anti-Patterns ✅ Passed PR does not introduce security anti-patterns; trust_remote_code is user-configurable, no unsafe deserialization or forbidden security bypasses present.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch lmikaelyan/fix-vlm-compression

Comment @coderabbitai help to get the list of available commands and usage tips.

@github-actions
Copy link
Copy Markdown
Contributor

PR Preview Action v1.8.1

QR code for preview link

🚀 View preview at
https://NVIDIA.github.io/Model-Optimizer/pr-preview/pr-1318/

Built to branch gh-pages at 2026-04-22 10:51 UTC.
Preview will be ready when the GitHub Pages deployment is complete.

@codecov
Copy link
Copy Markdown

codecov Bot commented Apr 22, 2026

Codecov Report

❌ Patch coverage is 0% with 1 line in your changes missing coverage. Please review.
✅ Project coverage is 75.67%. Comparing base (785d3a2) to head (5dfda82).
⚠️ Report is 1 commits behind head on main.

Files with missing lines Patch % Lines
modelopt/torch/utils/vlm_dataset_utils.py 0.00% 1 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #1318      +/-   ##
==========================================
+ Coverage   74.40%   75.67%   +1.27%     
==========================================
  Files         464      464              
  Lines       50036    50293     +257     
==========================================
+ Hits        37227    38058     +831     
+ Misses      12809    12235     -574     
Flag Coverage Δ
examples 41.53% <0.00%> (+5.49%) ⬆️
gpu 58.59% <0.00%> (-0.51%) ⬇️
regression 14.86% <0.00%> (+0.07%) ⬆️
unit 52.44% <0.00%> (+0.10%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@LianaMikael LianaMikael marked this pull request as ready for review April 22, 2026 11:28
@LianaMikael LianaMikael requested review from a team as code owners April 22, 2026 11:28
@LianaMikael LianaMikael requested a review from sugunav14 April 22, 2026 11:28
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (1)
modelopt/torch/utils/vlm_dataset_utils.py (1)

460-461: The truncation guard is currently a tautology against local state.

Because Line 456 always inserts "images" into kwargs, the Line 460 condition is always false, so max_length is never applied in this path. Consider gating by actual image presence instead of key membership.

Proposed refactor
diff --git a/modelopt/torch/utils/vlm_dataset_utils.py b/modelopt/torch/utils/vlm_dataset_utils.py
@@
-        kwargs: dict[str, Any] = {
-            "text": list(prompts),
-            "images": list(images),
-            "return_tensors": "pt",
-            "padding": True,
-        }
-        if max_length is not None and "images" not in kwargs:
+        has_images = any(img is not None for img in images)
+        kwargs: dict[str, Any] = {
+            "text": list(prompts),
+            "return_tensors": "pt",
+            "padding": True,
+        }
+        if has_images:
+            kwargs["images"] = list(images)
+        if max_length is not None and not has_images:
             kwargs.update({"truncation": True, "max_length": max_length})
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@modelopt/torch/utils/vlm_dataset_utils.py` around lines 460 - 461, The
current guard uses '"images" not in kwargs' which always fails because earlier
code always inserts the "images" key; change the check to detect actual image
data instead of key membership. In the block that sets truncation (referencing
max_length, kwargs and the "images" key), replace the condition with a runtime
presence check such as "if max_length is not None and not kwargs.get('images'):"
(or equivalent that treats empty/None image values as absent) so
max_length/truncation is applied when there truly are no images, not just when
the key is missing.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@examples/llm_ptq/hf_ptq.py`:
- Around line 490-491: The change allows args.calib_with_images to enter the
multimodal branch for any model but the downstream selection still always uses
create_vlm_calibration_loop(...) only for Nemotron, which can misroute batches;
update the calibration loop selection so that when args.calib_with_images is
true you choose the correct loop based on the actual model type (e.g., call
create_vlm_calibration_loop(model, ...) only for models that implement the
Nemotron-style VLM interface and otherwise call the existing
create_calibration_loop(...) or a proper VLM-compatible loop), using the same
identifying symbols args.calib_with_images, create_vlm_calibration_loop,
create_calibration_loop and the model/type check to route multimodal batches to
the appropriate loop.

---

Nitpick comments:
In `@modelopt/torch/utils/vlm_dataset_utils.py`:
- Around line 460-461: The current guard uses '"images" not in kwargs' which
always fails because earlier code always inserts the "images" key; change the
check to detect actual image data instead of key membership. In the block that
sets truncation (referencing max_length, kwargs and the "images" key), replace
the condition with a runtime presence check such as "if max_length is not None
and not kwargs.get('images'):" (or equivalent that treats empty/None image
values as absent) so max_length/truncation is applied when there truly are no
images, not just when the key is missing.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro Plus

Run ID: 9eedd8b2-571e-4f28-9afe-b0dc1908bbca

📥 Commits

Reviewing files that changed from the base of the PR and between c417e6f and 5dfda82.

📒 Files selected for processing (2)
  • examples/llm_ptq/hf_ptq.py
  • modelopt/torch/utils/vlm_dataset_utils.py

Comment on lines +490 to +491
elif args.calib_with_images:
# For VLM image calibration, we need an AutoProcessor to build multimodal inputs.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Broadened image-calibration entrypoint is not matched by downstream loop selection.

After this change, --calib_with_images can enter the multimodal path for any model, but Line 643 still only uses create_vlm_calibration_loop(...) for Nemotron. That can misroute multimodal batches for non-Nemotron VLMs and break calibration.

Proposed fix
diff --git a/examples/llm_ptq/hf_ptq.py b/examples/llm_ptq/hf_ptq.py
@@
-    elif args.calib_with_images:
+    elif args.calib_with_images:
+        if not is_multimodal_model(full_model):
+            raise ValueError("--calib_with_images requires a multimodal/VLM checkpoint.")
         # For VLM image calibration, we need an AutoProcessor to build multimodal inputs.
         processor = AutoProcessor.from_pretrained(
             args.pyt_ckpt_path,
             trust_remote_code=args.trust_remote_code,
             padding_side="left",
         )
@@
-            if args.calib_with_images and is_nemotron_vl_model:
+            if args.calib_with_images and is_multimodal_model(full_model):
                 calibrate_loop = create_vlm_calibration_loop(full_model, calib_dataloader)
             else:
                 calibrate_loop = create_forward_loop(
                     dataloader=calib_dataloader,
                     allowed_non_tensor_keys={"base_model_outputs"}
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@examples/llm_ptq/hf_ptq.py` around lines 490 - 491, The change allows
args.calib_with_images to enter the multimodal branch for any model but the
downstream selection still always uses create_vlm_calibration_loop(...) only for
Nemotron, which can misroute batches; update the calibration loop selection so
that when args.calib_with_images is true you choose the correct loop based on
the actual model type (e.g., call create_vlm_calibration_loop(model, ...) only
for models that implement the Nemotron-style VLM interface and otherwise call
the existing create_calibration_loop(...) or a proper VLM-compatible loop),
using the same identifying symbols args.calib_with_images,
create_vlm_calibration_loop, create_calibration_loop and the model/type check to
route multimodal batches to the appropriate loop.

@kevalmorabia97 kevalmorabia97 added the cherry-pick-0.44.0 After code freeze, cherry-pick to release branch for next rc (bulk update). Only for bug fixes / doc label Apr 22, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

cherry-pick-0.44.0 After code freeze, cherry-pick to release branch for next rc (bulk update). Only for bug fixes / doc

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants