Conversation
Signed-off-by: Liana Mikaelyan <lmikaelyan@nvidia.com>
|
Auto-sync is disabled for draft pull requests in this repository. Workflows must be run manually. Contributors can view more details about this message here. |
📝 WalkthroughWalkthroughTwo changes generalize image-text calibration support in the PTQ pipeline and refine tokenizer truncation handling in VLM dataset processing. The first broadens image-text calibration setup to any model when a flag is set, rather than restricting it to Nemotron VL models. The second prevents truncation parameters from being applied when image-based multimodal processing is active. Changes
Estimated code review effort🎯 2 (Simple) | ⏱️ ~10 minutes 🚥 Pre-merge checks | ✅ 5 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (5 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Comment |
|
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #1318 +/- ##
==========================================
+ Coverage 74.40% 75.67% +1.27%
==========================================
Files 464 464
Lines 50036 50293 +257
==========================================
+ Hits 37227 38058 +831
+ Misses 12809 12235 -574
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
There was a problem hiding this comment.
Actionable comments posted: 1
🧹 Nitpick comments (1)
modelopt/torch/utils/vlm_dataset_utils.py (1)
460-461: The truncation guard is currently a tautology against local state.Because Line 456 always inserts
"images"intokwargs, the Line 460 condition is always false, somax_lengthis never applied in this path. Consider gating by actual image presence instead of key membership.Proposed refactor
diff --git a/modelopt/torch/utils/vlm_dataset_utils.py b/modelopt/torch/utils/vlm_dataset_utils.py @@ - kwargs: dict[str, Any] = { - "text": list(prompts), - "images": list(images), - "return_tensors": "pt", - "padding": True, - } - if max_length is not None and "images" not in kwargs: + has_images = any(img is not None for img in images) + kwargs: dict[str, Any] = { + "text": list(prompts), + "return_tensors": "pt", + "padding": True, + } + if has_images: + kwargs["images"] = list(images) + if max_length is not None and not has_images: kwargs.update({"truncation": True, "max_length": max_length})🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@modelopt/torch/utils/vlm_dataset_utils.py` around lines 460 - 461, The current guard uses '"images" not in kwargs' which always fails because earlier code always inserts the "images" key; change the check to detect actual image data instead of key membership. In the block that sets truncation (referencing max_length, kwargs and the "images" key), replace the condition with a runtime presence check such as "if max_length is not None and not kwargs.get('images'):" (or equivalent that treats empty/None image values as absent) so max_length/truncation is applied when there truly are no images, not just when the key is missing.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@examples/llm_ptq/hf_ptq.py`:
- Around line 490-491: The change allows args.calib_with_images to enter the
multimodal branch for any model but the downstream selection still always uses
create_vlm_calibration_loop(...) only for Nemotron, which can misroute batches;
update the calibration loop selection so that when args.calib_with_images is
true you choose the correct loop based on the actual model type (e.g., call
create_vlm_calibration_loop(model, ...) only for models that implement the
Nemotron-style VLM interface and otherwise call the existing
create_calibration_loop(...) or a proper VLM-compatible loop), using the same
identifying symbols args.calib_with_images, create_vlm_calibration_loop,
create_calibration_loop and the model/type check to route multimodal batches to
the appropriate loop.
---
Nitpick comments:
In `@modelopt/torch/utils/vlm_dataset_utils.py`:
- Around line 460-461: The current guard uses '"images" not in kwargs' which
always fails because earlier code always inserts the "images" key; change the
check to detect actual image data instead of key membership. In the block that
sets truncation (referencing max_length, kwargs and the "images" key), replace
the condition with a runtime presence check such as "if max_length is not None
and not kwargs.get('images'):" (or equivalent that treats empty/None image
values as absent) so max_length/truncation is applied when there truly are no
images, not just when the key is missing.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro Plus
Run ID: 9eedd8b2-571e-4f28-9afe-b0dc1908bbca
📒 Files selected for processing (2)
examples/llm_ptq/hf_ptq.pymodelopt/torch/utils/vlm_dataset_utils.py
| elif args.calib_with_images: | ||
| # For VLM image calibration, we need an AutoProcessor to build multimodal inputs. |
There was a problem hiding this comment.
Broadened image-calibration entrypoint is not matched by downstream loop selection.
After this change, --calib_with_images can enter the multimodal path for any model, but Line 643 still only uses create_vlm_calibration_loop(...) for Nemotron. That can misroute multimodal batches for non-Nemotron VLMs and break calibration.
Proposed fix
diff --git a/examples/llm_ptq/hf_ptq.py b/examples/llm_ptq/hf_ptq.py
@@
- elif args.calib_with_images:
+ elif args.calib_with_images:
+ if not is_multimodal_model(full_model):
+ raise ValueError("--calib_with_images requires a multimodal/VLM checkpoint.")
# For VLM image calibration, we need an AutoProcessor to build multimodal inputs.
processor = AutoProcessor.from_pretrained(
args.pyt_ckpt_path,
trust_remote_code=args.trust_remote_code,
padding_side="left",
)
@@
- if args.calib_with_images and is_nemotron_vl_model:
+ if args.calib_with_images and is_multimodal_model(full_model):
calibrate_loop = create_vlm_calibration_loop(full_model, calib_dataloader)
else:
calibrate_loop = create_forward_loop(
dataloader=calib_dataloader,
allowed_non_tensor_keys={"base_model_outputs"}🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@examples/llm_ptq/hf_ptq.py` around lines 490 - 491, The change allows
args.calib_with_images to enter the multimodal branch for any model but the
downstream selection still always uses create_vlm_calibration_loop(...) only for
Nemotron, which can misroute batches; update the calibration loop selection so
that when args.calib_with_images is true you choose the correct loop based on
the actual model type (e.g., call create_vlm_calibration_loop(model, ...) only
for models that implement the Nemotron-style VLM interface and otherwise call
the existing create_calibration_loop(...) or a proper VLM-compatible loop),
using the same identifying symbols args.calib_with_images,
create_vlm_calibration_loop, create_calibration_loop and the model/type check to
route multimodal batches to the appropriate loop.
What does this PR do?
This PR fixes PTQ with image claibration for VLMs.
Usage
Summary by CodeRabbit