feat: add embedded adapters (granite switch) to openai backend#881
feat: add embedded adapters (granite switch) to openai backend#881jakelorocco wants to merge 17 commits intomainfrom
Conversation
Enable calling intrinsics on Granite Switch models via the OpenAI backend. Granite Switch models embed adapter weights directly in the checkpoint and activate them via chat template control tokens, so no PEFT loading is needed. - Add EmbeddedIntrinsicAdapter class that carries only I/O config (no weights) with factory methods to load from a model directory or HuggingFace Hub - Add register_granite_switch_model() and add_embedded_adapter() to OpenAIBackend - Add _generate_from_intrinsic() that reuses IntrinsicsRewriter/ResultProcessor and injects intrinsic_name into chat_template_kwargs for the switch model - Ensure serialized messages always include 'role' (latent issue in rewriter instruction messages, newly exposed by OpenAI API serialization path) - Add unit tests for adapter loading, registration, and rewriting Signed-off-by: lastrasl <lastrasl@us.ibm.com>
Signed-off-by: Jake LoRocco <jake.lorocco@ibm.com> Assisted-by: CLAUDE:OPUS
Signed-off-by: Jake LoRocco <jake.lorocco@ibm.com> Assisted-by: CLAUDE:OPUS
|
The PR description has been updated. Please fill out the template for your PR to be reviewed. |
|
I'm still in the process of testing some of these changes from an e2e perspective. I also still need to add documentation for how to use embedded models along with examples. These changes do seem to work though and allow utilizing intrinsics with an embedded adapter / granite switch model. |
|
Bug: The Fix: mirror the if ModelOption.TEMPERATURE in model_options:
api_params["temperature"] = model_options[ModelOption.TEMPERATURE]Verified: with this fix, 5 back-to-back pipeline runs (guardian → rewrite → answerability → QC → base model, no sleep between calls) produce bit-for-bit identical outputs including long-form base model answers. Without the fix, outputs vary run-to-run despite |
|
@lastras Your comment states that all model options get dropped but that the fix is to add (another) special case for It sounds from your comment like there's an underlying bug here that should be fixed. Can you clarify? |
Signed-off-by: Jake LoRocco <jake.lorocco@ibm.com>
Signed-off-by: Jake LoRocco <jake.lorocco@ibm.com> Assisted-by: CLAUDE:OPUS
Signed-off-by: Jake LoRocco <jake.lorocco@ibm.com> Assisted-by: CLAUDE:OPUS
Signed-off-by: Jake LoRocco <jake.lorocco@ibm.com> Assisted-by: CLAUDE:OPUS
Signed-off-by: Jake LoRocco <jake.lorocco@ibm.com> Assisted-by: CLAUDE:OPUS
Signed-off-by: Jake LoRocco <jake.lorocco@ibm.com>
Signed-off-by: Jake LoRocco <jake.lorocco@ibm.com> Assisted-by: CLAUDE:OPUS
c0b163c to
3227a5b
Compare
Signed-off-by: Jake LoRocco <jake.lorocco@ibm.com>
|
I fixed a few random issues and also the issues we talked about (call_intrinsic and seed/temperature setting). I'm working on adding some documentation and minor examples. The functional parts of the code should be stable now. I have additional things that I'd like to clean up, but I will delay them until after we make sure everything is working as intended and post-release, so I don't impact any functionality. |
Signed-off-by: Jake LoRocco <jake.lorocco@ibm.com>
Signed-off-by: Jake LoRocco <jake.lorocco@ibm.com> Assisted-by: CLAUDE:OPUS
Signed-off-by: Jake LoRocco <jake.lorocco@ibm.com> Assisted-by: CLAUDE:OPUS
Signed-off-by: Jake LoRocco <jake.lorocco@ibm.com> Assisted-by: CLAUDE:OPUS
|
Updated the PR description. Fixed a few additional errors / bugs. Added documentation and examples. We should be good to start reviewing. I need to run one more round of tests with the latest from the granite-switch team. I have plans to open a few additional issues to clean things up after this PR is merged. Additional changes that should be made before the official release:
|
ajbozarth
left a comment
There was a problem hiding this comment.
Good overall direction — the EmbeddedIntrinsicAdapter class is well-structured and the design of delegating I/O config loading to the existing io.yaml / IntrinsicsRewriter machinery makes sense. A few issues need addressing before this merges, ranging from a runtime bug to a dependency footprint concern.
|
This is pretty cool, I had Claude explain it to me along with its review. Afaik the review comments look important to address, but I didn't block on them just incase some are intentional. I also ran |
Signed-off-by: Jake LoRocco <jake.lorocco@ibm.com> Assisted-by: CLAUDE:OPUS
|
Thank you for the initial review @ajbozarth; I've addressed all the changes that I believe were necessary and commented on the others with explanations for why no changes were made. I also confirmed that all the vllm tests run and that the new examples do as well. |
|
@jakelorocco looked at the change - I see backwards compatibility with |
I will update that shortly then. Your recommended fix mentioned supporting both so I went ahead and did it that. I will move the change to be just looking for the new key in the entry dict. |
|
Yes, apologies for that overreach - simpler, easier. |
Signed-off-by: Jake LoRocco <jake.lorocco@ibm.com> Assisted-by: CLAUDE:OPUS
74ad5f8 to
6bb2dd4
Compare
|
Pushed the updated changes. Granite switch tests and examples ran successfully. |
|
Do we need to add any documentation for granite-switch in our mellea docs? |
I think we should at least add pointers to the official documentation. Otherwise, no one will be able to utilize these changes in Mellea. |
|
PR is looking good, I had Claude draft up a summary of my review status Thanks for addressing the feedback — the critical bugs (temperature drop, STREAM guard, Two things before I'll approve:
Three deferred items I'm fine leaving post-merge, but please open tracking issues for
|
Misc PR
Type of PR
Description
Adds support for granite switch models for OpenAIBackends.
Changes:
load_embedded_adapters=Trueat init.IBM_GRANITE_SWITCH_4_1_8Bthat points to the temp repo for now.Please see most recent comment below for additional context: #881 (comment)
Testing
Attribution