Skip to content

feat: add embedded adapters (granite switch) to openai backend#881

Open
jakelorocco wants to merge 17 commits intomainfrom
jal/granite-switch-intrinsics
Open

feat: add embedded adapters (granite switch) to openai backend#881
jakelorocco wants to merge 17 commits intomainfrom
jal/granite-switch-intrinsics

Conversation

@jakelorocco
Copy link
Copy Markdown
Contributor

@jakelorocco jakelorocco commented Apr 17, 2026

Misc PR

Type of PR

  • Bug Fix
  • New Feature
  • Documentation
  • Other

Description

  • Link to Issue: Fixes N/A

Adds support for granite switch models for OpenAIBackends.

Changes:

  • OpenAIBackends can utilize EmbeddedIntrinsicAdapters with any of the regular intrinsic functionality.
  • Adapters are loaded by default when load_embedded_adapters=True at init.
  • Added a placeholder IBM_GRANITE_SWITCH_4_1_8B that points to the temp repo for now.
  • Added documentation, examples, tests.
  • When generating from an intrinsic, we now only grab temperature and seed from model options. This is also correctly noted.
  • call_intrinsic function now works with both OpenAIBackends and LocalHFBackends

Please see most recent comment below for additional context: #881 (comment)

Testing

  • Tests added to the respective file if code was changed
  • New code has 100% coverage if code as added
  • Ensure existing tests and github automation passes (a maintainer will kick off the github automation when the rest of the PR is populated)

Attribution

  • AI coding assistants used

lastrasl@us.ibm.com;4A8621897;Luis Lastras and others added 3 commits April 10, 2026 18:11
Enable calling intrinsics on Granite Switch models via the OpenAI backend.
Granite Switch models embed adapter weights directly in the checkpoint and
activate them via chat template control tokens, so no PEFT loading is needed.

- Add EmbeddedIntrinsicAdapter class that carries only I/O config (no weights)
  with factory methods to load from a model directory or HuggingFace Hub
- Add register_granite_switch_model() and add_embedded_adapter() to OpenAIBackend
- Add _generate_from_intrinsic() that reuses IntrinsicsRewriter/ResultProcessor
  and injects intrinsic_name into chat_template_kwargs for the switch model
- Ensure serialized messages always include 'role' (latent issue in rewriter
  instruction messages, newly exposed by OpenAI API serialization path)
- Add unit tests for adapter loading, registration, and rewriting

Signed-off-by: lastrasl <lastrasl@us.ibm.com>
Signed-off-by: Jake LoRocco <jake.lorocco@ibm.com>
Assisted-by: CLAUDE:OPUS
Signed-off-by: Jake LoRocco <jake.lorocco@ibm.com>
Assisted-by: CLAUDE:OPUS
@github-actions github-actions Bot added the enhancement New feature or request label Apr 17, 2026
@github-actions
Copy link
Copy Markdown
Contributor

The PR description has been updated. Please fill out the template for your PR to be reviewed.

@jakelorocco
Copy link
Copy Markdown
Contributor Author

I'm still in the process of testing some of these changes from an e2e perspective. I also still need to add documentation for how to use embedded models along with examples.

These changes do seem to work though and allow utilizing intrinsics with an embedded adapter / granite switch model.

@jakelorocco jakelorocco changed the title feat: add embedded adapters to openai backend feat: add embedded adapters (granite switch) to openai backend Apr 17, 2026
@lastras
Copy link
Copy Markdown

lastras commented Apr 19, 2026

Bug: model_options temperature is silently dropped for intrinsic calls

The seed parameter is correctly forwarded to api_params at line 639, but temperature (and any other model_options entries) are not. They are written into request_json at line 617, which gets absorbed into the ChatCompletion object by rewriter.transform(), but never extracted back out when building api_params. The result is that callers passing model_options={ModelOption.TEMPERATURE: 0.0} get the model's default temperature instead — causing non-deterministic outputs even when greedy decoding is explicitly requested.

Fix: mirror the seed pattern immediately after line 640:

if ModelOption.TEMPERATURE in model_options:
    api_params["temperature"] = model_options[ModelOption.TEMPERATURE]

Verified: with this fix, 5 back-to-back pipeline runs (guardian → rewrite → answerability → QC → base model, no sleep between calls) produce bit-for-bit identical outputs including long-form base model answers. Without the fix, outputs vary run-to-run despite temperature=0.0 being passed by the caller. The warning at line 557 ("some model options may be overwritten / ignored") is technically accurate but understated — temperature is always silently dropped, not just sometimes.

@nrfulton
Copy link
Copy Markdown
Member

@lastras Your comment states that all model options get dropped but that the fix is to add (another) special case for temperature in model_options.

It sounds from your comment like there's an underlying bug here that should be fixed. Can you clarify?

@jakelorocco
Copy link
Copy Markdown
Contributor Author

@lastras @nrfulton, I am going to move this convo to slack for faster resolution.

Signed-off-by: Jake LoRocco <jake.lorocco@ibm.com>
Signed-off-by: Jake LoRocco <jake.lorocco@ibm.com>
Assisted-by: CLAUDE:OPUS
@jakelorocco jakelorocco self-assigned this Apr 20, 2026
Signed-off-by: Jake LoRocco <jake.lorocco@ibm.com>
Assisted-by: CLAUDE:OPUS
Signed-off-by: Jake LoRocco <jake.lorocco@ibm.com>
Assisted-by: CLAUDE:OPUS
Signed-off-by: Jake LoRocco <jake.lorocco@ibm.com>
Assisted-by: CLAUDE:OPUS
Signed-off-by: Jake LoRocco <jake.lorocco@ibm.com>
Signed-off-by: Jake LoRocco <jake.lorocco@ibm.com>
Assisted-by: CLAUDE:OPUS
@jakelorocco jakelorocco force-pushed the jal/granite-switch-intrinsics branch from c0b163c to 3227a5b Compare April 20, 2026 18:20
Signed-off-by: Jake LoRocco <jake.lorocco@ibm.com>
@jakelorocco
Copy link
Copy Markdown
Contributor Author

I fixed a few random issues and also the issues we talked about (call_intrinsic and seed/temperature setting). I'm working on adding some documentation and minor examples.

The functional parts of the code should be stable now. I have additional things that I'd like to clean up, but I will delay them until after we make sure everything is working as intended and post-release, so I don't impact any functionality.

Signed-off-by: Jake LoRocco <jake.lorocco@ibm.com>
Signed-off-by: Jake LoRocco <jake.lorocco@ibm.com>
Assisted-by: CLAUDE:OPUS
Signed-off-by: Jake LoRocco <jake.lorocco@ibm.com>
Assisted-by: CLAUDE:OPUS
Signed-off-by: Jake LoRocco <jake.lorocco@ibm.com>
Assisted-by: CLAUDE:OPUS
@jakelorocco
Copy link
Copy Markdown
Contributor Author

jakelorocco commented Apr 20, 2026

Updated the PR description. Fixed a few additional errors / bugs. Added documentation and examples. We should be good to start reviewing. I need to run one more round of tests with the latest from the granite-switch team.

I have plans to open a few additional issues to clean things up after this PR is merged.

Additional changes that should be made before the official release:

  • Update IBM_GRANITE_SWITCH_4_1_3B to point to the correct model and update all references to the model
  • Update our documentation to point to the official granite switch documentation when published

@jakelorocco jakelorocco marked this pull request as ready for review April 20, 2026 21:45
@jakelorocco jakelorocco requested a review from a team as a code owner April 20, 2026 21:45
Copy link
Copy Markdown
Contributor

@ajbozarth ajbozarth left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good overall direction — the EmbeddedIntrinsicAdapter class is well-structured and the design of delegating I/O config loading to the existing io.yaml / IntrinsicsRewriter machinery makes sense. A few issues need addressing before this merges, ranging from a runtime bug to a dependency footprint concern.

Comment thread pyproject.toml Outdated
Comment thread mellea/backends/model_ids.py Outdated
Comment thread mellea/backends/openai.py
Comment thread mellea/backends/openai.py Outdated
Comment thread mellea/backends/openai.py
Comment thread mellea/backends/openai.py Outdated
Comment thread mellea/stdlib/components/intrinsic/_util.py
Comment thread mellea/backends/adapters/adapter.py
Comment thread mellea/backends/adapters/adapter.py Outdated
Comment thread test/backends/test_openai_intrinsics.py
@ajbozarth
Copy link
Copy Markdown
Contributor

This is pretty cool, I had Claude explain it to me along with its review. Afaik the review comments look important to address, but I didn't block on them just incase some are intentional. I also ran uv run pytest locally and it failed on the same test as CI

Comment thread mellea/backends/adapters/adapter.py Outdated
Signed-off-by: Jake LoRocco <jake.lorocco@ibm.com>
Assisted-by: CLAUDE:OPUS
@jakelorocco
Copy link
Copy Markdown
Contributor Author

jakelorocco commented Apr 21, 2026

Thank you for the initial review @ajbozarth; I've addressed all the changes that I believe were necessary and commented on the others with explanations for why no changes were made.

I also confirmed that all the vllm tests run and that the new examples do as well.

@jakelorocco jakelorocco requested a review from ajbozarth April 21, 2026 14:11
@lastras
Copy link
Copy Markdown

lastras commented Apr 21, 2026

@jakelorocco looked at the change - I see backwards compatibility with intrinsic_name - do note that from the granite switch perspective, there's no need to maintain that compatibility.

@jakelorocco
Copy link
Copy Markdown
Contributor Author

@jakelorocco looked at the change - I see backwards compatibility with intrinsic_name - do note that from the granite switch perspective, there's no need to maintain that compatibility.

I will update that shortly then. Your recommended fix mentioned supporting both so I went ahead and did it that. I will move the change to be just looking for the new key in the entry dict.

@lastras
Copy link
Copy Markdown

lastras commented Apr 21, 2026

Yes, apologies for that overreach - simpler, easier.

Signed-off-by: Jake LoRocco <jake.lorocco@ibm.com>
Assisted-by: CLAUDE:OPUS
@jakelorocco jakelorocco force-pushed the jal/granite-switch-intrinsics branch from 74ad5f8 to 6bb2dd4 Compare April 21, 2026 15:04
@jakelorocco
Copy link
Copy Markdown
Contributor Author

Pushed the updated changes. Granite switch tests and examples ran successfully.

@psschwei
Copy link
Copy Markdown
Member

Do we need to add any documentation for granite-switch in our mellea docs?

@jakelorocco
Copy link
Copy Markdown
Contributor Author

Do we need to add any documentation for granite-switch in our mellea docs?

I think we should at least add pointers to the official documentation. Otherwise, no one will be able to utilize these changes in Mellea.

@ajbozarth
Copy link
Copy Markdown
Contributor

PR is looking good, I had Claude draft up a summary of my review status


Thanks for addressing the feedback — the critical bugs (temperature drop, STREAM guard,
exception chaining, hard dependency) are all fixed and the new unit tests cover the
generate path well.

Two things before I'll approve:

  1. IBM_GRANITE_SWITCH_4_1_3B personal repo placeholder — still points to
    GrizleeBer/gs-test-1. I'd like to see this updated (or at minimum have a GitHub
    release-blocker issue filed so it can't ship in a release) before approving.

  2. Docs pointer@psschwei raised this and I agree. Even a stub page or a note in
    the existing backends guide saying "Granite Switch support is available; official docs
    at [link when published]" would be enough. Worth resolving before merge so users
    know the feature exists.

Three deferred items I'm fine leaving post-merge, but please open tracking issues for
them so they don't get lost:

  • _uses_embedded_adapters = True unconditionally on all OpenAIBackend instances
    (openai.py:226) — the TODO comment is there but
    a GitHub issue with the proposed interface change would be better tracking.
  • _uses_embedded_adapters not on the AdapterMixin interface
    (_util.py:45) — same, needs an
    issue.
  • self.intrinsic_name / self.name ambiguity on EmbeddedIntrinsicAdapter
    (adapter.py:338) — you mentioned opening
    one, just flagging it here for the record.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants