feat: cli OpenAI-compatible API response_format support#884
feat: cli OpenAI-compatible API response_format support#884markstur wants to merge 3 commits intogenerative-computing:mainfrom
response_format support#884Conversation
- Added `JsonSchemaFormat` model to represent JSON schema definitions
- Extended `ResponseFormat` to support `json_schema` type (in addition to existing `text` and `json_object`)
- Used field alias to avoid conflict with Pydantic's `schema` method
- Added `_json_schema_to_pydantic()` utility function to dynamically convert JSON schemas to Pydantic models
- Updated `_build_model_options()` to exclude `response_format` from model options (handled separately)
- Modified `make_chat_endpoint()` to:
- Parse `response_format` from requests
- Convert `json_schema` type to Pydantic models using the utility function
- Detect if the serve function accepts a `format` parameter using `inspect.signature()`
- Pass the generated Pydantic model as `format=` parameter to serve functions that support it
- Handle backward compatibility with serve functions that don't accept `format`
- Added proper error handling for invalid schemas
- Test json_schema format is converted to Pydantic model and passed to serve
- Test json_object format doesn't pass a schema
- Test text format doesn't pass a schema
- Test error handling for missing json_schema field
- Test error handling for invalid JSON schemas
- Test backward compatibility with serve functions without format parameter
- Test optional fields in JSON schemas
When a client sends a request with `response_format.type = "json_schema"`, the server:
1. Extracts the JSON schema from `response_format.json_schema.schema`
2. Dynamically creates a Pydantic model from the schema
3. Passes it as the `format=` parameter to the serve function
4. The serve function can then use this for constrained decoding via Mellea's `instruct()` method
This maps OpenAI's `response_format` API to Mellea's native `format=` parameter for structured output.
Signed-off-by: Mark Sturdevant <mark.sturdevant@ibm.com>
Signed-off-by: Mark Sturdevant <mark.sturdevant@ibm.com>
Signed-off-by: Mark Sturdevant <mark.sturdevant@ibm.com>
|
The PR description has been updated. Please fill out the template for your PR to be reviewed. |
response_format support
| @@ -186,6 +289,7 @@ async def endpoint(request: ChatCompletionRequest): | |||
| created=created_timestamp, | |||
| stream_options=request.stream_options, | |||
| system_fingerprint=system_fingerprint, | |||
There was a problem hiding this comment.
The non-streaming path (return ChatCompletion just below, line 297) returns output.value without validating against format_model. Suggest adding before that return (also needs import json at the top and ValidationError added to the pydantic import):
if format_model is not None and output.value is not None:
try:
format_model.model_validate(json.loads(output.value))
except (json.JSONDecodeError, ValidationError) as e:
return create_openai_error_response(
status_code=400,
message=f"Output does not match required schema: {e!s}",
error_type="invalid_response_error",
)| ) | ||
| yield f"data: {chunk.model_dump_json()}\n\n" | ||
|
|
||
| # Validate format if format_model is provided |
There was a problem hiding this comment.
Validation runs after all content chunks are already sent (lines 68–106), so the error arrives after the client has consumed the data. A few options:
- Buffer when
format_modelis set, validate, then stream or error before emitting anything. - Return a 400 upfront when
stream=True+json_schema— simplest for now. - Keep post-hoc but document it — callers can pass
format=to the backend for constrained decoding instead.
| ) | ||
|
|
||
|
|
||
| def _json_schema_to_pydantic( |
There was a problem hiding this comment.
this handles type, but will not handle enum, additionalProperties, nested types, array, $ref, allOf, anyOf
Suggest clarifying caveats in comments? or figuring out if any more validation is viable
|
|
||
| # Check if serve function accepts format parameter | ||
| serve_sig = inspect.signature(module.serve) | ||
| accepts_format = "format" in serve_sig.parameters |
There was a problem hiding this comment.
cacheable/could be done up front? Here it's done in every request but won't change?
| schema_: dict[str, Any] = Field(alias="schema") | ||
| """JSON Schema definition.""" | ||
|
|
||
| strict: bool | None = None |
There was a problem hiding this comment.
not used? See related comment - more is needed to really be strict or at least clarify behaviour?
| accepts_format = "format" in serve_sig.parameters | ||
|
|
||
| # Detect if serve is async or sync and handle accordingly | ||
| if inspect.iscoroutinefunction(module.serve): |
There was a problem hiding this comment.
similar (not identical) code is repeated multiple times - possible opportunity for making common - minor.
Misc PR
Type of PR
Description
Testing
Attribution