feat: streaming support in m serve OpenAI API server#823
feat: streaming support in m serve OpenAI API server#823markstur merged 23 commits intogenerative-computing:mainfrom
Conversation
|
The PR description has been updated. Please fill out the template for your PR to be reviewed. |
planetf1
left a comment
There was a problem hiding this comment.
noticed a few things to tighten up.
Minor, but looking at the tests - The TestStreamingEndpoint tests (e.g. line 166) are marked @pytest.mark.asyncio and declared async def but contain no await — TestClient is synchronous and doesn't need the marker. TestStreamingHelpers is fine.
Suspect it will still work but it implies the wrong behaviour?
fixed |
|
thanks for the review! Addressed all the comments. See replies if there was some question, but I think they are covered. |
psschwei
left a comment
There was a problem hiding this comment.
couple of things flagged by claude
planetf1
left a comment
There was a problem hiding this comment.
Thanks for the PR — streaming support is well-structured and the test coverage is thorough. Two things worth addressing before merge:
|
Also opened #873 as I noted example is importing from the cli package, which I would suggest is inappropriate - it's not new to this PR though. |
|
Thanks for the review! Even the little things are really good for me to look closer at. |
|
I guess now that another PR merged, I have to get used to saying Assisted-by: IBM Bob |
Fixes: generative-computing#822 Signed-off-by: Mark Sturdevant <mark.sturdevant@ibm.com>
Signed-off-by: Mark Sturdevant <mark.sturdevant@ibm.com>
Signed-off-by: Mark Sturdevant <mark.sturdevant@ibm.com>
Added streaming support w/ setting system_fingerprint. Make it consistent. We are currently just setting it to None but now it is consistent for future use. Signed-off-by: Mark Sturdevant <mark.sturdevant@ibm.com>
Doesn't belong in library. This is for cli. Signed-off-by: Mark Sturdevant <mark.sturdevant@ibm.com>
astream() returns deltas (only new fragments), not accumulated text. Update docstring. Fix unused previous_length in streaming.py. Rename vars for clarity. Fix streaming tests to be consistent with the non-accumulating behavoir. Signed-off-by: Mark Sturdevant <mark.sturdevant@ibm.com>
Adds missing yield of the [DONE] that clients will expect. Signed-off-by: Mark Sturdevant <mark.sturdevant@ibm.com>
Since stream defaults to False, a regression was introduced where stream=False is now passed to backends where it used to be default. Fix is to only forward stream=True and not add stream=False. Signed-off-by: Mark Sturdevant <mark.sturdevant@ibm.com>
Stuff was marked asyncio and async def when it didn't need to be. Moved the usage tests a little for readability. Signed-off-by: Mark Sturdevant <mark.sturdevant@ibm.com>
In m serve, usage was included for backward compatibility but this is a new feature so that's not an issue. Instead the OpenAI spec is what we want to follow. Signed-off-by: Mark Sturdevant <mark.sturdevant@ibm.com>
Signed-off-by: Mark Sturdevant <mark.sturdevant@ibm.com>
Signed-off-by: Mark Sturdevant <mark.sturdevant@ibm.com>
Use content=None instead of content="" to be more correct with OpenAI API. Remove unneeded check for "" in the test. Signed-off-by: Mark Sturdevant <mark.sturdevant@ibm.com>
Test-first for... Bug where we get into a loop without checking for this first. Signed-off-by: Mark Sturdevant <mark.sturdevant@ibm.com>
Fix bug where we get into loop despite already having computed chunk. Update test which was written to test existing bug -- use wording that makes it current and not referencing the non-existent bug. Make the asserts more precise with expected chunks. Signed-off-by: Mark Sturdevant <mark.sturdevant@ibm.com>
Signed-off-by: Mark Sturdevant <mark.sturdevant@ibm.com>
…vior The default is now to only include usage when it is asked for when streaming. This is consistent with the OpenAI API spec. Signed-off-by: Mark Sturdevant <mark.sturdevant@ibm.com>
Signed-off-by: Mark Sturdevant <mark.sturdevant@ibm.com>
Following OpenAI API spec and better for validation. Signed-off-by: Mark Sturdevant <mark.sturdevant@ibm.com>
Some validation tests including how coercion from 1 or "true" to True is handled. Signed-off-by: Mark Sturdevant <mark.sturdevant@ibm.com>
Signed-off-by: Mark Sturdevant <mark.sturdevant@ibm.com>
Signed-off-by: Mark Sturdevant <mark.sturdevant@ibm.com>
Signed-off-by: Mark Sturdevant <mark.sturdevant@ibm.com>
planetf1
left a comment
There was a problem hiding this comment.
All looks good! Thanks for addressing (and followup)
ec8cc18
Misc PR
Type of PR
Description
Add OpenAI API compatible support for streaming in
m serveapp.Testing