Fix: provide valid Prompt and Completion Token usage counts from create_stream (#3972)

* Fix: `create_stream` to return valid usage token counts
* documentation

---------

Co-authored-by: Eric Zhu <ekzhu@users.noreply.github.com>
This commit is contained in:
Anthony Uphof 2024-10-30 12:20:03 +13:00 committed by GitHub
parent bd9c371605
commit 87bd1de396
No known key found for this signature in database
GPG Key ID: B5690EEEBB952194
4 changed files with 341 additions and 43 deletions

View File

@ -74,6 +74,24 @@
"print(response.content)"
]
},
{
"cell_type": "code",
"execution_count": 3,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"RequestUsage(prompt_tokens=15, completion_tokens=7)\n"
]
}
],
"source": [
"# Print the response token usage\n",
"print(response.usage)"
]
},
{
"cell_type": "markdown",
"metadata": {},
@ -86,7 +104,7 @@
},
{
"cell_type": "code",
"execution_count": 3,
"execution_count": 6,
"metadata": {},
"outputs": [
{
@ -94,24 +112,26 @@
"output_type": "stream",
"text": [
"Streamed responses:\n",
"In a secluded valley where the sun painted the sky with hues of gold, a solitary dragon named Bremora stood guard. Her emerald scales shimmered with an ancient light as she watched over the village below. Unlike her fiery kin, Bremora had no desire for destruction; her soul was bound by a promise to protect.\n",
"In the heart of an ancient forest, beneath the shadow of snow-capped peaks, a dragon named Elara lived secretly for centuries. Elara was unlike any dragon from the old tales; her scales shimmered with a deep emerald hue, each scale engraved with symbols of lost wisdom. The villagers in the nearby valley spoke of mysterious lights dancing across the night sky, but none dared venture close enough to solve the enigma.\n",
"\n",
"Generations ago, a wise elder had befriended Bremora, offering her companionship instead of fear. In gratitude, she vowed to shield the village from calamity. Years passed, and children grew up believing in the legends of a watchful dragon who brought them prosperity and peace.\n",
"One cold winter's eve, a young girl named Lira, brimming with curiosity and armed with the innocence of youth, wandered into Elaras domain. Instead of fire and fury, she found warmth and a gentle gaze. The dragon shared stories of a world long forgotten and in return, Lira gifted her simple stories of human life, rich in laughter and scent of earth.\n",
"\n",
"One summer, an ominous storm threatened the valley, with ravenous winds and torrents of rain. Bremora rose into the tempest, her mighty wings defying the chaos. She channeled her breath—not of fire, but of warmth and tranquility—calming the storm and saving her cherished valley.\n",
"\n",
"When dawn broke and the village emerged unscathed, the people looked to the sky. There, Bremora soared gracefully, a guardian spirit woven into their lives, silently promising her eternal vigilance.\n",
"From that night on, the villagers noticed subtle changes—the crops grew taller, and the air seemed sweeter. Elara had infused the valley with ancient magic, a guardian of balance, watching quietly as her new friend thrived under the stars. And so, Lira and Elaras bond marked the beginning of a timeless friendship that spun tales of hope whispered through the leaves of the ever-verdant forest.\n",
"\n",
"------------\n",
"\n",
"The complete response:\n",
"In a secluded valley where the sun painted the sky with hues of gold, a solitary dragon named Bremora stood guard. Her emerald scales shimmered with an ancient light as she watched over the village below. Unlike her fiery kin, Bremora had no desire for destruction; her soul was bound by a promise to protect.\n",
"In the heart of an ancient forest, beneath the shadow of snow-capped peaks, a dragon named Elara lived secretly for centuries. Elara was unlike any dragon from the old tales; her scales shimmered with a deep emerald hue, each scale engraved with symbols of lost wisdom. The villagers in the nearby valley spoke of mysterious lights dancing across the night sky, but none dared venture close enough to solve the enigma.\n",
"\n",
"Generations ago, a wise elder had befriended Bremora, offering her companionship instead of fear. In gratitude, she vowed to shield the village from calamity. Years passed, and children grew up believing in the legends of a watchful dragon who brought them prosperity and peace.\n",
"One cold winter's eve, a young girl named Lira, brimming with curiosity and armed with the innocence of youth, wandered into Elaras domain. Instead of fire and fury, she found warmth and a gentle gaze. The dragon shared stories of a world long forgotten and in return, Lira gifted her simple stories of human life, rich in laughter and scent of earth.\n",
"\n",
"One summer, an ominous storm threatened the valley, with ravenous winds and torrents of rain. Bremora rose into the tempest, her mighty wings defying the chaos. She channeled her breath—not of fire, but of warmth and tranquility—calming the storm and saving her cherished valley.\n",
"From that night on, the villagers noticed subtle changes—the crops grew taller, and the air seemed sweeter. Elara had infused the valley with ancient magic, a guardian of balance, watching quietly as her new friend thrived under the stars. And so, Lira and Elaras bond marked the beginning of a timeless friendship that spun tales of hope whispered through the leaves of the ever-verdant forest.\n",
"\n",
"When dawn broke and the village emerged unscathed, the people looked to the sky. There, Bremora soared gracefully, a guardian spirit woven into their lives, silently promising her eternal vigilance.\n"
"\n",
"------------\n",
"\n",
"The token usage was:\n",
"RequestUsage(prompt_tokens=0, completion_tokens=0)\n"
]
}
],
@ -133,7 +153,10 @@
" # The last response is a CreateResult object with the complete message.\n",
" print(\"\\n\\n------------\\n\")\n",
" print(\"The complete response:\", flush=True)\n",
" print(response.content, flush=True)"
" print(response.content, flush=True)\n",
" print(\"\\n\\n------------\\n\")\n",
" print(\"The token usage was:\", flush=True)\n",
" print(response.usage, flush=True)"
]
},
{
@ -143,7 +166,86 @@
"```{note}\n",
"The last response in the streaming response is always the final response\n",
"of the type {py:class}`~autogen_core.components.models.CreateResult`.\n",
"```"
"```\n",
"\n",
"**NB the default usage response is to return zero values**"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### A Note on Token usage counts with streaming example\n",
"Comparing usage returns in the above Non Streaming `model_client.create(messages=messages)` vs streaming `model_client.create_stream(messages=messages)` we see differences.\n",
"The non streaming response by default returns valid prompt and completion token usage counts. \n",
"The streamed response by default returns zero values.\n",
"\n",
"as documented in the OPENAI API Reference an additional parameter `stream_options` can be specified to return valid usage counts. see [stream_options](https://platform.openai.com/docs/api-reference/chat/create#chat-create-stream_options)\n",
"\n",
"Only set this when you using streaming ie , using `create_stream` \n",
"\n",
"to enable this in `create_stream` set `extra_create_args={\"stream_options\": {\"include_usage\": True}},`\n",
"\n",
"- **Note whilst other API's like LiteLLM also support this, it is not always guarenteed that it is fully supported or correct**\n",
"\n",
"#### Streaming example with token usage\n"
]
},
{
"cell_type": "code",
"execution_count": 7,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Streamed responses:\n",
"In a lush, emerald valley hidden by towering peaks, there lived a dragon named Ember. Unlike others of her kind, Ember cherished solitude over treasure, and the songs of the stream over the roar of flames. One misty dawn, a young shepherd stumbled into her sanctuary, lost and frightened. \n",
"\n",
"Instead of fury, he was met with kindness as Ember extended a wing, guiding him back to safety. In gratitude, the shepherd visited yearly, bringing tales of his world beyond the mountains. Over time, a friendship blossomed, binding man and dragon in shared stories and laughter.\n",
"\n",
"As the years passed, the legend of Ember the gentle-hearted spread far and wide, forever changing the way dragons were seen in the hearts of many.\n",
"\n",
"------------\n",
"\n",
"The complete response:\n",
"In a lush, emerald valley hidden by towering peaks, there lived a dragon named Ember. Unlike others of her kind, Ember cherished solitude over treasure, and the songs of the stream over the roar of flames. One misty dawn, a young shepherd stumbled into her sanctuary, lost and frightened. \n",
"\n",
"Instead of fury, he was met with kindness as Ember extended a wing, guiding him back to safety. In gratitude, the shepherd visited yearly, bringing tales of his world beyond the mountains. Over time, a friendship blossomed, binding man and dragon in shared stories and laughter.\n",
"\n",
"As the years passed, the legend of Ember the gentle-hearted spread far and wide, forever changing the way dragons were seen in the hearts of many.\n",
"\n",
"\n",
"------------\n",
"\n",
"The token usage was:\n",
"RequestUsage(prompt_tokens=17, completion_tokens=146)\n"
]
}
],
"source": [
"messages = [\n",
" UserMessage(content=\"Write a very short story about a dragon.\", source=\"user\"),\n",
"]\n",
"\n",
"# Create a stream.\n",
"stream = model_client.create_stream(messages=messages, extra_create_args={\"stream_options\": {\"include_usage\": True}})\n",
"\n",
"# Iterate over the stream and print the responses.\n",
"print(\"Streamed responses:\")\n",
"async for response in stream: # type: ignore\n",
" if isinstance(response, str):\n",
" # A partial response is a string.\n",
" print(response, flush=True, end=\"\")\n",
" else:\n",
" # The last response is a CreateResult object with the complete message.\n",
" print(\"\\n\\n------------\\n\")\n",
" print(\"The complete response:\", flush=True)\n",
" print(response.content, flush=True)\n",
" print(\"\\n\\n------------\\n\")\n",
" print(\"The token usage was:\", flush=True)\n",
" print(response.usage, flush=True)"
]
},
{
@ -234,7 +336,8 @@
"from autogen_core.application import SingleThreadedAgentRuntime\n",
"from autogen_core.base import MessageContext\n",
"from autogen_core.components import RoutedAgent, message_handler\n",
"from autogen_core.components.models import ChatCompletionClient, OpenAIChatCompletionClient, SystemMessage, UserMessage\n",
"from autogen_core.components.models import ChatCompletionClient, SystemMessage, UserMessage\n",
"from autogen_ext.models import OpenAIChatCompletionClient\n",
"\n",
"\n",
"@dataclass\n",

View File

@ -39,6 +39,7 @@ from openai.types.chat import (
completion_create_params,
)
from openai.types.chat.chat_completion import Choice
from openai.types.chat.chat_completion_chunk import Choice as ChunkChoice
from openai.types.shared_params import FunctionDefinition, FunctionParameters
from pydantic import BaseModel
from typing_extensions import Unpack
@ -555,6 +556,31 @@ class BaseOpenAIChatCompletionClient(ChatCompletionClient):
extra_create_args: Mapping[str, Any] = {},
cancellation_token: Optional[CancellationToken] = None,
) -> AsyncGenerator[Union[str, CreateResult], None]:
"""
Creates an AsyncGenerator that will yield a stream of chat completions based on the provided messages and tools.
Args:
messages (Sequence[LLMMessage]): A sequence of messages to be processed.
tools (Sequence[Tool | ToolSchema], optional): A sequence of tools to be used in the completion. Defaults to `[]`.
json_output (Optional[bool], optional): If True, the output will be in JSON format. Defaults to None.
extra_create_args (Mapping[str, Any], optional): Additional arguments for the creation process. Default to `{}`.
cancellation_token (Optional[CancellationToken], optional): A token to cancel the operation. Defaults to None.
Yields:
AsyncGenerator[Union[str, CreateResult], None]: A generator yielding the completion results as they are produced.
In streaming, the default behaviour is not return token usage counts. See: [OpenAI API reference for possible args](https://platform.openai.com/docs/api-reference/chat/create).
However `extra_create_args={"stream_options": {"include_usage": True}}` will (if supported by the accessed API)
return a final chunk with usage set to a RequestUsage object having prompt and completion token counts,
all preceding chunks will have usage as None. See: [stream_options](https://platform.openai.com/docs/api-reference/chat/create#chat-create-stream_options).
Other examples of OPENAI supported arguments that can be included in `extra_create_args`:
- `temperature` (float): Controls the randomness of the output. Higher values (e.g., 0.8) make the output more random, while lower values (e.g., 0.2) make it more focused and deterministic.
- `max_tokens` (int): The maximum number of tokens to generate in the completion.
- `top_p` (float): An alternative to sampling with temperature, called nucleus sampling, where the model considers the results of the tokens with top_p probability mass.
- `frequency_penalty` (float): A value between -2.0 and 2.0 that penalizes new tokens based on their existing frequency in the text so far, decreasing the likelihood of repeated phrases.
- `presence_penalty` (float): A value between -2.0 and 2.0 that penalizes new tokens based on whether they appear in the text so far, encouraging the model to talk about new topics.
"""
# Make sure all extra_create_args are valid
extra_create_args_keys = set(extra_create_args.keys())
if not create_kwargs.issuperset(extra_create_args_keys):
@ -601,7 +627,8 @@ class BaseOpenAIChatCompletionClient(ChatCompletionClient):
if cancellation_token is not None:
cancellation_token.link_future(stream_future)
stream = await stream_future
choice: Union[ParsedChoice[Any], ParsedChoice[BaseModel], ChunkChoice] = cast(ChunkChoice, None)
chunk = None
stop_reason = None
maybe_model = None
content_deltas: List[str] = []
@ -614,8 +641,23 @@ class BaseOpenAIChatCompletionClient(ChatCompletionClient):
if cancellation_token is not None:
cancellation_token.link_future(chunk_future)
chunk = await chunk_future
choice = chunk.choices[0]
stop_reason = choice.finish_reason
# to process usage chunk in streaming situations
# add stream_options={"include_usage": True} in the initialization of OpenAIChatCompletionClient(...)
# However the different api's
# OPENAI api usage chunk produces no choices so need to check if there is a choice
# liteLLM api usage chunk does produce choices
choice = (
chunk.choices[0]
if len(chunk.choices) > 0
else choice
if chunk.usage is not None and stop_reason is not None
else cast(ChunkChoice, None)
)
# for liteLLM chunk usage, do the following hack keeping the pervious chunk.stop_reason (if set).
# set the stop_reason for the usage chunk to the prior stop_reason
stop_reason = choice.finish_reason if chunk.usage is None and stop_reason is None else stop_reason
maybe_model = chunk.model
# First try get content
if choice.delta.content is not None:
@ -657,17 +699,21 @@ class BaseOpenAIChatCompletionClient(ChatCompletionClient):
model = maybe_model or create_args["model"]
model = model.replace("gpt-35", "gpt-3.5") # hack for Azure API
# TODO fix count token
prompt_tokens = 0
# prompt_tokens = count_token(messages, model=model)
if chunk and chunk.usage:
prompt_tokens = chunk.usage.prompt_tokens
else:
prompt_tokens = 0
if stop_reason is None:
raise ValueError("No stop reason found")
content: Union[str, List[FunctionCall]]
if len(content_deltas) > 1:
content = "".join(content_deltas)
completion_tokens = 0
# completion_tokens = count_token(content, model=model)
if chunk and chunk.usage:
completion_tokens = chunk.usage.completion_tokens
else:
completion_tokens = 0
else:
completion_tokens = 0
# TODO: fix assumption that dict values were added in order and actually order by int index

View File

@ -60,6 +60,7 @@ from openai.types.chat import (
completion_create_params,
)
from openai.types.chat.chat_completion import Choice
from openai.types.chat.chat_completion_chunk import Choice as ChunkChoice
from openai.types.shared_params import FunctionDefinition, FunctionParameters
from pydantic import BaseModel
from typing_extensions import Unpack
@ -556,6 +557,31 @@ class BaseOpenAIChatCompletionClient(ChatCompletionClient):
extra_create_args: Mapping[str, Any] = {},
cancellation_token: Optional[CancellationToken] = None,
) -> AsyncGenerator[Union[str, CreateResult], None]:
"""
Creates an AsyncGenerator that will yield a stream of chat completions based on the provided messages and tools.
Args:
messages (Sequence[LLMMessage]): A sequence of messages to be processed.
tools (Sequence[Tool | ToolSchema], optional): A sequence of tools to be used in the completion. Defaults to `[]`.
json_output (Optional[bool], optional): If True, the output will be in JSON format. Defaults to None.
extra_create_args (Mapping[str, Any], optional): Additional arguments for the creation process. Default to `{}`.
cancellation_token (Optional[CancellationToken], optional): A token to cancel the operation. Defaults to None.
Yields:
AsyncGenerator[Union[str, CreateResult], None]: A generator yielding the completion results as they are produced.
In streaming, the default behaviour is not return token usage counts. See: [OpenAI API reference for possible args](https://platform.openai.com/docs/api-reference/chat/create).
However `extra_create_args={"stream_options": {"include_usage": True}}` will (if supported by the accessed API)
return a final chunk with usage set to a RequestUsage object having prompt and completion token counts,
all preceding chunks will have usage as None. See: [stream_options](https://platform.openai.com/docs/api-reference/chat/create#chat-create-stream_options).
Other examples of OPENAI supported arguments that can be included in `extra_create_args`:
- `temperature` (float): Controls the randomness of the output. Higher values (e.g., 0.8) make the output more random, while lower values (e.g., 0.2) make it more focused and deterministic.
- `max_tokens` (int): The maximum number of tokens to generate in the completion.
- `top_p` (float): An alternative to sampling with temperature, called nucleus sampling, where the model considers the results of the tokens with top_p probability mass.
- `frequency_penalty` (float): A value between -2.0 and 2.0 that penalizes new tokens based on their existing frequency in the text so far, decreasing the likelihood of repeated phrases.
- `presence_penalty` (float): A value between -2.0 and 2.0 that penalizes new tokens based on whether they appear in the text so far, encouraging the model to talk about new topics.
"""
# Make sure all extra_create_args are valid
extra_create_args_keys = set(extra_create_args.keys())
if not create_kwargs.issuperset(extra_create_args_keys):
@ -602,7 +628,8 @@ class BaseOpenAIChatCompletionClient(ChatCompletionClient):
if cancellation_token is not None:
cancellation_token.link_future(stream_future)
stream = await stream_future
choice: Union[ParsedChoice[Any], ParsedChoice[BaseModel], ChunkChoice] = cast(ChunkChoice, None)
chunk = None
stop_reason = None
maybe_model = None
content_deltas: List[str] = []
@ -615,8 +642,23 @@ class BaseOpenAIChatCompletionClient(ChatCompletionClient):
if cancellation_token is not None:
cancellation_token.link_future(chunk_future)
chunk = await chunk_future
choice = chunk.choices[0]
stop_reason = choice.finish_reason
# to process usage chunk in streaming situations
# add stream_options={"include_usage": True} in the initialization of OpenAIChatCompletionClient(...)
# However the different api's
# OPENAI api usage chunk produces no choices so need to check if there is a choice
# liteLLM api usage chunk does produce choices
choice = (
chunk.choices[0]
if len(chunk.choices) > 0
else choice
if chunk.usage is not None and stop_reason is not None
else cast(ChunkChoice, None)
)
# for liteLLM chunk usage, do the following hack keeping the pervious chunk.stop_reason (if set).
# set the stop_reason for the usage chunk to the prior stop_reason
stop_reason = choice.finish_reason if chunk.usage is None and stop_reason is None else stop_reason
maybe_model = chunk.model
# First try get content
if choice.delta.content is not None:
@ -658,17 +700,21 @@ class BaseOpenAIChatCompletionClient(ChatCompletionClient):
model = maybe_model or create_args["model"]
model = model.replace("gpt-35", "gpt-3.5") # hack for Azure API
# TODO fix count token
prompt_tokens = 0
# prompt_tokens = count_token(messages, model=model)
if chunk and chunk.usage:
prompt_tokens = chunk.usage.prompt_tokens
else:
prompt_tokens = 0
if stop_reason is None:
raise ValueError("No stop reason found")
content: Union[str, List[FunctionCall]]
if len(content_deltas) > 1:
content = "".join(content_deltas)
completion_tokens = 0
# completion_tokens = count_token(content, model=model)
if chunk and chunk.usage:
completion_tokens = chunk.usage.completion_tokens
else:
completion_tokens = 0
else:
completion_tokens = 0
# TODO: fix assumption that dict values were added in order and actually order by int index

View File

@ -11,6 +11,7 @@ from autogen_core.components.models import (
FunctionExecutionResult,
FunctionExecutionResultMessage,
LLMMessage,
RequestUsage,
SystemMessage,
UserMessage,
)
@ -24,28 +25,83 @@ from openai.types.chat.chat_completion_chunk import ChatCompletionChunk, ChoiceD
from openai.types.chat.chat_completion_chunk import Choice as ChunkChoice
from openai.types.chat.chat_completion_message import ChatCompletionMessage
from openai.types.completion_usage import CompletionUsage
from pydantic import BaseModel
class MockChunkDefinition(BaseModel):
# defining elements for diffentiating mocking chunks
chunk_choice: ChunkChoice
usage: CompletionUsage | None
async def _mock_create_stream(*args: Any, **kwargs: Any) -> AsyncGenerator[ChatCompletionChunk, None]:
model = resolve_model(kwargs.get("model", "gpt-4o"))
chunks = ["Hello", " Another Hello", " Yet Another Hello"]
for chunk in chunks:
mock_chunks_content = ["Hello", " Another Hello", " Yet Another Hello"]
# The openai api implementations (OpenAI and Litellm) stream chunks of tokens
# with content as string, and then at the end a token with stop set and finally if
# usage requested with `"stream_options": {"include_usage": True}` a chunk with the usage data
mock_chunks = [
# generate the list of mock chunk content
MockChunkDefinition(
chunk_choice=ChunkChoice(
finish_reason=None,
index=0,
delta=ChoiceDelta(
content=mock_chunk_content,
role="assistant",
),
),
usage=None,
)
for mock_chunk_content in mock_chunks_content
] + [
# generate the stop chunk
MockChunkDefinition(
chunk_choice=ChunkChoice(
finish_reason="stop",
index=0,
delta=ChoiceDelta(
content=None,
role="assistant",
),
),
usage=None,
)
]
# generate the usage chunk if configured
if kwargs.get("stream_options", {}).get("include_usage") is True:
mock_chunks = mock_chunks + [
# ---- API differences
# OPENAI API does NOT create a choice
# LITELLM (proxy) DOES create a choice
# Not simulating all the API options, just implementing the LITELLM variant
MockChunkDefinition(
chunk_choice=ChunkChoice(
finish_reason=None,
index=0,
delta=ChoiceDelta(
content=None,
role="assistant",
),
),
usage=CompletionUsage(prompt_tokens=3, completion_tokens=3, total_tokens=6),
)
]
elif kwargs.get("stream_options", {}).get("include_usage") is False:
pass
else:
pass
for mock_chunk in mock_chunks:
await asyncio.sleep(0.1)
yield ChatCompletionChunk(
id="id",
choices=[
ChunkChoice(
finish_reason="stop",
index=0,
delta=ChoiceDelta(
content=chunk,
role="assistant",
),
)
],
choices=[mock_chunk.chunk_choice],
created=0,
model=model,
object="chat.completion.chunk",
usage=mock_chunk.usage,
)
@ -95,17 +151,64 @@ async def test_openai_chat_completion_client_create(monkeypatch: pytest.MonkeyPa
@pytest.mark.asyncio
async def test_openai_chat_completion_client_create_stream(monkeypatch: pytest.MonkeyPatch) -> None:
async def test_openai_chat_completion_client_create_stream_with_usage(monkeypatch: pytest.MonkeyPatch) -> None:
monkeypatch.setattr(AsyncCompletions, "create", _mock_create)
client = OpenAIChatCompletionClient(model="gpt-4o", api_key="api_key")
chunks: List[str | CreateResult] = []
async for chunk in client.create_stream(messages=[UserMessage(content="Hello", source="user")]):
async for chunk in client.create_stream(
messages=[UserMessage(content="Hello", source="user")],
# include_usage not the default of the OPENAI API and must be explicitly set
extra_create_args={"stream_options": {"include_usage": True}},
):
chunks.append(chunk)
assert chunks[0] == "Hello"
assert chunks[1] == " Another Hello"
assert chunks[2] == " Yet Another Hello"
assert isinstance(chunks[-1], CreateResult)
assert chunks[-1].content == "Hello Another Hello Yet Another Hello"
assert chunks[-1].usage == RequestUsage(prompt_tokens=3, completion_tokens=3)
@pytest.mark.asyncio
async def test_openai_chat_completion_client_create_stream_no_usage_default(monkeypatch: pytest.MonkeyPatch) -> None:
monkeypatch.setattr(AsyncCompletions, "create", _mock_create)
client = OpenAIChatCompletionClient(model="gpt-4o", api_key="api_key")
chunks: List[str | CreateResult] = []
async for chunk in client.create_stream(
messages=[UserMessage(content="Hello", source="user")],
# include_usage not the default of the OPENAI APIis ,
# it can be explicitly set
# or just not declared which is the default
# extra_create_args={"stream_options": {"include_usage": False}},
):
chunks.append(chunk)
assert chunks[0] == "Hello"
assert chunks[1] == " Another Hello"
assert chunks[2] == " Yet Another Hello"
assert isinstance(chunks[-1], CreateResult)
assert chunks[-1].content == "Hello Another Hello Yet Another Hello"
assert chunks[-1].usage == RequestUsage(prompt_tokens=0, completion_tokens=0)
@pytest.mark.asyncio
async def test_openai_chat_completion_client_create_stream_no_usage_explicit(monkeypatch: pytest.MonkeyPatch) -> None:
monkeypatch.setattr(AsyncCompletions, "create", _mock_create)
client = OpenAIChatCompletionClient(model="gpt-4o", api_key="api_key")
chunks: List[str | CreateResult] = []
async for chunk in client.create_stream(
messages=[UserMessage(content="Hello", source="user")],
# include_usage is not the default of the OPENAI API ,
# it can be explicitly set
# or just not declared which is the default
extra_create_args={"stream_options": {"include_usage": False}},
):
chunks.append(chunk)
assert chunks[0] == "Hello"
assert chunks[1] == " Another Hello"
assert chunks[2] == " Yet Another Hello"
assert isinstance(chunks[-1], CreateResult)
assert chunks[-1].content == "Hello Another Hello Yet Another Hello"
assert chunks[-1].usage == RequestUsage(prompt_tokens=0, completion_tokens=0)
@pytest.mark.asyncio