mirror of https://github.com/microsoft/autogen.git
513 lines
20 KiB
Plaintext
513 lines
20 KiB
Plaintext
{
|
|
"cells": [
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"<a href=\"https://colab.research.google.com/github/microsoft/autogen/blob/main/notebook/oai_client_cost.ipynb\" target=\"_parent\"><img src=\"https://colab.research.google.com/assets/colab-badge.svg\" alt=\"Open In Colab\"/></a>"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"Copyright (c) Microsoft Corporation. All rights reserved. \n",
|
|
"\n",
|
|
"Licensed under the MIT License.\n",
|
|
"\n",
|
|
"# Usage tracking with AutoGen\n",
|
|
"## 1. Use AutoGen's OpenAIWrapper for cost estimation\n",
|
|
"The `OpenAIWrapper` from `autogen` tracks token counts and costs of your API calls. Use the `create()` method to initiate requests and `print_usage_summary()` to retrieve a detailed usage report, including total cost and token usage for both cached and actual requests.\n",
|
|
"\n",
|
|
"- `mode=[\"actual\", \"total\"]` (default): print usage summary for non-caching completions and all completions (including cache).\n",
|
|
"- `mode='actual'`: only print non-cached usage.\n",
|
|
"- `mode='total'`: only print all usage (including cache).\n",
|
|
"\n",
|
|
"Reset your session's usage data with `clear_usage_summary()` when needed.\n",
|
|
"\n",
|
|
"## 2. Track cost and token count for agents\n",
|
|
"We also support cost estimation for agents. Use `Agent.print_usage_summary()` to print the cost summary for the agent.\n",
|
|
"You can retrieve usage summary in a dict using `Agent.get_actual_usage()` and `Agent.get_total_usage()`. Note that `Agent.reset()` will also reset the usage summary.\n",
|
|
"\n",
|
|
"To gather usage data for a list of agents, we provide an utility function `autogen.gather_usage_summary(agents)` where you pass in a list of agents and gather the usage summary.\n",
|
|
"\n",
|
|
"## Caution when using Azure OpenAI!\n",
|
|
"If you are using azure OpenAI, the model returned from completion doesn't have the version information. The returned model is either 'gpt-35-turbo' or 'gpt-4'. From there, we are calculating the cost based on gpt-3.5-turbo-0125: (0.0005, 0.0015) per 1k prompt and completion tokens and gpt-4-0613: (0.03, 0.06). This means the cost can be wrong if you are using a different version from azure OpenAI.\n",
|
|
"\n",
|
|
"This will be improved in the future. However, the token count summary is accurate. You can use the token count to calculate the cost yourself.\n",
|
|
"\n",
|
|
"## Requirements\n",
|
|
"\n",
|
|
"AutoGen requires `Python>=3.8`:\n",
|
|
"```bash\n",
|
|
"pip install \"pyautogen\"\n",
|
|
"```"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"## Set your API Endpoint\n",
|
|
"\n",
|
|
"The [`config_list_from_json`](https://microsoft.github.io/autogen/docs/reference/oai/openai_utils#config_list_from_json) function loads a list of configurations from an environment variable or a json file.\n"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": 1,
|
|
"metadata": {},
|
|
"outputs": [],
|
|
"source": [
|
|
"import autogen\n",
|
|
"from autogen import AssistantAgent, OpenAIWrapper, UserProxyAgent, gather_usage_summary\n",
|
|
"\n",
|
|
"config_list = autogen.config_list_from_json(\n",
|
|
" \"OAI_CONFIG_LIST\",\n",
|
|
" filter_dict={\n",
|
|
" \"tags\": [\"gpt-3.5-turbo\", \"gpt-3.5-turbo-16k\"], # comment out to get all\n",
|
|
" },\n",
|
|
")"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"It first looks for environment variable \"OAI_CONFIG_LIST\" which needs to be a valid json string. If that variable is not found, it then looks for a json file named \"OAI_CONFIG_LIST\". It filters the configs by tags (you can filter by other keys as well).\n",
|
|
"\n",
|
|
"The config list looks like the following:\n",
|
|
"```python\n",
|
|
"config_list = [\n",
|
|
" {\n",
|
|
" \"model\": \"gpt-3.5-turbo\",\n",
|
|
" \"api_key\": \"<your OpenAI API key>\",\n",
|
|
" \"tags\": [\"gpt-3.5-turbo\"],\n",
|
|
" }, # OpenAI API endpoint for gpt-3.5-turbo\n",
|
|
" {\n",
|
|
" \"model\": \"gpt-35-turbo-0613\", # 0613 or newer is needed to use functions\n",
|
|
" \"base_url\": \"<your Azure OpenAI API base>\", \n",
|
|
" \"api_type\": \"azure\", \n",
|
|
" \"api_version\": \"2024-02-15-preview\", # 2023-07-01-preview or newer is needed to use functions\n",
|
|
" \"api_key\": \"<your Azure OpenAI API key>\",\n",
|
|
" \"tags\": [\"gpt-3.5-turbo\", \"0613\"],\n",
|
|
" }\n",
|
|
"]\n",
|
|
"```\n",
|
|
"\n",
|
|
"You can set the value of config_list in any way you prefer. Please refer to this [notebook](https://github.com/microsoft/autogen/blob/main/website/docs/topics/llm_configuration.ipynb) for full code examples of the different methods."
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"## OpenAIWrapper with cost estimation"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": 2,
|
|
"metadata": {},
|
|
"outputs": [
|
|
{
|
|
"name": "stdout",
|
|
"output_type": "stream",
|
|
"text": [
|
|
"0.00020600000000000002\n"
|
|
]
|
|
}
|
|
],
|
|
"source": [
|
|
"client = OpenAIWrapper(config_list=config_list)\n",
|
|
"messages = [\n",
|
|
" {\"role\": \"user\", \"content\": \"Can you give me 3 useful tips on learning Python? Keep it simple and short.\"},\n",
|
|
"]\n",
|
|
"response = client.create(messages=messages, cache_seed=None)\n",
|
|
"print(response.cost)"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"## Usage Summary for OpenAIWrapper\n",
|
|
"\n",
|
|
"When creating a instance of OpenAIWrapper, cost of all completions from the same instance is recorded. You can call `print_usage_summary()` to checkout your usage summary. To clear up, use `clear_usage_summary()`.\n"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": 3,
|
|
"metadata": {},
|
|
"outputs": [
|
|
{
|
|
"name": "stdout",
|
|
"output_type": "stream",
|
|
"text": [
|
|
"No usage summary. Please call \"create\" first.\n"
|
|
]
|
|
}
|
|
],
|
|
"source": [
|
|
"client = OpenAIWrapper(config_list=config_list)\n",
|
|
"messages = [\n",
|
|
" {\"role\": \"user\", \"content\": \"Can you give me 3 useful tips on learning Python? Keep it simple and short.\"},\n",
|
|
"]\n",
|
|
"client.print_usage_summary() # print usage summary"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": 4,
|
|
"metadata": {},
|
|
"outputs": [
|
|
{
|
|
"name": "stdout",
|
|
"output_type": "stream",
|
|
"text": [
|
|
"----------------------------------------------------------------------------------------------------\n",
|
|
"Usage summary excluding cached usage: \n",
|
|
"Total cost: 0.00023\n",
|
|
"* Model 'gpt-35-turbo': cost: 0.00023, prompt_tokens: 25, completion_tokens: 142, total_tokens: 167\n",
|
|
"\n",
|
|
"All completions are non-cached: the total cost with cached completions is the same as actual cost.\n",
|
|
"----------------------------------------------------------------------------------------------------\n",
|
|
"----------------------------------------------------------------------------------------------------\n",
|
|
"Usage summary excluding cached usage: \n",
|
|
"Total cost: 0.00023\n",
|
|
"* Model 'gpt-35-turbo': cost: 0.00023, prompt_tokens: 25, completion_tokens: 142, total_tokens: 167\n",
|
|
"----------------------------------------------------------------------------------------------------\n",
|
|
"----------------------------------------------------------------------------------------------------\n",
|
|
"Usage summary including cached usage: \n",
|
|
"Total cost: 0.00023\n",
|
|
"* Model 'gpt-35-turbo': cost: 0.00023, prompt_tokens: 25, completion_tokens: 142, total_tokens: 167\n",
|
|
"----------------------------------------------------------------------------------------------------\n"
|
|
]
|
|
}
|
|
],
|
|
"source": [
|
|
"# The first creation\n",
|
|
"# By default, cache_seed is set to 41 and enabled. If you don't want to use cache, set cache_seed to None.\n",
|
|
"response = client.create(messages=messages, cache_seed=41)\n",
|
|
"client.print_usage_summary() # default to [\"actual\", \"total\"]\n",
|
|
"client.print_usage_summary(mode=\"actual\") # print actual usage summary\n",
|
|
"client.print_usage_summary(mode=\"total\") # print total usage summary"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": 5,
|
|
"metadata": {},
|
|
"outputs": [
|
|
{
|
|
"name": "stdout",
|
|
"output_type": "stream",
|
|
"text": [
|
|
"{'total_cost': 0.0002255, 'gpt-35-turbo': {'cost': 0.0002255, 'prompt_tokens': 25, 'completion_tokens': 142, 'total_tokens': 167}}\n",
|
|
"{'total_cost': 0.0002255, 'gpt-35-turbo': {'cost': 0.0002255, 'prompt_tokens': 25, 'completion_tokens': 142, 'total_tokens': 167}}\n"
|
|
]
|
|
}
|
|
],
|
|
"source": [
|
|
"# take out cost\n",
|
|
"print(client.actual_usage_summary)\n",
|
|
"print(client.total_usage_summary)"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": 6,
|
|
"metadata": {},
|
|
"outputs": [
|
|
{
|
|
"name": "stdout",
|
|
"output_type": "stream",
|
|
"text": [
|
|
"----------------------------------------------------------------------------------------------------\n",
|
|
"Usage summary excluding cached usage: \n",
|
|
"Total cost: 0.00023\n",
|
|
"* Model 'gpt-35-turbo': cost: 0.00023, prompt_tokens: 25, completion_tokens: 142, total_tokens: 167\n",
|
|
"\n",
|
|
"Usage summary including cached usage: \n",
|
|
"Total cost: 0.00045\n",
|
|
"* Model 'gpt-35-turbo': cost: 0.00045, prompt_tokens: 50, completion_tokens: 284, total_tokens: 334\n",
|
|
"----------------------------------------------------------------------------------------------------\n"
|
|
]
|
|
}
|
|
],
|
|
"source": [
|
|
"# Since cache is enabled, the same completion will be returned from cache, which will not incur any actual cost.\n",
|
|
"# So actual cost doesn't change but total cost doubles.\n",
|
|
"response = client.create(messages=messages, cache_seed=41)\n",
|
|
"client.print_usage_summary()"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": 7,
|
|
"metadata": {},
|
|
"outputs": [
|
|
{
|
|
"name": "stdout",
|
|
"output_type": "stream",
|
|
"text": [
|
|
"No usage summary. Please call \"create\" first.\n"
|
|
]
|
|
}
|
|
],
|
|
"source": [
|
|
"# clear usage summary\n",
|
|
"client.clear_usage_summary()\n",
|
|
"client.print_usage_summary()"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": 8,
|
|
"metadata": {},
|
|
"outputs": [
|
|
{
|
|
"name": "stdout",
|
|
"output_type": "stream",
|
|
"text": [
|
|
"----------------------------------------------------------------------------------------------------\n",
|
|
"No actual cost incurred (all completions are using cache).\n",
|
|
"\n",
|
|
"Usage summary including cached usage: \n",
|
|
"Total cost: 0.00023\n",
|
|
"* Model 'gpt-35-turbo': cost: 0.00023, prompt_tokens: 25, completion_tokens: 142, total_tokens: 167\n",
|
|
"----------------------------------------------------------------------------------------------------\n"
|
|
]
|
|
}
|
|
],
|
|
"source": [
|
|
"# all completions are returned from cache, so no actual cost incurred.\n",
|
|
"response = client.create(messages=messages, cache_seed=41)\n",
|
|
"client.print_usage_summary()"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"## Usage Summary for Agents\n",
|
|
"\n",
|
|
"- `Agent.print_usage_summary()` will print the cost summary for the agent.\n",
|
|
"- `Agent.get_actual_usage()` and `Agent.get_total_usage()` will return the usage summary in a dict. When an agent doesn't use LLM, they will return None.\n",
|
|
"- `Agent.reset()` will reset the usage summary.\n",
|
|
"- `autogen.gather_usage_summary` will gather the usage summary for a list of agents."
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": 9,
|
|
"metadata": {},
|
|
"outputs": [
|
|
{
|
|
"name": "stdout",
|
|
"output_type": "stream",
|
|
"text": [
|
|
"\u001b[33mai_user\u001b[0m (to assistant):\n",
|
|
"\n",
|
|
"$x^3=125$. What is x?\n",
|
|
"\n",
|
|
"--------------------------------------------------------------------------------\n",
|
|
"\u001b[33massistant\u001b[0m (to ai_user):\n",
|
|
"\n",
|
|
"To find x, we need to take the cube root of 125. The cube root of a number is the number that, when multiplied by itself three times, gives the original number.\n",
|
|
"\n",
|
|
"In this case, the cube root of 125 is 5 since 5 * 5 * 5 = 125. Therefore, x = 5.\n",
|
|
"\n",
|
|
"--------------------------------------------------------------------------------\n",
|
|
"\u001b[33mai_user\u001b[0m (to assistant):\n",
|
|
"\n",
|
|
"That's correct! Well done. The value of x is indeed 5, as you correctly found by taking the cube root of 125. Keep up the good work!\n",
|
|
"\n",
|
|
"--------------------------------------------------------------------------------\n",
|
|
"\u001b[33massistant\u001b[0m (to ai_user):\n",
|
|
"\n",
|
|
"Thank you! I'm glad I could help. If you have any more questions, feel free to ask!\n",
|
|
"\n",
|
|
"--------------------------------------------------------------------------------\n"
|
|
]
|
|
},
|
|
{
|
|
"data": {
|
|
"text/plain": [
|
|
"ChatResult(chat_id=None, chat_history=[{'content': '$x^3=125$. What is x?', 'role': 'assistant'}, {'content': 'To find x, we need to take the cube root of 125. The cube root of a number is the number that, when multiplied by itself three times, gives the original number.\\n\\nIn this case, the cube root of 125 is 5 since 5 * 5 * 5 = 125. Therefore, x = 5.', 'role': 'user'}, {'content': \"That's correct! Well done. The value of x is indeed 5, as you correctly found by taking the cube root of 125. Keep up the good work!\", 'role': 'assistant'}, {'content': \"Thank you! I'm glad I could help. If you have any more questions, feel free to ask!\", 'role': 'user'}], summary=\"Thank you! I'm glad I could help. If you have any more questions, feel free to ask!\", cost={'usage_including_cached_inference': {'total_cost': 0.000333, 'gpt-35-turbo': {'cost': 0.000333, 'prompt_tokens': 282, 'completion_tokens': 128, 'total_tokens': 410}}, 'usage_excluding_cached_inference': {'total_cost': 0.000333, 'gpt-35-turbo': {'cost': 0.000333, 'prompt_tokens': 282, 'completion_tokens': 128, 'total_tokens': 410}}}, human_input=[])"
|
|
]
|
|
},
|
|
"execution_count": 9,
|
|
"metadata": {},
|
|
"output_type": "execute_result"
|
|
}
|
|
],
|
|
"source": [
|
|
"assistant = AssistantAgent(\n",
|
|
" \"assistant\",\n",
|
|
" system_message=\"You are a helpful assistant.\",\n",
|
|
" llm_config={\n",
|
|
" \"timeout\": 600,\n",
|
|
" \"cache_seed\": None,\n",
|
|
" \"config_list\": config_list,\n",
|
|
" },\n",
|
|
")\n",
|
|
"\n",
|
|
"ai_user_proxy = UserProxyAgent(\n",
|
|
" name=\"ai_user\",\n",
|
|
" human_input_mode=\"NEVER\",\n",
|
|
" max_consecutive_auto_reply=1,\n",
|
|
" code_execution_config=False,\n",
|
|
" llm_config={\n",
|
|
" \"config_list\": config_list,\n",
|
|
" },\n",
|
|
" # In the system message the \"user\" always refers to the other agent.\n",
|
|
" system_message=\"You ask a user for help. You check the answer from the user and provide feedback.\",\n",
|
|
")\n",
|
|
"assistant.reset()\n",
|
|
"\n",
|
|
"math_problem = \"$x^3=125$. What is x?\"\n",
|
|
"ai_user_proxy.initiate_chat(\n",
|
|
" assistant,\n",
|
|
" message=math_problem,\n",
|
|
")"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": 10,
|
|
"metadata": {},
|
|
"outputs": [
|
|
{
|
|
"name": "stdout",
|
|
"output_type": "stream",
|
|
"text": [
|
|
"Agent 'ai_user':\n",
|
|
"----------------------------------------------------------------------------------------------------\n",
|
|
"Usage summary excluding cached usage: \n",
|
|
"Total cost: 0.00011\n",
|
|
"* Model 'gpt-35-turbo': cost: 0.00011, prompt_tokens: 114, completion_tokens: 35, total_tokens: 149\n",
|
|
"\n",
|
|
"All completions are non-cached: the total cost with cached completions is the same as actual cost.\n",
|
|
"----------------------------------------------------------------------------------------------------\n",
|
|
"\n",
|
|
"Agent 'assistant':\n",
|
|
"----------------------------------------------------------------------------------------------------\n",
|
|
"Usage summary excluding cached usage: \n",
|
|
"Total cost: 0.00022\n",
|
|
"* Model 'gpt-35-turbo': cost: 0.00022, prompt_tokens: 168, completion_tokens: 93, total_tokens: 261\n",
|
|
"\n",
|
|
"All completions are non-cached: the total cost with cached completions is the same as actual cost.\n",
|
|
"----------------------------------------------------------------------------------------------------\n"
|
|
]
|
|
}
|
|
],
|
|
"source": [
|
|
"ai_user_proxy.print_usage_summary()\n",
|
|
"print()\n",
|
|
"assistant.print_usage_summary()"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": 11,
|
|
"metadata": {},
|
|
"outputs": [
|
|
{
|
|
"name": "stdout",
|
|
"output_type": "stream",
|
|
"text": [
|
|
"No cost incurred from agent 'user'.\n"
|
|
]
|
|
}
|
|
],
|
|
"source": [
|
|
"user_proxy = UserProxyAgent(\n",
|
|
" name=\"user\",\n",
|
|
" human_input_mode=\"NEVER\",\n",
|
|
" max_consecutive_auto_reply=2,\n",
|
|
" code_execution_config=False,\n",
|
|
" default_auto_reply=\"That's all. Thank you.\",\n",
|
|
")\n",
|
|
"user_proxy.print_usage_summary()"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": 12,
|
|
"metadata": {},
|
|
"outputs": [
|
|
{
|
|
"name": "stdout",
|
|
"output_type": "stream",
|
|
"text": [
|
|
"Actual usage summary for assistant (excluding completion from cache): {'total_cost': 0.0002235, 'gpt-35-turbo': {'cost': 0.0002235, 'prompt_tokens': 168, 'completion_tokens': 93, 'total_tokens': 261}}\n",
|
|
"Total usage summary for assistant (including completion from cache): {'total_cost': 0.0002235, 'gpt-35-turbo': {'cost': 0.0002235, 'prompt_tokens': 168, 'completion_tokens': 93, 'total_tokens': 261}}\n",
|
|
"Actual usage summary for ai_user_proxy: {'total_cost': 0.0001095, 'gpt-35-turbo': {'cost': 0.0001095, 'prompt_tokens': 114, 'completion_tokens': 35, 'total_tokens': 149}}\n",
|
|
"Total usage summary for ai_user_proxy: {'total_cost': 0.0001095, 'gpt-35-turbo': {'cost': 0.0001095, 'prompt_tokens': 114, 'completion_tokens': 35, 'total_tokens': 149}}\n",
|
|
"Actual usage summary for user_proxy: None\n",
|
|
"Total usage summary for user_proxy: None\n"
|
|
]
|
|
}
|
|
],
|
|
"source": [
|
|
"print(\"Actual usage summary for assistant (excluding completion from cache):\", assistant.get_actual_usage())\n",
|
|
"print(\"Total usage summary for assistant (including completion from cache):\", assistant.get_total_usage())\n",
|
|
"\n",
|
|
"print(\"Actual usage summary for ai_user_proxy:\", ai_user_proxy.get_actual_usage())\n",
|
|
"print(\"Total usage summary for ai_user_proxy:\", ai_user_proxy.get_total_usage())\n",
|
|
"\n",
|
|
"print(\"Actual usage summary for user_proxy:\", user_proxy.get_actual_usage())\n",
|
|
"print(\"Total usage summary for user_proxy:\", user_proxy.get_total_usage())"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": 13,
|
|
"metadata": {},
|
|
"outputs": [
|
|
{
|
|
"data": {
|
|
"text/plain": [
|
|
"{'total_cost': 0.000333,\n",
|
|
" 'gpt-35-turbo': {'cost': 0.000333,\n",
|
|
" 'prompt_tokens': 282,\n",
|
|
" 'completion_tokens': 128,\n",
|
|
" 'total_tokens': 410}}"
|
|
]
|
|
},
|
|
"execution_count": 13,
|
|
"metadata": {},
|
|
"output_type": "execute_result"
|
|
}
|
|
],
|
|
"source": [
|
|
"usage_summary = gather_usage_summary([assistant, ai_user_proxy, user_proxy])\n",
|
|
"usage_summary[\"usage_including_cached_inference\"]"
|
|
]
|
|
}
|
|
],
|
|
"metadata": {
|
|
"kernelspec": {
|
|
"display_name": "msft",
|
|
"language": "python",
|
|
"name": "python3"
|
|
},
|
|
"language_info": {
|
|
"codemirror_mode": {
|
|
"name": "ipython",
|
|
"version": 3
|
|
},
|
|
"file_extension": ".py",
|
|
"mimetype": "text/x-python",
|
|
"name": "python",
|
|
"nbconvert_exporter": "python",
|
|
"pygments_lexer": "ipython3",
|
|
"version": "3.10.13"
|
|
}
|
|
},
|
|
"nbformat": 4,
|
|
"nbformat_minor": 2
|
|
}
|