Together AI Client (#2919)

* First pass together.ai client class * Config handling, models and cost * Added tests, moved param management to create function * Tests, parameter, validation, logging updates * Added use of client_utils PR 2949 * Updated to return OAI response * Notebook example * Improved function calling, updated tests, updated notebook with Chess example * Tidied up together client class, better parameter handling, simpler exception capture, warning for no cost, reuse in tests, cleaner tests * Update of documentation notebook, replacement of old version * Fix of messages parameter for hide_tools function call * Update autogen/oai/together.py Co-authored-by: Qingyun Wu <qingyun0327@gmail.com> * Update together.py to fix text --------- Co-authored-by: Qingyun Wu <qingyun0327@gmail.com> Co-authored-by: Yiran Wu <32823396+yiranwu0@users.noreply.github.com> Co-authored-by: Chi Wang <wang.chi@microsoft.com>
2024-06-22 03:14:44 +10:00 · 2024-06-22 03:14:44 +10:00 · b1ec3ae545
parent 843c343383
commit b1ec3ae545
10 changed files with 1701 additions and 185 deletions
--- a/.github/workflows/contrib-tests.yml
+++ b/.github/workflows/contrib-tests.yml
@ -558,3 +558,43 @@ jobs:
        with:
          file: ./coverage.xml
          flags: unittests
+
+  TogetherTest:
+    runs-on: ${{ matrix.os }}
+    strategy:
+      fail-fast: false
+      matrix:
+        os: [ubuntu-latest, macos-latest, windows-2019]
+        python-version: ["3.9", "3.10", "3.11", "3.12"]
+        exclude:
+          - os: macos-latest
+            python-version: "3.9"
+    steps:
+      - uses: actions/checkout@v4
+        with:
+          lfs: true
+      - name: Set up Python ${{ matrix.python-version }}
+        uses: actions/setup-python@v5
+        with:
+          python-version: ${{ matrix.python-version }}
+      - name: Install packages and dependencies for all tests
+        run: |
+          python -m pip install --upgrade pip wheel
+          pip install pytest-cov>=5
+      - name: Install packages and dependencies for Together
+        run: |
+          pip install -e .[together,test]
+      - name: Set AUTOGEN_USE_DOCKER based on OS
+        shell: bash
+        run: |
+          if [[ ${{ matrix.os }} != ubuntu-latest ]]; then
+            echo "AUTOGEN_USE_DOCKER=False" >> $GITHUB_ENV
+          fi
+      - name: Coverage
+        run: |
+          pytest test/oai/test_together.py --skip-openai
+      - name: Upload coverage to Codecov
+        uses: codecov/codecov-action@v3
+        with:
+          file: ./coverage.xml
+          flags: unittests
--- a/autogen/logger/file_logger.py
+++ b/autogen/logger/file_logger.py
@ -20,6 +20,7 @@ if TYPE_CHECKING:
    from autogen.oai.anthropic import AnthropicClient
    from autogen.oai.gemini import GeminiClient
    from autogen.oai.mistral import MistralAIClient
+    from autogen.oai.together import TogetherClient

 logger = logging.getLogger(__name__)

@ -203,7 +204,7 @@ class FileLogger(BaseLogger):

    def log_new_client(
        self,
-        client: AzureOpenAI | OpenAI | GeminiClient | AnthropicClient | MistralAIClient,
+        client: AzureOpenAI | OpenAI | GeminiClient | AnthropicClient | MistralAIClient | TogetherClient,
        wrapper: OpenAIWrapper,
        init_args: Dict[str, Any],
    ) -> None:
--- a/autogen/logger/sqlite_logger.py
+++ b/autogen/logger/sqlite_logger.py
@ -21,6 +21,7 @@ if TYPE_CHECKING:
    from autogen.oai.anthropic import AnthropicClient
    from autogen.oai.gemini import GeminiClient
    from autogen.oai.mistral import MistralAIClient
+    from autogen.oai.together import TogetherClient

 logger = logging.getLogger(__name__)
 lock = threading.Lock()
@ -390,7 +391,7 @@ class SqliteLogger(BaseLogger):

    def log_new_client(
        self,
-        client: Union[AzureOpenAI, OpenAI, GeminiClient, AnthropicClient, MistralAIClient],
+        client: Union[AzureOpenAI, OpenAI, GeminiClient, AnthropicClient, MistralAIClient, TogetherClient],
        wrapper: OpenAIWrapper,
        init_args: Dict[str, Any],
    ) -> None:
--- a/autogen/oai/client.py
+++ b/autogen/oai/client.py
@ -63,6 +63,13 @@ try:
 except ImportError as e:
    mistral_import_exception = e

+try:
+    from autogen.oai.together import TogetherClient
+
+    together_import_exception: Optional[ImportError] = None
+except ImportError as e:
+    together_import_exception = e
+
 logger = logging.getLogger(__name__)
 if not logger.handlers:
    # Add the console handler.
@ -473,6 +480,10 @@ class OpenAIWrapper:
                    raise ImportError("Please install `mistralai` to use the Mistral.AI API.")
                client = MistralAIClient(**openai_config)
                self._clients.append(client)
+            elif api_type is not None and api_type.startswith("together"):
+                if together_import_exception:
+                    raise ImportError("Please install `together` to use the Together.AI API.")
+                self._clients.append(TogetherClient(**config))
            else:
                client = OpenAI(**openai_config)
                self._clients.append(OpenAIClient(client))
--- a/autogen/oai/together.py
+++ b/autogen/oai/together.py
@ -0,0 +1,351 @@
+"""Create an OpenAI-compatible client using Together.AI's API.
+
+Example:
+    llm_config={
+        "config_list": [{
+            "api_type": "together",
+            "model": "mistralai/Mixtral-8x7B-Instruct-v0.1",
+            "api_key": os.environ.get("TOGETHER_API_KEY")
+            }
+    ]}
+
+    agent = autogen.AssistantAgent("my_agent", llm_config=llm_config)
+
+Install Together.AI python library using: pip install --upgrade together
+
+Resources:
+- https://docs.together.ai/docs/inference-python
+"""
+
+from __future__ import annotations
+
+import base64
+import copy
+import os
+import random
+import re
+import time
+import warnings
+from io import BytesIO
+from typing import Any, Dict, List, Mapping, Tuple, Union
+
+import requests
+from openai.types.chat import ChatCompletion, ChatCompletionMessageToolCall
+from openai.types.chat.chat_completion import ChatCompletionMessage, Choice
+from openai.types.completion_usage import CompletionUsage
+from PIL import Image
+from together import Together, error
+
+from autogen.oai.client_utils import should_hide_tools, validate_parameter
+
+
+class TogetherClient:
+    """Client for Together.AI's API."""
+
+    def __init__(self, **kwargs):
+        """Requires api_key or environment variable to be set
+
+        Args:
+            api_key (str): The API key for using Together.AI (or environment variable TOGETHER_API_KEY needs to be set)
+        """
+        # Ensure we have the api_key upon instantiation
+        self.api_key = kwargs.get("api_key", None)
+        if not self.api_key:
+            self.api_key = os.getenv("TOGETHER_API_KEY")
+
+        assert (
+            self.api_key
+        ), "Please include the api_key in your config list entry for Together.AI or set the TOGETHER_API_KEY env variable."
+
+    def message_retrieval(self, response) -> List:
+        """
+        Retrieve and return a list of strings or a list of Choice.Message from the response.
+
+        NOTE: if a list of Choice.Message is returned, it currently needs to contain the fields of OpenAI's ChatCompletion Message object,
+        since that is expected for function or tool calling in the rest of the codebase at the moment, unless a custom agent is being used.
+        """
+        return [choice.message for choice in response.choices]
+
+    def cost(self, response) -> float:
+        return response.cost
+
+    @staticmethod
+    def get_usage(response) -> Dict:
+        """Return usage summary of the response using RESPONSE_USAGE_KEYS."""
+        # ...  # pragma: no cover
+        return {
+            "prompt_tokens": response.usage.prompt_tokens,
+            "completion_tokens": response.usage.completion_tokens,
+            "total_tokens": response.usage.total_tokens,
+            "cost": response.cost,
+            "model": response.model,
+        }
+
+    def parse_params(self, params: Dict[str, Any]) -> Dict[str, Any]:
+        """Loads the parameters for Together.AI API from the passed in parameters and returns a validated set. Checks types, ranges, and sets defaults"""
+        together_params = {}
+
+        # Check that we have what we need to use Together.AI's API
+        together_params["model"] = params.get("model", None)
+        assert together_params[
+            "model"
+        ], "Please specify the 'model' in your config list entry to nominate the Together.AI model to use."
+
+        # Validate allowed Together.AI parameters
+        # https://github.com/togethercomputer/together-python/blob/94ffb30daf0ac3e078be986af7228f85f79bde99/src/together/resources/completions.py#L44
+        together_params["max_tokens"] = validate_parameter(params, "max_tokens", int, True, 512, (0, None), None)
+        together_params["stream"] = validate_parameter(params, "stream", bool, False, False, None, None)
+        together_params["temperature"] = validate_parameter(params, "temperature", (int, float), True, None, None, None)
+        together_params["top_p"] = validate_parameter(params, "top_p", (int, float), True, None, None, None)
+        together_params["top_k"] = validate_parameter(params, "top_k", int, True, None, None, None)
+        together_params["repetition_penalty"] = validate_parameter(
+            params, "repetition_penalty", float, True, None, None, None
+        )
+        together_params["presence_penalty"] = validate_parameter(
+            params, "presence_penalty", (int, float), True, None, (-2, 2), None
+        )
+        together_params["frequency_penalty"] = validate_parameter(
+            params, "frequency_penalty", (int, float), True, None, (-2, 2), None
+        )
+        together_params["min_p"] = validate_parameter(params, "min_p", (int, float), True, None, (0, 1), None)
+        together_params["safety_model"] = validate_parameter(
+            params, "safety_model", str, True, None, None, None
+        )  # We won't enforce the available models as they are likely to change
+
+        # Check if they want to stream and use tools, which isn't currently supported (TODO)
+        if together_params["stream"] and "tools" in params:
+            warnings.warn(
+                "Streaming is not supported when using tools, streaming will be disabled.",
+                UserWarning,
+            )
+
+            together_params["stream"] = False
+
+        return together_params
+
+    def create(self, params: Dict) -> ChatCompletion:
+
+        messages = params.get("messages", [])
+
+        # Convert AutoGen messages to Together.AI messages
+        together_messages = oai_messages_to_together_messages(messages)
+
+        # Parse parameters to Together.AI API's parameters
+        together_params = self.parse_params(params)
+
+        # Add tools to the call if we have them and aren't hiding them
+        if "tools" in params:
+            hide_tools = validate_parameter(
+                params, "hide_tools", str, False, "never", None, ["if_all_run", "if_any_run", "never"]
+            )
+            if not should_hide_tools(together_messages, params["tools"], hide_tools):
+                together_params["tools"] = params["tools"]
+
+        together_params["messages"] = together_messages
+
+        # We use chat model by default
+        client = Together(api_key=self.api_key)
+
+        # Token counts will be returned
+        prompt_tokens = 0
+        completion_tokens = 0
+        total_tokens = 0
+
+        max_retries = 5
+        for attempt in range(max_retries):
+            ans = None
+            try:
+                response = client.chat.completions.create(**together_params)
+            except Exception as e:
+                raise RuntimeError(f"Together.AI exception occurred: {e}")
+            else:
+
+                if together_params["stream"]:
+                    # Read in the chunks as they stream
+                    ans = ""
+                    for chunk in response:
+                        ans = ans + (chunk.choices[0].delta.content or "")
+
+                    prompt_tokens = chunk.usage.prompt_tokens
+                    completion_tokens = chunk.usage.completion_tokens
+                    total_tokens = chunk.usage.total_tokens
+                else:
+                    ans: str = response.choices[0].message.content
+
+                    prompt_tokens = response.usage.prompt_tokens
+                    completion_tokens = response.usage.completion_tokens
+                    total_tokens = response.usage.total_tokens
+                break
+
+        if response is not None:
+            # If we have tool calls as the response, populate completed tool calls for our return OAI response
+            if response.choices[0].finish_reason == "tool_calls":
+                together_finish = "tool_calls"
+                tool_calls = []
+                for tool_call in response.choices[0].message.tool_calls:
+                    tool_calls.append(
+                        ChatCompletionMessageToolCall(
+                            id=tool_call.id,
+                            function={"name": tool_call.function.name, "arguments": tool_call.function.arguments},
+                            type="function",
+                        )
+                    )
+            else:
+                together_finish = "stop"
+                tool_calls = None
+
+        else:
+            raise RuntimeError(f"Failed to get response from Together.AI after retrying {attempt + 1} times.")
+
+        # 3. convert output
+        message = ChatCompletionMessage(
+            role="assistant",
+            content=response.choices[0].message.content,
+            function_call=None,
+            tool_calls=tool_calls,
+        )
+        choices = [Choice(finish_reason=together_finish, index=0, message=message)]
+
+        response_oai = ChatCompletion(
+            id=response.id,
+            model=together_params["model"],
+            created=int(time.time() * 1000),
+            object="chat.completion",
+            choices=choices,
+            usage=CompletionUsage(
+                prompt_tokens=prompt_tokens,
+                completion_tokens=completion_tokens,
+                total_tokens=total_tokens,
+            ),
+            cost=calculate_together_cost(prompt_tokens, completion_tokens, together_params["model"]),
+        )
+
+        return response_oai
+
+
+def oai_messages_to_together_messages(messages: list[Dict[str, Any]]) -> list[dict[str, Any]]:
+    """Convert messages from OAI format to Together.AI format.
+    We correct for any specific role orders and types.
+    """
+
+    together_messages = copy.deepcopy(messages)
+
+    # If we have a message with role='tool', which occurs when a function is executed, change it to 'user'
+    for msg in together_messages:
+        if "role" in msg and msg["role"] == "tool":
+            msg["role"] = "user"
+
+    return together_messages
+
+
+# MODELS AND COSTS
+chat_lang_code_model_sizes = {
+    "zero-one-ai/Yi-34B-Chat": 34,
+    "allenai/OLMo-7B-Instruct": 7,
+    "allenai/OLMo-7B-Twin-2T": 7,
+    "allenai/OLMo-7B": 7,
+    "Austism/chronos-hermes-13b": 13,
+    "deepseek-ai/deepseek-coder-33b-instruct": 33,
+    "deepseek-ai/deepseek-llm-67b-chat": 67,
+    "garage-bAInd/Platypus2-70B-instruct": 70,
+    "google/gemma-2b-it": 2,
+    "google/gemma-7b-it": 7,
+    "Gryphe/MythoMax-L2-13b": 13,
+    "lmsys/vicuna-13b-v1.5": 13,
+    "lmsys/vicuna-7b-v1.5": 7,
+    "codellama/CodeLlama-13b-Instruct-hf": 13,
+    "codellama/CodeLlama-34b-Instruct-hf": 34,
+    "codellama/CodeLlama-70b-Instruct-hf": 70,
+    "codellama/CodeLlama-7b-Instruct-hf": 7,
+    "meta-llama/Llama-2-70b-chat-hf": 70,
+    "meta-llama/Llama-2-13b-chat-hf": 13,
+    "meta-llama/Llama-2-7b-chat-hf": 7,
+    "meta-llama/Llama-3-8b-chat-hf": 8,
+    "meta-llama/Llama-3-70b-chat-hf": 70,
+    "mistralai/Mistral-7B-Instruct-v0.1": 7,
+    "mistralai/Mistral-7B-Instruct-v0.2": 7,
+    "mistralai/Mistral-7B-Instruct-v0.3": 7,
+    "NousResearch/Nous-Capybara-7B-V1p9": 7,
+    "NousResearch/Nous-Hermes-llama-2-7b": 7,
+    "NousResearch/Nous-Hermes-Llama2-13b": 13,
+    "NousResearch/Nous-Hermes-2-Yi-34B": 34,
+    "openchat/openchat-3.5-1210": 7,
+    "Open-Orca/Mistral-7B-OpenOrca": 7,
+    "Qwen/Qwen1.5-0.5B-Chat": 0.5,
+    "Qwen/Qwen1.5-1.8B-Chat": 1.8,
+    "Qwen/Qwen1.5-4B-Chat": 4,
+    "Qwen/Qwen1.5-7B-Chat": 7,
+    "Qwen/Qwen1.5-14B-Chat": 14,
+    "Qwen/Qwen1.5-32B-Chat": 32,
+    "Qwen/Qwen1.5-72B-Chat": 72,
+    "Qwen/Qwen1.5-110B-Chat": 110,
+    "Qwen/Qwen2-72B-Instruct": 72,
+    "snorkelai/Snorkel-Mistral-PairRM-DPO": 7,
+    "togethercomputer/alpaca-7b": 7,
+    "teknium/OpenHermes-2-Mistral-7B": 7,
+    "teknium/OpenHermes-2p5-Mistral-7B": 7,
+    "togethercomputer/Llama-2-7B-32K-Instruct": 7,
+    "togethercomputer/RedPajama-INCITE-Chat-3B-v1": 3,
+    "togethercomputer/RedPajama-INCITE-7B-Chat": 7,
+    "togethercomputer/StripedHyena-Nous-7B": 7,
+    "Undi95/ReMM-SLERP-L2-13B": 13,
+    "Undi95/Toppy-M-7B": 7,
+    "WizardLM/WizardLM-13B-V1.2": 13,
+    "upstage/SOLAR-10.7B-Instruct-v1.0": 11,
+}
+
+# Cost per million tokens based on up to X Billion parameters, e.g. up 4B is $0.1/million
+chat_lang_code_model_costs = {4: 0.1, 8: 0.2, 21: 0.3, 41: 0.8, 80: 0.9, 110: 1.8}
+
+mixture_model_sizes = {
+    "cognitivecomputations/dolphin-2.5-mixtral-8x7b": 56,
+    "databricks/dbrx-instruct": 132,
+    "mistralai/Mixtral-8x7B-Instruct-v0.1": 47,
+    "mistralai/Mixtral-8x22B-Instruct-v0.1": 141,
+    "NousResearch/Nous-Hermes-2-Mistral-7B-DPO": 7,
+    "NousResearch/Nous-Hermes-2-Mixtral-8x7B-DPO": 47,
+    "NousResearch/Nous-Hermes-2-Mixtral-8x7B-SFT": 47,
+    "Snowflake/snowflake-arctic-instruct": 480,
+}
+
+# Cost per million tokens based on up to X Billion parameters, e.g. up 56B is $0.6/million
+mixture_costs = {56: 0.6, 176: 1.2, 480: 2.4}
+
+
+def calculate_together_cost(input_tokens: int, output_tokens: int, model_name: str) -> float:
+    """Cost calculation for inference"""
+
+    if model_name in chat_lang_code_model_sizes or model_name in mixture_model_sizes:
+        cost_per_mil = 0
+
+        # Chat, Language, Code models
+        if model_name in chat_lang_code_model_sizes:
+            size_in_b = chat_lang_code_model_sizes[model_name]
+
+            for top_size in chat_lang_code_model_costs.keys():
+                if size_in_b <= top_size:
+                    cost_per_mil = chat_lang_code_model_costs[top_size]
+                    break
+
+        else:
+            # Mixture-of-experts
+            size_in_b = mixture_model_sizes[model_name]
+
+            for top_size in mixture_costs.keys():
+                if size_in_b <= top_size:
+                    cost_per_mil = mixture_costs[top_size]
+                    break
+
+        if cost_per_mil == 0:
+            warnings.warn("Model size doesn't align with cost structure.", UserWarning)
+
+        return cost_per_mil * ((input_tokens + output_tokens) / 1e6)
+
+    else:
+        # Model is not in our list of models, can't determine the cost
+        warnings.warn(
+            "The model isn't catered for costing, to apply costs you can use the 'price' key on your config_list.",
+            UserWarning,
+        )
+
+        return 0
--- a/autogen/runtime_logging.py
+++ b/autogen/runtime_logging.py
@ -16,6 +16,7 @@ if TYPE_CHECKING:
    from autogen.oai.anthropic import AnthropicClient
    from autogen.oai.gemini import GeminiClient
    from autogen.oai.mistral import MistralAIClient
+    from autogen.oai.together import TogetherClient

 logger = logging.getLogger(__name__)

@ -109,7 +110,7 @@ def log_new_wrapper(wrapper: OpenAIWrapper, init_args: Dict[str, Union[LLMConfig


 def log_new_client(
-    client: Union[AzureOpenAI, OpenAI, GeminiClient, AnthropicClient, MistralAIClient],
+    client: Union[AzureOpenAI, OpenAI, GeminiClient, AnthropicClient, MistralAIClient, TogetherClient],
    wrapper: OpenAIWrapper,
    init_args: Dict[str, Any],
 ) -> None:
--- a/setup.py
+++ b/setup.py
@ -81,6 +81,7 @@ extra_require = {
    "lmm": ["replicate", "pillow"],
    "graph": ["networkx", "matplotlib"],
    "gemini": ["google-generativeai>=0.5,<1", "google-cloud-aiplatform", "google-auth", "pillow", "pydantic"],
+    "together": ["together>=1.2"],
    "websurfer": ["beautifulsoup4", "markdownify", "pdfminer.six", "pathvalidate"],
    "redis": ["redis"],
    "cosmosdb": ["azure-cosmos>=4.2.0"],
--- a/test/oai/test_together.py
+++ b/test/oai/test_together.py
@ -0,0 +1,264 @@
+from unittest.mock import MagicMock, patch
+
+import pytest
+
+try:
+    from openai.types.chat.chat_completion import ChatCompletionMessage, Choice
+
+    from autogen.oai.together import TogetherClient, calculate_together_cost
+
+    skip = False
+except ImportError:
+    TogetherClient = object
+    InternalServerError = object
+    skip = True
+
+
+# Fixtures for mock data
+@pytest.fixture
+def mock_response():
+    class MockResponse:
+        def __init__(self, text, choices, usage, cost, model):
+            self.text = text
+            self.choices = choices
+            self.usage = usage
+            self.cost = cost
+            self.model = model
+
+    return MockResponse
+
+
+@pytest.fixture
+def together_client():
+    return TogetherClient(api_key="fake_api_key")
+
+
+# Test initialization and configuration
+@pytest.mark.skipif(skip, reason="Together.AI dependency is not installed")
+def test_initialization():
+
+    # Missing any api_key
+    with pytest.raises(AssertionError) as assertinfo:
+        TogetherClient()  # Should raise an AssertionError due to missing api_key
+
+    assert (
+        "Please include the api_key in your config list entry for Together.AI or set the TOGETHER_API_KEY env variable."
+        in str(assertinfo.value)
+    )
+
+    # Creation works
+    TogetherClient(api_key="fake_api_key")  # Should create okay now.
+
+
+# Test standard initialization
+@pytest.mark.skipif(skip, reason="Together.AI dependency is not installed")
+def test_valid_initialization(together_client):
+    assert together_client.api_key == "fake_api_key", "Config api_key should be correctly set"
+
+
+# Test parameters
+@pytest.mark.skipif(skip, reason="Together.AI dependency is not installed")
+def test_parsing_params(together_client):
+    # All parameters
+    params = {
+        "model": "Qwen/Qwen2-72B-Instruct",
+        "max_tokens": 1000,
+        "stream": False,
+        "temperature": 1,
+        "top_p": 0.8,
+        "top_k": 50,
+        "repetition_penalty": 0.5,
+        "presence_penalty": 1.5,
+        "frequency_penalty": 1.5,
+        "min_p": 0.2,
+        "safety_model": "Meta-Llama/Llama-Guard-7b",
+    }
+    expected_params = {
+        "model": "Qwen/Qwen2-72B-Instruct",
+        "max_tokens": 1000,
+        "stream": False,
+        "temperature": 1,
+        "top_p": 0.8,
+        "top_k": 50,
+        "repetition_penalty": 0.5,
+        "presence_penalty": 1.5,
+        "frequency_penalty": 1.5,
+        "min_p": 0.2,
+        "safety_model": "Meta-Llama/Llama-Guard-7b",
+    }
+    result = together_client.parse_params(params)
+    assert result == expected_params
+
+    # Only model, others set as defaults
+    params = {
+        "model": "mistralai/Mixtral-8x7B-Instruct-v0.1",
+    }
+    expected_params = {
+        "model": "mistralai/Mixtral-8x7B-Instruct-v0.1",
+        "max_tokens": 512,
+        "stream": False,
+        "temperature": None,
+        "top_p": None,
+        "top_k": None,
+        "repetition_penalty": None,
+        "presence_penalty": None,
+        "frequency_penalty": None,
+        "min_p": None,
+        "safety_model": None,
+    }
+    result = together_client.parse_params(params)
+    assert result == expected_params
+
+    # Incorrect types, defaults should be set, will show warnings but not trigger assertions
+    params = {
+        "model": "mistralai/Mixtral-8x7B-Instruct-v0.1",
+        "max_tokens": "512",
+        "stream": "Yes",
+        "temperature": "0.5",
+        "top_p": "0.8",
+        "top_k": "50",
+        "repetition_penalty": "0.5",
+        "presence_penalty": "1.5",
+        "frequency_penalty": "1.5",
+        "min_p": "0.2",
+        "safety_model": False,
+    }
+    result = together_client.parse_params(params)
+    assert result == expected_params
+
+    # Values outside bounds, should warn and set to defaults
+    params = {
+        "model": "mistralai/Mixtral-8x7B-Instruct-v0.1",
+        "max_tokens": -200,
+        "presence_penalty": -5,
+        "frequency_penalty": 5,
+        "min_p": -0.5,
+    }
+    result = together_client.parse_params(params)
+    assert result == expected_params
+
+
+# Test cost calculation
+@pytest.mark.skipif(skip, reason="Together.AI dependency is not installed")
+def test_cost_calculation(mock_response):
+    response = mock_response(
+        text="Example response",
+        choices=[{"message": "Test message 1"}],
+        usage={"prompt_tokens": 10, "completion_tokens": 5, "total_tokens": 15},
+        cost=None,
+        model="mistralai/Mixtral-8x22B-Instruct-v0.1",
+    )
+    assert (
+        calculate_together_cost(response.usage["prompt_tokens"], response.usage["completion_tokens"], response.model)
+        == 0.000018
+    ), "Cost for this should be $0.000018"
+
+
+# Test text generation
+@pytest.mark.skipif(skip, reason="Together.AI dependency is not installed")
+@patch("autogen.oai.together.TogetherClient.create")
+def test_create_response(mock_create, together_client):
+    # Mock TogetherClient.chat response
+    mock_together_response = MagicMock()
+    mock_together_response.choices = [
+        MagicMock(finish_reason="stop", message=MagicMock(content="Example Llama response", tool_calls=None))
+    ]
+    mock_together_response.id = "mock_together_response_id"
+    mock_together_response.model = "meta-llama/Llama-3-8b-chat-hf"
+    mock_together_response.usage = MagicMock(prompt_tokens=10, completion_tokens=20)  # Example token usage
+
+    mock_create.return_value = mock_together_response
+
+    # Test parameters
+    params = {
+        "messages": [{"role": "user", "content": "Hello"}, {"role": "assistant", "content": "World"}],
+        "model": "meta-llama/Llama-3-8b-chat-hf",
+    }
+
+    # Call the create method
+    response = together_client.create(params)
+
+    # Assertions to check if response is structured as expected
+    assert (
+        response.choices[0].message.content == "Example Llama response"
+    ), "Response content should match expected output"
+    assert response.id == "mock_together_response_id", "Response ID should match the mocked response ID"
+    assert response.model == "meta-llama/Llama-3-8b-chat-hf", "Response model should match the mocked response model"
+    assert response.usage.prompt_tokens == 10, "Response prompt tokens should match the mocked response usage"
+    assert response.usage.completion_tokens == 20, "Response completion tokens should match the mocked response usage"
+
+
+# Test functions/tools
+@pytest.mark.skipif(skip, reason="Together.AI dependency is not installed")
+@patch("autogen.oai.together.TogetherClient.create")
+def test_create_response_with_tool_call(mock_create, together_client):
+
+    # Define the mock response directly within the patch
+    mock_function = MagicMock(name="currency_calculator")
+    mock_function.name = "currency_calculator"
+    mock_function.arguments = '{"base_currency": "EUR", "quote_currency": "USD", "base_amount": 123.45}'
+
+    # Define the mock response directly within the patch
+    mock_create.return_value = MagicMock(
+        choices=[
+            MagicMock(
+                finish_reason="tool_calls",
+                message=MagicMock(
+                    content="",  # Message is empty for tool responses
+                    tool_calls=[MagicMock(id="gdRdrvnHh", function=mock_function)],
+                ),
+            )
+        ],
+        id="mock_together_response_id",
+        model="meta-llama/Llama-3-8b-chat-hf",
+        usage=MagicMock(prompt_tokens=10, completion_tokens=20),
+    )
+
+    # Test parameters
+    converted_functions = [
+        {
+            "type": "function",
+            "function": {
+                "description": "Currency exchange calculator.",
+                "name": "currency_calculator",
+                "parameters": {
+                    "type": "object",
+                    "properties": {
+                        "base_amount": {"type": "number", "description": "Amount of currency in base_currency"},
+                        "base_currency": {
+                            "enum": ["USD", "EUR"],
+                            "type": "string",
+                            "default": "USD",
+                            "description": "Base currency",
+                        },
+                        "quote_currency": {
+                            "enum": ["USD", "EUR"],
+                            "type": "string",
+                            "default": "EUR",
+                            "description": "Quote currency",
+                        },
+                    },
+                    "required": ["base_amount"],
+                },
+            },
+        }
+    ]
+
+    together_messages = [
+        {
+            "role": "user",
+            "content": "How much is 123.45 EUR in USD?",
+            "name": None,
+            "tool_calls": None,
+            "tool_call_id": None,
+        },
+    ]
+
+    # Call the create method (which is now mocked)
+    response = together_client.create(
+        {"messages": together_messages, "tools": converted_functions, "model": "meta-llama/Llama-3-8b-chat-hf"}
+    )
+
+    # Assertions to check if response is structured as expected
+    assert response.choices[0].message.content == ""
+    assert response.choices[0].message.tool_calls[0].function.name == "currency_calculator"
--- a/website/docs/topics/non-openai-models/cloud-togetherai.ipynb
+++ b/website/docs/topics/non-openai-models/cloud-togetherai.ipynb
--- a/website/docs/topics/non-openai-models/cloud-togetherai.md
+++ b/website/docs/topics/non-openai-models/cloud-togetherai.md
@ -1,182 +0,0 @@
-# Together AI
-This cloud-based proxy server example, using [together.ai](https://www.together.ai/), is a group chat between a Python developer
-and a code reviewer, who are given a coding task.
-
-Start by [installing AutoGen](/docs/installation/) and getting your [together.ai API key](https://api.together.xyz/settings/profile).
-
-Put your together.ai API key in an environment variable, TOGETHER_API_KEY.
-
-Linux / Mac OSX:
-
-```bash
-export TOGETHER_API_KEY=YourTogetherAIKeyHere
-```
-
-Windows (command prompt):
-
-```powershell
-set TOGETHER_API_KEY=YourTogetherAIKeyHere
-```
-
-Create your LLM configuration, with the [model you want](https://docs.together.ai/docs/inference-models).
-
-```python
-import os
-
-config_list = [
-    {
-        # Available together.ai model strings:
-        # https://docs.together.ai/docs/inference-models
-        "model": "mistralai/Mistral-7B-Instruct-v0.1",
-        "api_key": os.environ['TOGETHER_API_KEY'],
-        "base_url": "https://api.together.xyz/v1"
-    }
-]
-```
-
-## Construct Agents
-
-```python
-from pathlib import Path
-from autogen import AssistantAgent, UserProxyAgent
-from autogen.coding import LocalCommandLineCodeExecutor
-
-work_dir = Path("groupchat")
-work_dir.mkdir(exist_ok=True)
-
-# Create local command line code executor.
-code_executor = LocalCommandLineCodeExecutor(work_dir=work_dir)
-
-# User Proxy will execute code and finish the chat upon typing 'exit'
-user_proxy = UserProxyAgent(
-    name="UserProxy",
-    system_message="A human admin",
-    code_execution_config={
-        "last_n_messages": 2,
-        "executor": code_executor,
-    },
-    human_input_mode="TERMINATE",
-    is_termination_msg=lambda x: "TERMINATE" in x.get("content"),
-)
-
-# Python Coder agent
-coder = AssistantAgent(
-    name="softwareCoder",
-    description="Software Coder, writes Python code as required and reiterates with feedback from the Code Reviewer.",
-    system_message="You are a senior Python developer, a specialist in writing succinct Python functions.",
-    llm_config={"config_list": config_list},
-)
-
-# Code Reviewer agent
-reviewer = AssistantAgent(
-    name="codeReviewer",
-    description="Code Reviewer, reviews written code for correctness, efficiency, and security. Asks the Software Coder to address issues.",
-    system_message="You are a Code Reviewer, experienced in checking code for correctness, efficiency, and security. Review and provide feedback to the Software Coder until you are satisfied, then return the word TERMINATE",
-    is_termination_msg=lambda x: "TERMINATE" in x.get("content"),
-    llm_config={"config_list": config_list},
-)
-```
-
-## Establish the group chat
-
-```python
-from autogen import GroupChat, GroupChatManager
-
-# Establish the Group Chat and disallow a speaker being selected consecutively
-groupchat = GroupChat(agents=[user_proxy, coder, reviewer], messages=[], max_round=12, allow_repeat_speaker=False)
-
-# Manages the group of multiple agents
-manager = GroupChatManager(groupchat=groupchat, llm_config={"config_list": config_list})
-```
-
-## Start Chat
-
-```python
-from autogen.cache import Cache
-
-# Cache LLM responses.
-with Cache.disk() as cache:
-    # Start the chat with a request to write a function
-    user_proxy.initiate_chat(
-        manager,
-        message="Write a Python function for the Fibonacci sequence, the function will have one parameter for the number in the sequence, which the function will return the Fibonacci number for.",
-        cache=cache,
-    )
-    # type exit to terminate the chat
-```
-
-Output:
-```` text
-UserProxy (to chat_manager):
-
-Write a Python function for the Fibonacci sequence, the function will have one parameter for the number in the sequence, which the function will return the Fibonacci number for.
-
--------------------------------------------------------------------------------
-softwareCoder (to chat_manager):
-
- Sure, here is a simple Python function that uses recursion to calculate the Fibonacci number:
-
-```python
-def fibonacci(n):
-    if n <= 0:
-        return "Input should be a positive integer."
-    elif n == 1:
-        return 0
-    elif n == 2:
-        return 1
-    else:
-        return fibonacci(n-1) + fibonacci(n-2)
-```
-
-This function takes an integer `n` as input and returns the `n`th number in the Fibonacci sequence. The Fibonacci sequence is a series of numbers in which each number is the sum of the two preceding ones, usually starting with 0 and 1.
-
-Note that this implementation uses recursion and may not be efficient for large values of `n`. In such cases, an iterative approach or memoization would be more appropriate.
-
--------------------------------------------------------------------------------
-codeReviewer (to chat_manager):
-
- I see a couple of issues with the current implementation of the `fibonacci` function:
-
-1. The function does not handle negative inputs correctly. Currently, it returns a string message for any input less than or equal to 0. It would be better to raise a `ValueError` or return a more informative message.
-2. The function uses recursion to calculate the Fibonacci number, which can be inefficient for large inputs. A more efficient approach would be to use an iterative approach or memoization.
-
-Here's an updated implementation that addresses these issues:
-
-```python
-def fibonacci(n):
-    if n <= 0:
-        raise ValueError("Input should be a positive integer.")
-    elif n == 1:
-        return 0
-    elif n == 2:
-        return 1
-    else:
-        a, b = 0, 1
-        for _ in range(n - 2):
-            a, b = b, a + b
-        return b
-```
-
-This implementation uses a loop to calculate the Fibonacci number iteratively, which is more efficient than the recursive approach. It also raises a `ValueError` for negative inputs, which is a more appropriate way to handle invalid inputs.
-
--------------------------------------------------------------------------------
-
->>>>>>>> USING AUTO REPLY...
-
->>>>>>>> EXECUTING CODE BLOCK 0 (inferred language is python)...
-UserProxy (to chat_manager):
-
-exitcode: 0 (execution succeeded)
-Code output:
-
-
--------------------------------------------------------------------------------
-codeReviewer (to chat_manager):
-
- I'm glad the updated implementation addresses the issues with the original code. Let me know if you have any further questions or if there's anything else I can help you with.
-
-To terminate the conversation, please type "TERMINATE".
-
--------------------------------------------------------------------------------
-Please give feedback to chat_manager. Press enter or type 'exit' to stop the conversation: exit
-````