coding agent; logging (#1011)

* coding agent * tsp * tsp * aoai * logging * compact * Handle Import Error * cost function * reset counter; doc * reset_counter * home page update * use case * catboost in linux * catboost * catboost * catboost * doc * intro * catboost
2023-05-02 13:38:23 -07:00 · 2023-05-02 13:38:23 -07:00 · 19aee67f55
parent 39b9a9a417
commit 19aee67f55
19 changed files with 764 additions and 78 deletions
--- a/README.md
+++ b/README.md
@ -18,13 +18,14 @@


 ## What is FLAML
-FLAML is a lightweight Python library that finds accurate machine
-learning models automatically, efficiently and economically. It frees users from selecting
-models and hyperparameters for each model. It can also be used to tune generic hyperparameters for foundation models, MLOps/LMOps workflows, pipelines, mathematical/statistical models, algorithms, computing experiments, software configurations and so on.
+FLAML is a lightweight Python library for efficient automation of machine
+learning, including selection of
+models, hyperparameters, and other tunable choices of an application (e.g., inference hyperparameters for foundation models, configurations in MLOps/LMOps workflows, pipelines, mathematical/statistical models, algorithms, computing experiments, software configurations).

-1. For common machine learning or AI tasks like classification, regression, and generation, it quickly finds quality models for user-provided data with low computational resources. It supports both classical machine learning models and deep neural networks, including foundation models such as the GPT series.
-1. It is easy to customize or extend. Users can find their desired customizability from a smooth range: minimal customization (computational resource budget), medium customization (e.g., scikit-style learner, search space and metric), or full customization (arbitrary training and evaluation code).
-1. It supports fast automatic tuning, capable of handling complex constraints/guidance/early stopping. FLAML is powered by a new, [cost-effective
+* For foundation models like the GPT series, it automates the experimentation and optimization of their inference performance to maximize the effectiveness for downstream applications and minimize the inference cost.
+* For common machine learning tasks like classification and regression, it quickly finds quality models for user-provided data with low computational resources.
+* It is easy to customize or extend. Users can find their desired customizability from a smooth range: minimal customization (computational resource budget), medium customization (e.g., scikit-style learner, search space and metric), or full customization (arbitrary training/inference/evaluation code).
+* It supports fast automatic tuning, capable of handling complex constraints/guidance/early stopping. FLAML is powered by a [cost-effective
 hyperparameter optimization](https://microsoft.github.io/FLAML/docs/Use-Cases/Tune-User-Defined-Function/#hyperparameter-optimization-algorithm)
 and model selection method invented by Microsoft Research, and many followup [research studies](https://microsoft.github.io/FLAML/docs/Research).

@ -58,6 +59,25 @@ Use the following guides to get started with FLAML in .NET:

 ## Quickstart

+* (New) You can optimize [generations](https://microsoft.github.io/FLAML/docs/Use-Cases/Auto-Generation) by ChatGPT or GPT-4 etc. with your own tuning data, success metrics and budgets.
+
+```python
+from flaml import oai
+
+config, analysis = oai.Completion.tune(
+    data=tune_data,
+    metric="success",
+    mode="max",
+    eval_func=eval_func,
+    inference_budget=0.05,
+    optimization_budget=3,
+    num_samples=-1,
+)
+```
+
+The automated experimentation and optimization can help you maximize the utility out of these expensive models.
+A suite of utilities such as caching and templating are offered to accelerate the experimentation and application development.
+
 * With three lines of code, you can start using this economical and fast
 AutoML engine as a [scikit-learn style estimator](https://microsoft.github.io/FLAML/docs/Use-Cases/Task-Oriented-AutoML).

@ -92,22 +112,6 @@ estimator = LGBMRegressor()
 estimator.fit(X_train, y_train)
 ```

-* (New) You can optimize [generations](https://microsoft.github.io/FLAML/docs/Use-Cases/Auto-Generation) by ChatGPT or GPT-4 etc. with your own tuning data, success metrics and budgets.
-
-```python
-from flaml import oai
-
-config, analysis = oai.Completion.tune(
-    data=tune_data,
-    metric="success",
-    mode="max",
-    eval_func=eval_func,
-    inference_budget=0.05,
-    optimization_budget=3,
-    num_samples=-1,
-)
-```
-
 ## Documentation

 You can find a detailed documentation about FLAML [here](https://microsoft.github.io/FLAML/) where you can find the API documentation, use cases and examples.
--- a/flaml/autogen/agent/init.py
+++ b/flaml/autogen/agent/init.py
--- a/flaml/autogen/agent/agent.py
+++ b/flaml/autogen/agent/agent.py
@ -0,0 +1,43 @@
+from collections import defaultdict
+
+
+class Agent:
+    """An abstract class for AI agent.
+    An agent can communicate with other agents, human and perform actions.
+    Different agents can differ in how and who they communicate with, and what actions they can perform. For example, an autonomous agent can communicate with human and other agents, and perform actions by creating agents and sending messages to other agents. A planning agent can communicate with other agents to make a plan and keep track of tasks. An execution agent can only communicate with other agents, and perform actions such as executing a command or code.
+    """
+
+    def __init__(self, name, system_message=""):
+        # empty memory
+        self._memory = []
+        # a dictionary of conversations, default value is list
+        self._conversations = defaultdict(list)
+        self._name = name
+        self._system_message = system_message
+
+    @property
+    def name(self):
+        """Get the name of the agent."""
+        return self._name
+
+    def _remember(self, memory):
+        """Remember something."""
+        self._memory.append(memory)
+
+    def _send(self, message, recipient):
+        """Send a message to another agent."""
+        self._conversations[recipient.name].append({"content": message, "role": "assistant"})
+        recipient.receive(message, self)
+
+    def _receive(self, message, sender):
+        """Receive a message from another agent."""
+        # print(self.name, "received message from", sender.name, ":", message)
+        self._conversations[sender.name].append({"content": message, "role": "user"})
+
+    def receive(self, message, sender):
+        """Receive a message from another agent.
+        This method is called by the sender.
+        It needs to be overriden by the subclass to perform followup actions.
+        """
+        self._receive(message, sender)
+        # perform actions based on the message
--- a/flaml/autogen/agent/coding_agent.py
+++ b/flaml/autogen/agent/coding_agent.py
@ -0,0 +1,53 @@
+from .agent import Agent
+from .execution_agent import ExecutionAgent
+from flaml.autogen.code_utils import generate_code, DEFAULT_MODEL
+from flaml import oai
+
+
+class PythonAgent(Agent):
+    """Suggest code blocks."""
+
+    DEFAULT_SYSTEM_MESSAGE = """You are a coding agent. You suggest python code for a user to execute for a given task. Don't suggest shell command. Output the code in a coding block. Check the execution result. If the result indicates there is an error, fix the error and output the code again.
+    """
+
+    DEFAULT_CONFIG = {
+        "model": DEFAULT_MODEL,
+    }
+    EXECUTION_AGENT_PREFIX = "execution_agent4"
+    SUCCESS_EXIT_CODE = "exitcode: 0\n"
+
+    def __init__(self, name, system_message=DEFAULT_SYSTEM_MESSAGE, work_dir=None, **config):
+        super().__init__(name, system_message)
+        self._work_dir = work_dir
+        self._config = self.DEFAULT_CONFIG.copy()
+        self._config.update(config)
+        self._sender_dict = {}
+
+    def receive(self, message, sender):
+        if sender.name not in self._sender_dict:
+            self._sender_dict[sender.name] = sender
+            self._conversations[sender.name] = [{"content": self._system_message, "role": "system"}]
+        super().receive(message, sender)
+        if sender.name.startswith(self.EXECUTION_AGENT_PREFIX) and message.startswith(self.SUCCESS_EXIT_CODE):
+            # the code is correct, respond to the original sender
+            name = sender.name[len(self.EXECUTION_AGENT_PREFIX) :]
+            original_sender = self._sender_dict[name]
+            output = message[len(self.SUCCESS_EXIT_CODE) :]
+            if output:
+                self._send(f"{output}", original_sender)
+            else:
+                self._send("Done. No output.", original_sender)
+            return
+        responses = oai.ChatCompletion.create(messages=self._conversations[sender.name], **self._config)
+        # cost = oai.ChatCompletion.cost(responses)
+        response = oai.ChatCompletion.extract_text(responses)[0]
+        if sender.name.startswith(self.EXECUTION_AGENT_PREFIX):
+            execution_agent = sender
+        else:
+            # create an execution agent
+            execution_agent = ExecutionAgent(f"{self.EXECUTION_AGENT_PREFIX}{sender.name}", work_dir=self._work_dir)
+            # initialize the conversation
+            self._conversations[execution_agent.name] = self._conversations[sender.name].copy()
+            self._sender_dict[execution_agent.name] = execution_agent
+        # send the response to the execution agent
+        self._send(response, execution_agent)
--- a/flaml/autogen/agent/execution_agent.py
+++ b/flaml/autogen/agent/execution_agent.py
@ -0,0 +1,24 @@
+from .agent import Agent
+from flaml.autogen.code_utils import execute_code, extract_code
+
+
+class ExecutionAgent(Agent):
+    """Perform actions based on instructions from other agents.
+    An execution agent can only communicate with other agents, and perform actions such as executing a command or code.
+    """
+
+    def __init__(self, name, system_message="", work_dir=None):
+        super().__init__(name, system_message)
+        self._word_dir = work_dir
+
+    def receive(self, message, sender):
+        super().receive(message, sender)
+        # extract code
+        code, lang = extract_code(message)
+        if lang == "bash":
+            assert code.startswith("python ")
+            file_name = code[len("python ") :]
+            exitcode, logs = execute_code(filename=file_name, work_dir=self._word_dir)
+        else:
+            exitcode, logs = execute_code(code, work_dir=self._word_dir)
+        self._send(f"exitcode: {exitcode}\n{logs.decode('utf-8')}", sender)
--- a/flaml/autogen/code_utils.py
+++ b/flaml/autogen/code_utils.py
@ -6,10 +6,11 @@ import pathlib
 from typing import List, Dict, Tuple, Optional, Union, Callable
 import re
 import time
+from hashlib import md5
 from flaml.autogen import oai, DEFAULT_MODEL, FAST_MODEL

 # Regular expression for finding a code block
-CODE_BLOCK_PATTERN = r"```\w*\n(.*?)\n```"
+CODE_BLOCK_PATTERN = r"```(\w*)\n(.*?)\n```"
 WORKING_DIR = os.path.join(os.path.dirname(os.path.realpath(__file__)), "extensions")


@ -18,9 +19,9 @@ def extract_code(text: str, pattern: str = CODE_BLOCK_PATTERN) -> str:
    match = re.search(pattern, text, flags=re.DOTALL)
    # If a match is found, return the code
    if match:
-        return match.group(1)
+        return match.group(2), match.group(1)
    # If no code block is found, return the whole text
-    return text
+    return text, "unknown"


 def generate_code(pattern: str = CODE_BLOCK_PATTERN, **config) -> Tuple[str, float]:
@ -36,7 +37,7 @@ def generate_code(pattern: str = CODE_BLOCK_PATTERN, **config) -> Tuple[str, flo
        float: The cost of the generation.
    """
    response = oai.Completion.create(**config)
-    cost = oai.Completion.cost(config["model"], response)
+    cost = oai.Completion.cost(response)
    return extract_code(oai.Completion.extract_text(response)[0], pattern), cost


@ -58,7 +59,7 @@ def improve_function(file_name, func_name, objective, **config):
    response = oai.Completion.create(
        {"func_name": func_name, "objective": objective, "file_string": file_string}, **params
    )
-    cost = oai.Completion.cost(params["model"], response)
+    cost = oai.Completion.cost(response)
    return oai.Completion.extract_text(response)[0], cost


@ -96,7 +97,7 @@ def improve_code(files, objective, suggest_only=True, **config):
    params = {**_IMPROVE_CODE_CONFIG, **config}
    followup = "" if suggest_only else " followed by the improved code"
    response = oai.Completion.create({"objective": objective, "code": code, "followup": followup}, **params)
-    cost = oai.Completion.cost(params["model"], response)
+    cost = oai.Completion.cost(response)
    return oai.Completion.extract_text(response)[0], cost


@ -141,7 +142,7 @@ def execute_code(

    original_filename = filename
    if filename is None:
-        code_hash = hash(code)
+        code_hash = md5(code.encode()).hexdigest()
        # create a file with a automatically generated name
        filename = f"tmp_code_{code_hash}.py"
    if work_dir is None:
@ -151,7 +152,6 @@ def execute_code(
    os.makedirs(file_dir, exist_ok=True)

    if code is not None:
-        code = code.strip()
        with open(filepath, "w") as fout:
            fout.write(code)
    # check if already running in a docker container
@ -281,7 +281,7 @@ def generate_assertions(definition: str, **config) -> Tuple[str, float]:
        {"definition": definition},
        **params,
    )
-    cost = oai.Completion.cost(params["model"], response)
+    cost = oai.Completion.cost(response)
    assertions = oai.Completion.extract_text(response)[0]
    return assertions, cost

@ -410,7 +410,7 @@ def implement(
        assertions, cost = assertions(definition)
    for i, config in enumerate(configs):
        response = oai.Completion.create({"definition": definition}, **config)
-        cost += oai.Completion.cost(config["model"], response)
+        cost += oai.Completion.cost(response)
        responses = oai.Completion.extract_text(response)
        metrics = eval_function_completions(responses, definition, assertions=assertions)
        assertions = metrics["assertions"]
--- a/flaml/autogen/math_utils.py
+++ b/flaml/autogen/math_utils.py
@ -20,7 +20,7 @@ def solve_problem(problem: str, **config) -> str:
    """
    params = {**_MATH_CONFIG, **config}
    response = oai.Completion.create({"problem": problem}, **params)
-    cost = oai.Completion.cost(params["model"], response)
+    cost = oai.Completion.cost(response)
    results = eval_math_responses(oai.Completion.extract_text(response))
    return results.get("voted_answer"), cost

--- a/flaml/autogen/oai/completion.py
+++ b/flaml/autogen/oai/completion.py
@ -4,6 +4,7 @@ import numpy as np
 import time
 from typing import List, Optional, Dict
 import sys
+import json
 from flaml import tune, BlendSearch
 from flaml.automl.logger import logger_formatter

@ -41,11 +42,12 @@ def get_key(config):
    Returns:
        tuple: A unique identifier which can be used as a key for a dict.
    """
-    if isinstance(config, dict):
-        return tuple(get_key(x) for x in sorted(config.items()))
-    if isinstance(config, list):
-        return tuple(get_key(x) for x in config)
-    return config
+    # if isinstance(config, dict):
+    #     return tuple(get_key(x) for x in sorted(config.items()))
+    # if isinstance(config, list):
+    #     return tuple(get_key(x) for x in config)
+    # return config
+    return json.dumps(config, sort_keys=True)


 class Completion(openai_Completion):
@ -117,6 +119,8 @@ class Completion(openai_Completion):
    _total_cost = 0
    optimization_budget = None

+    _history_dict = _count_create = None
+
    @classmethod
    def set_cache(cls, seed=41, cache_path=".cache"):
        """Set cache path.
@ -130,6 +134,35 @@ class Completion(openai_Completion):
        cls.seed = seed
        cls.cache_path = f"{cache_path}/{seed}"

+    @classmethod
+    def _book_keeping(cls, config: Dict, response):
+        """Book keeping for the created completions."""
+        if cls._history_dict is None:
+            return
+        if cls._history_compact:
+            value = {
+                "created_at": [],
+                "cost": [],
+            }
+            if "messages" in config:
+                messages = config["messages"]
+                if len(messages) > 1 and messages[-1]["role"] != "assistant":
+                    existing_key = get_key(messages[:-1])
+                    value = cls._history_dict.pop(existing_key, value)
+                key = get_key(messages + [choice["message"] for choice in response["choices"]])
+            else:
+                key = get_key([config["prompt"]] + [choice.get("text") for choice in response["choices"]])
+            value["created_at"].append(cls._count_create)
+            value["cost"].append(cls.cost(response))
+            cls._history_dict[key] = value
+            cls._count_create += 1
+            return
+        cls._history_dict[cls._count_create] = {
+            "request": config,
+            "response": response.to_dict_recursive(),
+        }
+        cls._count_create += 1
+
    @classmethod
    def _get_response(cls, config: dict, eval_only=False, use_cache=True):
        """Get the response from the openai api call.
@ -141,6 +174,7 @@ class Completion(openai_Completion):
            response = cls._cache.get(key, None)
            if response is not None and (response != -1 or not eval_only):
                # print("using cached response")
+                cls._book_keeping(config, response)
                return response
        openai_completion = openai.ChatCompletion if config["model"] in cls.chat_models else openai.Completion
        start_time = time.time()
@ -195,6 +229,7 @@ class Completion(openai_Completion):
            else:
                if use_cache:
                    cls._cache.set(key, response)
+                cls._book_keeping(config, response)
                return response
        logger.warning(
            f"Failed to get response from openai api due to getting RateLimitError or Timeout for {cls.retry_timeout} seconds."
@ -642,8 +677,7 @@ class Completion(openai_Completion):
        Args:
            context (dict, Optional): The context to instantiate the prompt.
                It needs to contain keys that are used by the prompt template.
-                E.g., `prompt="Complete the following sentence: {prefix}"`.
-                `context={"prefix": "Today I feel"}`.
+                E.g., `prompt="Complete the following sentence: {prefix}, context={"prefix": "Today I feel"}`.
                The actual prompt sent to OpenAI will be:
                "Complete the following sentence: Today I feel".
            use_cache (bool, Optional): Whether to use cached responses.
@ -781,13 +815,12 @@ class Completion(openai_Completion):
        result_agg, responses_list, result_list = {}, [], []
        metric_keys = None
        cost = 0
-        model = config["model"]
        old_level = logger.getEffectiveLevel()
        logger.setLevel(logging_level)
        for i, data_i in enumerate(data):
            logger.info(f"evaluating data instance {i}")
            response = cls.create(data_i, use_cache, **config)
-            cost += cls.cost(model, response)
+            cost += cls.cost(response)
            # evaluate the quality of the responses
            responses = cls.extract_text(response)
            if eval_func is not None:
@ -846,16 +879,16 @@ class Completion(openai_Completion):
            return result_agg

    @classmethod
-    def cost(cls, model: str, response: dict):
+    def cost(cls, response: dict):
        """Compute the cost of an API call.

        Args:
-            model (str): The model name.
            response (dict): The response from OpenAI API.

        Returns:
            The cost in USD.
        """
+        model = response["model"]
        if model not in cls.price1K:
            raise ValueError(f"Unknown model: {model}")
        usage = response["usage"]
@ -881,6 +914,68 @@ class Completion(openai_Completion):
            return [choice["text"] for choice in choices]
        return [choice["message"].get("content", "") for choice in choices]

+    @classmethod
+    @property
+    def logged_history(cls) -> Dict:
+        """Return the book keeping dictionary."""
+        return cls._history_dict
+
+    @classmethod
+    def start_logging(
+        cls, history_dict: Optional[Dict] = None, compact: Optional[bool] = True, reset_counter: Optional[bool] = True
+    ):
+        """Start book keeping.
+
+        Args:
+            history_dict (Dict): A dictionary for book keeping.
+                If no provided, a new one will be created.
+            compact (bool): Whether to keep the history dictionary compact.
+                Compact history contains one key per conversation, and the value is a dictionary
+                like:
+        ```python
+        {
+            "create_at": [0, 1],
+            "cost": [0.1, 0.2],
+        }
+        ```
+                where "created_at" is the index of API calls indicating the order of all the calls,
+                and "cost" is the cost of each call. This example shows that the conversation is based
+                on two API calls. The compact format is useful for condensing the history of a conversation.
+                If compact is False, the history dictionary will contain all the API calls: the key
+                is the index of the API call, and the value is a dictionary like:
+        ```python
+        {
+            "request": request_dict,
+            "response": response_dict,
+        }
+        ```
+                where request_dict is the request sent to OpenAI API, and response_dict is the response.
+                For a conversation containing two API calls, the non-compact history dictionary will be like:
+        ```python
+        {
+            0: {
+                "request": request_dict_0,
+                "response": response_dict_0,
+            },
+            1: {
+                "request": request_dict_1,
+                "response": response_dict_1,
+            },
+        ```
+                The first request's messages plus the response is equal to the second request's messages.
+                For a conversation with many turns, the non-compact history dictionary has a quadratic size
+                while the compact history dict has a linear size.
+            reset_counter (bool): whether to reset the counter of the number of API calls.
+        """
+        cls._history_dict = {} if history_dict is None else history_dict
+        cls._history_compact = compact
+        cls._count_create = 0 if reset_counter or cls._count_create is None else cls._count_create
+
+    @classmethod
+    def stop_logging(cls):
+        """End book keeping."""
+        cls._history_dict = cls._count_create = None
+

 class ChatCompletion(Completion):
    """A class for OpenAI API ChatCompletion."""
--- a/flaml/version.py
+++ b/flaml/version.py
@ -1 +1 @@
-__version__ = "1.2.2"
+__version__ = "1.2.3"
--- a/setup.py
+++ b/setup.py
@ -77,7 +77,7 @@ setuptools.setup(
            "ipykernel",
            "pytorch-lightning<1.9.1",  # test_forecast_panel
        ],
-        "catboost": ["catboost>=0.26,<1.2"],
+        "catboost": ["catboost>=0.26"],
        "blendsearch": ["optuna==2.8.0"],
        "ray": [
            "ray[tune]~=1.13",
@ -118,7 +118,7 @@ setuptools.setup(
            "hcrystalball==0.1.10",
            "pytorch-forecasting>=0.9.0",
        ],
-        "benchmark": ["catboost>=0.26,<1.2", "psutil==5.8.0", "xgboost==1.3.3"],
+        "benchmark": ["catboost>=0.26", "psutil==5.8.0", "xgboost==1.3.3"],
        "openai": ["openai==0.27.4", "diskcache"],
        "autogen": ["openai==0.27.4", "diskcache", "docker"],
        "synapse": ["joblibspark>=0.5.0", "optuna==2.8.0", "pyspark>=3.2.0"],
--- a/test/autogen/extensions/init.py
+++ b/test/autogen/extensions/init.py
--- a/test/autogen/extensions/tsp.py
+++ b/test/autogen/extensions/tsp.py
@ -0,0 +1,77 @@
+"""Solve a non-symmetric TSP problem.
+
+Triangular inequality is not required in this problem.
+"""
+import math
+import pdb
+import random
+import sys
+from itertools import combinations, permutations
+
+
+def solve_tsp(dists: dict) -> float:
+    """Solve the TSP problem
+
+    Args:
+        dists (dict): the distance matrix between each nodes. Each item in the
+            dict is a pair (node A, node B) to the distance from A to B.
+
+    Returns:
+        float: the optimal cost
+    """
+    # Get the unique nodes from the distance matrix
+    nodes = set()
+    for pair in dists.keys():
+        nodes.add(pair[0])
+        nodes.add(pair[1])
+
+    # Generate all possible routes (permutations of nodes)
+    routes = permutations(nodes)
+
+    # Initialize the optimal cost as infinite
+    optimal_cost = float("inf")
+    optimal_route = None
+
+    # Iterate through all possible routes
+    for route in routes:
+        cost = 0
+        # Calculate the cost of the current route
+        for i in range(len(route)):
+            current_node = route[i]
+            next_node = route[(i + 1) % len(route)]
+            cost += dists[(current_node, next_node)]
+
+        # Update the optimal cost if the current cost is smaller
+        if cost < optimal_cost:
+            optimal_cost = cost
+            optimal_route = route
+
+    print("Cost:", optimal_cost, "with route", optimal_route)
+    return optimal_cost
+
+
+def tsp_data(n: int, seed: int = 2022) -> dict:
+    """Generate some sample data for the non-symmetric TSP problem.
+
+    Args:
+        n (int): number of nodes in the problem
+        seed (int): the random seed.
+
+    Returns:
+        dict: the pairwise distance matrix.
+    """
+    # Initialize the random seed
+    random.seed(seed)
+
+    # Initialize the distance matrix
+    dist_matrix = {}
+
+    # Generate distances for each pair of nodes
+    for i in range(n):
+        for j in range(n):
+            if i != j:
+                # Generate a random distance between nodes i and j
+                distance = round(random.uniform(1, 100), 2)
+                dist_matrix[(i, j)] = distance
+
+    return dist_matrix
--- a/test/autogen/extensions/tsp_api.py
+++ b/test/autogen/extensions/tsp_api.py
@ -0,0 +1,35 @@
+from .tsp import tsp_data
+
+
+def change_dist(dist: dict, i: int, j: int, new_cost: float) -> float:
+    """Change the distance between two points.
+
+    Args:
+        dist (dict): distance matrix, where the key is a pair and value is
+            the cost (aka, distance).
+        i (int): the source node
+        j (int): the destination node
+        new_cost (float): the new cost for the distance
+
+    Returns:
+        float: the previous cost
+    """
+    prev_cost = dist[i, j]
+    dist[i, j] = new_cost
+    return prev_cost
+
+
+def compare_costs(prev_cost, new_cost) -> float:
+    """Compare the previous cost and the new cost.
+
+    Args:
+        prev_cost (float): the previous cost
+        new_cost (float): the updated cost
+
+    Returns:
+        float: the ratio between these two costs
+    """
+    return (new_cost - prev_cost) / prev_cost
+
+
+dists = tsp_data(5, seed=1)
--- a/test/autogen/test_agent.py
+++ b/test/autogen/test_agent.py
@ -0,0 +1,72 @@
+from flaml.autogen.code_utils import extract_code
+from flaml import oai
+
+
+def test_extract_code():
+    print(extract_code("```bash\npython temp.py\n```"))
+
+
+def test_coding_agent():
+    try:
+        import openai
+    except ImportError:
+        return
+    from flaml.autogen.agent.coding_agent import PythonAgent
+    from flaml.autogen.agent.agent import Agent
+
+    conversations = {}
+    oai.ChatCompletion.start_logging(conversations)
+    agent = PythonAgent("coding_agent")
+    user = Agent("user")
+    agent.receive(
+        """Create a temp.py file with the following content:
+```
+print('Hello world!')
+```""",
+        user,
+    )
+    print(conversations)
+    oai.ChatCompletion.start_logging(compact=False)
+    agent.receive("""Execute temp.py""", user)
+    print(oai.ChatCompletion.logged_history)
+    oai.ChatCompletion.stop_logging()
+
+
+def test_tsp():
+    try:
+        import openai
+    except ImportError:
+        return
+    from flaml.autogen.agent.coding_agent import PythonAgent
+    from flaml.autogen.agent.agent import Agent
+
+    hard_questions = [
+        "What if we must go from node 1 to node 2?",
+        "Can we double all distances?",
+        "Can we add a new point to the graph? It's distance should be randomly between 0 - 5 to each of the existing points.",
+    ]
+
+    oai.ChatCompletion.start_logging()
+    agent = PythonAgent("coding_agent", work_dir="test/autogen", temperature=0)
+    user = Agent("user")
+    with open("test/autogen/tsp_prompt.txt", "r") as f:
+        prompt = f.read()
+    # agent.receive(prompt.format(question=hard_questions[0]), user)
+    # agent.receive(prompt.format(question=hard_questions[1]), user)
+    agent.receive(prompt.format(question=hard_questions[2]), user)
+    print(oai.ChatCompletion.logged_history)
+    oai.ChatCompletion.stop_logging()
+
+
+if __name__ == "__main__":
+    import openai
+
+    openai.api_key_path = "test/openai/key.txt"
+    # if you use Azure OpenAI, comment the above line and uncomment the following lines
+    # openai.api_type = "azure"
+    # openai.api_base = "https://<your_endpoint>.openai.azure.com/"
+    # openai.api_version = "2023-03-15-preview"  # change if necessary
+    # openai.api_key = "<your_api_key>"
+    # test_extract_code()
+    test_coding_agent()
+    test_tsp()
--- a/test/autogen/tsp_prompt.txt
+++ b/test/autogen/tsp_prompt.txt
@ -0,0 +1,115 @@
+
+Now, we have a system to solve TSP problems. Let's try to solve a problem.
+
+Given a distance dictionary `dicts`, where the key is a pair of nodes and the
+value is the distance between them. For example, `dists[(1, 2)]` is the distance
+between node 1 and node 2. We want to find the optimal cost for the TSP problem.
+
+The users might have some questions regarding the solution. So, you are
+responsible to write code to answer the their questions. Note that you usually
+would need to run `solve_tsp` and `compare_costs` to compare the costs before
+and after the change.
+
+Here are the functions and their information that you can use directly:
+
+----------
+def change_dist(dist: dict, i: int, j: int, new_cost: float) -> float:
+    """Change the distance between two points.
+
+    Args:
+        dist (dict): distance matrix, where the key is a pair and value is
+            the cost (aka, distance).
+        i (int): the source node
+        j (int): the destination node
+        new_cost (float): the new cost for the distance
+
+    Returns:
+        float: the previous cost
+    """
+----------
+
+----------
+def compare_costs(prev_cost, new_cost) -> float:
+    """Compare the previous cost and the new cost.
+
+    Args:
+        prev_cost (float): the previous cost
+        new_cost (float): the updated cost
+
+    Returns:
+        float: the ratio between these two costs
+    """
+----------
+
+----------
+def solve_tsp(dists: dict) -> float:
+    """Solve the TSP problem
+
+    Args:
+        dists (dict): the distance matrix between each nodes. Each item in the
+            dict is a pair (node A, node B) to the distance from A to B.
+
+    Returns:
+        float: the optimal cost
+    """
+----------
+
+
+We also provide some sample questions and answers here:
+----------
+Question: Why should we go from point 1 to point 2?
+Code:
+```
+from extensions.tsp import solve_tsp
+from extensions.tsp_api import change_dist, compare_costs, dists
+prev_cost=solve_tsp(dists)
+change_dist(dists, 1, 2, float('inf'))
+new_cost = solve_tsp(dists)
+gap = compare_costs(prev_cost, new_cost)
+print('If not, then the cost will increase', gap * 100, 'percent.')
+```
+
+----------
+Question: Can we double the distance between point 4 and 2?
+Code:
+```
+from extensions.tsp import solve_tsp
+from extensions.tsp_api import change_dist, compare_costs, dists
+prev_cost=solve_tsp(dists)
+change_dist(dists, 3, 4, dists[(3, 4)] * 2)
+new_cost = solve_tsp(dists)
+gap = compare_costs(prev_cost, new_cost)
+print('If we double the distance between 4 and 2, then the cost will decrease', - gap * 100, 'percent.')
+```
+
+----------
+Question: what would happen if we remove point 2?
+Code:
+```
+from extensions.tsp import solve_tsp
+from extensions.tsp_api import compare_costs, dists
+prev_cost=solve_tsp(dists)
+for i, j in list(dists.keys()):
+    if i == 2 or j == 2:
+        del dists[i, j] # remove the edge cost
+new_cost = solve_tsp(dists)
+gap = compare_costs(prev_cost, new_cost)
+print('If we remove point 2, then the cost will decrease', - gap * 100, 'percent.')
+```
+
+----------
+Question: What if the edge between point 2 to 3 is removed?
+Code:
+```
+from extensions.tsp import solve_tsp
+from extensions.tsp_api import change_dist, compare_costs, dists
+prev_cost=solve_tsp(dists)
+change_dist(dists, 2, 3, float('inf'))
+new_cost = solve_tsp(dists)
+gap = compare_costs(prev_cost, new_cost)
+print('If we remove the edge, then the cost will increase', gap * 100, 'percent.')
+```
+
+Now, answer the questions by using Python code:
+Question: {question}
+Code:
--- a/test/openai/test_completion.py
+++ b/test/openai/test_completion.py
@ -100,7 +100,7 @@ def test_nocontext():
    )
    print(code)
    # test extract_code from markdown
-    code = extract_code(
+    code, _ = extract_code(
        """
 Example:
 ```
@ -110,7 +110,7 @@ print("hello extract code")
    )
    print(code)

-    code = extract_code(
+    code, _ = extract_code(
        """
 Example:
 ```python
--- a/website/docs/Getting-Started.md
+++ b/website/docs/Getting-Started.md
@ -2,14 +2,16 @@

 <!-- ### Welcome to FLAML, a Fast Library for Automated Machine Learning & Tuning! -->

-FLAML is a lightweight Python library that finds accurate machine
-learning models automatically, efficiently and economically. It frees users from selecting models and hyperparameters for each model.
+FLAML is a lightweight Python library for efficient automation of machine
+learning, including selection of
+models, hyperparameters, and other tunable choices of an application.

 ### Main Features

-1. For common machine learning or AI tasks like classification, regression, and generation, it quickly finds quality models for user-provided data with low computational resources. It supports both classical machine learning models and deep neural networks, including foundation models such as the GPT series.
-2. It is easy to customize or extend. Users can find their desired customizability from a smooth range: minimal customization (computational resource budget), medium customization (e.g., scikit-style learner, search space and metric), or full customization (arbitrary training and evaluation code). Users can customize only when and what they need to, and leave the rest to the library.
-3. It supports fast and economical automatic tuning, capable of handling large search space with heterogeneous evaluation cost and complex constraints/guidance/early stopping. FLAML is powered by a new, [cost-effective
+* For foundation models like the GPT series, it automates the experimentation and optimization of their inference performance to maximize the effectiveness for downstream applications and minimize the inference cost.
+* For common machine learning tasks like classification and regression, it quickly finds quality models for user-provided data with low computational resources.
+* It is easy to customize or extend. Users can find their desired customizability from a smooth range: minimal customization (computational resource budget), medium customization (e.g., scikit-style learner, search space and metric), or full customization (arbitrary training/inference/evaluation code). Users can customize only when and what they need to, and leave the rest to the library.
+* It supports fast and economical automatic tuning, capable of handling large search space with heterogeneous evaluation cost and complex constraints/guidance/early stopping. FLAML is powered by a [cost-effective
 hyperparameter optimization](Use-Cases/Tune-User-Defined-Function#hyperparameter-optimization-algorithm)
 and model selection method invented by Microsoft Research, and many followup [research studies](Research).

@ -19,6 +21,27 @@ Install FLAML from pip: `pip install flaml`. Find more options in [Installation]

 There are several ways of using flaml:

+#### (New) [Auto Generation](Use-Cases/Auto-Generation)
+
+For example, you can optimize generations by ChatGPT or GPT-4 etc. with your own tuning data, success metrics and budgets.
+
+```python
+from flaml import oai
+
+config, analysis = oai.Completion.tune(
+    data=tune_data,
+    metric="success",
+    mode="max",
+    eval_func=eval_func,
+    inference_budget=0.05,
+    optimization_budget=3,
+    num_samples=-1,
+)
+```
+
+The automated experimentation and optimization can help you maximize the utility out of these expensive models.
+A suite of utilities such as caching and templating are offered to accelerate the experimentation and application development.
+
 #### [Task-oriented AutoML](Use-Cases/task-oriented-automl)

 For example, with three lines of code, you can start using this economical and fast AutoML engine as a scikit-learn style estimator.
@ -86,26 +109,6 @@ from flaml.default import LGBMClassifier

 Then, you can use it just like you use the original `LGMBClassifier`. Your other code can remain unchanged. When you call the `fit()` function from `flaml.default.LGBMClassifier`, it will automatically instantiate a good data-dependent hyperparameter configuration for your dataset, which is expected to work better than the default configuration.

-#### (New) [Auto Generation](Use-Cases/Auto-Generation)
-
-You can optimize generations by ChatGPT or GPT-4 etc. with your own tuning data, success metrics and budgets.
-
-```python
-from flaml import oai
-
-config, analysis = oai.Completion.tune(
-    data=tune_data,
-    metric="success",
-    mode="max",
-    eval_func=eval_func,
-    inference_budget=0.05,
-    optimization_budget=3,
-    num_samples=-1,
-)
-```
-
-The optimization can help you maximize the utility out of these expensive models.
-
 ### Where to Go Next?

 * Understand the use cases for [Auto Generation](Use-Cases/Auto-Generation), [Task-oriented AutoML](Use-Cases/Task-Oriented-Automl), [Tune user-defined function](Use-Cases/Tune-User-Defined-Function) and [Zero-shot AutoML](Use-Cases/Zero-Shot-AutoML).
--- a/website/docs/Use-Cases/Auto-Generation.md
+++ b/website/docs/Use-Cases/Auto-Generation.md
@ -124,7 +124,172 @@ If the provided prompt or message is a template, it will be automatically materi
 response = oai.Completion.create(problme=problem, prompt="{problem} Solve the problem carefully.", **config)
 ```

-A template is either a format str, or a function which produces a str from several input fields.
+A template is either a format str, like the example above, or a function which produces a str from several input fields, like the example below.
+
+```python
+def content(turn, **context):
+    return "\n".join(
+        [
+            context[f"user_message_{turn}"],
+            context[f"external_info_{turn}"]
+        ]
+    )
+
+messages = [
+    {
+        "role": "system",
+        "content": "You are a teaching assistant of math.",
+    },
+    {
+        "role": "user",
+        "content": partial(content, turn=0),
+    },
+]
+context = {
+    "user_message_0": "Could you explain the solution to Problem 1?",
+    "external_info_0": "Problem 1: ...",
+}
+
+response = oai.ChatCompletion.create(context, messages=messages, **config)
+messages.append(
+    {
+        "role": "assistant",
+        "content": oai.ChatCompletion.extract_text(response)[0]
+    }
+)
+messages.append(
+    {
+        "role": "user",
+        "content": partial(content, turn=1),
+    },
+)
+context.append(
+    {
+        "user_message_1": "Why can't we apply Theorem 1 to Equation (2)?",
+        "external_info_1": "Theorem 1: ...",
+    }
+)
+response = oai.ChatCompletion.create(context, messages=messages, **config)
+```
+
+### Logging (Experimental)
+
+When debugging or diagnosing an LLM-based system, it is often convenient to log the API calls and analyze them. `flaml.oai.Completion` and `flaml.oai.ChatCompletion` offer an easy way to collect the API call histories. For example, to log the chat histories, simply run:
+```python
+flaml.oai.ChatCompletion.start_logging()
+```
+The API calls made after this will be automatically logged. They can be retrieved at any time by:
+```python
+flaml.oai.ChatCompletion.logged_history
+```
+To stop logging, use
+```python
+flaml.oai.ChatCompletion.stop_logging()
+```
+If one would like to append the history to an existing dict, pass the dict like:
+```python
+flaml.oai.ChatCompletion.start_logging(history_dict=existing_history_dict)
+```
+By default, the counter of API calls will be reset at `start_logging()`. If no reset is desired, set `reset_counter=False`.
+
+There are two types of logging formats: compact logging and individual API call logging. The default format is compact.
+Set `compact=False` in `start_logging()` to switch.
+
+* Example of a history dict with compact logging.
+```python
+{
+    """
+    [
+        {
+            'role': 'system',
+            'content': system_message,
+        },
+        {
+            'role': 'user',
+            'content': user_message_1,
+        },
+        {
+            'role': 'assistant',
+            'content': assistant_message_1,
+        },
+        {
+            'role': 'user',
+            'content': user_message_2,
+        },
+        {
+            'role': 'assistant',
+            'content': assistant_message_2,
+        },
+    ]""": {
+        "created_at": [0, 1],
+        "cost": [0.1, 0.2],
+    }
+}
+```
+
+* Example of a history dict with individual API call logging.
+```python
+{
+    0: {
+        "request": {
+            "messages": [
+                {
+                    "role": "system",
+                    "content": system_message,
+                },
+                {
+                    "role": "user",
+                    "content": user_message_1,
+                }
+            ],
+            ... # other parameters in the request
+        },
+        "response": {
+            "choices": [
+                "messages": {
+                    "role": "assistant",
+                    "content": assistant_message_1,
+                },
+            ],
+            ... # other fields in the response
+        }
+    },
+    1: {
+        "request": {
+            "messages": [
+                {
+                    "role": "system",
+                    "content": system_message,
+                },
+                {
+                    "role": "user",
+                    "content": user_message_1,
+                },
+                {
+                    "role": "assistant",
+                    "content": assistant_message_1,
+                },
+                {
+                    "role": "user",
+                    "content": user_message_2,
+                },
+            ],
+            ... # other parameters in the request
+        },
+        "response": {
+            "choices": [
+                "messages": {
+                    "role": "assistant",
+                    "content": assistant_message_2,
+                },
+            ],
+            ... # other fields in the response
+        }
+    },
+}
+```
+It can be seen that the individual API call history contain redundant information of the conversation. For a long conversation the degree of redundancy is high.
+The compact history is more efficient and the individual API call history contains more details.

 ## Other utilities

--- a/website/src/components/HomepageFeatures.js
+++ b/website/src/components/HomepageFeatures.js
@ -8,9 +8,9 @@ const FeatureList = [
    Svg: require('../../static/img/auto.svg').default,
    description: (
      <>
-        FLAML finds accurate ML models with low computational resources
-        for common ML tasks.
-        It frees users from selecting learners and hyperparameters.
+        FLAML finds accurate models or configurations with low computational resources
+        for common ML/AI tasks.
+        It frees users from selecting models and hyperparameters for training or inference.
        {/* It is fast and economical. */}
      </>
    ),