Agent notebook example with human feedback; Support shell command and multiple code blocks; Improve the system message for assistant agent; Improve utility functions for config lists; reuse docker image (#1056)

* add agent notebook and documentation * fix bug * set flush to True when printing msg in agent * add a math problem in agent notebook * remove * header * improve notebook doc * notebook update * improve notebook example * improve doc * agent notebook example with user feedback * log * log * improve notebook doc * improve print * doc * human_input_mode * human_input_mode str * indent * indent * Update flaml/autogen/agent/user_proxy_agent.py Co-authored-by: Chi Wang <wang.chi@microsoft.com> * shell command and multiple code blocks * Update notebook/autogen_agent.ipynb Co-authored-by: Chi Wang <wang.chi@microsoft.com> * Update notebook/autogen_agent.ipynb Co-authored-by: Chi Wang <wang.chi@microsoft.com> * Update notebook/autogen_agent.ipynb Co-authored-by: Chi Wang <wang.chi@microsoft.com> * coding agent * math notebook * renaming and doc format * typo * infer lang * sh * docker * docker * reset consecutive autoreply counter * fix explanation * paper talk * human feedback * web info * rename test * config list explanation * link to blogpost * installation * homepage features * features * features * rename agent * remove notebook * notebook test * docker command * notebook update * lang -> cmd * notebook * make it work for gpt-3.5 * return full log * quote * docker * docker * docker * docker * docker * docker image list * notebook * notebook * use_docker * use_docker * use_docker * doc * agent * doc * abs path * pandas * docker * reuse docker image * context window * news * print format * pyspark version in py3.8 * pyspark in py3.8 * pyspark and ray * quote * pyspark * pyspark * pyspark --------- Co-authored-by: Qingyun Wu <qingyun.wu@psu.edu>
2023-06-09 11:40:04 -07:00 · 2023-06-09 11:40:04 -07:00 · 5387a0a607
parent d36b2afe7f
commit 5387a0a607
28 changed files with 2228 additions and 1312 deletions
--- a/.github/workflows/python-package.yml
+++ b/.github/workflows/python-package.yml
@ -49,22 +49,21 @@ jobs:
          export CFLAGS="$CFLAGS -I/usr/local/opt/libomp/include"
          export CXXFLAGS="$CXXFLAGS -I/usr/local/opt/libomp/include"
          export LDFLAGS="$LDFLAGS -Wl,-rpath,/usr/local/opt/libomp/lib -L/usr/local/opt/libomp/lib -lomp"
      - name: On Linux + python 3.8, install pyspark 3.2.3
        if: matrix.os == 'ubuntu-latest' && matrix.python-version == '3.8'
        run: |
          python -m pip install --upgrade pip wheel
          pip install pyspark==3.2.3
      - name: Install packages and dependencies
        run: |
          python -m pip install --upgrade pip wheel
          pip install -e .
          python -c "import flaml"
          pip install -e .[test]
      - name: On Ubuntu python 3.8, install pyspark 3.2.3
        if: matrix.python-version == '3.8' && matrix.os == 'ubuntu-latest'
        run: |
          pip install pyspark==3.2.3
          pip list | grep "pyspark"
      - name: If linux, install ray 2
        if: matrix.os == 'ubuntu-latest'
        run: |
-          pip install ray[tune]
+          pip install "ray[tune]<2.5.0"
      - name: If mac, install ray
        if: matrix.os == 'macOS-latest'
        run: |
@ -77,8 +76,8 @@ jobs:
        if: matrix.python-version != '3.10'
        run: |
          pip install -e .[vw]
-      - name: Uninstall pyspark on python 3.9
+      - name: Uninstall pyspark on python 3.8 or 3.9 for windows
-        if: matrix.python-version == '3.9'
+        if: matrix.python-version == '3.8' || matrix.python-version == '3.9' && matrix.os == 'windows-2019'
        run: |
          # Uninstall pyspark to test env without pyspark
          pip uninstall -y pyspark
--- a/README.md
+++ b/README.md
@ -14,7 +14,9 @@
    <br>
 </p>
-:fire: v1.2.0 is released with support for ChatGPT and GPT-4.
+:fire: FLAML is highlighted in OpenAI's [cookbook](https://github.com/openai/openai-cookbook#related-resources-from-around-the-web)
 :fire: [autogen](https://microsoft.github.io/FLAML/docs/Use-Cases/Auto-Generation) is released with support for ChatGPT and GPT-4, based on [Cost-Effective Hyperparameter Optimization for Large Language Model Generation Inference](https://arxiv.org/abs/2303.04673).
 :fire: FLAML supports AutoML and Hyperparameter Tuning features in [Microsoft Fabric](https://learn.microsoft.com/en-us/fabric/get-started/microsoft-fabric-overview) private preview. Sign up for these features at: https://aka.ms/fabric/data-science/sign-up.
 ## What is FLAML
@ -22,10 +24,9 @@ FLAML is a lightweight Python library for efficient automation of machine
 learning and AI operations, including selection of
 models, hyperparameters, and other tunable choices of an application (e.g., inference hyperparameters for foundation models, configurations in MLOps/LMOps workflows, pipelines, mathematical/statistical models, algorithms, computing experiments, software configurations).
-* For foundation models like the GPT series and AI agents based on them, it automates the experimentation and optimization of their performance to maximize the effectiveness for applications and minimize the inference cost.
+* For foundation models like the GPT models, it automates the experimentation and optimization of their performance to maximize the effectiveness for applications and minimize the inference cost. FLAML enables users to build and use adaptive AI agents with minimal effort.
-* For common machine learning tasks like classification and regression, it quickly finds quality models for user-provided data with low computational resources.
+* For common machine learning tasks like classification and regression, it quickly finds quality models for user-provided data with low computational resources. It is easy to customize or extend. Users can find their desired customizability from a smooth range: minimal customization (computational resource budget), medium customization (e.g., search space and metric), or full customization (arbitrary training/inference/evaluation code).
-* It is easy to customize or extend. Users can find their desired customizability from a smooth range: minimal customization (computational resource budget), medium customization (e.g., scikit-style learner, search space and metric), or full customization (arbitrary training/inference/evaluation code).
+* It supports fast and economical automatic tuning, capable of handling complex constraints/guidance/early stopping. FLAML is powered by a [cost-effective
 * It supports fast automatic tuning, capable of handling complex constraints/guidance/early stopping. FLAML is powered by a [cost-effective
 hyperparameter optimization](https://microsoft.github.io/FLAML/docs/Use-Cases/Tune-User-Defined-Function/#hyperparameter-optimization-algorithm)
 and model selection method invented by Microsoft Research, and many followup [research studies](https://microsoft.github.io/FLAML/docs/Research).
@ -42,13 +43,14 @@ FLAML requires **Python version >= 3.7**. It can be installed from pip:
 pip install flaml
 ```
-To run the [`notebook examples`](https://github.com/microsoft/FLAML/tree/main/notebook),
+Minimal dependencies are installed without extra options. You can install extra options based on the feature you need. For example, use the following to install the dependencies needed by the [`autogen`](https://microsoft.github.io/FLAML/docs/Use-Cases/Auto-Generation) package.
 install flaml with the [notebook] option:
 ```bash
-pip install flaml[notebook]
+pip install "flaml[autogen]"
 ```
 Find more options in [Installation](Installation).
 Each of the [`notebook examples`](https://github.com/microsoft/FLAML/tree/main/notebook) may require a specific option to be installed.
 ### .NET
 Use the following guides to get started with FLAML in .NET:
@ -59,25 +61,31 @@ Use the following guides to get started with FLAML in .NET:
 ## Quickstart
-* (New) You can optimize [generations](https://microsoft.github.io/FLAML/docs/Use-Cases/Auto-Generation) by ChatGPT or GPT-4 etc. with your own tuning data, success metrics and budgets.
+* (New) The [autogen](https://microsoft.github.io/FLAML/docs/Use-Cases/Auto-Generation) package can help you maximize the utility out of the expensive LLMs such as ChatGPT and GPT-4, including:
    - A drop-in replacement of `openai.Completion` or `openai.ChatCompletion` with powerful functionalites like tuning, caching, templating, filtering. For example, you can optimize generations by LLM with your own tuning data, success metrics and budgets.
    ```python
    from flaml import oai
-```python
+    # perform tuning
-from flaml import oai
+    config, analysis = oai.Completion.tune(
-
+        data=tune_data,
-config, analysis = oai.Completion.tune(
+        metric="success",
-    data=tune_data,
+        mode="max",
-    metric="success",
+        eval_func=eval_func,
-    mode="max",
+        inference_budget=0.05,
-    eval_func=eval_func,
+        optimization_budget=3,
-    inference_budget=0.05,
+        num_samples=-1,
-    optimization_budget=3,
+    )
    num_samples=-1,
 )
 ```
 The automated experimentation and optimization can help you maximize the utility out of these expensive models.
 A suite of utilities are offered to accelerate the experimentation and application development, such as low-level inference API with caching, templating, filtering, and higher-level components like LLM-based coding and interactive agents.
    # perform inference for a test instance
    response = oai.Completion.create(context=test_instance, **config)
    ```
    - LLM-driven intelligent agents which can perform tasks autonomously or with human feedback, including tasks that require using tools via code.
    ```python
    assistant = AssistantAgent("assistant")
    user = UserProxyAgent("user", human_input_mode="TERMINATE")
    assistant.receive("Draw a rocket and save to a file named 'rocket.svg'")
    ```
 * With three lines of code, you can start using this economical and fast
 AutoML engine as a [scikit-learn style estimator](https://microsoft.github.io/FLAML/docs/Use-Cases/Task-Oriented-AutoML).
--- a/flaml/autogen/agent/init.py
+++ b/flaml/autogen/agent/init.py
@ -0,0 +1,5 @@
 from .agent import Agent
 from .assistant_agent import AssistantAgent
 from .user_proxy_agent import UserProxyAgent
 __all__ = ["Agent", "AssistantAgent", "UserProxyAgent"]
--- a/flaml/autogen/agent/agent.py
+++ b/flaml/autogen/agent/agent.py
@ -37,7 +37,8 @@ class Agent:
    def _receive(self, message, sender):
        """Receive a message from another agent."""
-        print("\n****", self.name, "received message from", sender.name, "****\n", flush=True)
+        print("\n", "-" * 80, "\n", flush=True)
        print(sender.name, "(to", f"{self.name}):", flush=True)
        print(message, flush=True)
        self._conversations[sender.name].append({"content": message, "role": "user"})
--- a/flaml/autogen/agent/assistant_agent.py
+++ b/flaml/autogen/agent/assistant_agent.py
@ -0,0 +1,47 @@
 from .agent import Agent
 from flaml.autogen.code_utils import DEFAULT_MODEL
 from flaml import oai
 class AssistantAgent(Agent):
    """(Experimental) Assistant agent, able to suggest code blocks."""
    DEFAULT_SYSTEM_MESSAGE = """You are a helpful AI assistant.
    In the following cases, suggest python code (in a python coding block) or shell script (in a sh coding block) for the user to execute. You must indicate the script type in the code block.
    1. When you need to ask the user for some info, use the code to output the info you need, for example, browse or search the web, download/read a file.
    2. When you need to perform some task with code, use the code to perform the task and output the result. Finish the task smartly. Solve the task step by step if you need to.
    If you want the user to save the code in a file before executing it, put # filename: <filename> inside the code block as the first line. Don't include multiple code blocks in one response. Do not ask users to copy and paste the result. Instead, use 'print' function for the output when relevant. Check the execution result returned by the user.
    If the result indicates there is an error, fix the error and output the code again. Suggeset the full code instead of partial code or code changes.
    Reply "TERMINATE" in the end when everything is done.
    """
    DEFAULT_CONFIG = {
        "model": DEFAULT_MODEL,
    }
    def __init__(self, name, system_message=DEFAULT_SYSTEM_MESSAGE, **config):
        """
        Args:
            name (str): agent name.
            system_message (str): system message to be sent to the agent.
            **config (dict): other configurations allowed in
              [oai.Completion.create](../oai/Completion#create).
              These configurations will be used when invoking LLM.
        """
        super().__init__(name, system_message)
        self._config = self.DEFAULT_CONFIG.copy()
        self._config.update(config)
        self._sender_dict = {}
    def receive(self, message, sender):
        if sender.name not in self._sender_dict:
            self._sender_dict[sender.name] = sender
            self._conversations[sender.name] = [{"content": self._system_message, "role": "system"}]
        super().receive(message, sender)
        responses = oai.ChatCompletion.create(messages=self._conversations[sender.name], **self._config)
        response = oai.ChatCompletion.extract_text(responses)[0]
        self._send(response, sender)
    def reset(self):
        self._sender_dict.clear()
        self._conversations.clear()
--- a/flaml/autogen/agent/coding_agent.py
+++ b/flaml/autogen/agent/coding_agent.py
@ -1,41 +0,0 @@
 from .agent import Agent
 from flaml.autogen.code_utils import DEFAULT_MODEL
 from flaml import oai
 class PythonAgent(Agent):
    """(Experimental) Suggest code blocks."""
    DEFAULT_SYSTEM_MESSAGE = """You suggest python code (in a python coding block) for a user to execute for a given task. If you want the user to save the code in a file before executing it, put # filename: <filename> inside the code block as the first line. Finish the task smartly. Don't suggest shell command. Don't include multiple code blocks in one response. Use 'print' function for the output when relevant. Check the execution result returned by the user.
    If the result indicates there is an error, fix the error and output the code again.
    Reply "TERMINATE" in the end when the task is done.
    """
    DEFAULT_CONFIG = {
        "model": DEFAULT_MODEL,
    }
    def __init__(self, name, system_message=DEFAULT_SYSTEM_MESSAGE, **config):
        """
        Args:
            name (str): agent name
            system_message (str): system message to be sent to the agent
            config (dict): other configurations.
        """
        super().__init__(name, system_message)
        self._config = self.DEFAULT_CONFIG.copy()
        self._config.update(config)
        self._sender_dict = {}
    def receive(self, message, sender):
        if sender.name not in self._sender_dict:
            self._sender_dict[sender.name] = sender
            self._conversations[sender.name] = [{"content": self._system_message, "role": "system"}]
        super().receive(message, sender)
        responses = oai.ChatCompletion.create(messages=self._conversations[sender.name], **self._config)
        response = oai.ChatCompletion.extract_text(responses)[0]
        self._send(response, sender)
    def reset(self):
        self._sender_dict.clear()
        self._conversations.clear()
--- a/flaml/autogen/agent/user_proxy_agent.py
+++ b/flaml/autogen/agent/user_proxy_agent.py
@ -1,5 +1,5 @@
 from .agent import Agent
-from flaml.autogen.code_utils import extract_code, execute_code
+from flaml.autogen.code_utils import UNKNOWN, extract_code, execute_code, infer_lang
 from collections import defaultdict
@ -54,36 +54,51 @@ class UserProxyAgent(Agent):
        self._consecutive_auto_reply_counter = defaultdict(int)
        self._use_docker = use_docker
-    def _execute_code(self, code, lang):
+    def _execute_code(self, code_blocks):
        """Execute the code and return the result."""
-        if lang in ["bash", "shell"]:
+        logs_all = ""
-            if not code.startswith("python "):
+        for code_block in code_blocks:
-                return 1, f"please do not suggest bash or shell commands like {code}"
+            lang, code = code_block
-            file_name = code[len("python ") :]
+            if not lang:
-            exitcode, logs = execute_code(filename=file_name, work_dir=self._work_dir, use_docker=self._use_docker)
+                lang = infer_lang(code)
-            logs = logs.decode("utf-8")
+            if lang in ["bash", "shell", "sh"]:
-        elif lang == "python":
+                # if code.startswith("python "):
-            if code.startswith("# filename: "):
+                #     # return 1, f"please do not suggest bash or shell commands like {code}"
-                filename = code[11 : code.find("\n")].strip()
+                #     file_name = code[len("python ") :]
                #     exitcode, logs = execute_code(filename=file_name, work_dir=self._work_dir, use_docker=self._use_docker)
                # else:
                exitcode, logs, image = execute_code(
                    code, work_dir=self._work_dir, use_docker=self._use_docker, lang=lang
                )
                logs = logs.decode("utf-8")
            elif lang == "python":
                if code.startswith("# filename: "):
                    filename = code[11 : code.find("\n")].strip()
                else:
                    filename = None
                exitcode, logs, image = execute_code(
                    code, work_dir=self._work_dir, filename=filename, use_docker=self._use_docker
                )
                logs = logs.decode("utf-8")
            else:
-                filename = None
+                # TODO: could this happen?
-            exitcode, logs = execute_code(code, work_dir=self._work_dir, filename=filename, use_docker=self._use_docker)
+                exitcode, logs, image = 1, f"unknown language {lang}"
-            logs = logs.decode("utf-8")
+                # raise NotImplementedError
-        else:
+            self._use_docker = image
-            # TODO: could this happen?
+            logs_all += "\n" + logs
-            exitcode, logs = 1, f"unknown language {lang}"
+            if exitcode != 0:
-            # raise NotImplementedError
+                return exitcode, logs_all
-        return exitcode, logs
+        return exitcode, logs_all
    def auto_reply(self, message, sender, default_reply=""):
        """Generate an auto reply."""
-        code, lang = extract_code(message)
+        code_blocks = extract_code(message)
-        if lang == "unknown":
+        if len(code_blocks) == 1 and code_blocks[0][0] == UNKNOWN:
-            # no code block is found, lang should be "unknown"
+            # no code block is found, lang should be `UNKNOWN``
            self._send(default_reply, sender)
        else:
            # try to execute the code
-            exitcode, logs = self._execute_code(code, lang)
+            exitcode, logs = self._execute_code(code_blocks)
            exitcode2str = "execution succeeded" if exitcode == 0 else "execution failed"
            self._send(f"exitcode: {exitcode} ({exitcode2str})\nCode output: {logs}", sender)
@ -111,8 +126,10 @@ class UserProxyAgent(Agent):
                # this corresponds to the case when self._human_input_mode == "NEVER"
                reply = "exit"
        if reply == "exit" or (self._is_termination_msg(message) and not reply):
            # reset the consecutive_auto_reply_counter
            self._consecutive_auto_reply_counter[sender.name] = 0
            return
-        elif reply:
+        if reply:
            # reset the consecutive_auto_reply_counter
            self._consecutive_auto_reply_counter[sender.name] = 0
            self._send(reply, sender)
--- a/flaml/autogen/code_utils.py
+++ b/flaml/autogen/code_utils.py
@ -12,16 +12,36 @@ from flaml.autogen import oai, DEFAULT_MODEL, FAST_MODEL
 # Regular expression for finding a code block
 CODE_BLOCK_PATTERN = r"```(\w*)\n(.*?)\n```"
 WORKING_DIR = os.path.join(os.path.dirname(os.path.realpath(__file__)), "extensions")
 UNKNOWN = "unknown"
-def extract_code(text: str, pattern: str = CODE_BLOCK_PATTERN) -> str:
+def infer_lang(code):
-    # Use a regular expression to find the code block
+    """infer the language for the code.
-    match = re.search(pattern, text, flags=re.DOTALL)
+    TODO: make it robust.
    """
    if code.startswith("python ") or code.startswith("pip"):
        return "sh"
    return "python"
 def extract_code(text: str, pattern: str = CODE_BLOCK_PATTERN) -> List[Tuple[str, str]]:
    """Extract code from a text.
    Args:
        text (str): The text to extract code from.
        pattern (Optional, str): The regular expression pattern for finding the code block.
    Returns:
        list: A list of tuples, each containing the language and the code.
    """
    # Use a regular expression to find all the code blocks
    match = re.findall(pattern, text, flags=re.DOTALL)
    # match = re.search(pattern, text, flags=re.DOTALL)
    # If a match is found, return the code
-    if match:
+    # if match:
-        return match.group(2), match.group(1)
+    #     return match.group(2), match.group(1)
    # If no code block is found, return the whole text
-    return text, "unknown"
+    return match if match else [(UNKNOWN, text)]
 def generate_code(pattern: str = CODE_BLOCK_PATTERN, **config) -> Tuple[str, float]:
@ -102,13 +122,22 @@ def timeout_handler(signum, frame):
    raise TimeoutError("Timed out!")
 def _cmd(lang):
    if lang.startswith("python") or lang in ["bash", "sh"]:
        return lang
    if lang == "shell":
        return "sh"
    raise NotImplementedError(f"{lang} not recognized in code execution")
 def execute_code(
    code: Optional[str] = None,
    timeout: Optional[int] = 600,
    filename: Optional[str] = None,
    work_dir: Optional[str] = None,
-    use_docker: Optional[bool] = True,
+    use_docker: Optional[Union[List[str], str, bool]] = True,
-) -> Tuple[int, bytes]:
+    lang: Optional[str] = "python",
 ) -> Tuple[int, bytes, str]:
    """Execute code in a docker container.
    This function is not tested on MacOS.
@ -125,15 +154,19 @@ def execute_code(
            If None, a default working directory will be used.
            The default working directory is the "extensions" directory under
            "xxx/flaml/autogen", where "xxx" is the path to the flaml package.
-        use_docker (Optional, bool): Whether to use a docker container for code execution.
+        use_docker (Optional, list, str or bool): The docker image to use for code execution.
-            If True, the code will be executed in a docker container.
+            If a list or a str of image name(s) is provided, the code will be executed in a docker container
-            If False, the code will be executed in the current environment.
+              with the first image successfully pulled.
-            Default is True. If the code is executed in the current environment,
+            If None, False or empty, the code will be executed in the current environment.
            Default is True, which will be converted into a list.
            If the code is executed in the current environment,
            the code must be trusted.
        lang (Optional, str): The language of the code. Default is "python".
    Returns:
        int: 0 if the code executes successfully.
        bytes: The error message if the code fails to execute; the stdout otherwise.
        image: The docker image name after container run when docker is used.
    """
    assert code is not None or filename is not None, "Either code or filename must be provided."
@ -141,7 +174,7 @@ def execute_code(
    if filename is None:
        code_hash = md5(code.encode()).hexdigest()
        # create a file with a automatically generated name
-        filename = f"tmp_code_{code_hash}.py"
+        filename = f"tmp_code_{code_hash}.{'py' if lang.startswith('python') else lang}"
    if work_dir is None:
        work_dir = WORKING_DIR
    filepath = os.path.join(work_dir, filename)
@ -155,12 +188,13 @@ def execute_code(
    in_docker_container = os.path.exists("/.dockerenv")
    if not use_docker or in_docker_container:
        # already running in a docker container
        cmd = [sys.executable if lang.startswith("python") else _cmd(lang), filename]
        signal.signal(signal.SIGALRM, timeout_handler)
        try:
            signal.alarm(timeout)
            # run the code in a subprocess in the current docker container in the working directory
            result = subprocess.run(
-                [sys.executable, filename],
+                cmd,
                cwd=work_dir,
                capture_output=True,
            )
@ -168,17 +202,22 @@ def execute_code(
        except TimeoutError:
            if original_filename is None:
                os.remove(filepath)
-            return 1, "Timeout"
+            return 1, "Timeout", None
        if original_filename is None:
            os.remove(filepath)
-        return result.returncode, result.stderr if result.returncode else result.stdout
+        return result.returncode, result.stderr if result.returncode else result.stdout, None
    import docker
    from requests.exceptions import ReadTimeout, ConnectionError
    # create a docker client
    client = docker.from_env()
-    image_list = ["python:3-alpine", "python:3", "python:3-windowsservercore"]
+    image_list = (
        ["python:3-alpine", "python:3", "python:3-windowsservercore"]
        if use_docker is True
        else [use_docker]
        if isinstance(use_docker, str)
        else use_docker
    )
    for image in image_list:
        # check if the image exists
        try:
@ -198,14 +237,15 @@ def execute_code(
    # if sys.platform == "win32":
    #     abs_path = str(abs_path).replace("\\", "/")
    #     abs_path = f"/{abs_path[0].lower()}{abs_path[2:]}"
    cmd = [
        "sh",
        "-c",
        f"{_cmd(lang)} {filename}; exit_code=$?; echo -n {exit_code_str}; echo -n $exit_code; echo {exit_code_str}",
    ]
    # create a docker container
    container = client.containers.run(
        image,
-        command=[
+        command=cmd,
            "sh",
            "-c",
            f"python {filename}; exit_code=$?; echo -n {exit_code_str}; echo -n $exit_code; echo {exit_code_str}",
        ],
        working_dir="/workspace",
        detach=True,
        # get absolute path to the working directory
@ -220,7 +260,7 @@ def execute_code(
        container.remove()
        if original_filename is None:
            os.remove(filepath)
-        return 1, "Timeout"
+        return 1, "Timeout", image
    # try:
    #     container.wait(timeout=timeout)
    # except (ReadTimeout, ConnectionError):
@ -231,6 +271,8 @@ def execute_code(
    #     return 1, "Timeout"
    # get the container logs
    logs = container.logs().decode("utf-8").rstrip()
    # commit the image
    container.commit(repository="python", tag=filename.replace("/", ""))
    # remove the container
    container.remove()
    # check if the code executed successfully
@ -246,8 +288,8 @@ def execute_code(
    logs = bytes(logs, "utf-8")
    if original_filename is None:
        os.remove(filepath)
-    # return the exit code and logs
+    # return the exit code, logs and image
-    return exit_code, logs
+    return exit_code, logs, f"python:{filename}"
 _GENERATE_ASSERTIONS_CONFIG = {
--- a/flaml/autogen/oai/init.py
+++ b/flaml/autogen/oai/init.py
@ -1,4 +1,16 @@
 from flaml.autogen.oai.completion import Completion, ChatCompletion
-from flaml.autogen.oai.openai_utils import get_config_list, config_list_gpt4_gpt35, config_list_openai_aoai
+from flaml.autogen.oai.openai_utils import (
    get_config_list,
    config_list_gpt4_gpt35,
    config_list_openai_aoai,
    config_list_from_models,
 )
-__all__ = ["Completion", "ChatCompletion", "get_config_list", "config_list_gpt4_gpt35", "config_list_openai_aoai"]
+__all__ = [
    "Completion",
    "ChatCompletion",
    "get_config_list",
    "config_list_gpt4_gpt35",
    "config_list_openai_aoai",
    "config_list_from_models",
 ]
--- a/flaml/autogen/oai/openai_utils.py
+++ b/flaml/autogen/oai/openai_utils.py
@ -59,6 +59,7 @@ def config_list_openai_aoai(
    openai_api_key_file: Optional[str] = "key_openai.txt",
    aoai_api_key_file: Optional[str] = "key_aoai.txt",
    aoai_api_base_file: Optional[str] = "base_aoai.txt",
    exclude: Optional[str] = None,
 ) -> List[Dict]:
    """Get a list of configs for openai + azure openai api calls.
@ -67,57 +68,103 @@ def config_list_openai_aoai(
        openai_api_key_file (str, optional): The file name of the openai api key.
        aoai_api_key_file (str, optional): The file name of the azure openai api key.
        aoai_api_base_file (str, optional): The file name of the azure openai api base.
        exclude (str, optional): The api type to exclude, "openai" or "aoai".
    Returns:
        list: A list of configs for openai api calls.
    """
-    if "OPENAI_API_KEY" not in os.environ:
+    if "OPENAI_API_KEY" not in os.environ and exclude != "openai":
        try:
-            os.environ["OPENAI_API_KEY"] = open(f"{key_file_path}/{openai_api_key_file}").read().strip()
+            with open(f"{key_file_path}/{openai_api_key_file}") as key_file:
                os.environ["OPENAI_API_KEY"] = key_file.read().strip()
        except FileNotFoundError:
            logging.info(
                "To use OpenAI API, please set OPENAI_API_KEY in os.environ "
                "or create key_openai.txt in the specified path, or specify the api_key in config_list."
            )
-    if "AZURE_OPENAI_API_KEY" not in os.environ:
+    if "AZURE_OPENAI_API_KEY" not in os.environ and exclude != "aoai":
        try:
-            os.environ["AZURE_OPENAI_API_KEY"] = open(f"{key_file_path}/{aoai_api_key_file}").read().strip()
+            with open(f"{key_file_path}/{aoai_api_key_file}") as key_file:
                os.environ["AZURE_OPENAI_API_KEY"] = key_file.read().strip()
        except FileNotFoundError:
            logging.info(
                "To use Azure OpenAI API, please set AZURE_OPENAI_API_KEY in os.environ "
                "or create key_aoai.txt in the specified path, or specify the api_key in config_list."
            )
-    if "AZURE_OPENAI_API_BASE" not in os.environ:
+    if "AZURE_OPENAI_API_BASE" not in os.environ and exclude != "aoai":
        try:
-            os.environ["AZURE_OPENAI_API_BASE"] = open(f"{key_file_path}/{aoai_api_base_file}").read().strip()
+            with open(f"{key_file_path}/{aoai_api_base_file}") as key_file:
                os.environ["AZURE_OPENAI_API_BASE"] = key_file.read().strip()
        except FileNotFoundError:
            logging.info(
                "To use Azure OpenAI API, please set AZURE_OPENAI_API_BASE in os.environ "
                "or create base_aoai.txt in the specified path, or specify the api_base in config_list."
            )
-    aoai_config = get_config_list(
+    aoai_config = (
-        # Assuming Azure OpenAI api keys in os.environ["AZURE_OPENAI_API_KEY"], in separated lines
+        get_config_list(
-        api_keys=os.environ.get("AZURE_OPENAI_API_KEY", "").split("\n"),
+            # Assuming Azure OpenAI api keys in os.environ["AZURE_OPENAI_API_KEY"], in separated lines
-        # Assuming Azure OpenAI api bases in os.environ["AZURE_OPENAI_API_BASE"], in separated lines
+            api_keys=os.environ.get("AZURE_OPENAI_API_KEY", "").split("\n"),
-        api_bases=os.environ.get("AZURE_OPENAI_API_BASE", "").split("\n"),
+            # Assuming Azure OpenAI api bases in os.environ["AZURE_OPENAI_API_BASE"], in separated lines
-        api_type="azure",
+            api_bases=os.environ.get("AZURE_OPENAI_API_BASE", "").split("\n"),
-        api_version="2023-03-15-preview",  # change if necessary
+            api_type="azure",
            api_version="2023-03-15-preview",  # change if necessary
        )
        if exclude != "aoai"
        else []
    )
-    openai_config = get_config_list(
+    openai_config = (
-        # Assuming OpenAI API_KEY in os.environ["OPENAI_API_KEY"]
+        get_config_list(
-        api_keys=os.environ.get("OPENAI_API_KEY", "").split("\n"),
+            # Assuming OpenAI API_KEY in os.environ["OPENAI_API_KEY"]
-        # "api_type": "open_ai",
+            api_keys=os.environ.get("OPENAI_API_KEY", "").split("\n"),
-        # "api_base": "https://api.openai.com/v1",
+            # "api_type": "open_ai",
            # "api_base": "https://api.openai.com/v1",
        )
        if exclude != "openai"
        else []
    )
    config_list = openai_config + aoai_config
    return config_list
 def config_list_from_models(
    key_file_path: Optional[str] = ".",
    openai_api_key_file: Optional[str] = "key_openai.txt",
    aoai_api_key_file: Optional[str] = "key_aoai.txt",
    aoai_api_base_file: Optional[str] = "base_aoai.txt",
    exclude: Optional[str] = None,
    model_list: Optional[list] = None,
 ) -> List[Dict]:
    """Get a list of configs for api calls with models in the model list.
    Args:
        key_file_path (str, optional): The path to the key files.
        openai_api_key_file (str, optional): The file name of the openai api key.
        aoai_api_key_file (str, optional): The file name of the azure openai api key.
        aoai_api_base_file (str, optional): The file name of the azure openai api base.
        exclude (str, optional): The api type to exclude, "openai" or "aoai".
        model_list (list, optional): The model list.
    Returns:
        list: A list of configs for openai api calls.
    """
    config_list = config_list_openai_aoai(
        key_file_path,
        openai_api_key_file,
        aoai_api_key_file,
        aoai_api_base_file,
        exclude,
    )
    if model_list:
        config_list = [{**config, "model": model} for config in config_list for model in model_list]
    return config_list
 def config_list_gpt4_gpt35(
    key_file_path: Optional[str] = ".",
    openai_api_key_file: Optional[str] = "key_openai.txt",
    aoai_api_key_file: Optional[str] = "key_aoai.txt",
    aoai_api_base_file: Optional[str] = "base_aoai.txt",
    exclude: Optional[str] = None,
 ) -> List[Dict]:
    """Get a list of configs for gpt-4 followed by gpt-3.5 api calls.
@ -126,17 +173,16 @@ def config_list_gpt4_gpt35(
        openai_api_key_file (str, optional): The file name of the openai api key.
        aoai_api_key_file (str, optional): The file name of the azure openai api key.
        aoai_api_base_file (str, optional): The file name of the azure openai api base.
        exclude (str, optional): The api type to exclude, "openai" or "aoai".
    Returns:
        list: A list of configs for openai api calls.
    """
-
+    return config_list_from_models(
    config_list = config_list_openai_aoai(
        key_file_path,
        openai_api_key_file,
        aoai_api_key_file,
        aoai_api_base_file,
        exclude,
        model_list=["gpt-4", "gpt-3.5-turbo"],
    )
    return [{**config, "model": "gpt-4"} for config in config_list] + [
        {**config, "model": "gpt-3.5-turbo"} for config in config_list
    ]
--- a/notebook/autogen_agent_auto_feedback_from_code_execution.ipynb
+++ b/notebook/autogen_agent_auto_feedback_from_code_execution.ipynb
@ -1,5 +1,13 @@
 {
 "cells": [
  {
   "attachments": {},
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "<a href=\"https://colab.research.google.com/github/microsoft/FLAML/blob/main/notebook/autogen_agent_auto_feedback_from_code_execution.ipynb\" target=\"_parent\"><img src=\"https://colab.research.google.com/assets/colab-badge.svg\" alt=\"Open In Colab\"/></a>"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
@ -9,11 +17,11 @@
    }
   },
   "source": [
-    "# Interactive LLM Agent\n",
+    "# Interactive LLM Agent with Auto Feedback from Code Execution\n",
    "\n",
-    "FLAML offers an experimental feature of interactive LLM agents, which can be used to solve various tasks, including coding and math problem-solving.\n",
+    "FLAML offers an experimental feature of interactive LLM agents, which can be used to solve various tasks with human or automatic feedback, including tasks that require using tools via code.\n",
    "\n",
-    "In this notebook, we demonstrate how to use `PythonAgent` and `UserProxyAgent` to write code and execute the code. Here `PythonAgent` is an LLM-based agent that can write Python code (in a Python coding block) for a user to execute for a given task. `UserProxyAgent` is an agent which serves as a proxy for the human user to execute the code written by `PythonAgent`, or automatically execute the code. Depending on the setting of `user_interaction_mode` and `max_consecutive_auto_reply`, the `UserProxyAgent` either solicits feedback from the human user or uses auto-feedback based on the result of code execution. For example, when `user_interaction_mode` is set to \"ALWAYS\", the `UserProxyAgent` will always prompt the user for feedback. When user feedback is provided, the `UserProxyAgent` will directly pass the feedback to `PythonAgent` without doing any additional steps. When no user feedback is provided, the `UserProxyAgent` will execute the code written by `PythonAgent` directly and return the execution results (success or failure and corresponding outputs) to `PythonAgent`.\n",
+    "In this notebook, we demonstrate how to use `AssistantAgent` and `UserProxyAgent` to write code and execute the code. Here `AssistantAgent` is an LLM-based agent that can write Python code (in a Python coding block) for a user to execute for a given task. `UserProxyAgent` is an agent which serves as a proxy for the human user to execute the code written by `AssistantAgent`, or automatically execute the code. Depending on the setting of `human_input_mode` and `max_consecutive_auto_reply`, the `UserProxyAgent` either solicits feedback from the human user or uses auto-feedback based on the result of code execution. For example, when `human_input_mode` is set to \"ALWAYS\", the `UserProxyAgent` will always prompt the user for feedback. When user feedback is provided, the `UserProxyAgent` will directly pass the feedback to `AssistantAgent` without doing any additional steps. When no user feedback is provided, the `UserProxyAgent` will execute the code written by `AssistantAgent` directly and return the execution results (success or failure and corresponding outputs) to `AssistantAgent`.\n",
    "\n",
    "## Requirements\n",
    "\n",
@ -25,7 +33,7 @@
  },
  {
   "cell_type": "code",
-   "execution_count": null,
+   "execution_count": 1,
   "metadata": {
    "execution": {
     "iopub.execute_input": "2023-02-13T23:40:52.317406Z",
@ -52,18 +60,18 @@
    "- Azure OpenAI API key: os.environ[\"AZURE_OPENAI_API_KEY\"] or `aoai_api_key_file=\"key_aoai.txt\"`. Multiple keys can be stored, one per line.\n",
    "- Azure OpenAI API base: os.environ[\"AZURE_OPENAI_API_BASE\"] or `aoai_api_base_file=\"base_aoai.txt\"`. Multiple bases can be stored, one per line.\n",
    "\n",
-    "It's OK to have only the OpenAI API key, or only the Azure Open API key + base.\n"
+    "It's OK to have only the OpenAI API key, or only the Azure OpenAI API key + base.\n"
   ]
  },
  {
   "cell_type": "code",
-   "execution_count": 1,
+   "execution_count": 2,
   "metadata": {},
   "outputs": [],
   "source": [
    "from flaml import oai\n",
    "\n",
-    "config_list = oai.config_list_gpt4_gpt35()"
+    "config_list = oai.config_list_from_models(model_list=[\"gpt-4\"])"
   ]
  },
  {
@ -71,14 +79,46 @@
   "cell_type": "markdown",
   "metadata": {},
   "source": [
-    "## Example Task: Create and execute a python script with agents\n",
+    "The config list looks like the following:\n",
    "```python\n",
    "config_list = [\n",
    "    {\n",
    "        'model': 'gpt-4',\n",
    "        'api_key': '<your OpenAI API key here>',\n",
    "    },  # only if OpenAI API key is found\n",
    "    {\n",
    "        'model': 'gpt-4',\n",
    "        'api_key': '<your first Azure OpenAI API key here>',\n",
    "        'api_base': '<your first Azure OpenAI API base here>',\n",
    "        'api_type': 'azure',\n",
    "        'api_version': '2023-03-15-preview',\n",
    "    },  # only if the at least one Azure OpenAI API key is found\n",
    "    {\n",
    "        'model': 'gpt-4',\n",
    "        'api_key': '<your second Azure OpenAI API key here>',\n",
    "        'api_base': '<your second Azure OpenAI API base here>',\n",
    "        'api_type': 'azure',\n",
    "        'api_version': '2023-03-15-preview',\n",
    "    },  # only if the second Azure OpenAI API key is found\n",
    "]\n",
    "```\n",
    "\n",
-    "In the example below, let's see how to use the agents in FLAML to write a python script and execute the script. This process involves constructing a `PythonAgent` to serve as the assistant, along with a `UserProxyAgent` that acts as a proxy for the human user. In this example demonstrated below, when constructing the `UserProxyAgent`,  we select the `human_input_mode` to \"NEVER\". This means that the `UserProxyAgent` will not solicit feedback from the human user until the limit defined by `max_consecutive_auto_reply` is reached. For the purpose of this example, we've set this limit to 10."
+    "You can directly override it if the above function returns an empty list, i.e., it doesn't find the keys in the specified locations."
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Example Task: Write Code to Draw a Plot\n",
    "\n",
    "In the example below, let's see how to use the agents in FLAML to write a python script and execute the script. This process involves constructing a `AssistantAgent` to serve as the assistant, along with a `UserProxyAgent` that acts as a proxy for the human user. In this example demonstrated below, when constructing the `UserProxyAgent`,  we select the `human_input_mode` to \"NEVER\". This means that the `UserProxyAgent` will not solicit feedback from the human user. It stops replying when the limit defined by `max_consecutive_auto_reply` is reached, or when `is_termination_msg()` returns true for the received message."
   ]
  },
  {
   "cell_type": "code",
-   "execution_count": 6,
+   "execution_count": 3,
   "metadata": {},
   "outputs": [
    {
@ -86,172 +126,84 @@
     "output_type": "stream",
     "text": [
      "\n",
-      "**** coding_agent received message from user ****\n",
+      " -------------------------------------------------------------------------------- \n",
      "\n",
-      "Create and execute a script to plot a rocket without using matplotlib\n",
+      "user (to assistant):\n",
      "Draw a rocket and save to a file named 'rocket.svg'\n",
      "\n",
-      "**** user received message from coding_agent ****\n",
+      " -------------------------------------------------------------------------------- \n",
      "\n",
      "Creating a rocket involves using ASCII characters to display it visually. Here's a simple script to get you started:\n",
      "\n",
      "assistant (to user):\n",
      "```python\n",
-      "# filename: rocket.py\n",
+      "# filename: draw_rocket.py\n",
-      "def plot_rocket():\n",
+      "import svgwrite\n",
      "    rocket = '''\n",
      "          |\n",
      "         /_\\\n",
      "        /^|^\\ \n",
      "        //| \\\\\n",
      "       // | \\\\\n",
      "    '''\n",
      "    print(rocket)\n",
      "\n",
-      "if __name__ == \"__main__\":\n",
+      "def draw_rocket():\n",
-      "    plot_rocket()\n",
+      "    dwg = svgwrite.Drawing('rocket.svg', profile='tiny')\n",
      "\n",
      "    rocket_body_color = \"gray\"\n",
      "    rocket_fire_color = \"red\"\n",
      "\n",
      "    # Draw the rocket body\n",
      "    dwg.add(dwg.rect(insert=(50, 20), size=(50, 100), stroke=rocket_body_color, fill=rocket_body_color))\n",
      "    \n",
      "    # Draw the rocket top\n",
      "    dwg.add(dwg.polygon(points=[(75, 0), (50, 20), (100, 20)], stroke=rocket_body_color, fill=rocket_body_color))\n",
      "    \n",
      "    # Draw the fins\n",
      "    dwg.add(dwg.polygon(points=[(50, 60), (30, 80), (50, 100)], stroke=rocket_body_color, fill=rocket_body_color))\n",
      "    dwg.add(dwg.polygon(points=[(100, 60), (120, 80), (100, 100)], stroke=rocket_body_color, fill=rocket_body_color))\n",
      "    \n",
      "    # Draw the rocket fire/flame\n",
      "    dwg.add(dwg.polygon(points=[(50, 120), (75, 160), (100, 120)], stroke=rocket_fire_color, fill=rocket_fire_color))\n",
      "\n",
      "    dwg.save()\n",
      "\n",
      "if __name__ == '__main__':\n",
      "    draw_rocket()\n",
      "```\n",
      "\n",
-      "Save this code in a file named `rocket.py` and execute the script. After that, let me know the results.\n",
+      "1. Save the above Python code in a file named \"draw_rocket.py\".\n",
      "2. Execute the script by running `python draw_rocket.py` in your terminal or command prompt.\n",
      "3. The script will create a file named 'rocket.svg' in the same directory containing the rocket drawing.\n",
      "\n",
-      " >>>>>>>> NO HUMAN INPUT RECEIVED. USING AUTO REPLY FOR THE USER...\n",
+      ">>>>>>>> NO HUMAN INPUT RECEIVED. USING AUTO REPLY FOR THE USER...\n",
      "\n",
-      "**** coding_agent received message from user ****\n",
+      " -------------------------------------------------------------------------------- \n",
      "\n",
      "user (to assistant):\n",
      "exitcode: 0 (execution succeeded)\n",
      "Code output: \n",
      "          |\n",
      "         /_        /^|^\\ \n",
      "        //| \\\n",
      "       // | \\\n",
      "    \n",
      "\n",
      "\n",
-      "**** user received message from coding_agent ****\n",
+      " -------------------------------------------------------------------------------- \n",
      "\n",
-      "It seems that there's an issue with the spacing in the rocket drawing. Let's fix that.\n",
+      "assistant (to user):\n",
      "Great! The script executed successfully and the rocket drawing has been saved to a file named 'rocket.svg'. You can now open the 'rocket.svg' file to view the rocket illustration. \n",
      "\n",
-      "```python\n",
+      "If you have any further questions or need additional assistance, feel free to ask.\n",
      "# filename: rocket.py\n",
      "def plot_rocket():\n",
      "    rocket = '''\n",
      "          |\n",
      "         /_\\\\\n",
      "        /^|^\\\\\n",
      "        //|\\\\\\\\\n",
      "       // | \\\\\\\\\n",
      "    '''\n",
      "    print(rocket)\n",
      "\n",
      "if __name__ == \"__main__\":\n",
      "    plot_rocket()\n",
      "```\n",
      "\n",
      "Save this code in the existing `rocket.py` file and execute the script again. After that, let me know the results.\n",
      "\n",
      " >>>>>>>> NO HUMAN INPUT RECEIVED. USING AUTO REPLY FOR THE USER...\n",
      "\n",
      "**** coding_agent received message from user ****\n",
      "\n",
      "exitcode: 0 (execution succeeded)\n",
      "Code output: \n",
      "          |\n",
      "         /_\\\n",
      "        /^|^\\\n",
      "        //|\\\\\n",
      "       // | \\\\\n",
      "    \n",
      "\n",
      "\n",
      "**** user received message from coding_agent ****\n",
      "\n",
      "Great! The rocket is now displayed correctly. If you have any more tasks or modifications, please let me know. Otherwise, I hope you enjoyed the ASCII rocket!\n",
      "\n",
      "TERMINATE\n"
     ]
    }
   ],
   "source": [
-    "from flaml.autogen.agent.coding_agent import PythonAgent\n",
+    "from flaml.autogen.agent import AssistantAgent, UserProxyAgent\n",
    "from flaml.autogen.agent.user_proxy_agent import UserProxyAgent\n",
    "\n",
-    "# create an assistant which is essentially a PythonAgent instance named \"coding_agent\"\n",
+    "# create an AssistantAgent named \"assistant\"\n",
-    "assistant = PythonAgent(\"coding_agent\", request_timeout=600, seed=42, config_list=config_list)\n",
+    "assistant = AssistantAgent(\"assistant\", request_timeout=600, seed=42, config_list=config_list)\n",
    "# create a UserProxyAgent instance named \"user\"\n",
    "user = UserProxyAgent(\n",
    "    \"user\",\n",
    "    human_input_mode=\"NEVER\",\n",
    "    max_consecutive_auto_reply=10,\n",
-    "    is_termination_msg=lambda x: x.rstrip().endswith(\"TERMINATE\"),\n",
+    "    is_termination_msg=lambda x: x.rstrip().endswith(\"TERMINATE\") or x.rstrip().endswith('\"TERMINATE\".'),\n",
    "    work_dir=\"coding\",\n",
    "    use_docker=False,  # set to True if you are using docker\n",
    ")\n",
    "# the assistant receives a message from the user, which contains the task description\n",
    "assistant.receive(\n",
-    "    \"\"\"Create and execute a script to plot a rocket without using matplotlib\"\"\",\n",
+    "    \"\"\"Draw a rocket and save to a file named 'rocket.svg'\"\"\",\n",
    "    user,\n",
    ")"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Next, let's see how to use the agents to first write the generated script to a file and then execute the script in two sessions of conversation between the `PythonAgent` and the `UserProxyAgent`."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 7,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "\n",
      "**** coding_agent received message from user ****\n",
      "\n",
      "Create a temp.py file with the following content:\n",
      "```\n",
      "print('Hello world!')\n",
      "```\n",
      "\n",
      "**** user received message from coding_agent ****\n",
      "\n",
      "Here is the code to create the temp.py file with the specified content. Please execute this code:\n",
      "\n",
      "```python\n",
      "with open('temp.py', 'w') as file:\n",
      "    file.write(\"print('Hello world!')\")\n",
      "```\n",
      "\n",
      "After executing this code, you should have a file named temp.py with the content:\n",
      "\n",
      "```\n",
      "print('Hello world!')\n",
      "```\n",
      "\n",
      " >>>>>>>> NO HUMAN INPUT RECEIVED. USING AUTO REPLY FOR THE USER...\n",
      "\n",
      "**** coding_agent received message from user ****\n",
      "\n",
      "exitcode: 0 (execution succeeded)\n",
      "Code output: \n",
      "\n",
      "**** user received message from coding_agent ****\n",
      "\n",
      "Great! The temp.py file has been created successfully. Now, you can run this file to see the output. If you need any further assistance, feel free to ask.\n",
      "\n",
      "TERMINATE\n"
     ]
    }
   ],
   "source": [
    "# it is suggested to reset the assistant to clear the state if the new task is not related to the previous one.\n",
    "assistant.reset()\n",
    "assistant.receive(\n",
    "    \"\"\"Create a temp.py file with the following content:\n",
    "    ```\n",
    "    print('Hello world!')\n",
    "    ```\"\"\",\n",
    "    user,\n",
    ")"
   ]
@ -263,12 +215,45 @@
   "source": [
    "The example above involves code execution. In FLAML, code execution is triggered automatically by the `UserProxyAgent` when it detects an executable code block in a received message and no human user input is provided. This process occurs in a designated working directory, using a Docker container by default. Unless a specific directory is specified, FLAML defaults to the `flaml/autogen/extensions` directory. Users have the option to specify a different working directory by setting the `work_dir` argument when constructing a new instance of the `UserProxyAgent`.\n",
    "\n",
-    "Upon successful execution of the preceding code block, a file named `temp.py` will be created and saved in the default working directory `flaml/autogen/extensions`. Now, let's prompt the assistant to execute the code contained within this file using the following line of code."
+    "Let's display the generated figure."
   ]
  },
  {
   "cell_type": "code",
-   "execution_count": 8,
+   "execution_count": 4,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "image/svg+xml": [
       "<svg xmlns=\"http://www.w3.org/2000/svg\" xmlns:ev=\"http://www.w3.org/2001/xml-events\" xmlns:xlink=\"http://www.w3.org/1999/xlink\" baseProfile=\"tiny\" height=\"100%\" version=\"1.2\" width=\"100%\"><defs/><rect fill=\"gray\" height=\"100\" stroke=\"gray\" width=\"50\" x=\"50\" y=\"20\"/><polygon fill=\"gray\" points=\"75,0 50,20 100,20\" stroke=\"gray\"/><polygon fill=\"gray\" points=\"50,60 30,80 50,100\" stroke=\"gray\"/><polygon fill=\"gray\" points=\"100,60 120,80 100,100\" stroke=\"gray\"/><polygon fill=\"red\" points=\"50,120 75,160 100,120\" stroke=\"red\"/></svg>"
      ],
      "text/plain": [
       "<IPython.core.display.SVG object>"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    }
   ],
   "source": [
    "# uncomment the following to render the svg file\n",
    "# from IPython.display import SVG, display\n",
    "\n",
    "# display(SVG(\"coding/rocket.svg\"))"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Example Task: Use Code to Check Stock Price Change"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "metadata": {},
   "outputs": [
    {
@ -276,42 +261,190 @@
     "output_type": "stream",
     "text": [
      "\n",
-      "**** coding_agent received message from user ****\n",
+      " -------------------------------------------------------------------------------- \n",
      "\n",
-      "Execute temp.py\n",
+      "user (to assistant):\n",
      "What date is today? Which big tech stock has the largest year-to-date gain this year? How much is the gain?\n",
      "\n",
-      "**** user received message from coding_agent ****\n",
+      " -------------------------------------------------------------------------------- \n",
      "\n",
-      "To execute temp.py, run the following code:\n",
+      "assistant (to user):\n",
      "First, let's find out today's date. You can do that by running the following Python code:\n",
      "\n",
      "```python\n",
-      "import os\n",
+      "from datetime import datetime\n",
      "\n",
-      "os.system('python temp.py')\n",
+      "today = datetime.now()\n",
      "print(\"Today's date:\", today.strftime(\"%B %d, %Y\"))\n",
      "```\n",
      "\n",
-      "This code imports the os module and then runs the temp.py file. After executing this code, you should see the output:\n",
+      "For the tech stock information, you will need to use an API or web scraping to fetch this data. I will show you how to do it using the `yfinance` library in Python. Before running the code, make sure you have the `yfinance` library installed by executing the following command:\n",
      "\n",
-      "Hello world!\n",
+      "```sh\n",
      "pip install yfinance\n",
      "```\n",
      "\n",
-      " >>>>>>>> NO HUMAN INPUT RECEIVED. USING AUTO REPLY FOR THE USER...\n",
+      "Now, we can fetch the stock information using the following Python code:\n",
      "\n",
-      "**** coding_agent received message from user ****\n",
+      "```python\n",
      "import yfinance as yf\n",
      "\n",
      "def get_stock_info(ticker):\n",
      "    stock = yf.Ticker(ticker)\n",
      "    stock_info = stock.history(\"ytd\")\n",
      "    current_price = stock_info[\"Close\"][-1]\n",
      "    start_price = stock_info[\"Close\"][0]\n",
      "    return (current_price - start_price) / start_price * 100\n",
      "\n",
      "tech_stocks = {\n",
      "    \"Apple\": \"AAPL\",\n",
      "    \"Microsoft\": \"MSFT\",\n",
      "    \"Amazon\": \"AMZN\",\n",
      "    \"Google\": \"GOOGL\",\n",
      "    \"Facebook\": \"FB\",\n",
      "}\n",
      "\n",
      "ytd_gains = {stock: get_stock_info(ticker) for stock, ticker in tech_stocks.items()}\n",
      "largest_gain = max(ytd_gains, key=ytd_gains.get)\n",
      "print(f\"{largest_gain} has the largest year-to-date gain with {ytd_gains[largest_gain]:.2f}% gain.\")\n",
      "```\n",
      "\n",
      "This script will print out the big tech stock with the largest year-to-date gain and the gain percentage.\n",
      "\n",
      ">>>>>>>> NO HUMAN INPUT RECEIVED. USING AUTO REPLY FOR THE USER...\n",
      "\n",
      " -------------------------------------------------------------------------------- \n",
      "\n",
      "user (to assistant):\n",
      "exitcode: 1 (execution failed)\n",
      "Code output: \n",
      "Today's date: June 08, 2023\n",
      "\n",
      "Defaulting to user installation because normal site-packages is not writeable\n",
      "Requirement already satisfied: yfinance in /home/vscode/.local/lib/python3.9/site-packages (0.2.18)\n",
      "Requirement already satisfied: pandas>=1.3.0 in /usr/local/lib/python3.9/site-packages (from yfinance) (1.5.2)\n",
      "Requirement already satisfied: numpy>=1.16.5 in /home/vscode/.local/lib/python3.9/site-packages (from yfinance) (1.23.5)\n",
      "Requirement already satisfied: requests>=2.26 in /usr/local/lib/python3.9/site-packages (from yfinance) (2.28.1)\n",
      "Requirement already satisfied: multitasking>=0.0.7 in /home/vscode/.local/lib/python3.9/site-packages (from yfinance) (0.0.11)\n",
      "Requirement already satisfied: lxml>=4.9.1 in /home/vscode/.local/lib/python3.9/site-packages (from yfinance) (4.9.2)\n",
      "Requirement already satisfied: appdirs>=1.4.4 in /home/vscode/.local/lib/python3.9/site-packages (from yfinance) (1.4.4)\n",
      "Requirement already satisfied: pytz>=2022.5 in /usr/local/lib/python3.9/site-packages (from yfinance) (2022.6)\n",
      "Requirement already satisfied: frozendict>=2.3.4 in /home/vscode/.local/lib/python3.9/site-packages (from yfinance) (2.3.8)\n",
      "Requirement already satisfied: cryptography>=3.3.2 in /home/vscode/.local/lib/python3.9/site-packages (from yfinance) (38.0.4)\n",
      "Requirement already satisfied: beautifulsoup4>=4.11.1 in /home/vscode/.local/lib/python3.9/site-packages (from yfinance) (4.11.1)\n",
      "Requirement already satisfied: html5lib>=1.1 in /home/vscode/.local/lib/python3.9/site-packages (from yfinance) (1.1)\n",
      "Requirement already satisfied: soupsieve>1.2 in /home/vscode/.local/lib/python3.9/site-packages (from beautifulsoup4>=4.11.1->yfinance) (2.3.2.post1)\n",
      "Requirement already satisfied: cffi>=1.12 in /home/vscode/.local/lib/python3.9/site-packages (from cryptography>=3.3.2->yfinance) (1.15.1)\n",
      "Requirement already satisfied: six>=1.9 in /usr/local/lib/python3.9/site-packages (from html5lib>=1.1->yfinance) (1.16.0)\n",
      "Requirement already satisfied: webencodings in /home/vscode/.local/lib/python3.9/site-packages (from html5lib>=1.1->yfinance) (0.5.1)\n",
      "Requirement already satisfied: python-dateutil>=2.8.1 in /usr/local/lib/python3.9/site-packages (from pandas>=1.3.0->yfinance) (2.8.2)\n",
      "Requirement already satisfied: charset-normalizer<3,>=2 in /usr/local/lib/python3.9/site-packages (from requests>=2.26->yfinance) (2.1.1)\n",
      "Requirement already satisfied: idna<4,>=2.5 in /usr/local/lib/python3.9/site-packages (from requests>=2.26->yfinance) (3.4)\n",
      "Requirement already satisfied: urllib3<1.27,>=1.21.1 in /usr/local/lib/python3.9/site-packages (from requests>=2.26->yfinance) (1.26.13)\n",
      "Requirement already satisfied: certifi>=2017.4.17 in /usr/local/lib/python3.9/site-packages (from requests>=2.26->yfinance) (2022.9.24)\n",
      "Requirement already satisfied: pycparser in /home/vscode/.local/lib/python3.9/site-packages (from cffi>=1.12->cryptography>=3.3.2->yfinance) (2.21)\n",
      "\n",
      "Traceback (most recent call last):\n",
      "  File \"/workspaces/FLAML/notebook/coding/tmp_code_74e4297091e1a4a01622501c25dfb9db.py\", line 18, in <module>\n",
      "    ytd_gains = {stock: get_stock_info(ticker) for stock, ticker in tech_stocks.items()}\n",
      "  File \"/workspaces/FLAML/notebook/coding/tmp_code_74e4297091e1a4a01622501c25dfb9db.py\", line 18, in <dictcomp>\n",
      "    ytd_gains = {stock: get_stock_info(ticker) for stock, ticker in tech_stocks.items()}\n",
      "  File \"/workspaces/FLAML/notebook/coding/tmp_code_74e4297091e1a4a01622501c25dfb9db.py\", line 6, in get_stock_info\n",
      "    current_price = stock_info[\"Close\"][-1]\n",
      "  File \"/usr/local/lib/python3.9/site-packages/pandas/core/series.py\", line 978, in __getitem__\n",
      "    return self._values[key]\n",
      "IndexError: index -1 is out of bounds for axis 0 with size 0\n",
      "\n",
      "\n",
      " -------------------------------------------------------------------------------- \n",
      "\n",
      "assistant (to user):\n",
      "Apologies for the error. It seems there might be an issue with `yfinance` fetching historical price data. To address the error, let's modify the code to include a try-except block to handle any errors when fetching stock data.\n",
      "\n",
      "Here's the updated code:\n",
      "\n",
      "```python\n",
      "import yfinance as yf\n",
      "\n",
      "def get_stock_info(ticker):\n",
      "    try:\n",
      "        stock = yf.Ticker(ticker)\n",
      "        stock_info = stock.history(\"ytd\")\n",
      "        if stock_info.empty:\n",
      "            return None\n",
      "        current_price = stock_info[\"Close\"][-1]\n",
      "        start_price = stock_info[\"Close\"][0]\n",
      "        return (current_price - start_price) / start_price * 100\n",
      "    except Exception as e:\n",
      "        print(f\"Error fetching stock data for {ticker}: {e}\")\n",
      "        return None\n",
      "\n",
      "tech_stocks = {\n",
      "    \"Apple\": \"AAPL\",\n",
      "    \"Microsoft\": \"MSFT\",\n",
      "    \"Amazon\": \"AMZN\",\n",
      "    \"Google\": \"GOOGL\",\n",
      "    \"Facebook\": \"FB\",\n",
      "}\n",
      "\n",
      "ytd_gains = {stock: get_stock_info(ticker) for stock, ticker in tech_stocks.items()}\n",
      "ytd_gains = {stock: gain for stock, gain in ytd_gains.items() if gain is not None}  # Remove stocks with errors\n",
      "if ytd_gains:\n",
      "    largest_gain = max(ytd_gains, key=ytd_gains.get)\n",
      "    print(f\"{largest_gain} has the largest year-to-date gain with {ytd_gains[largest_gain]:.2f}% gain.\")\n",
      "else:\n",
      "    print(\"Unable to retrieve stock information for any tech stocks.\")\n",
      "```\n",
      "\n",
      "This code will handle any errors that may occur when fetching stock data and continue to the next one. It will also print out an error message for the stocks that failed to fetch data. Run the modified code and let me know the result.\n",
      "\n",
      ">>>>>>>> NO HUMAN INPUT RECEIVED. USING AUTO REPLY FOR THE USER...\n",
      "\n",
      " -------------------------------------------------------------------------------- \n",
      "\n",
      "user (to assistant):\n",
      "exitcode: 0 (execution succeeded)\n",
-      "Code output: Hello world!\n",
+      "Code output: \n",
      "FB: No data found, symbol may be delisted\n",
      "Apple has the largest year-to-date gain with 42.59% gain.\n",
      "\n",
      "\n",
-      "**** user received message from coding_agent ****\n",
+      " -------------------------------------------------------------------------------- \n",
      "\n",
-      "I'm glad that the code execution was successful and you got the desired output! If you need any further help or assistance with another task, feel free to ask.\n",
+      "assistant (to user):\n",
      "Great! The updated code successfully fetched the stock data and determined that Apple has the largest year-to-date gain with 42.59%. Please note that the error message for Facebook stock (FB) indicates that no data was found, which may be due to the stock symbol being delisted or an issue with the `yfinance` library.\n",
      "\n",
      "If you have any more questions or need further assistance, feel free to ask. Otherwise, type \"TERMINATE\" to end this session.\n",
      "\n",
      ">>>>>>>> NO HUMAN INPUT RECEIVED. USING AUTO REPLY FOR THE USER...\n",
      "\n",
      " -------------------------------------------------------------------------------- \n",
      "\n",
      "user (to assistant):\n",
      "\n",
      "\n",
      " -------------------------------------------------------------------------------- \n",
      "\n",
      "assistant (to user):\n",
      "TERMINATE\n"
     ]
    }
   ],
   "source": [
-    "assistant.receive(\"\"\"Execute temp.py\"\"\", user)"
+    "# it is suggested to reset the assistant to clear the state if the new task is not related to the previous one.\n",
    "assistant.reset()\n",
    "assistant.receive(\n",
    "    \"\"\"What date is today? Which big tech stock has the largest year-to-date gain this year? How much is the gain?\"\"\",\n",
    "    user,\n",
    ")"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "All the feedback is auto generated."
   ]
  }
 ],
@ -331,7 +464,7 @@
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
-   "version": "3.9.16"
+   "version": "3.9.15"
  },
  "vscode": {
   "interpreter": {
--- a/notebook/autogen_agent_human_feedback.ipynb
+++ b/notebook/autogen_agent_human_feedback.ipynb
--- a/notebook/autogen_agent_web_info.ipynb
+++ b/notebook/autogen_agent_web_info.ipynb
@ -0,0 +1,607 @@
 {
 "cells": [
  {
   "attachments": {},
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "<a href=\"https://colab.research.google.com/github/microsoft/FLAML/blob/main/notebook/autogen_agent_web_info.ipynb\" target=\"_parent\"><img src=\"https://colab.research.google.com/assets/colab-badge.svg\" alt=\"Open In Colab\"/></a>"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "source": [
    "# Interactive LLM Agent Dealing with Web Info\n",
    "\n",
    "FLAML offers an experimental feature of interactive LLM agents, which can be used to solve various tasks with human or automatic feedback, including tasks that require using tools via code.\n",
    "\n",
    "In this notebook, we demonstrate how to use `AssistantAgent` and `UserProxyAgent` to discuss a paper based on its URL. Here `AssistantAgent` is an LLM-based agent that can write Python code (in a Python coding block) for a user to execute for a given task. `UserProxyAgent` is an agent which serves as a proxy for a user to execute the code written by `AssistantAgent`. By setting `human_input_mode` properly, the `UserProxyAgent` can also prompt the user for feedback to `AssistantAgent`. For example, when `human_input_mode` is set to \"ALWAYS\", the `UserProxyAgent` will always prompt the user for feedback. When user feedback is provided, the `UserProxyAgent` will directly pass the feedback to `AssistantAgent` without doing any additional steps. When no user feedback is provided, the `UserProxyAgent` will execute the code written by `AssistantAgent` directly and return the execution results (success or failure and corresponding outputs) to `AssistantAgent`.\n",
    "\n",
    "## Requirements\n",
    "\n",
    "FLAML requires `Python>=3.7`. To run this notebook example, please install flaml with the [openai] option:\n",
    "```bash\n",
    "pip install flaml[autogen]\n",
    "```"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "metadata": {
    "execution": {
     "iopub.execute_input": "2023-02-13T23:40:52.317406Z",
     "iopub.status.busy": "2023-02-13T23:40:52.316561Z",
     "iopub.status.idle": "2023-02-13T23:40:52.321193Z",
     "shell.execute_reply": "2023-02-13T23:40:52.320628Z"
    }
   },
   "outputs": [],
   "source": [
    "# %pip install flaml[autogen]"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Set your API Endpoint\n",
    "\n",
    "The [`config_list_openai_aoai`](https://microsoft.github.io/FLAML/docs/reference/autogen/oai/openai_utils#config_list_openai_aoai) function tries to create a list of configurations using Azure OpenAI endpoints and OpenAI endpoints. It assumes the api keys and api bases are stored in the corresponding environment variables or local txt files:\n",
    "\n",
    "- OpenAI API key: os.environ[\"OPENAI_API_KEY\"] or `openai_api_key_file=\"key_openai.txt\"`.\n",
    "- Azure OpenAI API key: os.environ[\"AZURE_OPENAI_API_KEY\"] or `aoai_api_key_file=\"key_aoai.txt\"`. Multiple keys can be stored, one per line.\n",
    "- Azure OpenAI API base: os.environ[\"AZURE_OPENAI_API_BASE\"] or `aoai_api_base_file=\"base_aoai.txt\"`. Multiple bases can be stored, one per line.\n",
    "\n",
    "It's OK to have only the OpenAI API key, or only the Azure OpenAI API key + base.\n",
    "\n",
    "The following code excludes openai endpoints from the config list.\n",
    "Change to `exclude=\"aoai\"` to exclude Azure OpenAI, or remove the `exclude` argument to include both.\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "metadata": {},
   "outputs": [],
   "source": [
    "from flaml import oai\n",
    "\n",
    "config_list = oai.config_list_openai_aoai(exclude=\"openai\")"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Construct Agents\n",
    "\n",
    "We construct the assistant agent and the user proxy agent. We specify `human_input_mode` as \"TERMINATE\" in the user proxy agent, which will ask for feedback when it receives a \"TERMINATE\" signal from the assistant agent."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "metadata": {},
   "outputs": [],
   "source": [
    "from flaml.autogen.agent import AssistantAgent, UserProxyAgent\n",
    "\n",
    "# create an AssistantAgent instance named \"assistant\"\n",
    "assistant = AssistantAgent(\n",
    "    name=\"assistant\",\n",
    "    request_timeout=600,\n",
    "    seed=42,\n",
    "    config_list=config_list,\n",
    "    model=\"gpt-4-32k\",  # make sure the endpoint you use supports the model\n",
    ")\n",
    "# create a UserProxyAgent instance named \"user\"\n",
    "user = UserProxyAgent(\n",
    "    name=\"user\",\n",
    "    human_input_mode=\"TERMINATE\",\n",
    "    max_consecutive_auto_reply=10,\n",
    "    is_termination_msg=lambda x: x.rstrip().endswith(\"TERMINATE\"),\n",
    "    work_dir='web',\n",
    ")"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Perform a task\n",
    "\n",
    "We invoke the `receive()` method of the coding agent to start the conversation. When you run the cell below, you will be prompted to provide feedback after receving a message from the coding agent. If you don't provide any feedback (by pressing Enter directly), the user proxy agent will try to execute the code suggested by the coding agent on behalf of you, or terminate if the coding agent sends a \"TERMINATE\" signal in the end of the message."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "\n",
      " -------------------------------------------------------------------------------- \n",
      "\n",
      "user (to assistant):\n",
      "\n",
      "Who should read this paper: https://arxiv.org/abs/2306.01337\n",
      "\n",
      "\n",
      " -------------------------------------------------------------------------------- \n",
      "\n",
      "assistant (to user):\n",
      "To determine who should read the paper, I will fetch and analyze the abstract of the paper.\n",
      "\n",
      "```python\n",
      "import requests\n",
      "from bs4 import BeautifulSoup\n",
      "\n",
      "def get_arxiv_abstract(url):\n",
      "    response = requests.get(url)\n",
      "    soup = BeautifulSoup(response.text, 'html.parser')\n",
      "    abstract = soup.find('blockquote', {'class': 'abstract'}).text.strip()\n",
      "    return abstract.replace(\"Abstract: \", \"\")\n",
      "\n",
      "url = \"https://arxiv.org/abs/2306.01337\"\n",
      "abstract = get_arxiv_abstract(url)\n",
      "print(abstract)\n",
      "```\n",
      "\n",
      "Please run this Python code to fetch and display the abstract of the paper. Based on the abstract, we can figure out who should read the paper.\n",
      "\n",
      ">>>>>>>> NO HUMAN INPUT RECEIVED. USING AUTO REPLY FOR THE USER...\n",
      "\n",
      " -------------------------------------------------------------------------------- \n",
      "\n",
      "user (to assistant):\n",
      "exitcode: 0 (execution succeeded)\n",
      "Code output: \n",
      " Employing Large Language Models (LLMs) to address mathematical problems is an\n",
      "intriguing research endeavor, considering the abundance of math problems\n",
      "expressed in natural language across numerous science and engineering fields.\n",
      "While several prior works have investigated solving elementary mathematics\n",
      "using LLMs, this work explores the frontier of using GPT-4 for solving more\n",
      "complex and challenging math problems. We evaluate various ways of using GPT-4.\n",
      "Some of them are adapted from existing work, and one is \\MathChat, a\n",
      "conversational problem-solving framework newly proposed in this work. We\n",
      "perform the evaluation on difficult high school competition problems from the\n",
      "MATH dataset, which shows the advantage of the proposed conversational\n",
      "approach.\n",
      "\n",
      "\n",
      " -------------------------------------------------------------------------------- \n",
      "\n",
      "assistant (to user):\n",
      "Based on the abstract, the following people may be interested in reading the paper:\n",
      "\n",
      "1. Researchers and practitioners working on large language models (LLMs)\n",
      "2. Artificial intelligence (AI) and natural language processing (NLP) researchers exploring the application of LLMs in solving mathematical problems\n",
      "3. Educators, mathematicians, and researchers studying advanced mathematical problem-solving techniques\n",
      "4. Individuals working on conversational AI for math tutoring or educational purposes\n",
      "5. Anyone interested in the development and improvement of models like GPT-4 for complex problem-solving\n",
      "\n",
      "If you belong to any of these categories or have an interest in these topics, you should consider reading the paper.\n",
      "\n",
      "TERMINATE\n"
     ]
    }
   ],
   "source": [
    "# the assistant receives a message from the user, which contains the task description\n",
    "assistant.receive(\n",
    "    \"\"\"\n",
    "Who should read this paper: https://arxiv.org/abs/2306.01337\n",
    "\"\"\",\n",
    "    user\n",
    ")"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.9.15"
  },
  "vscode": {
   "interpreter": {
    "hash": "949777d72b0d2535278d3dc13498b2535136f6dfe0678499012e853ee9abcab1"
   }
  },
  "widgets": {
   "application/vnd.jupyter.widget-state+json": {
    "state": {
     "2d910cfd2d2a4fc49fc30fbbdc5576a7": {
      "model_module": "@jupyter-widgets/base",
      "model_module_version": "2.0.0",
      "model_name": "LayoutModel",
      "state": {
       "_model_module": "@jupyter-widgets/base",
       "_model_module_version": "2.0.0",
       "_model_name": "LayoutModel",
       "_view_count": null,
       "_view_module": "@jupyter-widgets/base",
       "_view_module_version": "2.0.0",
       "_view_name": "LayoutView",
       "align_content": null,
       "align_items": null,
       "align_self": null,
       "border_bottom": null,
       "border_left": null,
       "border_right": null,
       "border_top": null,
       "bottom": null,
       "display": null,
       "flex": null,
       "flex_flow": null,
       "grid_area": null,
       "grid_auto_columns": null,
       "grid_auto_flow": null,
       "grid_auto_rows": null,
       "grid_column": null,
       "grid_gap": null,
       "grid_row": null,
       "grid_template_areas": null,
       "grid_template_columns": null,
       "grid_template_rows": null,
       "height": null,
       "justify_content": null,
       "justify_items": null,
       "left": null,
       "margin": null,
       "max_height": null,
       "max_width": null,
       "min_height": null,
       "min_width": null,
       "object_fit": null,
       "object_position": null,
       "order": null,
       "overflow": null,
       "padding": null,
       "right": null,
       "top": null,
       "visibility": null,
       "width": null
      }
     },
     "454146d0f7224f038689031002906e6f": {
      "model_module": "@jupyter-widgets/controls",
      "model_module_version": "2.0.0",
      "model_name": "HBoxModel",
      "state": {
       "_dom_classes": [],
       "_model_module": "@jupyter-widgets/controls",
       "_model_module_version": "2.0.0",
       "_model_name": "HBoxModel",
       "_view_count": null,
       "_view_module": "@jupyter-widgets/controls",
       "_view_module_version": "2.0.0",
       "_view_name": "HBoxView",
       "box_style": "",
       "children": [
        "IPY_MODEL_e4ae2b6f5a974fd4bafb6abb9d12ff26",
        "IPY_MODEL_577e1e3cc4db4942b0883577b3b52755",
        "IPY_MODEL_b40bdfb1ac1d4cffb7cefcb870c64d45"
       ],
       "layout": "IPY_MODEL_dc83c7bff2f241309537a8119dfc7555",
       "tabbable": null,
       "tooltip": null
      }
     },
     "577e1e3cc4db4942b0883577b3b52755": {
      "model_module": "@jupyter-widgets/controls",
      "model_module_version": "2.0.0",
      "model_name": "FloatProgressModel",
      "state": {
       "_dom_classes": [],
       "_model_module": "@jupyter-widgets/controls",
       "_model_module_version": "2.0.0",
       "_model_name": "FloatProgressModel",
       "_view_count": null,
       "_view_module": "@jupyter-widgets/controls",
       "_view_module_version": "2.0.0",
       "_view_name": "ProgressView",
       "bar_style": "success",
       "description": "",
       "description_allow_html": false,
       "layout": "IPY_MODEL_2d910cfd2d2a4fc49fc30fbbdc5576a7",
       "max": 1,
       "min": 0,
       "orientation": "horizontal",
       "style": "IPY_MODEL_74a6ba0c3cbc4051be0a83e152fe1e62",
       "tabbable": null,
       "tooltip": null,
       "value": 1
      }
     },
     "6086462a12d54bafa59d3c4566f06cb2": {
      "model_module": "@jupyter-widgets/base",
      "model_module_version": "2.0.0",
      "model_name": "LayoutModel",
      "state": {
       "_model_module": "@jupyter-widgets/base",
       "_model_module_version": "2.0.0",
       "_model_name": "LayoutModel",
       "_view_count": null,
       "_view_module": "@jupyter-widgets/base",
       "_view_module_version": "2.0.0",
       "_view_name": "LayoutView",
       "align_content": null,
       "align_items": null,
       "align_self": null,
       "border_bottom": null,
       "border_left": null,
       "border_right": null,
       "border_top": null,
       "bottom": null,
       "display": null,
       "flex": null,
       "flex_flow": null,
       "grid_area": null,
       "grid_auto_columns": null,
       "grid_auto_flow": null,
       "grid_auto_rows": null,
       "grid_column": null,
       "grid_gap": null,
       "grid_row": null,
       "grid_template_areas": null,
       "grid_template_columns": null,
       "grid_template_rows": null,
       "height": null,
       "justify_content": null,
       "justify_items": null,
       "left": null,
       "margin": null,
       "max_height": null,
       "max_width": null,
       "min_height": null,
       "min_width": null,
       "object_fit": null,
       "object_position": null,
       "order": null,
       "overflow": null,
       "padding": null,
       "right": null,
       "top": null,
       "visibility": null,
       "width": null
      }
     },
     "74a6ba0c3cbc4051be0a83e152fe1e62": {
      "model_module": "@jupyter-widgets/controls",
      "model_module_version": "2.0.0",
      "model_name": "ProgressStyleModel",
      "state": {
       "_model_module": "@jupyter-widgets/controls",
       "_model_module_version": "2.0.0",
       "_model_name": "ProgressStyleModel",
       "_view_count": null,
       "_view_module": "@jupyter-widgets/base",
       "_view_module_version": "2.0.0",
       "_view_name": "StyleView",
       "bar_color": null,
       "description_width": ""
      }
     },
     "7d3f3d9e15894d05a4d188ff4f466554": {
      "model_module": "@jupyter-widgets/controls",
      "model_module_version": "2.0.0",
      "model_name": "HTMLStyleModel",
      "state": {
       "_model_module": "@jupyter-widgets/controls",
       "_model_module_version": "2.0.0",
       "_model_name": "HTMLStyleModel",
       "_view_count": null,
       "_view_module": "@jupyter-widgets/base",
       "_view_module_version": "2.0.0",
       "_view_name": "StyleView",
       "background": null,
       "description_width": "",
       "font_size": null,
       "text_color": null
      }
     },
     "b40bdfb1ac1d4cffb7cefcb870c64d45": {
      "model_module": "@jupyter-widgets/controls",
      "model_module_version": "2.0.0",
      "model_name": "HTMLModel",
      "state": {
       "_dom_classes": [],
       "_model_module": "@jupyter-widgets/controls",
       "_model_module_version": "2.0.0",
       "_model_name": "HTMLModel",
       "_view_count": null,
       "_view_module": "@jupyter-widgets/controls",
       "_view_module_version": "2.0.0",
       "_view_name": "HTMLView",
       "description": "",
       "description_allow_html": false,
       "layout": "IPY_MODEL_f1355871cc6f4dd4b50d9df5af20e5c8",
       "placeholder": "",
       "style": "IPY_MODEL_ca245376fd9f4354af6b2befe4af4466",
       "tabbable": null,
       "tooltip": null,
       "value": " 1/1 [00:00&lt;00:00, 44.69it/s]"
      }
     },
     "ca245376fd9f4354af6b2befe4af4466": {
      "model_module": "@jupyter-widgets/controls",
      "model_module_version": "2.0.0",
      "model_name": "HTMLStyleModel",
      "state": {
       "_model_module": "@jupyter-widgets/controls",
       "_model_module_version": "2.0.0",
       "_model_name": "HTMLStyleModel",
       "_view_count": null,
       "_view_module": "@jupyter-widgets/base",
       "_view_module_version": "2.0.0",
       "_view_name": "StyleView",
       "background": null,
       "description_width": "",
       "font_size": null,
       "text_color": null
      }
     },
     "dc83c7bff2f241309537a8119dfc7555": {
      "model_module": "@jupyter-widgets/base",
      "model_module_version": "2.0.0",
      "model_name": "LayoutModel",
      "state": {
       "_model_module": "@jupyter-widgets/base",
       "_model_module_version": "2.0.0",
       "_model_name": "LayoutModel",
       "_view_count": null,
       "_view_module": "@jupyter-widgets/base",
       "_view_module_version": "2.0.0",
       "_view_name": "LayoutView",
       "align_content": null,
       "align_items": null,
       "align_self": null,
       "border_bottom": null,
       "border_left": null,
       "border_right": null,
       "border_top": null,
       "bottom": null,
       "display": null,
       "flex": null,
       "flex_flow": null,
       "grid_area": null,
       "grid_auto_columns": null,
       "grid_auto_flow": null,
       "grid_auto_rows": null,
       "grid_column": null,
       "grid_gap": null,
       "grid_row": null,
       "grid_template_areas": null,
       "grid_template_columns": null,
       "grid_template_rows": null,
       "height": null,
       "justify_content": null,
       "justify_items": null,
       "left": null,
       "margin": null,
       "max_height": null,
       "max_width": null,
       "min_height": null,
       "min_width": null,
       "object_fit": null,
       "object_position": null,
       "order": null,
       "overflow": null,
       "padding": null,
       "right": null,
       "top": null,
       "visibility": null,
       "width": null
      }
     },
     "e4ae2b6f5a974fd4bafb6abb9d12ff26": {
      "model_module": "@jupyter-widgets/controls",
      "model_module_version": "2.0.0",
      "model_name": "HTMLModel",
      "state": {
       "_dom_classes": [],
       "_model_module": "@jupyter-widgets/controls",
       "_model_module_version": "2.0.0",
       "_model_name": "HTMLModel",
       "_view_count": null,
       "_view_module": "@jupyter-widgets/controls",
       "_view_module_version": "2.0.0",
       "_view_name": "HTMLView",
       "description": "",
       "description_allow_html": false,
       "layout": "IPY_MODEL_6086462a12d54bafa59d3c4566f06cb2",
       "placeholder": "",
       "style": "IPY_MODEL_7d3f3d9e15894d05a4d188ff4f466554",
       "tabbable": null,
       "tooltip": null,
       "value": "100%"
      }
     },
     "f1355871cc6f4dd4b50d9df5af20e5c8": {
      "model_module": "@jupyter-widgets/base",
      "model_module_version": "2.0.0",
      "model_name": "LayoutModel",
      "state": {
       "_model_module": "@jupyter-widgets/base",
       "_model_module_version": "2.0.0",
       "_model_name": "LayoutModel",
       "_view_count": null,
       "_view_module": "@jupyter-widgets/base",
       "_view_module_version": "2.0.0",
       "_view_name": "LayoutView",
       "align_content": null,
       "align_items": null,
       "align_self": null,
       "border_bottom": null,
       "border_left": null,
       "border_right": null,
       "border_top": null,
       "bottom": null,
       "display": null,
       "flex": null,
       "flex_flow": null,
       "grid_area": null,
       "grid_auto_columns": null,
       "grid_auto_flow": null,
       "grid_auto_rows": null,
       "grid_column": null,
       "grid_gap": null,
       "grid_row": null,
       "grid_template_areas": null,
       "grid_template_columns": null,
       "grid_template_rows": null,
       "height": null,
       "justify_content": null,
       "justify_items": null,
       "left": null,
       "margin": null,
       "max_height": null,
       "max_width": null,
       "min_height": null,
       "min_width": null,
       "object_fit": null,
       "object_position": null,
       "order": null,
       "overflow": null,
       "padding": null,
       "right": null,
       "top": null,
       "visibility": null,
       "width": null
      }
     }
    },
    "version_major": 2,
    "version_minor": 0
   }
  }
 },
 "nbformat": 4,
 "nbformat_minor": 2
 }
--- a/notebook/autogen_chatgpt_gpt4.ipynb
+++ b/notebook/autogen_chatgpt_gpt4.ipynb
@ -25,7 +25,9 @@
    "\n",
    "FLAML offers a cost-effective hyperparameter optimization technique [EcoOptiGen](https://arxiv.org/abs/2303.04673) for tuning Large Language Models. Our study finds that tuning hyperparameters can significantly improve the utility of LLMs.\n",
    "\n",
-    "In this notebook, we tune OpenAI ChatGPT (both GPT-3.5 and GPT-4) models for math problem solving. We use [the MATH benchmark](https://crfm.stanford.edu/helm/latest/?group=math_chain_of_thought) for measuring mathematical problem solving on competition math problems with chain-of-thoughts style reasoning. \n",
+    "In this notebook, we tune OpenAI ChatGPT (both GPT-3.5 and GPT-4) models for math problem solving. We use [the MATH benchmark](https://crfm.stanford.edu/helm/latest/?group=math_chain_of_thought) for measuring mathematical problem solving on competition math problems with chain-of-thoughts style reasoning.\n",
    "\n",
    "Related link: [Blogpost](https://microsoft.github.io/FLAML/blog/2023/04/21/LLM-tuning-math) based on this experiment.\n",
    "\n",
    "## Requirements\n",
    "\n",
@ -93,7 +95,7 @@
    "- Azure OpenAI API key: os.environ[\"AZURE_OPENAI_API_KEY\"] or `aoai_api_key_file=\"key_aoai.txt\"`. Multiple keys can be stored, one per line.\n",
    "- Azure OpenAI API base: os.environ[\"AZURE_OPENAI_API_BASE\"] or `aoai_api_base_file=\"base_aoai.txt\"`. Multiple bases can be stored, one per line.\n",
    "\n",
-    "It's OK to have only the OpenAI API key, or only the Azure Open API key + base.\n"
+    "It's OK to have only the OpenAI API key, or only the Azure OpenAI API key + base.\n"
   ]
  },
  {
--- a/notebook/autogen_openai_completion.ipynb
+++ b/notebook/autogen_openai_completion.ipynb
@ -5,7 +5,7 @@
   "cell_type": "markdown",
   "metadata": {},
   "source": [
-    "<a href=\"https://colab.research.google.com/github/microsoft/FLAML/blob/main/notebook/autogen_openai.ipynb\" target=\"_parent\"><img src=\"https://colab.research.google.com/assets/colab-badge.svg\" alt=\"Open In Colab\"/></a>"
+    "<a href=\"https://colab.research.google.com/github/microsoft/FLAML/blob/main/notebook/autogen_openai_completion.ipynb\" target=\"_parent\"><img src=\"https://colab.research.google.com/assets/colab-badge.svg\" alt=\"Open In Colab\"/></a>"
   ]
  },
  {
--- a/notebook/automl_flight_delays.ipynb
+++ b/notebook/automl_flight_delays.ipynb
@ -37,10 +37,7 @@
        "\n",
        "In this notebook, we use one real data example (binary classification) to showcase how to use FLAML library.\n",
        "\n",
-        "FLAML requires `Python>=3.7`. To run this notebook example, please install flaml with the `notebook` option:\n",
+        "FLAML requires `Python>=3.7`. To run this notebook example, please install the following packages."
        "```bash\n",
        "pip install flaml[notebook]==1.1.3\n",
        "```"
      ]
    },
    {
@ -420,6 +417,7 @@
      ]
    },
    {
      "attachments": {},
      "cell_type": "markdown",
      "metadata": {
        "slideshow": {
@ -617,6 +615,7 @@
      ]
    },
    {
      "attachments": {},
      "cell_type": "markdown",
      "metadata": {
        "slideshow": {
@ -1047,6 +1046,7 @@
      ]
    },
    {
      "attachments": {},
      "cell_type": "markdown",
      "metadata": {
        "slideshow": {
@ -1323,6 +1323,7 @@
      ]
    },
    {
      "attachments": {},
      "cell_type": "markdown",
      "metadata": {
        "slideshow": {
@ -1450,6 +1451,7 @@
      ]
    },
    {
      "attachments": {},
      "cell_type": "markdown",
      "metadata": {},
      "source": [
@ -1457,6 +1459,7 @@
      ]
    },
    {
      "attachments": {},
      "cell_type": "markdown",
      "metadata": {},
      "source": [
@ -1574,6 +1577,7 @@
      ]
    },
    {
      "attachments": {},
      "cell_type": "markdown",
      "metadata": {},
      "source": [
@ -1768,6 +1772,7 @@
      ]
    },
    {
      "attachments": {},
      "cell_type": "markdown",
      "metadata": {
        "slideshow": {
@ -1779,6 +1784,7 @@
      ]
    },
    {
      "attachments": {},
      "cell_type": "markdown",
      "metadata": {
        "slideshow": {
@ -1792,6 +1798,7 @@
      ]
    },
    {
      "attachments": {},
      "cell_type": "markdown",
      "metadata": {
        "slideshow": {
@ -1964,6 +1971,7 @@
      ]
    },
    {
      "attachments": {},
      "cell_type": "markdown",
      "metadata": {
        "slideshow": {
@ -2184,6 +2192,7 @@
      ]
    },
    {
      "attachments": {},
      "cell_type": "markdown",
      "metadata": {},
      "source": [
@ -2244,6 +2253,7 @@
      ]
    },
    {
      "attachments": {},
      "cell_type": "markdown",
      "metadata": {},
      "source": [
--- a/notebook/automl_synapseML.ipynb
+++ b/notebook/automl_synapseML.ipynb
@ -1,831 +0,0 @@
 {
  "cells": [
    {
      "attachments": {},
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "# AutoML with FLAML Library for synapseML models and spark dataframes\n",
        "\n",
        "\n",
        "## 1. Introduction\n",
        "\n",
        "FLAML is a Python library (https://github.com/microsoft/FLAML) designed to automatically produce accurate machine learning models \n",
        "with low computational cost. It is fast and economical. The simple and lightweight design makes it easy \n",
        "to use and extend, such as adding new learners. FLAML can \n",
        "- serve as an economical AutoML engine,\n",
        "- be used as a fast hyperparameter tuning tool, or \n",
        "- be embedded in self-tuning software that requires low latency & resource in repetitive\n",
        "   tuning tasks.\n",
        "\n",
        "In this notebook, we demonstrate how to use FLAML library to do AutoML for synapseML models and spark dataframes. We also compare the results between FLAML AutoML and default SynapseML. \n",
        "In this example, we use LightGBM to build a classification model in order to predict bankruptcy.\n",
        "\n",
        "Since the dataset is unbalanced, `AUC` is a better metric than `Accuracy`. FLAML (1 min of training) achieved AUC **0.79**, the default SynapseML model only got AUC **0.64**. \n",
        "\n",
        "FLAML requires `Python>=3.7`. To run this notebook example, please install flaml with the `synapse` option:\n",
        "```bash\n",
        "pip install flaml[synapse] \n",
        "```\n",
        " "
      ]
    },
    {
      "cell_type": "code",
      "execution_count": 1,
      "metadata": {},
      "outputs": [],
      "source": [
        "# %pip install \"flaml[synapse]\""
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "## 2. Load data and preprocess"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": 2,
      "metadata": {},
      "outputs": [
        {
          "name": "stdout",
          "output_type": "stream",
          "text": [
            ":: loading settings :: url = jar:file:/datadrive/spark/spark33/jars/ivy-2.5.0.jar!/org/apache/ivy/core/settings/ivysettings.xml\n"
          ]
        },
        {
          "name": "stderr",
          "output_type": "stream",
          "text": [
            "Ivy Default Cache set to: /home/lijiang1/.ivy2/cache\n",
            "The jars for the packages stored in: /home/lijiang1/.ivy2/jars\n",
            "com.microsoft.azure#synapseml_2.12 added as a dependency\n",
            "org.apache.hadoop#hadoop-azure added as a dependency\n",
            "com.microsoft.azure#azure-storage added as a dependency\n",
            ":: resolving dependencies :: org.apache.spark#spark-submit-parent-bfb2447b-61c5-4941-bf9b-0548472077eb;1.0\n",
            "\tconfs: [default]\n",
            "\tfound com.microsoft.azure#synapseml_2.12;0.10.2 in central\n",
            "\tfound com.microsoft.azure#synapseml-core_2.12;0.10.2 in central\n",
            "\tfound org.scalactic#scalactic_2.12;3.2.14 in local-m2-cache\n",
            "\tfound org.scala-lang#scala-reflect;2.12.15 in central\n",
            "\tfound io.spray#spray-json_2.12;1.3.5 in central\n",
            "\tfound com.jcraft#jsch;0.1.54 in central\n",
            "\tfound org.apache.httpcomponents.client5#httpclient5;5.1.3 in central\n",
            "\tfound org.apache.httpcomponents.core5#httpcore5;5.1.3 in central\n",
            "\tfound org.apache.httpcomponents.core5#httpcore5-h2;5.1.3 in central\n",
            "\tfound org.slf4j#slf4j-api;1.7.25 in local-m2-cache\n",
            "\tfound commons-codec#commons-codec;1.15 in local-m2-cache\n",
            "\tfound org.apache.httpcomponents#httpmime;4.5.13 in local-m2-cache\n",
            "\tfound org.apache.httpcomponents#httpclient;4.5.13 in local-m2-cache\n",
            "\tfound org.apache.httpcomponents#httpcore;4.4.13 in central\n",
            "\tfound commons-logging#commons-logging;1.2 in central\n",
            "\tfound com.linkedin.isolation-forest#isolation-forest_3.2.0_2.12;2.0.8 in central\n",
            "\tfound com.chuusai#shapeless_2.12;2.3.2 in central\n",
            "\tfound org.typelevel#macro-compat_2.12;1.1.1 in central\n",
            "\tfound org.apache.spark#spark-avro_2.12;3.2.0 in central\n",
            "\tfound org.tukaani#xz;1.8 in central\n",
            "\tfound org.spark-project.spark#unused;1.0.0 in central\n",
            "\tfound org.testng#testng;6.8.8 in central\n",
            "\tfound org.beanshell#bsh;2.0b4 in central\n",
            "\tfound com.beust#jcommander;1.27 in central\n",
            "\tfound com.microsoft.azure#synapseml-deep-learning_2.12;0.10.2 in central\n",
            "\tfound com.microsoft.azure#synapseml-opencv_2.12;0.10.2 in central\n",
            "\tfound org.openpnp#opencv;3.2.0-1 in central\n",
            "\tfound com.microsoft.azure#onnx-protobuf_2.12;0.9.1 in central\n",
            "\tfound com.microsoft.cntk#cntk;2.4 in central\n",
            "\tfound com.microsoft.onnxruntime#onnxruntime_gpu;1.8.1 in central\n",
            "\tfound com.microsoft.azure#synapseml-cognitive_2.12;0.10.2 in central\n",
            "\tfound com.microsoft.cognitiveservices.speech#client-jar-sdk;1.14.0 in central\n",
            "\tfound com.microsoft.azure#synapseml-vw_2.12;0.10.2 in central\n",
            "\tfound com.github.vowpalwabbit#vw-jni;8.9.1 in central\n",
            "\tfound com.microsoft.azure#synapseml-lightgbm_2.12;0.10.2 in central\n",
            "\tfound com.microsoft.ml.lightgbm#lightgbmlib;3.2.110 in central\n",
            "\tfound org.apache.hadoop#hadoop-azure;3.3.1 in central\n",
            "\tfound org.apache.hadoop.thirdparty#hadoop-shaded-guava;1.1.1 in local-m2-cache\n",
            "\tfound org.eclipse.jetty#jetty-util-ajax;9.4.40.v20210413 in central\n",
            "\tfound org.eclipse.jetty#jetty-util;9.4.40.v20210413 in central\n",
            "\tfound org.codehaus.jackson#jackson-mapper-asl;1.9.13 in local-m2-cache\n",
            "\tfound org.codehaus.jackson#jackson-core-asl;1.9.13 in local-m2-cache\n",
            "\tfound org.wildfly.openssl#wildfly-openssl;1.0.7.Final in local-m2-cache\n",
            "\tfound com.microsoft.azure#azure-storage;8.6.6 in central\n",
            "\tfound com.fasterxml.jackson.core#jackson-core;2.9.4 in central\n",
            "\tfound org.apache.commons#commons-lang3;3.4 in local-m2-cache\n",
            "\tfound com.microsoft.azure#azure-keyvault-core;1.2.4 in central\n",
            "\tfound com.google.guava#guava;24.1.1-jre in central\n",
            "\tfound com.google.code.findbugs#jsr305;1.3.9 in central\n",
            "\tfound org.checkerframework#checker-compat-qual;2.0.0 in central\n",
            "\tfound com.google.errorprone#error_prone_annotations;2.1.3 in central\n",
            "\tfound com.google.j2objc#j2objc-annotations;1.1 in central\n",
            "\tfound org.codehaus.mojo#animal-sniffer-annotations;1.14 in central\n",
            ":: resolution report :: resolve 992ms :: artifacts dl 77ms\n",
            "\t:: modules in use:\n",
            "\tcom.beust#jcommander;1.27 from central in [default]\n",
            "\tcom.chuusai#shapeless_2.12;2.3.2 from central in [default]\n",
            "\tcom.fasterxml.jackson.core#jackson-core;2.9.4 from central in [default]\n",
            "\tcom.github.vowpalwabbit#vw-jni;8.9.1 from central in [default]\n",
            "\tcom.google.code.findbugs#jsr305;1.3.9 from central in [default]\n",
            "\tcom.google.errorprone#error_prone_annotations;2.1.3 from central in [default]\n",
            "\tcom.google.guava#guava;24.1.1-jre from central in [default]\n",
            "\tcom.google.j2objc#j2objc-annotations;1.1 from central in [default]\n",
            "\tcom.jcraft#jsch;0.1.54 from central in [default]\n",
            "\tcom.linkedin.isolation-forest#isolation-forest_3.2.0_2.12;2.0.8 from central in [default]\n",
            "\tcom.microsoft.azure#azure-keyvault-core;1.2.4 from central in [default]\n",
            "\tcom.microsoft.azure#azure-storage;8.6.6 from central in [default]\n",
            "\tcom.microsoft.azure#onnx-protobuf_2.12;0.9.1 from central in [default]\n",
            "\tcom.microsoft.azure#synapseml-cognitive_2.12;0.10.2 from central in [default]\n",
            "\tcom.microsoft.azure#synapseml-core_2.12;0.10.2 from central in [default]\n",
            "\tcom.microsoft.azure#synapseml-deep-learning_2.12;0.10.2 from central in [default]\n",
            "\tcom.microsoft.azure#synapseml-lightgbm_2.12;0.10.2 from central in [default]\n",
            "\tcom.microsoft.azure#synapseml-opencv_2.12;0.10.2 from central in [default]\n",
            "\tcom.microsoft.azure#synapseml-vw_2.12;0.10.2 from central in [default]\n",
            "\tcom.microsoft.azure#synapseml_2.12;0.10.2 from central in [default]\n",
            "\tcom.microsoft.cntk#cntk;2.4 from central in [default]\n",
            "\tcom.microsoft.cognitiveservices.speech#client-jar-sdk;1.14.0 from central in [default]\n",
            "\tcom.microsoft.ml.lightgbm#lightgbmlib;3.2.110 from central in [default]\n",
            "\tcom.microsoft.onnxruntime#onnxruntime_gpu;1.8.1 from central in [default]\n",
            "\tcommons-codec#commons-codec;1.15 from local-m2-cache in [default]\n",
            "\tcommons-logging#commons-logging;1.2 from central in [default]\n",
            "\tio.spray#spray-json_2.12;1.3.5 from central in [default]\n",
            "\torg.apache.commons#commons-lang3;3.4 from local-m2-cache in [default]\n",
            "\torg.apache.hadoop#hadoop-azure;3.3.1 from central in [default]\n",
            "\torg.apache.hadoop.thirdparty#hadoop-shaded-guava;1.1.1 from local-m2-cache in [default]\n",
            "\torg.apache.httpcomponents#httpclient;4.5.13 from local-m2-cache in [default]\n",
            "\torg.apache.httpcomponents#httpcore;4.4.13 from central in [default]\n",
            "\torg.apache.httpcomponents#httpmime;4.5.13 from local-m2-cache in [default]\n",
            "\torg.apache.httpcomponents.client5#httpclient5;5.1.3 from central in [default]\n",
            "\torg.apache.httpcomponents.core5#httpcore5;5.1.3 from central in [default]\n",
            "\torg.apache.httpcomponents.core5#httpcore5-h2;5.1.3 from central in [default]\n",
            "\torg.apache.spark#spark-avro_2.12;3.2.0 from central in [default]\n",
            "\torg.beanshell#bsh;2.0b4 from central in [default]\n",
            "\torg.checkerframework#checker-compat-qual;2.0.0 from central in [default]\n",
            "\torg.codehaus.jackson#jackson-core-asl;1.9.13 from local-m2-cache in [default]\n",
            "\torg.codehaus.jackson#jackson-mapper-asl;1.9.13 from local-m2-cache in [default]\n",
            "\torg.codehaus.mojo#animal-sniffer-annotations;1.14 from central in [default]\n",
            "\torg.eclipse.jetty#jetty-util;9.4.40.v20210413 from central in [default]\n",
            "\torg.eclipse.jetty#jetty-util-ajax;9.4.40.v20210413 from central in [default]\n",
            "\torg.openpnp#opencv;3.2.0-1 from central in [default]\n",
            "\torg.scala-lang#scala-reflect;2.12.15 from central in [default]\n",
            "\torg.scalactic#scalactic_2.12;3.2.14 from local-m2-cache in [default]\n",
            "\torg.slf4j#slf4j-api;1.7.25 from local-m2-cache in [default]\n",
            "\torg.spark-project.spark#unused;1.0.0 from central in [default]\n",
            "\torg.testng#testng;6.8.8 from central in [default]\n",
            "\torg.tukaani#xz;1.8 from central in [default]\n",
            "\torg.typelevel#macro-compat_2.12;1.1.1 from central in [default]\n",
            "\torg.wildfly.openssl#wildfly-openssl;1.0.7.Final from local-m2-cache in [default]\n",
            "\t:: evicted modules:\n",
            "\tcommons-codec#commons-codec;1.11 by [commons-codec#commons-codec;1.15] in [default]\n",
            "\tcom.microsoft.azure#azure-storage;7.0.1 by [com.microsoft.azure#azure-storage;8.6.6] in [default]\n",
            "\torg.slf4j#slf4j-api;1.7.12 by [org.slf4j#slf4j-api;1.7.25] in [default]\n",
            "\torg.apache.commons#commons-lang3;3.8.1 by [org.apache.commons#commons-lang3;3.4] in [default]\n",
            "\t---------------------------------------------------------------------\n",
            "\t|                  |            modules            ||   artifacts   |\n",
            "\t|       conf       | number| search|dwnlded|evicted|| number|dwnlded|\n",
            "\t---------------------------------------------------------------------\n",
            "\t|      default     |   57  |   0   |   0   |   4   ||   53  |   0   |\n",
            "\t---------------------------------------------------------------------\n",
            ":: retrieving :: org.apache.spark#spark-submit-parent-bfb2447b-61c5-4941-bf9b-0548472077eb\n",
            "\tconfs: [default]\n",
            "\t0 artifacts copied, 53 already retrieved (0kB/20ms)\n"
          ]
        },
        {
          "name": "stdout",
          "output_type": "stream",
          "text": [
            "23/02/28 02:12:16 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable\n"
          ]
        },
        {
          "name": "stderr",
          "output_type": "stream",
          "text": [
            "Setting default log level to \"WARN\".\n",
            "To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use setLogLevel(newLevel).\n"
          ]
        }
      ],
      "source": [
        "import pyspark\n",
        "\n",
        "spark = (\n",
        "    pyspark.sql.SparkSession.builder.appName(\"MyApp\")\n",
        "    .config(\n",
        "        \"spark.jars.packages\",\n",
        "        f\"com.microsoft.azure:synapseml_2.12:0.10.2,org.apache.hadoop:hadoop-azure:{pyspark.__version__},com.microsoft.azure:azure-storage:8.6.6\",\n",
        "    )\n",
        "    .config(\"spark.sql.debug.maxToStringFields\", \"100\")\n",
        "    .getOrCreate()\n",
        ")"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": 3,
      "metadata": {},
      "outputs": [
        {
          "name": "stdout",
          "output_type": "stream",
          "text": [
            "23/02/28 02:12:32 WARN MetricsConfig: Cannot locate configuration: tried hadoop-metrics2-azure-file-system.properties,hadoop-metrics2.properties\n",
            "records read: 6819\n",
            "Schema: \n",
            "root\n",
            " |-- Bankrupt?: integer (nullable = true)\n",
            " |--  ROA(C) before interest and depreciation before interest: double (nullable = true)\n",
            " |--  ROA(A) before interest and % after tax: double (nullable = true)\n",
            " |--  ROA(B) before interest and depreciation after tax: double (nullable = true)\n",
            " |--  Operating Gross Margin: double (nullable = true)\n",
            " |--  Realized Sales Gross Margin: double (nullable = true)\n",
            " |--  Operating Profit Rate: double (nullable = true)\n",
            " |--  Pre-tax net Interest Rate: double (nullable = true)\n",
            " |--  After-tax net Interest Rate: double (nullable = true)\n",
            " |--  Non-industry income and expenditure/revenue: double (nullable = true)\n",
            " |--  Continuous interest rate (after tax): double (nullable = true)\n",
            " |--  Operating Expense Rate: double (nullable = true)\n",
            " |--  Research and development expense rate: double (nullable = true)\n",
            " |--  Cash flow rate: double (nullable = true)\n",
            " |--  Interest-bearing debt interest rate: double (nullable = true)\n",
            " |--  Tax rate (A): double (nullable = true)\n",
            " |--  Net Value Per Share (B): double (nullable = true)\n",
            " |--  Net Value Per Share (A): double (nullable = true)\n",
            " |--  Net Value Per Share (C): double (nullable = true)\n",
            " |--  Persistent EPS in the Last Four Seasons: double (nullable = true)\n",
            " |--  Cash Flow Per Share: double (nullable = true)\n",
            " |--  Revenue Per Share (Yuan ??): double (nullable = true)\n",
            " |--  Operating Profit Per Share (Yuan ??): double (nullable = true)\n",
            " |--  Per Share Net profit before tax (Yuan ??): double (nullable = true)\n",
            " |--  Realized Sales Gross Profit Growth Rate: double (nullable = true)\n",
            " |--  Operating Profit Growth Rate: double (nullable = true)\n",
            " |--  After-tax Net Profit Growth Rate: double (nullable = true)\n",
            " |--  Regular Net Profit Growth Rate: double (nullable = true)\n",
            " |--  Continuous Net Profit Growth Rate: double (nullable = true)\n",
            " |--  Total Asset Growth Rate: double (nullable = true)\n",
            " |--  Net Value Growth Rate: double (nullable = true)\n",
            " |--  Total Asset Return Growth Rate Ratio: double (nullable = true)\n",
            " |--  Cash Reinvestment %: double (nullable = true)\n",
            " |--  Current Ratio: double (nullable = true)\n",
            " |--  Quick Ratio: double (nullable = true)\n",
            " |--  Interest Expense Ratio: double (nullable = true)\n",
            " |--  Total debt/Total net worth: double (nullable = true)\n",
            " |--  Debt ratio %: double (nullable = true)\n",
            " |--  Net worth/Assets: double (nullable = true)\n",
            " |--  Long-term fund suitability ratio (A): double (nullable = true)\n",
            " |--  Borrowing dependency: double (nullable = true)\n",
            " |--  Contingent liabilities/Net worth: double (nullable = true)\n",
            " |--  Operating profit/Paid-in capital: double (nullable = true)\n",
            " |--  Net profit before tax/Paid-in capital: double (nullable = true)\n",
            " |--  Inventory and accounts receivable/Net value: double (nullable = true)\n",
            " |--  Total Asset Turnover: double (nullable = true)\n",
            " |--  Accounts Receivable Turnover: double (nullable = true)\n",
            " |--  Average Collection Days: double (nullable = true)\n",
            " |--  Inventory Turnover Rate (times): double (nullable = true)\n",
            " |--  Fixed Assets Turnover Frequency: double (nullable = true)\n",
            " |--  Net Worth Turnover Rate (times): double (nullable = true)\n",
            " |--  Revenue per person: double (nullable = true)\n",
            " |--  Operating profit per person: double (nullable = true)\n",
            " |--  Allocation rate per person: double (nullable = true)\n",
            " |--  Working Capital to Total Assets: double (nullable = true)\n",
            " |--  Quick Assets/Total Assets: double (nullable = true)\n",
            " |--  Current Assets/Total Assets: double (nullable = true)\n",
            " |--  Cash/Total Assets: double (nullable = true)\n",
            " |--  Quick Assets/Current Liability: double (nullable = true)\n",
            " |--  Cash/Current Liability: double (nullable = true)\n",
            " |--  Current Liability to Assets: double (nullable = true)\n",
            " |--  Operating Funds to Liability: double (nullable = true)\n",
            " |--  Inventory/Working Capital: double (nullable = true)\n",
            " |--  Inventory/Current Liability: double (nullable = true)\n",
            " |--  Current Liabilities/Liability: double (nullable = true)\n",
            " |--  Working Capital/Equity: double (nullable = true)\n",
            " |--  Current Liabilities/Equity: double (nullable = true)\n",
            " |--  Long-term Liability to Current Assets: double (nullable = true)\n",
            " |--  Retained Earnings to Total Assets: double (nullable = true)\n",
            " |--  Total income/Total expense: double (nullable = true)\n",
            " |--  Total expense/Assets: double (nullable = true)\n",
            " |--  Current Asset Turnover Rate: double (nullable = true)\n",
            " |--  Quick Asset Turnover Rate: double (nullable = true)\n",
            " |--  Working capitcal Turnover Rate: double (nullable = true)\n",
            " |--  Cash Turnover Rate: double (nullable = true)\n",
            " |--  Cash Flow to Sales: double (nullable = true)\n",
            " |--  Fixed Assets to Assets: double (nullable = true)\n",
            " |--  Current Liability to Liability: double (nullable = true)\n",
            " |--  Current Liability to Equity: double (nullable = true)\n",
            " |--  Equity to Long-term Liability: double (nullable = true)\n",
            " |--  Cash Flow to Total Assets: double (nullable = true)\n",
            " |--  Cash Flow to Liability: double (nullable = true)\n",
            " |--  CFO to Assets: double (nullable = true)\n",
            " |--  Cash Flow to Equity: double (nullable = true)\n",
            " |--  Current Liability to Current Assets: double (nullable = true)\n",
            " |--  Liability-Assets Flag: double (nullable = true)\n",
            " |--  Net Income to Total Assets: double (nullable = true)\n",
            " |--  Total assets to GNP price: double (nullable = true)\n",
            " |--  No-credit Interval: double (nullable = true)\n",
            " |--  Gross Profit to Sales: double (nullable = true)\n",
            " |--  Net Income to Stockholder's Equity: double (nullable = true)\n",
            " |--  Liability to Equity: double (nullable = true)\n",
            " |--  Degree of Financial Leverage (DFL): double (nullable = true)\n",
            " |--  Interest Coverage Ratio (Interest expense to EBIT): double (nullable = true)\n",
            " |--  Net Income Flag: double (nullable = true)\n",
            " |--  Equity to Liability: double (nullable = true)\n",
            "\n"
          ]
        }
      ],
      "source": [
        "df = (\n",
        "    spark.read.format(\"csv\")\n",
        "    .option(\"header\", True)\n",
        "    .option(\"inferSchema\", True)\n",
        "    .load(\n",
        "        \"wasbs://publicwasb@mmlspark.blob.core.windows.net/company_bankruptcy_prediction_data.csv\"\n",
        "    )\n",
        ")\n",
        "# print dataset size\n",
        "print(\"records read: \" + str(df.count()))\n",
        "print(\"Schema: \")\n",
        "df.printSchema()"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "Split the dataset into train and test"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": 4,
      "metadata": {},
      "outputs": [],
      "source": [
        "train, test = df.randomSplit([0.8, 0.2], seed=41)"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "Add featurizer to convert features to vector"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": 5,
      "metadata": {},
      "outputs": [],
      "source": [
        "from pyspark.ml.feature import VectorAssembler\n",
        "\n",
        "feature_cols = df.columns[1:]\n",
        "featurizer = VectorAssembler(inputCols=feature_cols, outputCol=\"features\")\n",
        "train_data = featurizer.transform(train)[\"Bankrupt?\", \"features\"]\n",
        "test_data = featurizer.transform(test)[\"Bankrupt?\", \"features\"]"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "### Default SynapseML LightGBM"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": 6,
      "metadata": {},
      "outputs": [
        {
          "name": "stdout",
          "output_type": "stream",
          "text": [
            "23/02/28 02:12:42 WARN package: Truncated the string representation of a plan since it was too large. This behavior can be adjusted by setting 'spark.sql.debug.maxToStringFields'.\n",
            "[LightGBM] [Warning] Find whitespaces in feature_names, replace with underlines\n"
          ]
        },
        {
          "name": "stderr",
          "output_type": "stream",
          "text": [
            "                                                                                \r"
          ]
        }
      ],
      "source": [
        "from synapse.ml.lightgbm import LightGBMClassifier\n",
        "\n",
        "model = LightGBMClassifier(\n",
        "    objective=\"binary\", featuresCol=\"features\", labelCol=\"Bankrupt?\", isUnbalance=True\n",
        ")\n",
        "\n",
        "model = model.fit(train_data)"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "#### Model Prediction"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": 7,
      "metadata": {},
      "outputs": [
        {
          "data": {
            "text/plain": [
              "DataFrame[evaluation_type: string, confusion_matrix: matrix, accuracy: double, precision: double, recall: double, AUC: double]"
            ]
          },
          "metadata": {},
          "output_type": "display_data"
        },
        {
          "name": "stderr",
          "output_type": "stream",
          "text": [
            "[Stage 27:>                                                         (0 + 1) / 1]\r"
          ]
        },
        {
          "name": "stdout",
          "output_type": "stream",
          "text": [
            "+---------------+--------------------+-----------------+------------------+-------------------+------------------+\n",
            "|evaluation_type|    confusion_matrix|         accuracy|         precision|             recall|               AUC|\n",
            "+---------------+--------------------+-----------------+------------------+-------------------+------------------+\n",
            "| Classification|1250.0  23.0  \\n3...|0.958997722095672|0.3611111111111111|0.29545454545454547|0.6386934942512319|\n",
            "+---------------+--------------------+-----------------+------------------+-------------------+------------------+\n",
            "\n"
          ]
        },
        {
          "name": "stderr",
          "output_type": "stream",
          "text": [
            "                                                                                \r"
          ]
        }
      ],
      "source": [
        "def predict(model):\n",
        "    from synapse.ml.train import ComputeModelStatistics\n",
        "\n",
        "    predictions = model.transform(test_data)\n",
        "    # predictions.limit(10).show()\n",
        "    \n",
        "    metrics = ComputeModelStatistics(\n",
        "        evaluationMetric=\"classification\",\n",
        "        labelCol=\"Bankrupt?\",\n",
        "        scoredLabelsCol=\"prediction\",\n",
        "    ).transform(predictions)\n",
        "    display(metrics)\n",
        "    return metrics\n",
        "\n",
        "default_metrics = predict(model)\n",
        "default_metrics.show()"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "### Run FLAML\n",
        "In the FLAML automl run configuration, users can specify the task type, time budget, error metric, learner list, whether to subsample, resampling strategy type, and so on. All these arguments have default values which will be used if users do not provide them. "
      ]
    },
    {
      "cell_type": "code",
      "execution_count": 8,
      "metadata": {},
      "outputs": [],
      "source": [
        "''' import AutoML class from flaml package '''\n",
        "from flaml import AutoML\n",
        "from flaml.automl.spark.utils import to_pandas_on_spark\n",
        "\n",
        "automl = AutoML()"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": 9,
      "metadata": {},
      "outputs": [],
      "source": [
        "import os\n",
        "settings = {\n",
        "    \"time_budget\": 30,  # total running time in seconds\n",
        "    \"metric\": 'roc_auc',\n",
        "    \"estimator_list\": ['lgbm_spark'],  # list of ML learners; we tune lightgbm in this example\n",
        "    \"task\": 'classification',  # task type\n",
        "    \"log_file_name\": 'flaml_experiment.log',  # flaml log file\n",
        "    \"seed\": 41,    # random seed\n",
        "    \"force_cancel\": True,  # force stop training once time_budget is used up\n",
        "}"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "Disable Arrow optimization to omit below warning:\n",
        "```\n",
        "/opt/spark/python/lib/pyspark.zip/pyspark/sql/pandas/conversion.py:87: UserWarning: toPandas attempted Arrow optimization because 'spark.sql.execution.arrow.pyspark.enabled' is set to true; however, failed by the reason below:\n",
        "  Unsupported type in conversion to Arrow: VectorUDT\n",
        "Attempting non-optimization as 'spark.sql.execution.arrow.pyspark.fallback.enabled' is set to true.\n",
        "  warnings.warn(msg)\n",
        "```"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": 10,
      "metadata": {},
      "outputs": [],
      "source": [
        "spark.conf.set(\"spark.sql.execution.arrow.pyspark.enabled\", \"false\")"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": 11,
      "metadata": {},
      "outputs": [
        {
          "data": {
            "text/html": [
              "<div>\n",
              "<style scoped>\n",
              "    .dataframe tbody tr th:only-of-type {\n",
              "        vertical-align: middle;\n",
              "    }\n",
              "\n",
              "    .dataframe tbody tr th {\n",
              "        vertical-align: top;\n",
              "    }\n",
              "\n",
              "    .dataframe thead th {\n",
              "        text-align: right;\n",
              "    }\n",
              "</style>\n",
              "<table border=\"1\" class=\"dataframe\">\n",
              "  <thead>\n",
              "    <tr style=\"text-align: right;\">\n",
              "      <th></th>\n",
              "      <th>index</th>\n",
              "      <th>Bankrupt?</th>\n",
              "      <th>features</th>\n",
              "    </tr>\n",
              "  </thead>\n",
              "  <tbody>\n",
              "    <tr>\n",
              "      <th>0</th>\n",
              "      <td>0</td>\n",
              "      <td>0</td>\n",
              "      <td>[0.0828, 0.0693, 0.0884, 0.6468, 0.6468, 0.997...</td>\n",
              "    </tr>\n",
              "    <tr>\n",
              "      <th>1</th>\n",
              "      <td>1</td>\n",
              "      <td>0</td>\n",
              "      <td>[0.1606, 0.1788, 0.1832, 0.5897, 0.5897, 0.998...</td>\n",
              "    </tr>\n",
              "    <tr>\n",
              "      <th>2</th>\n",
              "      <td>2</td>\n",
              "      <td>0</td>\n",
              "      <td>[0.204, 0.2638, 0.2598, 0.4483, 0.4483, 0.9959...</td>\n",
              "    </tr>\n",
              "    <tr>\n",
              "      <th>3</th>\n",
              "      <td>3</td>\n",
              "      <td>0</td>\n",
              "      <td>[0.217, 0.1881, 0.2451, 0.5992, 0.5992, 0.9962...</td>\n",
              "    </tr>\n",
              "    <tr>\n",
              "      <th>4</th>\n",
              "      <td>4</td>\n",
              "      <td>0</td>\n",
              "      <td>[0.2314, 0.1628, 0.2068, 0.6001, 0.6001, 0.998...</td>\n",
              "    </tr>\n",
              "  </tbody>\n",
              "</table>\n",
              "</div>"
            ],
            "text/plain": [
              "   index  Bankrupt?                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                         features\n",
              "0      0          0                                  [0.0828, 0.0693, 0.0884, 0.6468, 0.6468, 0.9971, 0.7958, 0.8078, 0.3047, 0.78, 0.0027, 0.0029, 0.428, 0.0, 0.0, 0.1273, 0.1273, 0.1273, 0.1872, 0.3127, 0.0038, 0.062, 0.1482, 0.022, 0.8478, 0.6893, 0.6893, 0.2176, 0.0, 0.0002, 0.2628, 0.291, 0.0039, 0.0025, 0.6306, 0.0137, 0.1776, 0.8224, 0.005, 0.3696, 0.0054, 0.062, 0.1473, 0.3986, 0.1109, 0.0003, 0.0182, 7150000000.0, 0.0003, 0.0302, 0.0025, 0.3763, 0.0009, 0.6971, 0.262, 0.3948, 0.0918, 0.0025, 0.0027, 0.1828, 0.242, 0.2766, 0.0039, 0.984, 0.7264, 0.3382, 0.0, 0.0, 0.0021, 1.0, 3650000000.0, 2500000000.0, 0.5939, 3060000000.0, 0.6714, 0.4836, 0.984, 0.3382, 0.1109, 0.0, 0.3666, 0.0, 0.1653, 0.072, 0.0, 0.0, 0.0, 0.6237, 0.6468, 0.7483, 0.2847, 0.0268, 0.5652, 1.0, 0.0199]\n",
              "1      1          0  [0.1606, 0.1788, 0.1832, 0.5897, 0.5897, 0.9986, 0.7969, 0.8088, 0.3034, 0.781, 0.0003, 0.0002, 0.4434, 0.0002, 0.0, 0.1341, 0.1341, 0.1341, 0.1637, 0.2935, 0.0215, 0.0575, 0.1295, 0.0222, 0.848, 0.6894, 0.6894, 0.2176, 6700000000.0, 0.0003, 0.2646, 0.1561, 0.0075, 0.0016, 0.6306, 0.0275, 0.2228, 0.7772, 0.0061, 0.3952, 0.0054, 0.0574, 0.1285, 0.4264, 0.2579, 0.0218, 0.0003, 7550000000.0, 0.0029, 0.0569, 0.0184, 0.3689, 0.0009, 0.8013, 0.3721, 0.9357, 0.1842, 0.0028, 0.0042, 0.232, 0.2865, 0.2785, 0.0123, 1.0, 0.7403, 0.3506, 0.0, 0.811, 0.0019, 0.1083, 0.0001, 5310000000.0, 0.5939, 7880000000.0, 0.6715, 0.0499, 1.0, 0.3506, 0.1109, 0.463, 0.4385, 0.1781, 0.2476, 0.0388, 0.0, 0.5917, 4370000000.0, 0.6236, 0.5897, 0.8023, 0.2947, 0.0268, 0.5651, 1.0, 0.0151]\n",
              "2      2          0                      [0.204, 0.2638, 0.2598, 0.4483, 0.4483, 0.9959, 0.7937, 0.8063, 0.3034, 0.7782, 0.0007, 0.0004, 0.4511, 0.0003, 0.0, 0.1387, 0.1387, 0.1387, 0.1546, 0.263, 0.004, 0.0393, 0.0757, 0.0187, 0.8468, 0.6872, 0.6872, 0.2173, 0.0002, 0.0004, 0.2588, 0.1568, 0.0025, 0.0007, 0.6305, 0.04, 0.2419, 0.7581, 0.0048, 0.4073, 0.0054, 0.0394, 0.1165, 0.4142, 0.0315, 0.0009, 0.0074, 5310000000.0, 3030000000.0, 0.0195, 0.002, 0.3723, 0.0124, 0.6252, 0.1282, 0.3562, 0.0377, 0.0008, 0.0008, 0.2515, 0.3097, 0.2767, 0.0046, 1.0, 0.7042, 0.3617, 0.0, 0.8891, 0.0013, 0.0213, 0.0006, 0.0002, 0.5933, 0.0002, 0.6715, 0.5863, 1.0, 0.3617, 0.1109, 0.635, 0.4584, 0.3252, 0.3106, 0.1097, 0.0, 0.6816, 0.0003, 0.6221, 0.4483, 0.8117, 0.3038, 0.0268, 0.5651, 1.0, 0.0136]\n",
              "3      3          0                                                [0.217, 0.1881, 0.2451, 0.5992, 0.5992, 0.9962, 0.794, 0.8061, 0.3034, 0.7781, 0.0029, 0.0038, 0.4555, 0.0003, 0.0, 0.1277, 0.1277, 0.1277, 0.1387, 0.271, 0.0049, 0.0319, 0.0091, 0.022, 0.848, 0.6893, 0.6893, 0.2176, 9790000000.0, 0.0011, 0.2629, 0.0, 0.004, 0.004, 0.6305, 0.2222, 0.286, 0.714, 0.0052, 0.6137, 0.0054, 0.0608, 0.1361, 0.407, 0.039, 0.0008, 0.0078, 0.0002, 0.0006, 0.1497, 0.0091, 0.3072, 0.0015, 0.6671, 0.6679, 0.656, 0.6709, 0.004, 0.012, 0.2966, 0.3228, 0.2769, 0.0003, 1.0, 0.6453, 0.523, 0.0, 0.8015, 0.002, 0.112, 0.0008, 0.0008, 0.5937, 0.0022, 0.6723, 0.022, 1.0, 0.523, 0.1109, 0.9353, 0.4857, 0.402, 1.0, 0.0707, 0.0, 0.6196, 0.0011, 0.6236, 0.5992, 0.6346, 0.4359, 0.0268, 0.565, 1.0, 0.0108]\n",
              "4      4          0                      [0.2314, 0.1628, 0.2068, 0.6001, 0.6001, 0.9988, 0.796, 0.8078, 0.3015, 0.7801, 0.0003, 0.0002, 0.458, 0.0005, 0.0, 0.1351, 0.1351, 0.1351, 0.1599, 0.315, 0.0085, 0.088, 0.1271, 0.0223, 0.8481, 0.6894, 0.6894, 0.2176, 3860000000.0, 0.0003, 0.2633, 0.363, 0.011, 0.0072, 0.6306, 0.0214, 0.2081, 0.7919, 0.0053, 0.3832, 0.0123, 0.088, 0.1261, 0.3996, 0.0885, 0.0008, 0.0075, 0.0005, 0.0003, 0.025, 0.0108, 0.3855, 0.0044, 0.8522, 0.8464, 0.8194, 0.0331, 0.0111, 0.0013, 0.1393, 0.3341, 0.277, 0.0003, 0.637, 0.7459, 0.3384, 0.0024, 0.8278, 0.002, 0.184, 0.0003, 0.0003, 0.594, 3320000000.0, 0.6715, 0.1798, 0.637, 0.3384, 0.1171, 0.587, 0.4524, 0.521, 0.2972, 0.0265, 0.0, 0.5269, 0.0003, 0.6241, 0.6001, 0.7985, 0.2903, 0.0268, 0.5651, 1.0, 0.0164]"
            ]
          },
          "execution_count": 11,
          "metadata": {},
          "output_type": "execute_result"
        }
      ],
      "source": [
        "df = to_pandas_on_spark(to_pandas_on_spark(train_data).to_spark(index_col=\"index\"))\n",
        "\n",
        "df.head()"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": 12,
      "metadata": {},
      "outputs": [
        {
          "name": "stdout",
          "output_type": "stream",
          "text": [
            "[flaml.automl.automl: 02-28 02:12:59] {2922} INFO - task = classification\n",
            "[flaml.automl.automl: 02-28 02:13:00] {2924} INFO - Data split method: stratified\n",
            "[flaml.automl.automl: 02-28 02:13:00] {2927} INFO - Evaluation method: cv\n"
          ]
        },
        {
          "name": "stderr",
          "output_type": "stream",
          "text": [
            "/datadrive/spark/spark33/python/pyspark/pandas/utils.py:975: PandasAPIOnSparkAdviceWarning: `to_pandas` loads all data into the driver's memory. It should only be used if the resulting pandas Series is expected to be small.\n",
            "  warnings.warn(message, PandasAPIOnSparkAdviceWarning)\n"
          ]
        },
        {
          "name": "stdout",
          "output_type": "stream",
          "text": [
            "[flaml.automl.automl: 02-28 02:13:01] {3054} INFO - Minimizing error metric: 1-roc_auc\n",
            "[flaml.automl.automl: 02-28 02:13:01] {3209} INFO - List of ML learners in AutoML Run: ['lgbm_spark']\n",
            "[flaml.automl.automl: 02-28 02:13:01] {3539} INFO - iteration 0, current learner lgbm_spark\n"
          ]
        },
        {
          "name": "stderr",
          "output_type": "stream",
          "text": [
            "/datadrive/spark/spark33/python/pyspark/pandas/utils.py:975: PandasAPIOnSparkAdviceWarning: `to_numpy` loads all data into the driver's memory. It should only be used if the resulting NumPy ndarray is expected to be small.\n",
            "  warnings.warn(message, PandasAPIOnSparkAdviceWarning)\n",
            "/datadrive/spark/spark33/python/pyspark/pandas/utils.py:975: PandasAPIOnSparkAdviceWarning: If `index_col` is not specified for `to_spark`, the existing index is lost when converting to Spark DataFrame.\n",
            "  warnings.warn(message, PandasAPIOnSparkAdviceWarning)\n"
          ]
        },
        {
          "name": "stdout",
          "output_type": "stream",
          "text": [
            "[LightGBM] [Warning] Find whitespaces in feature_names, replace with underlines\n",
            "[LightGBM] [Warning] Find whitespaces in feature_names, replace with underlines\n"
          ]
        },
        {
          "name": "stderr",
          "output_type": "stream",
          "text": [
            "/datadrive/spark/spark33/python/pyspark/pandas/utils.py:975: PandasAPIOnSparkAdviceWarning: `to_numpy` loads all data into the driver's memory. It should only be used if the resulting NumPy ndarray is expected to be small.\n",
            "  warnings.warn(message, PandasAPIOnSparkAdviceWarning)\n"
          ]
        },
        {
          "name": "stdout",
          "output_type": "stream",
          "text": [
            "[flaml.automl.automl: 02-28 02:13:48] {3677} INFO - Estimated sufficient time budget=464999s. Estimated necessary time budget=465s.\n",
            "[flaml.automl.automl: 02-28 02:13:48] {3724} INFO -  at 48.5s,\testimator lgbm_spark's best error=0.0871,\tbest estimator lgbm_spark's best error=0.0871\n"
          ]
        },
        {
          "name": "stderr",
          "output_type": "stream",
          "text": [
            "/datadrive/spark/spark33/python/pyspark/pandas/utils.py:975: PandasAPIOnSparkAdviceWarning: If `index_col` is not specified for `to_spark`, the existing index is lost when converting to Spark DataFrame.\n",
            "  warnings.warn(message, PandasAPIOnSparkAdviceWarning)\n"
          ]
        },
        {
          "name": "stdout",
          "output_type": "stream",
          "text": [
            "[LightGBM] [Warning] Find whitespaces in feature_names, replace with underlines\n",
            "[LightGBM] [Warning] Find whitespaces in feature_names, replace with underlines\n",
            "[flaml.automl.automl: 02-28 02:13:54] {3988} INFO - retrain lgbm_spark for 6.2s\n",
            "[flaml.automl.automl: 02-28 02:13:54] {3995} INFO - retrained model: LightGBMClassifier_a2177c5be001\n",
            "[flaml.automl.automl: 02-28 02:13:54] {3239} INFO - fit succeeded\n",
            "[flaml.automl.automl: 02-28 02:13:54] {3240} INFO - Time taken to find the best model: 48.4579541683197\n"
          ]
        }
      ],
      "source": [
        "'''The main flaml automl API'''\n",
        "automl.fit(dataframe=df, label='Bankrupt?', labelCol=\"Bankrupt?\", isUnbalance=True, **settings)"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "### Best model and metric"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": 13,
      "metadata": {},
      "outputs": [
        {
          "name": "stdout",
          "output_type": "stream",
          "text": [
            "Best hyperparmeter config: {'numIterations': 4, 'numLeaves': 4, 'minDataInLeaf': 20, 'learningRate': 0.09999999999999995, 'log_max_bin': 8, 'featureFraction': 1.0, 'lambdaL1': 0.0009765625, 'lambdaL2': 1.0}\n",
            "Best roc_auc on validation data: 0.9129\n",
            "Training duration of best run: 6.237 s\n"
          ]
        }
      ],
      "source": [
        "''' retrieve best config'''\n",
        "print('Best hyperparmeter config:', automl.best_config)\n",
        "print('Best roc_auc on validation data: {0:.4g}'.format(1-automl.best_loss))\n",
        "print('Training duration of best run: {0:.4g} s'.format(automl.best_config_train_time))"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": 14,
      "metadata": {},
      "outputs": [
        {
          "data": {
            "text/plain": [
              "DataFrame[evaluation_type: string, confusion_matrix: matrix, accuracy: double, precision: double, recall: double, AUC: double]"
            ]
          },
          "metadata": {},
          "output_type": "display_data"
        },
        {
          "name": "stdout",
          "output_type": "stream",
          "text": [
            "+---------------+--------------------+------------------+-------------------+------------------+------------------+\n",
            "|evaluation_type|    confusion_matrix|          accuracy|          precision|            recall|               AUC|\n",
            "+---------------+--------------------+------------------+-------------------+------------------+------------------+\n",
            "| Classification|1218.0  55.0  \\n1...|0.9453302961275627|0.32926829268292684|0.6136363636363636|0.7852156680711276|\n",
            "+---------------+--------------------+------------------+-------------------+------------------+------------------+\n",
            "\n"
          ]
        }
      ],
      "source": [
        "flaml_metrics = predict(automl.model.estimator)\n",
        "flaml_metrics.show()"
      ]
    }
  ],
  "metadata": {
    "accelerator": "GPU",
    "colab": {
      "collapsed_sections": [],
      "include_colab_link": true,
      "name": "Copy of automl_nlp.ipynb",
      "provenance": []
    },
    "gpuClass": "standard",
    "kernelspec": {
      "display_name": "flaml-dev",
      "language": "python",
      "name": "python3"
    },
    "language_info": {
      "codemirror_mode": {
        "name": "ipython",
        "version": 3
      },
      "file_extension": ".py",
      "mimetype": "text/x-python",
      "name": "python",
      "nbconvert_exporter": "python",
      "pygments_lexer": "ipython3",
      "version": "3.10.8"
    },
    "vscode": {
      "interpreter": {
        "hash": "cbbf4d250a3560c7073bd6e01a7ecfe1c772dc45f2100f74412fcaea735f0880"
      }
    },
    "widgets": {}
  },
  "nbformat": 4,
  "nbformat_minor": 0
 }
--- a/notebook/research/autogen_code.ipynb
+++ b/notebook/research/autogen_code.ipynb
@ -15,7 +15,9 @@
    "\n",
    "# Use FLAML to Optimize Code Generation Performance\n",
    "\n",
-    "In this notebook, we optimize OpenAI models for code generation. We use [the HumanEval benchmark](https://huggingface.co/datasets/openai_humaneval) released by OpenAI for synthesizing programs from docstrings. \n",
+    "In this notebook, we optimize OpenAI models for code generation. We use [the HumanEval benchmark](https://huggingface.co/datasets/openai_humaneval) released by OpenAI for synthesizing programs from docstrings.\n",
    "\n",
    "Related link: [Blogpost](https://microsoft.github.io/FLAML/blog/2023/05/18/GPT-adaptive-humaneval) based on this experiment.\n",
    "\n",
    "## Requirements\n",
    "\n",
--- a/setup.py
+++ b/setup.py
@ -43,8 +43,6 @@ setuptools.setup(
        ],
        "notebook": [
            "jupyter",
            "matplotlib",
            "openml",
        ],
        "spark": [
            "pyspark>=3.2.0",
--- a/test/autogen/oai/test_completion.py
+++ b/test/autogen/oai/test_completion.py
@ -11,10 +11,8 @@ from flaml.autogen.code_utils import (
    generate_assertions,
    implement,
    generate_code,
    extract_code,
    improve_function,
    improve_code,
    execute_code,
 )
 from flaml.autogen.math_utils import eval_math_responses, solve_problem
@ -101,34 +99,6 @@ def test_multi_model():
    print(response)
@pytest.mark.skipif(
    sys.platform in ["darwin", "win32"],
    reason="do not run on MacOS or windows",
 )
 def test_execute_code():
    try:
        import docker
    except ImportError as exc:
        print(exc)
        return
    exitcode, msg = execute_code("print('hello world')", filename="tmp/codetest.py")
    assert exitcode == 0 and msg == b"hello world\n", msg
    # read a file
    print(execute_code("with open('tmp/codetest.py', 'r') as f: a=f.read()"))
    # create a file
    print(execute_code("with open('tmp/codetest.py', 'w') as f: f.write('b=1')", work_dir=f"{here}/my_tmp"))
    # execute code in a file
    print(execute_code(filename="tmp/codetest.py"))
    # execute code for assertion error
    exit_code, msg = execute_code("assert 1==2")
    assert exit_code, msg
    # execute code which takes a long time
    exit_code, error = execute_code("import time; time.sleep(2)", timeout=1)
    assert exit_code and error == "Timeout"
    exit_code, error = execute_code("import time; time.sleep(2)", timeout=1, use_docker=False)
    assert exit_code and error == "Timeout"
 def test_improve():
    try:
        import openai
@ -187,39 +157,7 @@ def test_nocontext():
        ],
    )
    print(code)
    # test extract_code from markdown
    code, _ = extract_code(
        """
 Example:
 ```
 print("hello extract code")
 ```
 """
    )
    print(code)
    code, _ = extract_code(
        """
 Example:
 ```python
 def scrape(url):
    import requests
    from bs4 import BeautifulSoup
    response = requests.get(url)
    soup = BeautifulSoup(response.text, "html.parser")
    title = soup.find("title").text
    text = soup.find("div", {"id": "bodyContent"}).text
    return title, text
 ```
 Test:
 ```python
 url = "https://en.wikipedia.org/wiki/Web_scraping"
 title, text = scrape(url)
 print(f"Title: {title}")
 print(f"Text: {text}")
 """
    )
    print(code)
    solution, cost = solve_problem("1+1=", config_list=oai.config_list_gpt4_gpt35(KEY_LOC))
    print(solution, cost)
@ -445,7 +383,6 @@ if __name__ == "__main__":
    # test_filter()
    # test_chatcompletion()
    # test_multi_model()
    # test_execute_code()
    # test_improve()
    # test_nocontext()
    test_humaneval(1)
--- a/test/autogen/oai/test_notebook.py
+++ b/test/autogen/oai/test_notebook.py
@ -45,7 +45,15 @@ def run_notebook(input_nb, output_nb="executed_openai_notebook.ipynb", save=Fals
@pytest.mark.skipif(
    skip or not sys.version.startswith("3.10"),
-    reason="do not run openai test if openai is not installed or py!=3.10",
+    reason="do not run if openai is not installed or py!=3.10",
 )
 def test_autogen_agent_auto_feedback_from_code(save=False):
    run_notebook("autogen_agent_auto_feedback_from_code_execution.ipynb", save=save)
@pytest.mark.skipif(
    skip or not sys.version.startswith("3.10"),
    reason="do not run if openai is not installed or py!=3.10",
 )
 def test_autogen_openai_completion(save=False):
    run_notebook("autogen_openai_completion.ipynb", save=save)
@ -53,7 +61,7 @@ def test_autogen_openai_completion(save=False):
@pytest.mark.skipif(
    skip or not sys.version.startswith("3.11"),
-    reason="do not run openai test if openai is not installed or py!=3.11",
+    reason="do not run if openai is not installed or py!=3.11",
 )
 def test_autogen_chatgpt_gpt4(save=False):
    run_notebook("autogen_chatgpt_gpt4.ipynb", save=save)
--- a/test/autogen/test_assistant_agent.py
+++ b/test/autogen/test_assistant_agent.py
@ -1,39 +1,63 @@
 import os
 from flaml.autogen.code_utils import extract_code
 from flaml import oai
 from flaml.autogen.agent import AssistantAgent, UserProxyAgent
 KEY_LOC = "test/autogen"
 here = os.path.abspath(os.path.dirname(__file__))
-def test_extract_code():
+def test_gpt35(human_input_mode="NEVER", max_consecutive_auto_reply=5):
-    print(extract_code("```bash\npython temp.py\n```"))
+    try:
-
+        import openai
-
+    except ImportError:
-def test_coding_agent(human_input_mode="NEVER", max_consecutive_auto_reply=10):
+        return
    config_list = oai.config_list_from_models(key_file_path=KEY_LOC, model_list=["gpt-3.5-turbo"])
    assistant = AssistantAgent(
        "coding_agent",
        request_timeout=600,
        seed=40,
        max_tokens=1024,
        config_list=config_list,
    )
    user = UserProxyAgent(
        "user",
        work_dir=f"{here}/test_agent_scripts",
        human_input_mode=human_input_mode,
        is_termination_msg=lambda x: x.rstrip().endswith("TERMINATE"),
        max_consecutive_auto_reply=max_consecutive_auto_reply,
        use_docker="python:3",
    )
    coding_task = "Print hello world to a file called hello.txt"
    assistant.receive(coding_task, user)
    # coding_task = "Create a powerpoint with the text hello world in it."
    # assistant.receive(coding_task, user)
    assistant.reset()
    coding_task = "Save a pandas df with 3 rows and 3 columns to disk."
    assistant.receive(coding_task, user)
 def test_create_execute_script(human_input_mode="NEVER", max_consecutive_auto_reply=10):
    try:
        import openai
    except ImportError:
        return
    from flaml.autogen.agent.coding_agent import PythonAgent
    from flaml.autogen.agent.user_proxy_agent import UserProxyAgent
    config_list = oai.config_list_gpt4_gpt35(key_file_path=KEY_LOC)
    conversations = {}
    oai.ChatCompletion.start_logging(conversations)
-    agent = PythonAgent("coding_agent", request_timeout=600, seed=42, config_list=config_list)
+    assistant = AssistantAgent("assistant", request_timeout=600, seed=42, config_list=config_list)
    user = UserProxyAgent(
        "user",
        human_input_mode=human_input_mode,
        max_consecutive_auto_reply=max_consecutive_auto_reply,
        is_termination_msg=lambda x: x.rstrip().endswith("TERMINATE"),
    )
-    agent.receive(
+    assistant.receive(
        """Create and execute a script to plot a rocket without using matplotlib""",
        user,
    )
-    agent.reset()
+    assistant.reset()
-    agent.receive(
+    assistant.receive(
        """Create a temp.py file with the following content:
 ```
 print('Hello world!')
@ -42,7 +66,7 @@ print('Hello world!')
    )
    print(conversations)
    oai.ChatCompletion.start_logging(compact=False)
-    agent.receive("""Execute temp.py""", user)
+    assistant.receive("""Execute temp.py""", user)
    print(oai.ChatCompletion.logged_history)
    oai.ChatCompletion.stop_logging()
@ -52,8 +76,6 @@ def test_tsp(human_input_mode="NEVER", max_consecutive_auto_reply=10):
        import openai
    except ImportError:
        return
    from flaml.autogen.agent.coding_agent import PythonAgent
    from flaml.autogen.agent.user_proxy_agent import UserProxyAgent
    config_list = oai.config_list_openai_aoai(key_file_path=KEY_LOC)
    hard_questions = [
@ -63,7 +85,7 @@ def test_tsp(human_input_mode="NEVER", max_consecutive_auto_reply=10):
    ]
    oai.ChatCompletion.start_logging()
-    agent = PythonAgent("coding_agent", temperature=0, config_list=config_list)
+    assistant = AssistantAgent("assistant", temperature=0, config_list=config_list)
    user = UserProxyAgent(
        "user",
        work_dir=f"{here}",
@ -74,14 +96,14 @@ def test_tsp(human_input_mode="NEVER", max_consecutive_auto_reply=10):
        prompt = f.read()
    # agent.receive(prompt.format(question=hard_questions[0]), user)
    # agent.receive(prompt.format(question=hard_questions[1]), user)
-    agent.receive(prompt.format(question=hard_questions[2]), user)
+    assistant.receive(prompt.format(question=hard_questions[2]), user)
    print(oai.ChatCompletion.logged_history)
    oai.ChatCompletion.stop_logging()
 if __name__ == "__main__":
-    # test_extract_code()
+    test_gpt35()
-    test_coding_agent(human_input_mode="TERMINATE")
+    test_create_execute_script(human_input_mode="TERMINATE")
    # when GPT-4, i.e., the DEFAULT_MODEL, is used, conversation in the following test
    # should terminate in 2-3 rounds of interactions (because is_termination_msg should be true after 2-3 rounds)
    # although the max_consecutive_auto_reply is set to 10.
--- a/test/autogen/test_code.py
+++ b/test/autogen/test_code.py
@ -0,0 +1,85 @@
 import sys
 import os
 import pytest
 from flaml.autogen.code_utils import UNKNOWN, extract_code, execute_code, infer_lang
 here = os.path.abspath(os.path.dirname(__file__))
 def test_infer_lang():
    assert infer_lang("print('hello world')") == "python"
    assert infer_lang("pip install flaml") == "sh"
 def test_extract_code():
    print(extract_code("```bash\npython temp.py\n```"))
    # test extract_code from markdown
    codeblocks = extract_code(
        """
 Example:
 ```
 print("hello extract code")
 ```
 """
    )
    print(codeblocks)
    codeblocks = extract_code(
        """
 Example:
 ```python
 def scrape(url):
    import requests
    from bs4 import BeautifulSoup
    response = requests.get(url)
    soup = BeautifulSoup(response.text, "html.parser")
    title = soup.find("title").text
    text = soup.find("div", {"id": "bodyContent"}).text
    return title, text
 ```
 Test:
 ```python
 url = "https://en.wikipedia.org/wiki/Web_scraping"
 title, text = scrape(url)
 print(f"Title: {title}")
 print(f"Text: {text}")
 """
    )
    print(codeblocks)
    codeblocks = extract_code("no code block")
    assert len(codeblocks) == 1 and codeblocks[0] == (UNKNOWN, "no code block")
@pytest.mark.skipif(
    sys.platform in ["darwin", "win32"],
    reason="do not run on MacOS or windows",
 )
 def test_execute_code():
    try:
        import docker
    except ImportError as exc:
        print(exc)
        return
    exitcode, msg, image = execute_code("print('hello world')", filename="tmp/codetest.py")
    assert exitcode == 0 and msg == b"hello world\n", msg
    # read a file
    print(execute_code("with open('tmp/codetest.py', 'r') as f: a=f.read()"))
    # create a file
    print(execute_code("with open('tmp/codetest.py', 'w') as f: f.write('b=1')", work_dir=f"{here}/my_tmp"))
    # execute code in a file
    print(execute_code(filename="tmp/codetest.py"))
    print(execute_code("python tmp/codetest.py", lang="sh"))
    # execute code for assertion error
    exit_code, msg, image = execute_code("assert 1==2")
    assert exit_code, msg
    # execute code which takes a long time
    exit_code, error, image = execute_code("import time; time.sleep(2)", timeout=1)
    assert exit_code and error == "Timeout"
    exit_code, error, image = execute_code("import time; time.sleep(2)", timeout=1, use_docker=False)
    assert exit_code and error == "Timeout" and image is None
 if __name__ == "__main__":
    # test_infer_lang()
    # test_extract_code()
    test_execute_code()
--- a/website/docs/Getting-Started.md
+++ b/website/docs/Getting-Started.md
@ -8,41 +8,47 @@ models, hyperparameters, and other tunable choices of an application.
 ### Main Features
-* For foundation models like the GPT serie and AI agents based on them, it automates the experimentation and optimization of their performance to maximize the effectiveness for applications and minimize the inference cost.
+* For foundation models like the GPT models, it automates the experimentation and optimization of their performance to maximize the effectiveness for applications and minimize the inference cost. FLAML enables users to build and use adaptive AI agents with minimal effort.
-* For common machine learning tasks like classification and regression, it quickly finds quality models for user-provided data with low computational resources.
+* For common machine learning tasks like classification and regression, it quickly finds quality models for user-provided data with low computational resources. It is easy to customize or extend. Users can find their desired customizability from a smooth range: minimal customization (computational resource budget), medium customization (e.g., search space and metric), or full customization (arbitrary training/inference/evaluation code).
 * It is easy to customize or extend. Users can find their desired customizability from a smooth range: minimal customization (computational resource budget), medium customization (e.g., scikit-style learner, search space and metric), or full customization (arbitrary training/inference/evaluation code). Users can customize only when and what they need to, and leave the rest to the library.
 * It supports fast and economical automatic tuning, capable of handling large search space with heterogeneous evaluation cost and complex constraints/guidance/early stopping. FLAML is powered by a [cost-effective
-hyperparameter optimization](Use-Cases/Tune-User-Defined-Function#hyperparameter-optimization-algorithm)
+hyperparameter optimization](/docs/Use-Cases/Tune-User-Defined-Function#hyperparameter-optimization-algorithm)
-and model selection method invented by Microsoft Research, and many followup [research studies](Research).
+and model selection method invented by Microsoft Research, and many followup [research studies](/docs/Research).
 ### Quickstart
-Install FLAML from pip: `pip install flaml`. Find more options in [Installation](Installation).
+Install FLAML from pip: `pip install flaml`. Find more options in [Installation](/docs/Installation).
 There are several ways of using flaml:
-#### (New) [Auto Generation](Use-Cases/Auto-Generation)
+#### (New) [Auto Generation](/docs/Use-Cases/Auto-Generation)
-For example, you can optimize generations by ChatGPT or GPT-4 etc. with your own tuning data, success metrics and budgets.
+Maximize the utility out of the expensive LLMs such as ChatGPT and GPT-4, including:
    - A drop-in replacement of `openai.Completion` or `openai.ChatCompletion` with powerful functionalites like tuning, caching, templating, filtering. For example, you can optimize generations by LLM with your own tuning data, success metrics and budgets.
    ```python
    from flaml import oai
-```python
+    # perform tuning
-from flaml import oai
+    config, analysis = oai.Completion.tune(
        data=tune_data,
        metric="success",
        mode="max",
        eval_func=eval_func,
        inference_budget=0.05,
        optimization_budget=3,
        num_samples=-1,
    )
-config, analysis = oai.Completion.tune(
+    # perform inference for a test instance
-    data=tune_data,
+    response = oai.Completion.create(context=test_instance, **config)
-    metric="success",
+    ```
-    mode="max",
+    - LLM-driven intelligent agents which can perform tasks autonomously or with human feedback, including tasks that require using tools via code. For example,
-    eval_func=eval_func,
+    ```python
-    inference_budget=0.05,
+    assistant = AssistantAgent("assistant")
-    optimization_budget=3,
+    user = UserProxyAgent("user", human_input_mode="TERMINATE")
-    num_samples=-1,
+    assistant.receive("Draw a rocket and save to a file named 'rocket.svg'")
-)
+    ```
 ```
-The automated experimentation and optimization can help you maximize the utility out of these expensive models.
+#### [Task-oriented AutoML](/docs/Use-Cases/task-oriented-automl)
 A suite of utilities are offered to accelerate the experimentation and application development, such as low-level inference API with caching, templating, filtering, and higher-level components like LLM-based coding and interactive agents.
 #### [Task-oriented AutoML](Use-Cases/task-oriented-automl)
 For example, with three lines of code, you can start using this economical and fast AutoML engine as a scikit-learn style estimator.
@ -52,14 +58,14 @@ automl = AutoML()
 automl.fit(X_train, y_train, task="classification", time_budget=60)
 ```
-It automatically tunes the hyperparameters and selects the best model from default learners such as LightGBM, XGBoost, random forest etc. for the specified time budget 60 seconds. [Customizing](Use-Cases/task-oriented-automl#customize-automlfit) the optimization metrics, learners and search spaces etc. is very easy. For example,
+It automatically tunes the hyperparameters and selects the best model from default learners such as LightGBM, XGBoost, random forest etc. for the specified time budget 60 seconds. [Customizing](/docs/Use-Cases/task-oriented-automl#customize-automlfit) the optimization metrics, learners and search spaces etc. is very easy. For example,
 ```python
 automl.add_learner("mylgbm", MyLGBMEstimator)
 automl.fit(X_train, y_train, task="classification", metric=custom_metric, estimator_list=["mylgbm"], time_budget=60)
 ```
-#### [Tune user-defined function](Use-Cases/Tune-User-Defined-Function)
+#### [Tune user-defined function](/docs/Use-Cases/Tune-User-Defined-Function)
 You can run generic hyperparameter tuning for a custom function (machine learning or beyond). For example,
@ -99,7 +105,7 @@ analysis = tune.run(
 ```
 Please see this [script](https://github.com/microsoft/FLAML/blob/main/test/tune_example.py) for the complete version of the above example.
-#### [Zero-shot AutoML](Use-Cases/Zero-Shot-AutoML)
+#### [Zero-shot AutoML](/docs/Use-Cases/Zero-Shot-AutoML)
 FLAML offers a unique, seamless and effortless way to leverage AutoML for the commonly used classifiers and regressors such as LightGBM and XGBoost. For example, if you are using `lightgbm.LGBMClassifier` as your current learner, all you need to do is to replace `from lightgbm import LGBMClassifier` by:
@ -111,11 +117,11 @@ Then, you can use it just like you use the original `LGMBClassifier`. Your other
 ### Where to Go Next?
-* Understand the use cases for [Auto Generation](Use-Cases/Auto-Generation), [Task-oriented AutoML](Use-Cases/Task-Oriented-Automl), [Tune user-defined function](Use-Cases/Tune-User-Defined-Function) and [Zero-shot AutoML](Use-Cases/Zero-Shot-AutoML).
+* Understand the use cases for [Auto Generation](/docs/Use-Cases/Auto-Generation), [Task-oriented AutoML](/docs/Use-Cases/Task-Oriented-Automl), [Tune user-defined function](/docs/Use-Cases/Tune-User-Defined-Function) and [Zero-shot AutoML](/docs/Use-Cases/Zero-Shot-AutoML).
-* Find code examples under "Examples": from [AutoGen - OpenAI](Examples/AutoGen-OpenAI) to [Tune - PyTorch](Examples/Tune-PyTorch).
+* Find code examples under "Examples": from [AutoGen - OpenAI](/docs/Examples/AutoGen-OpenAI) to [Tune - PyTorch](/docs/Examples/Tune-PyTorch).
-* Learn about [research](Research) around FLAML and check [blogposts](/blog).
+* Learn about [research](/docs/Research) around FLAML and check [blogposts](/blog).
 * Chat on [Discord](https://discord.gg/Cppx2vSPVP).
-If you like our project, please give it a [star](https://github.com/microsoft/FLAML/stargazers) on GitHub. If you are interested in contributing, please read [Contributor's Guide](Contribute).
+If you like our project, please give it a [star](https://github.com/microsoft/FLAML/stargazers) on GitHub. If you are interested in contributing, please read [Contributor's Guide](/docs/Contribute).
 <iframe src="https://ghbtns.com/github-btn.html?user=microsoft&amp;repo=FLAML&amp;type=star&amp;count=true&amp;size=large" frameborder="0" scrolling="0" width="170" height="30" title="GitHub"></iframe>
--- a/website/docs/Installation.md
+++ b/website/docs/Installation.md
@ -15,47 +15,60 @@ conda install flaml -c conda-forge
 ### Optional Dependencies
 #### [Auto Generation](Use-Cases/Auto-Generation)
 ```bash
 pip install "flaml[autogen]"
 ```
 #### [Task-oriented AutoML](Use-Cases/Task-Oriented-AutoML)
 ```bash
 pip install "flaml[automl]"
 ```
 #### Extra learners/models
 * openai models
 ```bash
 pip install "flaml[openai]"
 ```
 * catboost
 ```bash
 pip install "flaml[catboost]"
 ```
 * vowpal wabbit
 ```bash
 pip install "flaml[vw]"
 ```
 * time series forecaster: prophet, statsmodels
 ```bash
 pip install "flaml[forecast]"
 ```
 * huggingface transformers
 ```bash
 pip install "flaml[hf]"
 ```
 #### Notebook
 To run the [notebook examples](https://github.com/microsoft/FLAML/tree/main/notebook),
 install flaml with the [notebook] option:
 ```bash
-pip install flaml[notebook]
+pip install "flaml[notebook]"
 ```
 #### Extra learners/models
 * openai models
 ```bash
 pip install flaml[openai]
 ```
 * catboost
 ```bash
 pip install flaml[catboost]
 ```
 * vowpal wabbit
 ```bash
 pip install flaml[vw]
 ```
 * time series forecaster: prophet, statsmodels
 ```bash
 pip install flaml[forecast]
 ```
 * huggingface transformers
 ```bash
 pip install flaml[hf]
 ```
 #### Distributed tuning
 * ray
 ```bash
-pip install flaml[ray]
+pip install "flaml[ray]"
 ```
 * spark
 > *Spark support is added in v1.1.0*
 ```bash
-pip install flaml[spark]>=1.1.0
+pip install "flaml[spark]>=1.1.0"
 ```
 For cloud platforms such as [Azure Synapse](https://azure.microsoft.com/en-us/products/synapse-analytics/), Spark clusters are provided.
@ -76,11 +89,11 @@ export PATH=$PATH:$SPARK_HOME/bin
 * nni
 ```bash
-pip install flaml[nni]
+pip install "flaml[nni]"
 ```
 * blendsearch
 ```bash
-pip install flaml[blendsearch]
+pip install "flaml[blendsearch]"
 ```
 * synapse
--- a/website/docs/Research.md
+++ b/website/docs/Research.md
@ -82,13 +82,12 @@ For technical details, please check our research publications.
 * [Targeted Hyperparameter Optimization with Lexicographic Preferences Over Multiple Objectives](https://openreview.net/forum?id=0Ij9_q567Ma). Shaokun Zhang, Feiran Jia, Chi Wang, Qingyun Wu. ICLR 2023 (notable-top-5%).
 ```bibtex
-@inproceedings{
+@inproceedings{zhang2023targeted,
-zhang2023targeted,
+    title={Targeted Hyperparameter Optimization with Lexicographic Preferences Over Multiple Objectives},
-title={Targeted Hyperparameter Optimization with Lexicographic Preferences Over Multiple Objectives},
+    author={Shaokun Zhang and Feiran Jia and Chi Wang and Qingyun Wu},
-author={Shaokun Zhang and Feiran Jia and Chi Wang and Qingyun Wu},
+    booktitle={International Conference on Learning Representations},
-booktitle={International Conference on Learning Representations},
+    year={2023},
-year={2023},
+    url={https://openreview.net/forum?id=0Ij9_q567Ma},
 url={https://openreview.net/forum?id=0Ij9_q567Ma}
 }
 ```
--- a/website/docs/Use-Cases/Auto-Generation.md
+++ b/website/docs/Use-Cases/Auto-Generation.md
@ -4,8 +4,8 @@
 * Leveraging [`flaml.tune`](../reference/tune/tune) to adapt LLMs to applications, such that:
  - Maximize the utility out of using expensive foundation models.
  - Reduce the inference cost by using cheaper models or configurations which achieve equal or better performance.
-* An enhanced inference API with utilities like API unification, caching, error handling, multi-config inference, context programming etc.
+* An enhanced inference API as a drop-in replacement of `openai.Completion.create` or `openai.ChatCompletion.create` with utilities like API unification, caching, error handling, multi-config inference, context programming etc.
-* Higher-level utility functions like LLM-based coding and interactive agents.
+* Higher-level components like LLM-based intelligent agents which can perform tasks autonomously or with human feedback, including tasks that require using tools via code.
 The package is under active development with more features upcoming.
@ -32,7 +32,7 @@ There are also complex interactions among subsets of the hyperparameters. For ex
 the temperature and top_p are not recommended to be altered from their default values together because they both control the randomness of the generated text, and changing both at the same time can result in conflicting effects; n and best_of are rarely tuned together because if the application can process multiple outputs, filtering on the server side causes unnecessary information loss; both n and max_tokens will affect the total number of tokens generated, which in turn will affect the cost of the request.
 These interactions and trade-offs make it difficult to manually determine the optimal hyperparameter settings for a given text generation task.
-*Do the choices matter? Check this [blog post](/blog/2023/04/21/LLM-tuning-math) for a case study.*
+*Do the choices matter? Check this [blogpost](/blog/2023/04/21/LLM-tuning-math) to find example tuning results about gpt-3.5-turbo and gpt-4.*
 With `flaml.autogen`, the tuning can be performed with the following information:
@ -190,6 +190,8 @@ response = oai.Completion.create(
 The example above will try to use text-ada-001, gpt-3.5-turbo, and text-davinci-003 iteratively, until a valid json string is returned or the last config is used. One can also repeat the same model in the list for multiple times to try one model multiple times for increasing the robustness of the final response.
 *Advanced use case: Check this [blogpost](/blog/2023/05/18/GPT-adaptive-humaneval) to find how to improve GPT-4's coding performance from 68% to 90% while reducing the inference cost.*
 ### Templating
 If the provided prompt or message is a template, it will be automatically materialized with a given context. For example,
--- a/website/src/components/HomepageFeatures.js
+++ b/website/src/components/HomepageFeatures.js
@ -10,22 +10,33 @@ const FeatureList = [
      <>
        FLAML finds accurate models or configurations with low computational resources
        for common ML/AI tasks.
-        It frees users from selecting models and hyperparameters for training or inference.
+        It frees users from selecting models and hyperparameters for training or inference,
-        {/* It is fast and economical. */}
+        with smooth customizability.
      </>
    ),
  },
  {
-    title: 'Easy to Customize or Extend',
+    title: 'Adapt Large Language Models to Your Needs',
    Svg: require('../../static/img/extend.svg').default,
    description: (
      <>
-        FLAML is designed easy to extend, such as adding custom learners or metrics.
+        By automatically adapting LLMs to applications, FLAML
-        The customization level ranges smoothly from minimal
+        maximizes the benefits of expensive LLMs and reduce monetary cost.
-(training data and task type as only input) to full (tuning a user-defined function).
+        FLAML enables users to build and use intelligent adaptive AI agents with minimal effort.
      </>
    ),
  },
 //   {
 //     title: 'Easy to Customize or Extend',
 //     Svg: require('../../static/img/extend.svg').default,
 //     description: (
 //       <>
 //         FLAML is designed easy to extend, such as adding custom learners or metrics.
 //         The customization level ranges smoothly from minimal
 // (training data and task type as only input) to full (tuning a user-defined function).
 //       </>
 //     ),
 //   },
  {
    title: 'Tune It Fast, Tune It As You Like',
    Svg: require('../../static/img/fast.svg').default,