unify auto_reply; bug fix in UserProxyAgent; reorg agent hierarchy (#1142)

* simplify the initiation of chat * version update * include openai * completion * load config list from json * initiate_chat * oai config list * oai config list * config list * config_list * raise_error * retry_time * raise condition * oai config list * catch file not found * catch openml error * handle openml error * handle openml error * handle openml error * handle openml error * handle openml error * handle openml error * close #1139 * use property * termination msg * AIUserProxyAgent * smaller dev container * update notebooks * match * document code execution and AIUserProxyAgent * gpt 3.5 config list * rate limit * variable visibility * remove unnecessary import * quote * notebook comments * remove mathchat from init import * two users * import location * expose config * return str not tuple * rate limit * ipython user proxy * message * None result * rate limit * rate limit * rate limit * rate limit * make auto_reply a common method for all agents * abs path * refactor and doc * set mathchat_termination * code format * modified * emove import * code quality * sender -> messages * system message * clean agent hierarchy * dict check * invalid oai msg * return * openml error * docstr --------- Co-authored-by: kevin666aa <yrwu000627@gmail.com>
2023-07-25 16:46:11 -07:00 · 2023-07-25 16:46:11 -07:00 · 3e7aac6e8b
parent 2406e69496
commit 3e7aac6e8b
34 changed files with 804 additions and 690 deletions
--- a/flaml/autogen/agent/init.py
+++ b/flaml/autogen/agent/init.py
@ -1,5 +1,7 @@
 from .agent import Agent
+from .generic_agent import GenericAgent
 from .assistant_agent import AssistantAgent
-from .user_proxy_agent import UserProxyAgent, AIUserProxyAgent
+from .user_proxy_agent import UserProxyAgent

-__all__ = ["Agent", "AssistantAgent", "UserProxyAgent", "AIUserProxyAgent"]
+
+__all__ = ["Agent", "GenericAgent", "AssistantAgent", "UserProxyAgent"]
--- a/flaml/autogen/agent/agent.py
+++ b/flaml/autogen/agent/agent.py
@ -1,144 +1,33 @@
-from collections import defaultdict
-from typing import Callable, Dict, List, Optional, Union
-from flaml import oai
-from flaml.autogen.code_utils import DEFAULT_MODEL
+from typing import Dict, Union


 class Agent:
    """(Experimental) An abstract class for AI agent.
    An agent can communicate with other agents and perform actions.
    Different agents can differ in what actions they perform in the `receive` method.
-
    """

-    DEFAULT_CONFIG = {
-        "model": DEFAULT_MODEL,
-    }
-
    def __init__(
        self,
        name: str,
-        system_message: Optional[str] = "",
-        is_termination_msg: Optional[Callable[[Dict], bool]] = None,
-        **config,
    ):
        """
        Args:
-            name (str): name of the agent
-            system_message (str): system message to be sent to the agent.
-            is_termination_msg (function): a function that takes a message in the form of a dictionary
-                and returns a boolean value indicating if this received message is a termination message.
-                The dict can contain the following keys: "content", "role", "name", "function_call".
+            name (str): name of the agent.
        """
        # a dictionary of conversations, default value is list
-        self._oai_conversations = defaultdict(list)
        self._name = name
-        self._system_message = system_message
-        self._is_termination_msg = (
-            is_termination_msg if is_termination_msg is not None else (lambda x: x.get("content") == "TERMINATE")
-        )
-        self.config = self.DEFAULT_CONFIG.copy()
-        self.config.update(config)
-        self._sender_dict = {}

    @property
    def name(self):
        """Get the name of the agent."""
        return self._name

-    @property
-    def oai_conversations(self) -> Dict[str, List[Dict]]:
-        """a dictionary of conversations from name to list of oai messages"""
-        return self._oai_conversations
-
-    @staticmethod
-    def _message_to_dict(message: Union[Dict, str]):
-        """Convert a message to a dictionary.
-
-        The message can be a string or a dictionary. The string with be put in the "content" field of the new dictionary.
-        """
-        if isinstance(message, str):
-            return {"content": message}
-        else:
-            return message
-
-    def _append_oai_message(self, message: Union[Dict, str], role, conversation_id):
-        """Append a message to the openai conversation.
-
-        If the message received is a string, it will be put in the "content" field of the new dictionary.
-        If the message received is a dictionary but does not have any of the two fields "content" or "function_call",
-            this message is not a valid openai message and will be ignored.
-
-        Args:
-            message (dict or str): message to be appended to the openai conversation.
-            role (str): role of the message, can be "assistant" or "function".
-            conversation_id (str): id of the conversation, should be the name of the recipient or sender.
-        """
-        message = self._message_to_dict(message)
-        # create openai message to be appended to the openai conversation that can be passed to oai directly.
-        oai_message = {k: message[k] for k in ("content", "function_call", "name") if k in message}
-        if "content" not in oai_message and "function_call" not in oai_message:
-            return
-
-        oai_message["role"] = "function" if message.get("role") == "function" else role
-        self._oai_conversations[conversation_id].append(oai_message)
-
-    def send(self, message: Union[Dict, str], recipient):
-        """Send a message to another agent."""
-        # When the agent composes and sends the message, the role of the message is "assistant". (If 'role' exists and is 'function', it will remain unchanged.)
-        self._append_oai_message(message, "assistant", recipient.name)
-        recipient.receive(message, self)
+    def send(self, message: Union[Dict, str], recipient: "Agent"):
+        """(Aabstract method) Send a message to another agent."""

    def receive(self, message: Union[Dict, str], sender: "Agent"):
-        """Receive a message from another agent.
-        This method is called by the sender.
-        It needs to be overriden by the subclass to perform followup actions.
-
-        Args:
-            message (dict or str): message from the sender. If the type is dict, it may contain the following reserved fields (All fields are optional).
-                1. "content": content of the message, can be None.
-                2. "function_call": a dictionary containing the function name and arguments.
-                3. "role": role of the message, can be "assistant", "user", "function".
-                    This field is only needed to distinguish between "function" or "assistant"/"user".
-                4. "name": In most cases, this field is not needed. When the role is "function", this field is needed to indicate the function name.
-            sender: sender of an Agent instance.
-        """
-        if sender.name not in self._sender_dict:
-            self._sender_dict[sender.name] = sender
-            self._oai_conversations[sender.name] = [{"content": self._system_message, "role": "system"}]
-        message = self._message_to_dict(message)
-        # print the message received
-        print(sender.name, "(to", f"{self.name}):\n", flush=True)
-        if message.get("role") == "function":
-            func_print = f"***** Response from calling function \"{message['name']}\" *****"
-            print(func_print, flush=True)
-            print(message["content"], flush=True)
-            print("*" * len(func_print), flush=True)
-        else:
-            if message.get("content") is not None:
-                print(message["content"], flush=True)
-            if "function_call" in message:
-                func_print = f"***** Suggested function Call: {message['function_call'].get('name', '(No function name found)')} *****"
-                print(func_print, flush=True)
-                print(
-                    "Arguments: \n",
-                    message["function_call"].get("arguments", "(No arguments found)"),
-                    flush=True,
-                    sep="",
-                )
-                print("*" * len(func_print), flush=True)
-        print("\n", "-" * 80, flush=True, sep="")
-
-        # When the agent receives a message, the role of the message is "user". (If 'role' exists and is 'function', it will remain unchanged.)
-        self._append_oai_message(message, "user", sender.name)
-
-        # After the above, perform actions based on the message in a subclass.
+        """(Abstract method) Receive a message from another agent."""

    def reset(self):
-        """Reset the agent."""
-        self._sender_dict.clear()
-        self._oai_conversations.clear()
-
-    def _ai_reply(self, sender):
-        response = oai.ChatCompletion.create(messages=self._oai_conversations[sender.name], **self.config)
-        return oai.ChatCompletion.extract_text_or_function_call(response)[0]
+        """(Abstract method) Reset the agent."""
--- a/flaml/autogen/agent/assistant_agent.py
+++ b/flaml/autogen/agent/assistant_agent.py
@ -1,9 +1,17 @@
-from .agent import Agent
-from typing import Dict, Optional, Union
+from .generic_agent import GenericAgent
+from typing import Callable, Dict, Optional, Union


-class AssistantAgent(Agent):
-    """(Experimental) Assistant agent, able to suggest code blocks with default system message."""
+class AssistantAgent(GenericAgent):
+    """(Experimental) Assistant agent, designed to solve a task with LLM.
+
+    AssistantAgent is a subclass of GenericAgent configured with a default system message.
+    The default system message is designed to solve a task with LLM,
+    including suggesting python code blocks and debugging.
+    `human_input_mode` is default to "NEVER"
+    and `code_execution_config` is default to False.
+    This agent doesn't execute code by default, and expects the user to execute the code.
+    """

    DEFAULT_SYSTEM_MESSAGE = """You are a helpful AI assistant.
    In the following cases, suggest python code (in a python coding block) or shell script (in a sh coding block) for the user to execute. You must indicate the script type in the code block. The user cannot provide any other feedback or perform any other action beyond executing the code you suggest. The user can't modify your code. So do not suggest incomplete code which requires users to modify. Don't use a code block if it's not intended to be executed by the user.
@ -15,19 +23,41 @@ class AssistantAgent(Agent):
    Reply "TERMINATE" in the end when everything is done.
    """

-    def __init__(self, name: str, system_message: Optional[str] = DEFAULT_SYSTEM_MESSAGE, **kwargs):
+    def __init__(
+        self,
+        name: str,
+        system_message: Optional[str] = DEFAULT_SYSTEM_MESSAGE,
+        oai_config: Optional[Union[Dict, bool]] = None,
+        is_termination_msg: Optional[Callable[[Dict], bool]] = None,
+        max_consecutive_auto_reply: Optional[int] = None,
+        human_input_mode: Optional[str] = "NEVER",
+        code_execution_config: Optional[Union[Dict, bool]] = False,
+        **kwargs,
+    ):
        """
        Args:
            name (str): agent name.
-            system_message (str): system message to be sent to the agent.
-            **kwargs (dict): other kwargs allowed in
-              [Agent](agent#__init__).
+            system_message (str): system message for the oai inference.
+                Please override this attribute if you want to reprogram the agent.
+            oai_config (dict): oai inference configuration.
+                Please refer to [oai.Completion.create](/docs/reference/autogen/oai/completion#create)
+                for available options.
+            is_termination_msg (function): a function that takes a message in the form of a dictionary
+                and returns a boolean value indicating if this received message is a termination message.
+                The dict can contain the following keys: "content", "role", "name", "function_call".
+            max_consecutive_auto_reply (int): the maximum number of consecutive auto replies.
+                default to None (no limit provided, class attribute MAX_CONSECUTIVE_AUTO_REPLY will be used as the limit in this case).
+                The limit only plays a role when human_input_mode is not "ALWAYS".
+            **kwargs (dict): Please refer to other kwargs in
+                [GenericAgent](generic_agent#__init__).
        """
-        super().__init__(name, system_message, **kwargs)
-
-    def receive(self, message: Union[Dict, str], sender):
-        message = self._message_to_dict(message)
-        super().receive(message, sender)
-        if self._is_termination_msg(message):
-            return
-        self.send(self._ai_reply(sender), sender)
+        super().__init__(
+            name,
+            system_message,
+            is_termination_msg,
+            max_consecutive_auto_reply,
+            human_input_mode,
+            code_execution_config=code_execution_config,
+            oai_config=oai_config,
+            **kwargs,
+        )
--- a/flaml/autogen/agent/generic_agent.py
+++ b/flaml/autogen/agent/generic_agent.py
@ -0,0 +1,423 @@
+from collections import defaultdict
+import json
+from typing import Callable, Dict, List, Optional, Union
+from flaml import oai
+from .agent import Agent
+from flaml.autogen.code_utils import DEFAULT_MODEL, UNKNOWN, execute_code, extract_code, infer_lang
+
+
+class GenericAgent(Agent):
+    """(Experimental) An generic agent which can be configured as assistant or user proxy.
+
+    For example, AssistantAgent and UserProxyAgent are subclasses of GenericAgent,
+    configured with different default settings.
+    """
+
+    DEFAULT_CONFIG = {
+        "model": DEFAULT_MODEL,
+    }
+    MAX_CONSECUTIVE_AUTO_REPLY = 100  # maximum number of consecutive auto replies (subject to future change)
+
+    def __init__(
+        self,
+        name: str,
+        system_message: Optional[str] = "You are a helpful AI Assistant.",
+        is_termination_msg: Optional[Callable[[Dict], bool]] = None,
+        max_consecutive_auto_reply: Optional[int] = None,
+        human_input_mode: Optional[str] = "ALWAYS",
+        function_map: Optional[Dict[str, Callable]] = None,
+        code_execution_config: Optional[Union[Dict, bool]] = None,
+        oai_config: Optional[Union[Dict, bool]] = None,
+    ):
+        """
+        Args:
+            name (str): name of the agent.
+            system_message (str): system message for the oai inference.
+            is_termination_msg (function): a function that takes a message in the form of a dictionary
+                and returns a boolean value indicating if this received message is a termination message.
+                The dict can contain the following keys: "content", "role", "name", "function_call".
+            max_consecutive_auto_reply (int): the maximum number of consecutive auto replies.
+                default to None (no limit provided, class attribute MAX_CONSECUTIVE_AUTO_REPLY will be used as the limit in this case).
+                The limit only plays a role when human_input_mode is not "ALWAYS".
+            human_input_mode (str): whether to ask for human inputs every time a message is received.
+                Possible values are "ALWAYS", "TERMINATE", "NEVER".
+                (1) When "ALWAYS", the agent prompts for human input every time a message is received.
+                    Under this mode, the conversation stops when the human input is "exit",
+                    or when is_termination_msg is True and there is no human input.
+                (2) When "TERMINATE", the agent only prompts for human input only when a termination message is received or
+                    the number of auto reply reaches the max_consecutive_auto_reply.
+                (3) When "NEVER", the agent will never prompt for human input. Under this mode, the conversation stops
+                    when the number of auto reply reaches the max_consecutive_auto_reply or when is_termination_msg is True.
+            function_map (dict[str, callable]): Mapping function names (passed to openai) to callable functions.
+            code_execution_config (dict or False): config for the code execution.
+                To disable code execution, set to False. Otherwise, set to a dictionary with the following keys:
+                - work_dir (Optional, str): The working directory for the code execution.
+                    If None, a default working directory will be used.
+                    The default working directory is the "extensions" directory under
+                    "path_to_flaml/autogen".
+                - use_docker (Optional, list, str or bool): The docker image to use for code execution.
+                    If a list or a str of image name(s) is provided, the code will be executed in a docker container
+                    with the first image successfully pulled.
+                    If None, False or empty, the code will be executed in the current environment.
+                    Default is True, which will be converted into a list.
+                    If the code is executed in the current environment,
+                    the code must be trusted.
+                - timeout (Optional, int): The maximum execution time in seconds.
+            oai_config (dict or False): oai inference configuration.
+                Please refer to [oai.Completion.create](/docs/reference/autogen/oai/completion#create)
+                for available options.
+                To disable oai-based auto reply, set to False.
+        """
+        super().__init__(name)
+        # a dictionary of conversations, default value is list
+        self._oai_conversations = defaultdict(list)
+        self._system_message = system_message
+        self._oai_system_message = [{"content": self._system_message, "role": "system"}]
+        self._is_termination_msg = (
+            is_termination_msg if is_termination_msg is not None else (lambda x: x.get("content") == "TERMINATE")
+        )
+        if oai_config is False:
+            self.oai_config = False
+        else:
+            self.oai_config = self.DEFAULT_CONFIG.copy()
+            if isinstance(oai_config, dict):
+                self.oai_config.update(oai_config)
+
+        self._code_execution_config = {} if code_execution_config is None else code_execution_config
+        self.human_input_mode = human_input_mode
+        self.max_consecutive_auto_reply = (
+            max_consecutive_auto_reply if max_consecutive_auto_reply is not None else self.MAX_CONSECUTIVE_AUTO_REPLY
+        )
+        self._consecutive_auto_reply_counter = defaultdict(int)
+        self._function_map = {} if function_map is None else function_map
+
+    @property
+    def oai_conversations(self) -> Dict[str, List[Dict]]:
+        """A dictionary of conversations from name to list of oai messages."""
+        return self._oai_conversations
+
+    @property
+    def use_docker(self) -> Union[bool, str, None]:
+        """Bool value of whether to use docker to execute the code,
+        or str value of the docker image name to use, or None when code execution is disabled."""
+        return None if self._code_execution_config is False else self._code_execution_config.get("use_docker")
+
+    @staticmethod
+    def _message_to_dict(message: Union[Dict, str]):
+        """Convert a message to a dictionary.
+
+        The message can be a string or a dictionary. The string with be put in the "content" field of the new dictionary.
+        """
+        if isinstance(message, str):
+            return {"content": message}
+        else:
+            return message
+
+    def _append_oai_message(self, message: Union[Dict, str], role, conversation_id) -> bool:
+        """Append a message to the oai conversation.
+
+        If the message received is a string, it will be put in the "content" field of the new dictionary.
+        If the message received is a dictionary but does not have any of the two fields "content" or "function_call",
+            this message is not a valid oai message and will be ignored.
+
+        Args:
+            message (dict or str): message to be appended to the oai conversation.
+            role (str): role of the message, can be "assistant" or "function".
+            conversation_id (str): id of the conversation, should be the name of the recipient or sender.
+
+        Returns:
+            bool: whether the message is appended to the oai conversation.
+        """
+        message = self._message_to_dict(message)
+        # create oai message to be appended to the oai conversation that can be passed to oai directly.
+        oai_message = {k: message[k] for k in ("content", "function_call", "name") if k in message}
+        if "content" not in oai_message and "function_call" not in oai_message:
+            return False
+
+        oai_message["role"] = "function" if message.get("role") == "function" else role
+        self._oai_conversations[conversation_id].append(oai_message)
+        return True
+
+    def send(self, message: Union[Dict, str], recipient: "Agent"):
+        """Send a message to another agent."""
+        # When the agent composes and sends the message, the role of the message is "assistant". (If 'role' exists and is 'function', it will remain unchanged.)
+        valid = self._append_oai_message(message, "assistant", recipient.name)
+        if valid:
+            recipient.receive(message, self)
+
+    def _print_received_message(self, message: Union[Dict, str], sender: "Agent"):
+        # print the message received
+        print(sender.name, "(to", f"{self.name}):\n", flush=True)
+        if message.get("role") == "function":
+            func_print = f"***** Response from calling function \"{message['name']}\" *****"
+            print(func_print, flush=True)
+            print(message["content"], flush=True)
+            print("*" * len(func_print), flush=True)
+        else:
+            if message.get("content") is not None:
+                print(message["content"], flush=True)
+            if "function_call" in message:
+                func_print = f"***** Suggested function Call: {message['function_call'].get('name', '(No function name found)')} *****"
+                print(func_print, flush=True)
+                print(
+                    "Arguments: \n",
+                    message["function_call"].get("arguments", "(No arguments found)"),
+                    flush=True,
+                    sep="",
+                )
+                print("*" * len(func_print), flush=True)
+        print("\n", "-" * 80, flush=True, sep="")
+
+    def receive(self, message: Union[Dict, str], sender: "Agent"):
+        """Receive a message from another agent.
+
+        Once a message is received, this function sends a reply to the sender or stop.
+        The reply can be generated automatically or entered manually by a human.
+
+        Args:
+            message (dict or str): message from the sender. If the type is dict, it may contain the following reserved fields (All fields are optional).
+                1. "content": content of the message, can be None.
+                2. "function_call": a dictionary containing the function name and arguments.
+                3. "role": role of the message, can be "assistant", "user", "function".
+                    This field is only needed to distinguish between "function" or "assistant"/"user".
+                4. "name": In most cases, this field is not needed. When the role is "function", this field is needed to indicate the function name.
+            sender: sender of an Agent instance.
+        """
+        message = self._message_to_dict(message)
+        # When the agent receives a message, the role of the message is "user". (If 'role' exists and is 'function', it will remain unchanged.)
+        valid = self._append_oai_message(message, "user", sender.name)
+        if not valid:
+            return
+        self._print_received_message(message, sender)
+
+        # default reply is empty (i.e., no reply, in this case we will try to generate auto reply)
+        reply = ""
+        if self.human_input_mode == "ALWAYS":
+            reply = self.get_human_input(
+                "Provide feedback to the sender. Press enter to skip and use auto-reply, or type 'exit' to end the conversation: "
+            )
+        elif self._consecutive_auto_reply_counter[
+            sender.name
+        ] >= self.max_consecutive_auto_reply or self._is_termination_msg(message):
+            if self.human_input_mode == "TERMINATE":
+                reply = self.get_human_input(
+                    "Please give feedback to the sender. (Press enter or type 'exit' to stop the conversation): "
+                )
+                reply = reply if reply else "exit"
+            else:
+                # this corresponds to the case when self._human_input_mode == "NEVER"
+                reply = "exit"
+        if reply == "exit" or (self._is_termination_msg(message) and not reply):
+            # reset the consecutive_auto_reply_counter
+            self._consecutive_auto_reply_counter[sender.name] = 0
+            return
+        if reply:
+            # reset the consecutive_auto_reply_counter
+            self._consecutive_auto_reply_counter[sender.name] = 0
+            self.send(reply, sender)
+            return
+
+        self._consecutive_auto_reply_counter[sender.name] += 1
+        if self.human_input_mode != "NEVER":
+            print("\n>>>>>>>> NO HUMAN INPUT RECEIVED. USING AUTO REPLY FOR THE USER...", flush=True)
+        self.send(self.auto_reply(self._oai_conversations[sender.name], default_reply=reply), sender)
+
+    def reset(self):
+        """Reset the agent."""
+        self._oai_conversations.clear()
+        self._consecutive_auto_reply_counter.clear()
+
+    def _oai_reply(self, messages: List[Dict]) -> Union[str, Dict]:
+        # TODO: #1143 handle token limit exceeded error
+        response = oai.ChatCompletion.create(messages=self._oai_system_message + messages, **self.oai_config)
+        return oai.ChatCompletion.extract_text_or_function_call(response)[0]
+
+    def auto_reply(self, messages: List[Dict], default_reply: Union[str, Dict] = "") -> Union[str, Dict]:
+        """Reply based on the conversation history.
+
+        First, execute function or code and return the result.
+        AI replies are generated only when no code execution is performed.
+        Subclasses can override this method to customize the reply.
+
+        Args:
+            messages: a list of messages in the conversation history.
+            default_reply (str or dict): default reply.
+
+        Returns:
+            str or dict: reply.
+        """
+        message = messages[-1]
+        if "function_call" in message:
+            _, func_return = self.execute_function(message["function_call"])
+            return func_return
+        if self._code_execution_config is False:
+            return default_reply if self.oai_config is False else self._oai_reply(messages)
+        code_blocks = extract_code(message["content"])
+        if len(code_blocks) == 1 and code_blocks[0][0] == UNKNOWN:
+            # no code block is found, lang should be `UNKNOWN`
+            return default_reply if self.oai_config is False else self._oai_reply(messages)
+        # try to execute the code
+        exitcode, logs = self.execute_code_blocks(code_blocks)
+        exitcode2str = "execution succeeded" if exitcode == 0 else "execution failed"
+        return f"exitcode: {exitcode} ({exitcode2str})\nCode output: {logs}"
+
+    def get_human_input(self, prompt: str) -> str:
+        """Get human input.
+
+        Override this method to customize the way to get human input.
+
+        Args:
+            prompt (str): prompt for the human input.
+
+        Returns:
+            str: human input.
+        """
+        reply = input(prompt)
+        return reply
+
+    def run_code(self, code, **kwargs):
+        """Run the code and return the result.
+
+        Override this function to modify the way to run the code.
+        Args:
+            code (str): the code to be executed.
+            **kwargs: other keyword arguments.
+
+        Returns:
+            A tuple of (exitcode, logs, image).
+            exitcode (int): the exit code of the code execution.
+            logs (bytes): the logs of the code execution.
+            image (str or None): the docker image used for the code execution.
+        """
+        return execute_code(code, **kwargs)
+
+    def execute_code_blocks(self, code_blocks):
+        """Execute the code blocks and return the result."""
+        logs_all = ""
+        for code_block in code_blocks:
+            lang, code = code_block
+            if not lang:
+                lang = infer_lang(code)
+            if lang in ["bash", "shell", "sh"]:
+                exitcode, logs, image = self.run_code(code, lang=lang, **self._code_execution_config)
+                logs = logs.decode("utf-8")
+            elif lang in ["python", "Python"]:
+                if code.startswith("# filename: "):
+                    filename = code[11 : code.find("\n")].strip()
+                else:
+                    filename = None
+                exitcode, logs, image = self.run_code(
+                    code,
+                    filename=filename,
+                    **self._code_execution_config,
+                )
+                logs = logs.decode("utf-8")
+            else:
+                # In case the language is not supported, we return an error message.
+                exitcode, logs, image = 1, f"unknown language {lang}", self._code_execution_config["use_docker"]
+                # raise NotImplementedError
+            self._code_execution_config["use_docker"] = image
+            logs_all += "\n" + logs
+            if exitcode != 0:
+                return exitcode, logs_all
+        return exitcode, logs_all
+
+    @staticmethod
+    def _format_json_str(jstr):
+        """Remove newlines outside of quotes, and handle JSON escape sequences.
+
+        1. this function removes the newline in the query outside of quotes otherwise json.loads(s) will fail.
+            Ex 1:
+            "{\n"tool": "python",\n"query": "print('hello')\nprint('world')"\n}" -> "{"tool": "python","query": "print('hello')\nprint('world')"}"
+            Ex 2:
+            "{\n  \"location\": \"Boston, MA\"\n}" -> "{"location": "Boston, MA"}"
+
+        2. this function also handles JSON escape sequences inside quotes,
+            Ex 1:
+            '{"args": "a\na\na\ta"}' -> '{"args": "a\\na\\na\\ta"}'
+        """
+        result = []
+        inside_quotes = False
+        last_char = " "
+        for char in jstr:
+            if last_char != "\\" and char == '"':
+                inside_quotes = not inside_quotes
+            last_char = char
+            if not inside_quotes and char == "\n":
+                continue
+            if inside_quotes and char == "\n":
+                char = "\\n"
+            if inside_quotes and char == "\t":
+                char = "\\t"
+            result.append(char)
+        return "".join(result)
+
+    def execute_function(self, func_call):
+        """Execute a function call and return the result.
+
+        Override this function to modify the way to execute a function call.
+
+        Args:
+            func_call: a dictionary extracted from openai message at key "function_call" with keys "name" and "arguments".
+
+        Returns:
+            A tuple of (is_exec_success, result_dict).
+            is_exec_success (boolean): whether the execution is successful.
+            result_dict: a dictionary with keys "name", "role", and "content". Value of "role" is "function".
+        """
+        func_name = func_call.get("name", "")
+        func = self._function_map.get(func_name, None)
+
+        is_exec_success = False
+        if func is not None:
+            # Extract arguments from a json-like string and put it into a dict.
+            input_string = self._format_json_str(func_call.get("arguments", "{}"))
+            try:
+                arguments = json.loads(input_string)
+            except json.JSONDecodeError as e:
+                arguments = None
+                content = f"Error: {e}\n You argument should follow json format."
+
+            # Try to execute the function
+            if arguments:
+                try:
+                    content = func(**arguments)
+                    is_exec_success = True
+                except Exception as e:
+                    content = f"Error: {e}"
+        else:
+            content = f"Error: Function {func_name} not found."
+
+        return is_exec_success, {
+            "name": func_name,
+            "role": "function",
+            "content": str(content),
+        }
+
+    def generate_init_message(self, **context) -> Union[str, Dict]:
+        """Generate the initial message for the agent.
+
+        Override this function to customize the initial message based on user's request.
+        If not overriden, "message" needs to be provided in the context.
+        """
+        return context["message"]
+
+    def initiate_chat(self, recipient, **context):
+        """Initiate a chat with the recipient agent.
+
+        `generate_init_message` is called to generate the initial message for the agent.
+
+        Args:
+            recipient: the recipient agent.
+            **context: any context information.
+                "message" needs to be provided if the `generate_init_message` method is not overridden.
+        """
+        self.send(self.generate_init_message(**context), recipient)
+
+    def register_function(self, function_map: Dict[str, Callable]):
+        """Register functions to the agent.
+
+        Args:
+            function_map: a dictionary mapping function names to functions.
+        """
+        self._function_map.update(function_map)
--- a/flaml/autogen/agent/math_user_proxy_agent.py
+++ b/flaml/autogen/agent/math_user_proxy_agent.py
@ -1,9 +1,9 @@
 import re
 import os
 from pydantic import BaseModel, Extra, root_validator
-from typing import Any, Callable, Dict, Optional, Union
+from typing import Any, Callable, Dict, List, Optional, Union
 from time import sleep
-from flaml.autogen.agent import UserProxyAgent, Agent
+from flaml.autogen.agent import UserProxyAgent
 from flaml.autogen.code_utils import UNKNOWN, extract_code, execute_code, infer_lang
 from flaml.autogen.math_utils import get_answer

@ -81,7 +81,7 @@ Problem: """,
 }


-def is_termination_msg_mathchat(message):
+def _is_termination_msg_mathchat(message):
    """Check if a message is a termination message."""
    if isinstance(message, dict):
        message = message.get("content")
@ -96,7 +96,7 @@ def is_termination_msg_mathchat(message):
    return not contain_code and get_answer(message) is not None and get_answer(message) != ""


-def add_print_to_last_line(s):
+def _add_print_to_last_line(s):
    """Add print() to the last line of a string."""
    # 1. check if there is already a print statement
    if "print(" in s:
@ -115,7 +115,7 @@ def add_print_to_last_line(s):
    return "\n".join(lines)


-def remove_print(s):
+def _remove_print(s):
    """remove all print statements from a string."""
    lines = s.splitlines()
    lines = [line for line in lines if not line.startswith("print(")]
@ -130,19 +130,16 @@ class MathUserProxyAgent(UserProxyAgent):
    def __init__(
        self,
        name: Optional[str] = "MathChatAgent",  # default set to MathChatAgent
-        system_message: Optional[str] = "",
-        is_termination_msg: Optional[Callable[[Dict], bool]] = None,
+        is_termination_msg: Optional[
+            Callable[[Dict], bool]
+        ] = _is_termination_msg_mathchat,  # terminate if \boxed{} in message
        human_input_mode: Optional[str] = "NEVER",  # Fully automated
-        function_map: Optional[Dict[str, Callable]] = None,
-        max_consecutive_auto_reply: Optional[int] = None,
-        code_execution_config: Optional[Dict] = None,
        max_invalid_q_per_step=3,  # a parameter needed in MathChat
-        **config,
+        **kwargs,
    ):
        """
        Args:
            name (str): name of the agent
-            system_message (str): system message to be sent to the agent
            is_termination_msg (function): a function that takes a message in the form of a dictionary and returns a boolean value indicating if this received message is a termination message.
                The dict can contain the following keys: "content", "role", "name", "function_call".
            human_input_mode (str): whether to ask for human inputs every time a message is received.
@ -152,37 +149,16 @@ class MathUserProxyAgent(UserProxyAgent):
                    or when is_termination_msg is True and there is no human input.
                (2) When "TERMINATE", the agent only prompts for human input only when a termination message is received or
                    the number of auto reply reaches the max_consecutive_auto_reply.
-                (3) When "NEVER", the agent will never prompt for human input. Under this mode, the conversation stops
+                (3) (Default) When "NEVER", the agent will never prompt for human input. Under this mode, the conversation stops
                    when the number of auto reply reaches the max_consecutive_auto_reply or when is_termination_msg is True.
-            function_map (dict[str, callable]): Mapping function names (passed to openai) to callable functions.
-            max_consecutive_auto_reply (int): the maximum number of consecutive auto replies.
-                default to None (no limit provided, class attribute MAX_CONSECUTIVE_AUTO_REPLY will be used as the limit in this case).
-                The limit only plays a role when human_input_mode is not "ALWAYS".
-            code_execution_config (dict or False): config for the code execution.
-                To disable code execution, set to False. Otherwise, set to a dictionary with the following keys:
-                - work_dir (Optional, str): The working directory for the code execution.
-                    If None, a default working directory will be used.
-                    The default working directory is the "extensions" directory under
-                    "path_to_flaml/autogen".
-                - use_docker (Optional, list, str or bool): The docker image to use for code execution.
-                    If a list or a str of image name(s) is provided, the code will be executed in a docker container
-                    with the first image successfully pulled.
-                    If None, False or empty, the code will be executed in the current environment.
-                    Default is True, which will be converted into a list.
-                    If the code is executed in the current environment,
-                    the code must be trusted.
            max_invalid_q_per_step (int): (ADDED) the maximum number of invalid queries per step.
-            **config (dict): other configurations.
+            **kwargs (dict): other kwargs in [UserProxyAgent](user_proxy_agent#__init__).
        """
        super().__init__(
            name=name,
-            system_message=system_message,
            is_termination_msg=is_termination_msg,
-            function_map=function_map,
            human_input_mode=human_input_mode,
-            max_consecutive_auto_reply=max_consecutive_auto_reply,
-            code_execution_config=code_execution_config,
-            **config,
+            **kwargs,
        )

        # fixed var
@ -222,7 +198,7 @@ class MathUserProxyAgent(UserProxyAgent):
        return PROMPTS[prompt_type] + problem

    def _reset(self):
-        self._oai_conversations.clear()
+        super().reset()
        self._valid_q_count = 0
        self._total_q_count = 0
        self._accum_invalid_q_per_step = 0
@ -237,7 +213,7 @@ class MathUserProxyAgent(UserProxyAgent):
        """
        # Need to replace all "; " with "\n" to avoid syntax error when adding `print` to the last line
        pycode = pycode.replace("; ", "\n").replace(";", "\n")
-        pycode = self._previous_code + add_print_to_last_line(pycode)
+        pycode = self._previous_code + _add_print_to_last_line(pycode)

        return_code, output, _ = execute_code(pycode, **self._code_execution_config, timeout=5)
        is_success = return_code == 0
@ -271,7 +247,7 @@ class MathUserProxyAgent(UserProxyAgent):

        if is_success:
            # remove print and check if it still works
-            tmp = self._previous_code + "\n" + remove_print(pycode) + "\n"
+            tmp = self._previous_code + "\n" + _remove_print(pycode) + "\n"
            rcode, _, _ = execute_code(tmp, **self._code_execution_config)
        else:
            # only add imports and check if it works
@ -286,11 +262,14 @@ class MathUserProxyAgent(UserProxyAgent):
        return output, is_success

    def execute_one_wolfram_query(self, query: str):
-        """
-        Run one wolfram query and return the output.
-        return:
-            output: string with the output of the query
-            is_success: boolean indicating whether the query was successful
+        """Run one wolfram query and return the output.
+
+        Args:
+            query: string of the query.
+
+        Returns:
+            output: string with the output of the query.
+            is_success: boolean indicating whether the query was successful.
        """
        # wolfram query handler
        wolfram = WolframAlphaAPIWrapper()
@ -300,9 +279,9 @@ class MathUserProxyAgent(UserProxyAgent):
            is_success = False
        return output, is_success

-    def auto_reply(self, sender: "Agent", default_reply: Union[str, Dict] = ""):
+    def auto_reply(self, messages: List[Dict], default_reply: Union[str, Dict] = "") -> Union[str, Dict]:
        """Generate an auto reply."""
-        message = self.oai_conversations[sender.name][-1]
+        message = messages[-1]
        message = message.get("content", "")
        code_blocks = extract_code(message)

@ -345,7 +324,7 @@ class MathUserProxyAgent(UserProxyAgent):
        return reply


-# Imported from langchain. Langchain is licensed under MIT License:
+# Modified based on langchain. Langchain is licensed under MIT License:
 # The MIT License

 # Copyright (c) Harrison Chase
@ -385,7 +364,6 @@ def get_from_dict_or_env(data: Dict[str, Any], key: str, env_key: str, default:
        )


-# Imported from langchain
 class WolframAlphaAPIWrapper(BaseModel):
    """Wrapper for Wolfram Alpha.

@ -453,11 +431,11 @@ class WolframAlphaAPIWrapper(BaseModel):
                )
            assumption = next(res.pods).text
            answer = ""
-            for r in res["pod"]:
-                if r["@title"] == "Solution":
-                    answer = r["subpod"]["plaintext"]
-                if r["@title"] == "Results" or r["@title"] == "Solutions":
-                    for i, sub in enumerate(r["subpod"]):
+            for result in res["pod"]:
+                if result["@title"] == "Solution":
+                    answer = result["subpod"]["plaintext"]
+                if result["@title"] == "Results" or result["@title"] == "Solutions":
+                    for i, sub in enumerate(result["subpod"]):
                        answer += f"ans {i}: " + sub["plaintext"] + "\n"
                    break
            if answer == "":
@ -472,6 +450,5 @@ class WolframAlphaAPIWrapper(BaseModel):
        if answer is None or answer == "":
            # We don't want to return the assumption alone if answer is empty
            return "No good Wolfram Alpha Result was found", is_success
-        else:
-            is_success = True
-            return f"Assumption: {assumption} \nAnswer: {answer}", is_success
+        is_success = True
+        return f"Assumption: {assumption} \nAnswer: {answer}", is_success
--- a/flaml/autogen/agent/user_proxy_agent.py
+++ b/flaml/autogen/agent/user_proxy_agent.py
@ -1,33 +1,35 @@
-from .agent import Agent
-from flaml.autogen.code_utils import UNKNOWN, extract_code, execute_code, infer_lang
-from collections import defaultdict
-import json
+from .generic_agent import GenericAgent
 from typing import Callable, Dict, Optional, Union


-class UserProxyAgent(Agent):
-    """(Experimental) A proxy agent for the user, that can execute code and provide feedback to the other agents."""
+class UserProxyAgent(GenericAgent):
+    """(Experimental) A proxy agent for the user, that can execute code and provide feedback to the other agents.

-    MAX_CONSECUTIVE_AUTO_REPLY = 100  # maximum number of consecutive auto replies (subject to future change)
+    UserProxyAgent is a subclass of GenericAgent configured with `human_input_mode` to ALWAYS
+    and `oai_config` to False. By default, the agent will prompt for human input every time a message is received.
+    Code execution is enabled by default. LLM-based auto reply is disabled by default.
+    """

    def __init__(
        self,
        name: str,
-        system_message: Optional[str] = "",
        is_termination_msg: Optional[Callable[[Dict], bool]] = None,
+        max_consecutive_auto_reply: Optional[int] = None,
        human_input_mode: Optional[str] = "ALWAYS",
        function_map: Optional[Dict[str, Callable]] = None,
-        max_consecutive_auto_reply: Optional[int] = None,
-        code_execution_config: Optional[Dict] = None,
-        **config,
+        code_execution_config: Optional[Union[Dict, bool]] = None,
+        oai_config: Optional[Union[Dict, bool]] = False,
+        system_message: Optional[str] = "",
    ):
        """
        Args:
            name (str): name of the agent.
-            system_message (str): system message for the agent.
            is_termination_msg (function): a function that takes a message in the form of a dictionary
                and returns a boolean value indicating if this received message is a termination message.
                The dict can contain the following keys: "content", "role", "name", "function_call".
+            max_consecutive_auto_reply (int): the maximum number of consecutive auto replies.
+                default to None (no limit provided, class attribute MAX_CONSECUTIVE_AUTO_REPLY will be used as the limit in this case).
+                The limit only plays a role when human_input_mode is not "ALWAYS".
            human_input_mode (str): whether to ask for human inputs every time a message is received.
                Possible values are "ALWAYS", "TERMINATE", "NEVER".
                (1) When "ALWAYS", the agent prompts for human input every time a message is received.
@ -38,9 +40,6 @@ class UserProxyAgent(Agent):
                (3) When "NEVER", the agent will never prompt for human input. Under this mode, the conversation stops
                    when the number of auto reply reaches the max_consecutive_auto_reply or when is_termination_msg is True.
            function_map (dict[str, callable]): Mapping function names (passed to openai) to callable functions.
-            max_consecutive_auto_reply (int): the maximum number of consecutive auto replies.
-                default to None (no limit provided, class attribute MAX_CONSECUTIVE_AUTO_REPLY will be used as the limit in this case).
-                The limit only plays a role when human_input_mode is not "ALWAYS".
            code_execution_config (dict or False): config for the code execution.
                To disable code execution, set to False. Otherwise, set to a dictionary with the following keys:
                - work_dir (Optional, str): The working directory for the code execution.
@ -55,239 +54,20 @@ class UserProxyAgent(Agent):
                    If the code is executed in the current environment,
                    the code must be trusted.
                - timeout (Optional, int): The maximum execution time in seconds.
-            **config (dict): other configurations.
+            oai_config (dict or False): oai inference configuration.
+                Please refer to [oai.Completion.create](/docs/reference/autogen/oai/completion#create)
+                for available options.
+                Default to false, which disables oai-based auto reply.
+            system_message (str): system message for oai inference.
+                Only used when oai_config is not False. Use it to reprogram the agent.
        """
-        super().__init__(name, system_message, is_termination_msg)
-        self._code_execution_config = {} if code_execution_config is None else code_execution_config
-        self.human_input_mode = human_input_mode
-        self.max_consecutive_auto_reply = (
-            max_consecutive_auto_reply if max_consecutive_auto_reply is not None else self.MAX_CONSECUTIVE_AUTO_REPLY
+        super().__init__(
+            name,
+            system_message,
+            is_termination_msg,
+            max_consecutive_auto_reply,
+            human_input_mode,
+            function_map,
+            code_execution_config,
+            oai_config,
        )
-        self._consecutive_auto_reply_counter = defaultdict(int)
-        self._function_map = {} if function_map is None else function_map
-
-    @property
-    def use_docker(self) -> Union[bool, str, None]:
-        """bool value of whether to use docker to execute the code,
-        or str value of the docker image name to use, or None when code execution is disabled."""
-        return None if self._code_execution_config is False else self._code_execution_config.get("use_docker")
-
-    def _run_code(self, code, **kwargs):
-        """Run the code and return the result.
-
-        Args:
-            code (str): the code to be executed.
-            **kwargs: other keyword arguments.
-
-        Returns:
-            A tuple of (exitcode, logs, image).
-            exitcode (int): the exit code of the code execution.
-            logs (bytes): the logs of the code execution.
-            image (str or None): the docker image used for the code execution.
-        """
-        return execute_code(code, **kwargs)
-
-    def execute_code_blocks(self, code_blocks):
-        """Execute the code blocks and return the result."""
-        logs_all = ""
-        for code_block in code_blocks:
-            lang, code = code_block
-            if not lang:
-                lang = infer_lang(code)
-            if lang in ["bash", "shell", "sh"]:
-                exitcode, logs, image = self._run_code(code, lang=lang, **self._code_execution_config)
-                logs = logs.decode("utf-8")
-            elif lang in ["python", "Python"]:
-                if code.startswith("# filename: "):
-                    filename = code[11 : code.find("\n")].strip()
-                else:
-                    filename = None
-                exitcode, logs, image = self._run_code(
-                    code,
-                    filename=filename,
-                    **self._code_execution_config,
-                )
-                logs = logs.decode("utf-8")
-            else:
-                # In case the language is not supported, we return an error message.
-                exitcode, logs, image = 1, f"unknown language {lang}", self._code_execution_config["use_docker"]
-                # raise NotImplementedError
-            self._code_execution_config["use_docker"] = image
-            logs_all += "\n" + logs
-            if exitcode != 0:
-                return exitcode, logs_all
-        return exitcode, logs_all
-
-    @staticmethod
-    def _format_json_str(jstr):
-        """Remove newlines outside of quotes, and handle JSON escape sequences.
-
-        1. this function removes the newline in the query outside of quotes otherwise json.loads(s) will fail.
-            Ex 1:
-            "{\n"tool": "python",\n"query": "print('hello')\nprint('world')"\n}" -> "{"tool": "python","query": "print('hello')\nprint('world')"}"
-            Ex 2:
-            "{\n  \"location\": \"Boston, MA\"\n}" -> "{"location": "Boston, MA"}"
-
-        2. this function also handles JSON escape sequences inside quotes,
-            Ex 1:
-            '{"args": "a\na\na\ta"}' -> '{"args": "a\\na\\na\\ta"}'
-        """
-        result = []
-        inside_quotes = False
-        last_char = " "
-        for char in jstr:
-            if last_char != "\\" and char == '"':
-                inside_quotes = not inside_quotes
-            last_char = char
-            if not inside_quotes and char == "\n":
-                continue
-            if inside_quotes and char == "\n":
-                char = "\\n"
-            if inside_quotes and char == "\t":
-                char = "\\t"
-            result.append(char)
-        return "".join(result)
-
-    def _execute_function(self, func_call):
-        """Execute a function call and return the result.
-
-        Args:
-            func_call: a dictionary extracted from openai message at key "function_call" with keys "name" and "arguments".
-
-        Returns:
-            A tuple of (is_exec_success, result_dict).
-            is_exec_success (boolean): whether the execution is successful.
-            result_dict: a dictionary with keys "name", "role", and "content". Value of "role" is "function".
-        """
-        func_name = func_call.get("name", "")
-        func = self._function_map.get(func_name, None)
-
-        is_exec_success = False
-        if func is not None:
-            # Extract arguments from a json-like string and put it into a dict.
-            input_string = self._format_json_str(func_call.get("arguments", "{}"))
-            try:
-                arguments = json.loads(input_string)
-            except json.JSONDecodeError as e:
-                arguments = None
-                content = f"Error: {e}\n You argument should follow json format."
-
-            # Try to execute the function
-            if arguments:
-                try:
-                    content = func(**arguments)
-                    is_exec_success = True
-                except Exception as e:
-                    content = f"Error: {e}"
-        else:
-            content = f"Error: Function {func_name} not found."
-
-        return is_exec_success, {
-            "name": func_name,
-            "role": "function",
-            "content": str(content),
-        }
-
-    def auto_reply(self, sender: "Agent", default_reply: Union[str, Dict] = ""):
-        """Generate an auto reply."""
-        message = self.oai_conversations[sender.name][-1]
-        if "function_call" in message:
-            _, func_return = self._execute_function(message["function_call"])
-            return func_return
-        if self._code_execution_config is False:
-            return default_reply
-        code_blocks = extract_code(message["content"])
-        if len(code_blocks) == 1 and code_blocks[0][0] == UNKNOWN:
-            # no code block is found, lang should be `UNKNOWN`
-            return default_reply
-        # try to execute the code
-        exitcode, logs = self.execute_code_blocks(code_blocks)
-        exitcode2str = "execution succeeded" if exitcode == 0 else "execution failed"
-        return f"exitcode: {exitcode} ({exitcode2str})\nCode output: {logs}"
-
-    def receive(self, message: Union[Dict, str], sender):
-        """Receive a message from the sender agent.
-        Once a message is received, this function sends a reply to the sender or simply stop.
-        The reply can be generated automatically or entered manually by a human.
-        """
-        message = self._message_to_dict(message)
-        super().receive(message, sender)
-        # default reply is empty (i.e., no reply, in this case we will try to generate auto reply)
-        reply = ""
-        if self.human_input_mode == "ALWAYS":
-            reply = input(
-                "Provide feedback to the sender. Press enter to skip and use auto-reply, or type 'exit' to end the conversation: "
-            )
-        elif self._consecutive_auto_reply_counter[
-            sender.name
-        ] >= self.max_consecutive_auto_reply or self._is_termination_msg(message):
-            if self.human_input_mode == "TERMINATE":
-                reply = input(
-                    "Please give feedback to the sender. (Press enter or type 'exit' to stop the conversation): "
-                )
-                reply = reply if reply else "exit"
-            else:
-                # this corresponds to the case when self._human_input_mode == "NEVER"
-                reply = "exit"
-        if reply == "exit" or (self._is_termination_msg(message) and not reply):
-            # reset the consecutive_auto_reply_counter
-            self._consecutive_auto_reply_counter[sender.name] = 0
-            return
-        if reply:
-            # reset the consecutive_auto_reply_counter
-            self._consecutive_auto_reply_counter[sender.name] = 0
-            self.send(reply, sender)
-            return
-
-        self._consecutive_auto_reply_counter[sender.name] += 1
-        no_human_input = "NO HUMAN INPUT RECEIVED. " if self.human_input_mode != "NEVER" else ""
-        print(f"\n>>>>>>>> {no_human_input}USING AUTO REPLY FOR THE USER...", flush=True)
-        self.send(self.auto_reply(sender, default_reply=reply), sender)
-
-    def reset(self):
-        """Reset the agent."""
-        super().reset()
-        self._consecutive_auto_reply_counter.clear()
-
-    def generate_init_message(self, **context) -> Union[str, Dict]:
-        """Generate the initial message for the agent.
-
-        Override this function to customize the initial message based on user's request.
-        """
-        return context["message"]
-
-    def initiate_chat(self, recipient, **context):
-        """Initiate a chat with the recipient agent.
-
-        `generate_init_message` is called to generate the initial message for the agent.
-
-        Args:
-            recipient: the recipient agent.
-            **context: any context information.
-                "message" needs to be provided if the `generate_init_message` method is not overridden.
-        """
-        self.send(self.generate_init_message(**context), recipient)
-
-    def register_function(self, function_map: Dict[str, Callable]):
-        """Register functions to the agent.
-
-        Args:
-            function_map: a dictionary mapping function names to functions.
-        """
-        self._function_map.update(function_map)
-
-
-class AIUserProxyAgent(UserProxyAgent):
-    """(Experimental) A proxy agent for the user, that can execute code and provide feedback to the other agents.
-
-    Compared to UserProxyAgent, this agent can also generate AI replies.
-    Code execution is enabled by default. AI replies are generated only when no code execution is performed.
-    To disable code execution, set code_execution_config to False.
-    """
-
-    def auto_reply(self, sender: "Agent", default_reply: Union[str, Dict] = ""):
-        reply = super().auto_reply(sender, default_reply)
-        if reply == default_reply:
-            # try to generate AI reply
-            reply = self._ai_reply(sender)
-        return reply
--- a/flaml/automl/automl.py
+++ b/flaml/automl/automl.py
@ -232,7 +232,7 @@ class AutoML(BaseEstimator):
            seed: int or None, default=None | The random seed for hpo.
            n_concurrent_trials: [Experimental] int, default=1 | The number of
                concurrent trials. When n_concurrent_trials > 1, flaml performes
-                [parallel tuning](../../Use-Cases/Task-Oriented-AutoML#parallel-tuning)
+                [parallel tuning](/docs/Use-Cases/Task-Oriented-AutoML#parallel-tuning)
                and installation of ray or spark is required: `pip install flaml[ray]`
                or `pip install flaml[spark]`. Please check
                [here](https://spark.apache.org/docs/latest/api/python/getting_started/install.html)
@ -277,7 +277,7 @@ class AutoML(BaseEstimator):
                the metrics_to_log dictionary returned by a customized metric function.
                The customized metric function shall be provided via the `metric` key word
                argument of the fit() function or the automl constructor.
-                Find an example in the 4th constraint type in this [doc](../../Use-Cases/Task-Oriented-AutoML#constraint).
+                Find an example in the 4th constraint type in this [doc](/docs/Use-Cases/Task-Oriented-AutoML#constraint).
                If `pred_time_limit` is provided as one of keyword arguments to fit() function or
                the automl constructor, flaml will automatically (and under the hood)
                add it as an additional element in the metric_constraints. Essentially 'pred_time_limit'
@ -1368,7 +1368,7 @@ class AutoML(BaseEstimator):
            seed: int or None, default=None | The random seed for hpo.
            n_concurrent_trials: [Experimental] int, default=1 | The number of
                concurrent trials. When n_concurrent_trials > 1, flaml performes
-                [parallel tuning](../../Use-Cases/Task-Oriented-AutoML#parallel-tuning)
+                [parallel tuning](/docs/Use-Cases/Task-Oriented-AutoML#parallel-tuning)
                and installation of ray or spark is required: `pip install flaml[ray]`
                or `pip install flaml[spark]`. Please check
                [here](https://spark.apache.org/docs/latest/api/python/getting_started/install.html)
--- a/flaml/automl/nlp/huggingface/training_args.py
+++ b/flaml/automl/nlp/huggingface/training_args.py
@ -30,7 +30,7 @@ class TrainingArgumentsForAuto(TrainingArguments):
            When the task is sequence labeling/token classification, there are two formats of the labels:
            (1) The token labels, i.e., [B-PER, I-PER, B-LOC]; (2) Id labels. For (2), need to pass the label_list (e.g., [B-PER, I-PER, B-LOC])
            to convert the Id to token labels when computing the metric with metric_loss_score.
-            See the example in [a simple token classification example](../../../../Examples/AutoML-NLP#a-simple-token-classification-example).
+            See the example in [a simple token classification example](/docs/Examples/AutoML-NLP#a-simple-token-classification-example).
    """

    task: str = field(default="seq-classification")
--- a/flaml/tune/tune.py
+++ b/flaml/tune/tune.py
@ -369,7 +369,7 @@ def run(
        resources_per_trial: A dictionary of the hardware resources to allocate
            per trial, e.g., `{'cpu': 1}`. It is only valid when using ray backend
            (by setting 'use_ray = True'). It shall be used when you need to do
-            [parallel tuning](../../Use-Cases/Tune-User-Defined-Function#parallel-tuning).
+            [parallel tuning](/docs/Use-Cases/Tune-User-Defined-Function#parallel-tuning).
        config_constraints: A list of config constraints to be satisfied.
            e.g., ```config_constraints = [(mem_size, '<=', 1024**3)]```

--- a/notebook/autogen_agent_MathChat.ipynb
+++ b/notebook/autogen_agent_MathChat.ipynb
@ -15,7 +15,7 @@
   "source": [
    "# Using MathChat to Solve Math Problems\n",
    "\n",
-    "MathChat is a convesational framework for math problem solving. In this notebook, we demonstrate how to use MathChat to solve math problems. MathChat uses the `AssistantAgent` and `MathUserProxyAgent`, which is similar to the usage of `AssistantAgent` and `UserProxyAgent` in other notebooks (More details in `autogen_agent_auto_feedback_from_code_execution.ipynb`). Essentially, `MathUserProxyAgent` implements a different auto reply mechanism corresponding to the MathChat prompts. The original implementation and exeperiments of MathChat are in this [branch](https://github.com/kevin666aa/FLAML/tree/gpt_math_solver/flaml/autogen/math), and you can find more details in our paper [An Empirical Study on Challenging Math Problem Solving with GPT-4](https://arxiv.org/abs/2306.01337).\n",
+    "MathChat is a convesational framework for math problem solving. In this notebook, we demonstrate how to use MathChat to solve math problems. MathChat uses the `AssistantAgent` and `MathUserProxyAgent`, which is similar to the usage of `AssistantAgent` and `UserProxyAgent` in other notebooks (e.g., [Interactive LLM Agent with Auto Feedback from Code Execution](https://github.com/microsoft/FLAML/blob/main/notebook/autogen_agent_auto_feedback_from_code_execution.ipynb)). Essentially, `MathUserProxyAgent` implements a different auto reply mechanism corresponding to the MathChat prompts. The original implementation and exeperiments of MathChat are in this [branch](https://github.com/kevin666aa/FLAML/tree/gpt_math_solver/flaml/autogen/math), and you can find more details in our paper [An Empirical Study on Challenging Math Problem Solving with GPT-4](https://arxiv.org/abs/2306.01337).\n",
    "\n",
    "## Requirements\n",
    "\n",
@ -131,9 +131,11 @@
    "assistant = AssistantAgent(\n",
    "    name=\"assistant\", \n",
    "    system_message=\"You are a helpful assistant.\",\n",
-    "    request_timeout=600, \n",
-    "    seed=42, \n",
-    "    config_list=config_list,\n",
+    "    oai_config={\n",
+    "        \"request_timeout\": 600,\n",
+    "        \"seed\": 42,\n",
+    "        \"config_list\": config_list,\n",
+    "    }\n",
    ")\n",
    "\n",
    "# 2. create the MathUserProxyAgent instance named \"mathproxyagent\"\n",
--- a/notebook/autogen_agent_auto_feedback_from_code_execution.ipynb
+++ b/notebook/autogen_agent_auto_feedback_from_code_execution.ipynb
--- a/notebook/autogen_agent_function_call.ipynb
+++ b/notebook/autogen_agent_function_call.ipynb
@ -255,9 +255,9 @@
    "            },\n",
    "        }\n",
    "    ],\n",
-    "    \"function_call\": \"auto\",\n",
+    "    \"config_list\": config_list,\n",
    "}\n",
-    "chatbot = AssistantAgent(\"chatbot\", config_list=config_list, **oai_config)\n",
+    "chatbot = AssistantAgent(\"chatbot\", oai_config=oai_config)\n",
    "\n",
    "# create a UserProxyAgent instance named \"user\"\n",
    "user = UserProxyAgent(\n",
@ -413,14 +413,15 @@
    "            },\n",
    "        }\n",
    "    ],\n",
-    "    \"function_call\": \"auto\",\n",
+    "    \"config_list\": config_list,\n",
    "}\n",
-    "chatbot = AssistantAgent(\"chatbot\", sys_prompt, config_list=config_list, **oai_config)\n",
+    "chatbot = AssistantAgent(\"chatbot\", system_message=sys_prompt, oai_config=oai_config)\n",
    "\n",
    "# the key in `function_map` should match the function name in \"functions\" above\n",
    "# we register a class instance method directly\n",
    "user = UserProxyAgent(\n",
    "    \"user\",\n",
+    "    max_consecutive_auto_reply=2,\n",
    "    human_input_mode=\"NEVER\",\n",
    "    function_map={\"query_wolfram\": MathUserProxyAgent().execute_one_wolfram_query},\n",
    ")\n",
--- a/notebook/autogen_agent_human_feedback.ipynb
+++ b/notebook/autogen_agent_human_feedback.ipynb
@ -144,8 +144,10 @@
    "# create an AssistantAgent instance named \"assistant\"\n",
    "assistant = AssistantAgent(\n",
    "    name=\"assistant\",\n",
-    "    seed=41,\n",
-    "    config_list=config_list,\n",
+    "    oai_config={\n",
+    "        \"seed\": 41,\n",
+    "        \"config_list\": config_list,\n",
+    "    }\n",
    ")\n",
    "# create a UserProxyAgent instance named \"user\"\n",
    "user = UserProxyAgent(\n",
--- a/notebook/autogen_agent_planning.ipynb
+++ b/notebook/autogen_agent_planning.ipynb
@ -159,29 +159,29 @@
    "# create an AssistantAgent instance named \"assistant\"\n",
    "assistant = AssistantAgent(\n",
    "    name=\"assistant\",\n",
-    "    request_timeout=600,\n",
-    "    seed=42,\n",
-    "    # Excluding azure openai endpoints from the config list.\n",
-    "    # Change to `exclude=\"openai\"` to exclude openai endpoints, or remove the `exclude` argument to include both.\n",
-    "    config_list=oai.config_list_openai_aoai(exclude=\"aoai\"),\n",
-    "    model=\"gpt-4-0613\",  # make sure the endpoint you use supports the model\n",
-    "    temperature=0,\n",
-    "    functions=[\n",
-    "        {\n",
-    "            \"name\": \"ask_planner\",\n",
-    "            \"description\": \"ask planner to: 1. get a plan, 2. verify the execution result of the plan and potentially suggest new plan.\",\n",
-    "            \"parameters\": {\n",
-    "                \"type\": \"object\",\n",
-    "                \"properties\": {\n",
-    "                    \"message\": {\n",
-    "                        \"type\": \"string\",\n",
-    "                        \"description\": \"question to ask planner. Make sure the question include enough context, such as the code and the execution result. The planner does not know the conversation between you and the user, unless you share the conversation with the planner.\",\n",
+    "    oai_config={\n",
+    "        \"temperature\": 0,\n",
+    "        \"request_timeout\": 600,\n",
+    "        \"seed\": 42,\n",
+    "        \"model\": \"gpt-4-0613\",\n",
+    "        \"config_list\": oai.config_list_openai_aoai(exclude=\"aoai\"),\n",
+    "        \"functions\": [\n",
+    "            {\n",
+    "                \"name\": \"ask_planner\",\n",
+    "                \"description\": \"ask planner to: 1. get a plan, 2. verify the execution result of the plan and potentially suggest new plan.\",\n",
+    "                \"parameters\": {\n",
+    "                    \"type\": \"object\",\n",
+    "                    \"properties\": {\n",
+    "                        \"message\": {\n",
+    "                            \"type\": \"string\",\n",
+    "                            \"description\": \"question to ask planner. Make sure the question include enough context, such as the code and the execution result. The planner does not know the conversation between you and the user, unless you share the conversation with the planner.\",\n",
+    "                        },\n",
    "                    },\n",
+    "                    \"required\": [\"message\"],\n",
    "                },\n",
-    "                \"required\": [\"message\"],\n",
    "            },\n",
-    "        }\n",
-    "    ],\n",
+    "        ],\n",
+    "    }\n",
    ")\n",
    "\n",
    "# create a UserProxyAgent instance named \"user\"\n",
--- a/notebook/autogen_agent_two_users.ipynb
+++ b/notebook/autogen_agent_two_users.ipynb
@ -162,36 +162,37 @@
   "source": [
    "assistant_for_student = AssistantAgent(\n",
    "    name=\"assistant_for_student\",\n",
-    "    request_timeout=600,\n",
-    "    seed=42,\n",
-    "    # Excluding azure openai endpoints from the config list.\n",
-    "    # Change to `exclude=\"openai\"` to exclude openai endpoints, or remove the `exclude` argument to include both.\n",
-    "    config_list=oai.config_list_openai_aoai(exclude=\"aoai\"),\n",
-    "    model=\"gpt-4-0613\",  # make sure the endpoint you use supports the model\n",
-    "    temperature=0,\n",
-    "    functions=[\n",
-    "        {\n",
-    "            \"name\": \"ask_expert\",\n",
-    "            \"description\": \"ask expert when you can't solve the problem satisfactorily.\",\n",
-    "            \"parameters\": {\n",
-    "                \"type\": \"object\",\n",
-    "                \"properties\": {\n",
-    "                    \"message\": {\n",
-    "                        \"type\": \"string\",\n",
-    "                        \"description\": \"question to ask expert. Make sure the question include enough context, such as the code and the execution result. The expert does not know the conversation between you and the user, unless you share the conversation with the expert.\",\n",
+    "    oai_config={\n",
+    "        \"request_timeout\": 600,\n",
+    "        \"seed\": 42,\n",
+    "        # Excluding azure openai endpoints from the config list.\n",
+    "        # Change to `exclude=\"openai\"` to exclude openai endpoints, or remove the `exclude` argument to include both.\n",
+    "        \"config_list\": oai.config_list_openai_aoai(exclude=\"aoai\"),\n",
+    "        \"model\": \"gpt-4-0613\",  # make sure the endpoint you use supports the model\n",
+    "        \"temperature\": 0,\n",
+    "        \"functions\": [\n",
+    "            {\n",
+    "                \"name\": \"ask_expert\",\n",
+    "                \"description\": \"ask expert when you can't solve the problem satisfactorily.\",\n",
+    "                \"parameters\": {\n",
+    "                    \"type\": \"object\",\n",
+    "                    \"properties\": {\n",
+    "                        \"message\": {\n",
+    "                            \"type\": \"string\",\n",
+    "                            \"description\": \"question to ask expert. Make sure the question include enough context, such as the code and the execution result. The expert does not know the conversation between you and the user, unless you share the conversation with the expert.\",\n",
+    "                        },\n",
    "                    },\n",
+    "                    \"required\": [\"message\"],\n",
    "                },\n",
-    "                \"required\": [\"message\"],\n",
-    "            },\n",
-    "        }\n",
-    "    ],\n",
+    "            }\n",
+    "        ],\n",
+    "    }\n",
    ")\n",
    "\n",
    "student = UserProxyAgent(\n",
    "    name=\"student\",\n",
    "    human_input_mode=\"TERMINATE\",\n",
    "    max_consecutive_auto_reply=10,\n",
-    "    # is_termination_msg=lambda x: \"content\" in x and x[\"content\"] is not None and x[\"content\"].rstrip().endswith(\"TERMINATE\"),\n",
    "    code_execution_config={\"work_dir\": \"student\"},\n",
    "    function_map={\"ask_expert\": ask_expert},\n",
    ")"
--- a/notebook/autogen_agent_web_info.ipynb
+++ b/notebook/autogen_agent_web_info.ipynb
@ -135,11 +135,13 @@
    "# create an AssistantAgent instance named \"assistant\"\n",
    "assistant = AssistantAgent(\n",
    "    name=\"assistant\",\n",
-    "    request_timeout=600,\n",
-    "    seed=42,\n",
-    "    config_list=config_list,\n",
-    "    model=\"gpt-4-32k\",  # modify if the endpoint you use doesn't support this model\n",
-    "    temperature=0,\n",
+    "    oai_config={\n",
+    "        \"request_timeout\": 600,\n",
+    "        \"seed\": 42,\n",
+    "        \"config_list\": config_list,\n",
+    "        \"model\": \"gpt-4-32k\",  # modify if the endpoint you use doesn't support this model\n",
+    "        \"temperature\": 0,\n",
+    "    }\n",
    ")\n",
    "# create a UserProxyAgent instance named \"user\"\n",
    "user = UserProxyAgent(\n",
--- a/notebook/automl_classification.ipynb
+++ b/notebook/automl_classification.ipynb
@ -84,7 +84,7 @@
    "\n",
    "try:\n",
    "    X_train, X_test, y_train, y_test = load_openml_dataset(dataset_id=1169, data_dir='./')\n",
-    "except ServerError:\n",
+    "except (ServerError, Exception):\n",
    "    from sklearn.datasets import make_classification\n",
    "    from sklearn.model_selection import train_test_split\n",
    "    from pandas import DataFrame\n",
--- a/notebook/integrate_spark.ipynb
+++ b/notebook/integrate_spark.ipynb
--- a/test/autogen/test_assistant_agent.py
+++ b/test/autogen/test_assistant_agent.py
@ -1,4 +1,6 @@
 import os
+import sys
+import pytest
 from flaml import oai
 from flaml.autogen.agent import AssistantAgent, UserProxyAgent

@ -7,6 +9,54 @@ OAI_CONFIG_LIST = "OAI_CONFIG_LIST"
 here = os.path.abspath(os.path.dirname(__file__))


+@pytest.mark.skipif(
+    sys.platform in ["darwin", "win32"],
+    reason="do not run on MacOS or windows",
+)
+def test_ai_user_proxy_agent():
+    try:
+        import openai
+    except ImportError:
+        return
+
+    conversations = {}
+    oai.ChatCompletion.start_logging(conversations)
+
+    config_list = oai.config_list_from_json(
+        OAI_CONFIG_LIST,
+        file_location=KEY_LOC,
+    )
+    assistant = AssistantAgent(
+        "assistant",
+        system_message="You are a helpful assistant.",
+        oai_config={
+            "request_timeout": 600,
+            "seed": 42,
+            "config_list": config_list,
+        },
+    )
+
+    ai_user_proxy = UserProxyAgent(
+        name="ai_user",
+        human_input_mode="NEVER",
+        max_consecutive_auto_reply=2,
+        code_execution_config=False,
+        oai_config={
+            "config_list": config_list,
+        },
+        # In the system message the "user" always refers to ther other agent.
+        system_message="You ask a user for help. You check the answer from the user and provide feedback.",
+    )
+    assistant.reset()
+
+    math_problem = "$x^3=125$. What is x?"
+    ai_user_proxy.initiate_chat(
+        assistant,
+        message=math_problem,
+    )
+    print(conversations)
+
+
 def test_gpt35(human_input_mode="NEVER", max_consecutive_auto_reply=5):
    try:
        import openai
@ -27,10 +77,12 @@ def test_gpt35(human_input_mode="NEVER", max_consecutive_auto_reply=5):
    )
    assistant = AssistantAgent(
        "coding_agent",
-        # request_timeout=600,
-        seed=40,
-        max_tokens=1024,
-        config_list=config_list,
+        oai_config={
+            # "request_timeout": 600,
+            "seed": 42,
+            "config_list": config_list,
+            "max_tokens": 1024,
+        },
    )
    user = UserProxyAgent(
        "user",
@ -66,7 +118,14 @@ def test_create_execute_script(human_input_mode="NEVER", max_consecutive_auto_re
    config_list = oai.config_list_from_json(OAI_CONFIG_LIST, file_location=KEY_LOC)
    conversations = {}
    oai.ChatCompletion.start_logging(conversations)
-    assistant = AssistantAgent("assistant", request_timeout=600, seed=42, config_list=config_list)
+    assistant = AssistantAgent(
+        "assistant",
+        oai_config={
+            "request_timeout": 600,
+            "seed": 42,
+            "config_list": config_list,
+        },
+    )
    user = UserProxyAgent(
        "user",
        human_input_mode=human_input_mode,
@ -121,7 +180,7 @@ def test_tsp(human_input_mode="NEVER", max_consecutive_auto_reply=10):
            return self._prompt.format(question=question)

    oai.ChatCompletion.start_logging()
-    assistant = AssistantAgent("assistant", temperature=0, config_list=config_list)
+    assistant = AssistantAgent("assistant", oai_config={"temperature": 0, "config_list": config_list})
    user = TSPUserProxyAgent(
        "user",
        code_execution_config={"work_dir": here},
--- a/test/autogen/test_function_call.py
+++ b/test/autogen/test_function_call.py
@ -64,7 +64,7 @@ def test_eval_math_responses():
 def test_json_extraction():
    from flaml.autogen.agent import UserProxyAgent

-    user = UserProxyAgent(name="test", use_docker=False)
+    user = UserProxyAgent(name="test", code_execution_config={"use_docker": False})

    jstr = '{\n"location": "Boston, MA"\n}'
    assert user._format_json_str(jstr) == '{"location": "Boston, MA"}'
@ -88,24 +88,22 @@ def test_execute_function():

    # correct execution
    correct_args = {"name": "add_num", "arguments": '{ "num_to_be_added": 5 }'}
-    assert user._execute_function(func_call=correct_args)[1]["content"] == "15"
+    assert user.execute_function(func_call=correct_args)[1]["content"] == "15"

    # function name called is wrong or doesn't exist
    wrong_func_name = {"name": "subtract_num", "arguments": '{ "num_to_be_added": 5 }'}
-    assert "Error: Function" in user._execute_function(func_call=wrong_func_name)[1]["content"]
+    assert "Error: Function" in user.execute_function(func_call=wrong_func_name)[1]["content"]

    # arguments passed is not in correct json format
    wrong_json_format = {
        "name": "add_num",
        "arguments": '{ "num_to_be_added": 5, given_num: 10 }',
    }  # should be "given_num" with quotes
-    assert (
-        "You argument should follow json format." in user._execute_function(func_call=wrong_json_format)[1]["content"]
-    )
+    assert "You argument should follow json format." in user.execute_function(func_call=wrong_json_format)[1]["content"]

    # function execution error with wrong arguments passed
    wrong_args = {"name": "add_num", "arguments": '{ "num_to_be_added": 5, "given_num": 10 }'}
-    assert "Error: " in user._execute_function(func_call=wrong_args)[1]["content"]
+    assert "Error: " in user.execute_function(func_call=wrong_args)[1]["content"]

    # 2. test calling a class method
    class AddNum:
@ -118,8 +116,8 @@ def test_execute_function():

    user = UserProxyAgent(name="test", function_map={"add_num": AddNum(given_num=10).add})
    func_call = {"name": "add_num", "arguments": '{ "num_to_be_added": 5 }'}
-    assert user._execute_function(func_call=func_call)[1]["content"] == "15"
-    assert user._execute_function(func_call=func_call)[1]["content"] == "20"
+    assert user.execute_function(func_call=func_call)[1]["content"] == "15"
+    assert user.execute_function(func_call=func_call)[1]["content"] == "20"


 if __name__ == "__main__":
--- a/test/autogen/test_generic_agent.py
+++ b/test/autogen/test_generic_agent.py
@ -1,10 +1,16 @@
-def test_agent():
-    from flaml.autogen.agent import Agent
+import sys
+from io import StringIO
+import pytest
+from flaml.autogen.agent import GenericAgent

-    dummy_agent_1 = Agent(name="dummy_agent_1")
-    dummy_agent_2 = Agent(name="dummy_agent_2")

+def test_generic_agent(monkeypatch):
+    dummy_agent_1 = GenericAgent(name="dummy_agent_1")
+    dummy_agent_2 = GenericAgent(name="dummy_agent_2", human_input_mode="TERMINATE")
+
+    monkeypatch.setattr(sys, "stdin", StringIO("exit"))
    dummy_agent_1.receive("hello", dummy_agent_2)  # receive a str
+    monkeypatch.setattr(sys, "stdin", StringIO("TERMINATE\n\n"))
    dummy_agent_1.receive(
        {
            "content": "hello",
@ -19,10 +25,12 @@ def test_agent():
        dummy_agent_1.oai_conversations["dummy_agent_2"]
    ), "When the message is not an valid openai message, it should not be appended to the oai conversation."

-    dummy_agent_1.send("hello", dummy_agent_2)  # send a str
+    monkeypatch.setattr(sys, "stdin", StringIO("exit"))
+    dummy_agent_1.send("TERMINATE", dummy_agent_2)  # send a str
+    monkeypatch.setattr(sys, "stdin", StringIO("exit"))
    dummy_agent_1.send(
        {
-            "content": "hello",
+            "content": "TERMINATE",
        },
        dummy_agent_2,
    )  # send a dict
@ -37,4 +45,4 @@ def test_agent():


 if __name__ == "__main__":
-    test_agent()
+    test_generic_agent(pytest.monkeypatch)
--- a/test/autogen/test_math_user_proxy_agent.py
+++ b/test/autogen/test_math_user_proxy_agent.py
@ -1,5 +1,5 @@
 from flaml import oai
-from flaml.autogen.agent.math_user_proxy_agent import MathUserProxyAgent, remove_print, add_print_to_last_line
+from flaml.autogen.agent.math_user_proxy_agent import MathUserProxyAgent, _remove_print, _add_print_to_last_line
 import pytest
 import sys

@ -32,9 +32,11 @@ def test_math_user_proxy_agent():
    assistant = AssistantAgent(
        "assistant",
        system_message="You are a helpful assistant.",
-        request_timeout=600,
-        seed=42,
-        config_list=config_list,
+        oai_config={
+            "request_timeout": 600,
+            "seed": 42,
+            "config_list": config_list,
+        },
    )

    mathproxyagent = MathUserProxyAgent(name="MathChatAgent", human_input_mode="NEVER")
@ -51,15 +53,15 @@ def test_math_user_proxy_agent():
 def test_add_remove_print():
    # test add print
    code = "a = 4\nb = 5\na,b"
-    assert add_print_to_last_line(code) == "a = 4\nb = 5\nprint(a,b)"
+    assert _add_print_to_last_line(code) == "a = 4\nb = 5\nprint(a,b)"

    # test remove print
    code = """print("hello")\na = 4*5\nprint("wolrld")"""
-    assert remove_print(code) == "a = 4*5"
+    assert _remove_print(code) == "a = 4*5"

    # test remove print. Only remove prints without indentation
    code = "if 4 > 5:\n\tprint('True')"
-    assert remove_print(code) == code
+    assert _remove_print(code) == code


@pytest.mark.skipif(
--- a/test/autogen/test_user_proxy_agent.py
+++ b/test/autogen/test_user_proxy_agent.py
@ -1,55 +0,0 @@
-from flaml import oai
-from flaml.autogen.agent import AIUserProxyAgent, AssistantAgent
-import pytest
-import sys
-
-KEY_LOC = "test/autogen"
-OAI_CONFIG_LIST = "OAI_CONFIG_LIST"
-
-
-@pytest.mark.skipif(
-    sys.platform in ["darwin", "win32"],
-    reason="do not run on MacOS or windows",
-)
-def test_ai_user_proxy_agent():
-    try:
-        import openai
-    except ImportError:
-        return
-
-    conversations = {}
-    oai.ChatCompletion.start_logging(conversations)
-
-    config_list = oai.config_list_from_json(
-        OAI_CONFIG_LIST,
-        file_location=KEY_LOC,
-    )
-    assistant = AssistantAgent(
-        "assistant",
-        system_message="You are a helpful assistant.",
-        request_timeout=600,
-        seed=42,
-        config_list=config_list,
-    )
-
-    ai_user_proxy = AIUserProxyAgent(
-        name="ai_user",
-        human_input_mode="NEVER",
-        config_list=config_list,
-        max_consecutive_auto_reply=2,
-        code_execution_config=False,
-        # In the system message the "user" always refers to ther other agent.
-        system_message="You ask a user for help. You check the answer from the user and provide feedback.",
-    )
-    assistant.reset()
-
-    math_problem = "$x^3=125$. What is x?"
-    ai_user_proxy.initiate_chat(
-        assistant,
-        message=math_problem,
-    )
-    print(conversations)
-
-
-if __name__ == "__main__":
-    test_ai_user_proxy_agent()
--- a/test/automl/test_notebook_example.py
+++ b/test/automl/test_notebook_example.py
@ -31,6 +31,7 @@ def test_automl(budget=5, dataset_format="dataframe", hpo_method=None):
        urllib3.exceptions.ReadTimeoutError,
        SSLError,
        ServerError,
+        Exception,
    ) as e:
        print(e)
        return
@ -121,7 +122,7 @@ def test_mlflow():

    try:
        X_train, X_test, y_train, y_test = load_openml_task(task_id=7592, data_dir="test/")
-    except (OpenMLServerException, ChunkedEncodingError, SSLError, ServerError) as e:
+    except (OpenMLServerException, ChunkedEncodingError, SSLError, ServerError, Exception) as e:
        print(e)
        return
    """ import AutoML class from flaml package """
--- a/test/automl/test_split.py
+++ b/test/automl/test_split.py
@ -95,7 +95,7 @@ def test_stratified_groupkfold():

    try:
        X_train, _, y_train, _ = load_openml_dataset(dataset_id=1169, data_dir="test/")
-    except ServerError:
+    except (ServerError, Exception):
        return
    splitter = StratifiedGroupKFold(n_splits=5, shuffle=True, random_state=0)

--- a/test/automl/test_warmstart.py
+++ b/test/automl/test_warmstart.py
@ -114,7 +114,7 @@ class TestWarmStart(unittest.TestCase):

        try:
            X_train, X_test, y_train, y_test = load_openml_dataset(dataset_id=1169, data_dir="./")
-        except (OpenMLServerException, ChunkedEncodingError, SSLError, ServerError):
+        except (OpenMLServerException, ChunkedEncodingError, SSLError, ServerError, Exception):
            from sklearn.datasets import load_wine

            X_train, y_train = load_wine(return_X_y=True)
--- a/test/spark/test_exceptions.py
+++ b/test/spark/test_exceptions.py
@ -17,7 +17,7 @@ def base_automl(n_concurrent_trials=1, use_ray=False, use_spark=False, verbose=0

    try:
        X_train, X_test, y_train, y_test = load_openml_dataset(dataset_id=537, data_dir="./")
-    except ServerError:
+    except (ServerError, Exception):
        from sklearn.datasets import fetch_california_housing

        X_train, y_train = fetch_california_housing(return_X_y=True)
--- a/test/spark/test_performance.py
+++ b/test/spark/test_performance.py
@ -36,6 +36,7 @@ def run_automl(budget=3, dataset_format="dataframe", hpo_method=None):
        urllib3.exceptions.ReadTimeoutError,
        SSLError,
        ServerError,
+        Exception,
    ) as e:
        print(e)
        return
--- a/test/test_autovw.py
+++ b/test/test_autovw.py
@ -356,7 +356,7 @@ class TestAutoVW(unittest.TestCase):

        try:
            vw_oml_problem_args, vw_online_aml_problem = get_vw_tuning_problem()
-        except (SSLError, ServerError) as e:
+        except (SSLError, ServerError, Exception) as e:
            print(e)
            return
        vanilla_vw = pyvw.vw(**vw_oml_problem_args["fixed_hp_config"])
@ -372,7 +372,7 @@ class TestAutoVW(unittest.TestCase):
        # basic experiment setting
        try:
            vw_oml_problem_args, vw_online_aml_problem = get_vw_tuning_problem()
-        except (SSLError, ServerError) as e:
+        except (SSLError, ServerError, Exception) as e:
            print(e)
            return
        autovw = AutoVW(
@ -397,7 +397,7 @@ class TestAutoVW(unittest.TestCase):
            vw_oml_problem_args, vw_online_aml_problem = get_vw_tuning_problem(
                tuning_hp="NamesapceInteraction+LearningRate"
            )
-        except (SSLError, ServerError) as e:
+        except (SSLError, ServerError, Exception) as e:
            print(e)
            return

--- a/website/docs/Examples/Integrate
+++ b/website/docs/Examples/Integrate
@ -81,8 +81,9 @@ automl.fit(


 [Link to notebook](https://github.com/microsoft/FLAML/blob/main/notebook/automl_bankrupt_synapseml.ipynb) | [Open in colab](https://colab.research.google.com/github/microsoft/FLAML/blob/main/notebook/automl_bankrupt_synapseml.ipynb)
+
 ## Parallel Spark Jobs
-You can activate Spark as the parallel backend during parallel tuning in both [AutoML](../Use-Cases/Task-Oriented-AutoML#parallel-tuning) and [Hyperparameter Tuning](../Use-Cases/Tune-User-Defined-Function#parallel-tuning), by setting the `use_spark` to `true`. FLAML will dispatch your job to the distributed Spark backend using [`joblib-spark`](https://github.com/joblib/joblib-spark).
+You can activate Spark as the parallel backend during parallel tuning in both [AutoML](/docs/Use-Cases/Task-Oriented-AutoML#parallel-tuning) and [Hyperparameter Tuning](/docs/Use-Cases/Tune-User-Defined-Function#parallel-tuning), by setting the `use_spark` to `true`. FLAML will dispatch your job to the distributed Spark backend using [`joblib-spark`](https://github.com/joblib/joblib-spark).

 Please note that you should not set `use_spark` to `true` when applying AutoML and Tuning for Spark Data. This is because only SparkML models will be used for Spark Data in AutoML and Tuning. As SparkML models run in parallel, there is no need to distribute them with `use_spark` again.

--- a/website/docs/Use-Cases/Auto-Generation.md
+++ b/website/docs/Use-Cases/Auto-Generation.md
@ -5,7 +5,7 @@
 ## Features

 * An enhanced inference API as a drop-in replacement of `openai.Completion.create` or `openai.ChatCompletion.create`. It allows easy performance tuning and advanced usage patterns, including:
-  - Leveraging [`flaml.tune`](../reference/tune/tune) to adapt LLMs to applications, to maximize the utility out of using expensive foundation models and reduce the inference cost by using cheaper models or configurations which achieve equal or better performance.
+  - Leveraging [`flaml.tune`](/docs/reference/tune/tune) to adapt LLMs to applications, to maximize the utility out of using expensive foundation models and reduce the inference cost by using cheaper models or configurations which achieve equal or better performance.
  - Utilities like API unification, caching, error handling, multi-config inference, context programming etc.
 * A higher-level abstraction of using foundation models: intelligent agents which can perform tasks autonomously or with human feedback. The same abstraction allows both automated feedback and human feedback sent between agents, so that complex tasks can be accomplished, including tasks that require using tools via code.

@ -13,7 +13,7 @@ The package is under active development with more features upcoming.

 ## Agents (Experimental)

-[`flaml.autogen.agent`](../reference/autogen/agent/agent) contains an experimental implementation of interactive agents which can adapt to human or simulated feedback. This subpackage is under active development.
+[`flaml.autogen.agent`](/docs/reference/autogen/agent/agent) contains an experimental implementation of interactive agents which can adapt to human or simulated feedback. This subpackage is under active development.

 We have designed different classes of Agents that are capable of communicating with each other through the exchange of messages to collaboratively finish a task. An agent can communicate with other agents and perform actions. Different agents can differ in what actions they perform after receiving messages.

@ -141,7 +141,7 @@ user.initiate_chat(

 ## Enhanced Inference

-One can use [`flaml.oai.Completion.create`](../reference/autogen/oai/completion#create) to perform inference.
+One can use [`flaml.oai.Completion.create`](/docs/reference/autogen/oai/completion#create) to perform inference.
 There are a number of benefits of using `flaml.oai.Completion.create` to perform inference.

 ### Tune Inference Parameters
@ -193,7 +193,7 @@ def eval_math_responses(responses: List[str], solution: str, **args) -> Dict:
    return {"success": is_equivalent(answer, solution)}
 ```

-[`flaml.autogen.code_utils`](../reference/autogen/code_utils) and [`flaml.autogen.math_utils`](../reference/autogen/math_utils) offer some example evaluation functions for code generation and math problem solving.
+[`flaml.autogen.code_utils`](/docs/reference/autogen/code_utils) and [`flaml.autogen.math_utils`](/docs/reference/autogen/math_utils) offer some example evaluation functions for code generation and math problem solving.

 #### Metric to optimize

@ -222,7 +222,7 @@ The optimization budget refers to the total budget allowed in the tuning process

 #### Perform tuning

-Now, you can use [`flaml.oai.Completion.tune`](../reference/autogen/oai/completion#tune) for tuning. For example,
+Now, you can use [`flaml.oai.Completion.tune`](/docs/reference/autogen/oai/completion#tune) for tuning. For example,

 ```python
 from flaml import oai
@ -239,11 +239,11 @@ config, analysis = oai.Completion.tune(
 ```

 `num_samples` is the number of configurations to sample. -1 means unlimited (until optimization budget is exhausted).
-The returned `config` contains the optimized configuration and `analysis` contains an [ExperimentAnalysis](../reference/tune/analysis#experimentanalysis-objects) object for all the tried configurations and results.
+The returned `config` contains the optimized configuration and `analysis` contains an [ExperimentAnalysis](/docs/reference/tune/analysis#experimentanalysis-objects) object for all the tried configurations and results.

 The tuend config can be used to perform inference.

-*Refer to this [page](../Examples/AutoGen-OpenAI) for a full example. Or check the following notebook examples:*
+*Refer to this [page](/docs/Examples/AutoGen-OpenAI) for a full example. Or check the following notebook examples:*
 * [Optimize for Code Generation](https://github.com/microsoft/FLAML/blob/main/notebook/autogen_openai_completion.ipynb)
 * [Optimize for Math](https://github.com/microsoft/FLAML/blob/main/notebook/autogen_chatgpt_gpt4.ipynb)

@ -253,13 +253,13 @@ The tuend config can be used to perform inference.
 `flaml.oai.Completion.create` is compatible with both `openai.Completion.create` and `openai.ChatCompletion.create`, and both OpenAI API and Azure OpenAI API. So models such as "text-davinci-003", "gpt-3.5-turbo" and "gpt-4" can share a common API.
 When chat models are used and `prompt` is given as the input to `flaml.oai.Completion.create`, the prompt will be automatically converted into `messages` to fit the chat completion API requirement. One advantage is that one can experiment with both chat and non-chat models for the same prompt in a unified API.

-For local LLMs, one can spin up an endpoint using a package like [simple_ai_server](https://github.com/lhenault/simpleAI) and [FastChat](https://github.com/lm-sys/FastChat), and then use the same API to send a request. See [here](../../blog/2023/07/14/Local-LLMs) for examples on how to make inference with local LLMs.
+For local LLMs, one can spin up an endpoint using a package like [simple_ai_server](https://github.com/lhenault/simpleAI) and [FastChat](https://github.com/lm-sys/FastChat), and then use the same API to send a request. See [here](/blog/2023/07/14/Local-LLMs) for examples on how to make inference with local LLMs.

 When only working with the chat-based models, `flaml.oai.ChatCompletion` can be used. It also does automatic conversion from prompt to messages, if prompt is provided instead of messages.

 ### Caching

-API call results are cached locally and reused when the same request is issued. This is useful when repeating or continuing experiments for reproducibility and cost saving. It still allows controlled randomness by setting the "seed", using [`set_cache`](../reference/autogen/oai/completion#set_cache) or specifying in `create()`.
+API call results are cached locally and reused when the same request is issued. This is useful when repeating or continuing experiments for reproducibility and cost saving. It still allows controlled randomness by setting the "seed", using [`set_cache`](/docs/reference/autogen/oai/completion#set_cache) or specifying in `create()`.

 ### Error handling

@ -506,25 +506,25 @@ The compact history is more efficient and the individual API call history contai

 ### Other Utilities

- a [`cost`](../reference/autogen/oai/completion#cost) function to calculate the cost of an API call.
- a [`test`](../reference/autogen/oai/completion#test) function to conveniently evaluate the configuration over test data.
- an [`extract_text_or_function_call`](../reference/autogen/oai/completion#extract_text_or_function_call) function to extract the text or function call from a completion or chat response.
+- a [`cost`](/docs/reference/autogen/oai/completion#cost) function to calculate the cost of an API call.
+- a [`test`](/docs/reference/autogen/oai/completion#test) function to conveniently evaluate the configuration over test data.
+- an [`extract_text_or_function_call`](/docs/reference/autogen/oai/completion#extract_text_or_function_call) function to extract the text or function call from a completion or chat response.


 ## Utilities for Applications

 ### Code

-[`flaml.autogen.code_utils`](../reference/autogen/code_utils) offers code-related utilities, such as:
- a [`improve_code`](../reference/autogen/code_utils#improve_code) function to improve code for a given objective.
- a [`generate_assertions`](../reference/autogen/code_utils#generate_assertions) function to generate assertion statements from function signature and docstr.
- a [`implement`](../reference/autogen/code_utils#implement) function to implement a function from a definition.
- a [`eval_function_completions`](../reference/autogen/code_utils#eval_function_completions) function to evaluate the success of a function completion task, or select a response from a list of responses using generated assertions.
+[`flaml.autogen.code_utils`](/docs/reference/autogen/code_utils) offers code-related utilities, such as:
+- a [`improve_code`](/docs/reference/autogen/code_utils#improve_code) function to improve code for a given objective.
+- a [`generate_assertions`](/docs/reference/autogen/code_utils#generate_assertions) function to generate assertion statements from function signature and docstr.
+- a [`implement`](/docs/reference/autogen/code_utils#implement) function to implement a function from a definition.
+- a [`eval_function_completions`](/docs/reference/autogen/code_utils#eval_function_completions) function to evaluate the success of a function completion task, or select a response from a list of responses using generated assertions.

 ### Math

-[`flaml.autogen.math_utils`](../reference/autogen/math_utils) offers utilities for math problems, such as:
- a [eval_math_responses](../reference/autogen/math_utils#eval_math_responses) function to select a response using voting, and check if the final answer is correct if the canonical solution is provided.
+[`flaml.autogen.math_utils`](/docs/reference/autogen/math_utils) offers utilities for math problems, such as:
+- a [eval_math_responses](/docs/reference/autogen/math_utils#eval_math_responses) function to select a response using voting, and check if the final answer is correct if the canonical solution is provided.


 *Interested in the research that leads to this package? Please check the following papers.*
--- a/website/docs/Use-Cases/Task-Oriented-AutoML.md
+++ b/website/docs/Use-Cases/Task-Oriented-AutoML.md
@ -2,7 +2,7 @@

 ## Overview

-[`flaml.AutoML`](../reference/automl/automl#automl-objects) is a class for task-oriented AutoML. It can be used as a scikit-learn style estimator with the standard `fit` and `predict` functions. The minimal inputs from users are the training data and the task type.
+[`flaml.AutoML`](/docs/reference/automl/automl#automl-objects) is a class for task-oriented AutoML. It can be used as a scikit-learn style estimator with the standard `fit` and `predict` functions. The minimal inputs from users are the training data and the task type.

 * Training data:
    - numpy array. When the input data are stored in numpy array, they are passed to `fit()` as `X_train` and `y_train`.
@ -135,7 +135,7 @@ The estimator list can contain one or more estimator names, each corresponding t
 #### Guidelines on tuning a custom estimator

 To tune a custom estimator that is not built-in, you need to:
-1. Build a custom estimator by inheritting [`flaml.model.BaseEstimator`](../reference/automl/model#baseestimator-objects) or a derived class.
+1. Build a custom estimator by inheritting [`flaml.model.BaseEstimator`](/docs/reference/automl/model#baseestimator-objects) or a derived class.
 For example, if you have a estimator class with scikit-learn style `fit()` and `predict()` functions, you only need to set `self.estimator_class` to be that class in your constructor.

 ```python
@ -177,7 +177,7 @@ class MyRegularizedGreedyForest(SKLearnEstimator):
        return space
 ```

-In the constructor, we set `self.estimator_class` as `RGFClassifier` or `RGFRegressor` according to the task type. If the estimator you want to tune does not have a scikit-learn style `fit()` and `predict()` API, you can override the `fit()` and `predict()` function of `flaml.model.BaseEstimator`, like [XGBoostEstimator](../reference/automl/model#xgboostestimator-objects). Importantly, we also add the `task="binary"` parameter in the signature of `__init__` so that it doesn't get grouped together with the `**config` kwargs that determines the parameters with which the underlying estimator (`self.estimator_class`) is constructed. If your estimator doesn't use one of the parameters that it is passed, for example some regressors in `scikit-learn` don't use the `n_jobs` parameter, it is enough to add `n_jobs=None` to the signature so that it is ignored by the `**config` dict.
+In the constructor, we set `self.estimator_class` as `RGFClassifier` or `RGFRegressor` according to the task type. If the estimator you want to tune does not have a scikit-learn style `fit()` and `predict()` API, you can override the `fit()` and `predict()` function of `flaml.model.BaseEstimator`, like [XGBoostEstimator](/docs/reference/automl/model#xgboostestimator-objects). Importantly, we also add the `task="binary"` parameter in the signature of `__init__` so that it doesn't get grouped together with the `**config` kwargs that determines the parameters with which the underlying estimator (`self.estimator_class`) is constructed. If your estimator doesn't use one of the parameters that it is passed, for example some regressors in `scikit-learn` don't use the `n_jobs` parameter, it is enough to add `n_jobs=None` to the signature so that it is ignored by the `**config` dict.

 2. Give the custom estimator a name and add it in AutoML. E.g.,

@ -198,7 +198,7 @@ This registers the `MyRegularizedGreedyForest` class in AutoML, with the name "r
 Each estimator class, built-in or not, must have a `search_space` function. In the `search_space` function, we return a dictionary about the hyperparameters, the keys of which are the names of the hyperparameters to tune, and each value is a set of detailed search configurations about the corresponding hyperparameters represented in a dictionary. A search configuration dictionary includes the following fields:
 * `domain`, which specifies the possible values of the hyperparameter and their distribution. Please refer to [more details about the search space domain](Tune-User-Defined-Function#more-details-about-the-search-space-domain).
 * `init_value` (optional), which specifies the initial value of the hyperparameter.
-* `low_cost_init_value`(optional), which specifies the value of the hyperparameter that is associated with low computation cost. See [cost related hyperparameters](Tune-User-Defined-Function#cost-related-hyperparameters) or [FAQ](../FAQ#about-low_cost_partial_config-in-tune) for more details.
+* `low_cost_init_value`(optional), which specifies the value of the hyperparameter that is associated with low computation cost. See [cost related hyperparameters](Tune-User-Defined-Function#cost-related-hyperparameters) or [FAQ](/docs/FAQ#about-low_cost_partial_config-in-tune) for more details.

 In the example above, we tune four hyperparameters, three integers and one float. They all follow a log-uniform distribution. "max_leaf" and "n_iter" have "low_cost_init_value" specified as their values heavily influence the training cost.

@ -246,7 +246,7 @@ We override the `search_space` function to tune two hyperparameters only, "n_est

 ##### A shortcut to override the search space

-One can use the `custom_hp` argument in [`AutoML.fit()`](../reference/automl/automl#fit) to override the search space for an existing estimator quickly. For example, if you would like to temporarily change the search range of "n_estimators" of xgboost, disable searching "max_leaves" in random forest, and add "subsample" in the search space of lightgbm, you can set:
+One can use the `custom_hp` argument in [`AutoML.fit()`](/docs/reference/automl/automl#fit) to override the search space for an existing estimator quickly. For example, if you would like to temporarily change the search range of "n_estimators" of xgboost, disable searching "max_leaves" in random forest, and add "subsample" in the search space of lightgbm, you can set:

 ```python
 custom_hp = {
@ -414,13 +414,13 @@ To do parallel tuning with Spark, install the `spark` and `blendsearch` options:
 pip install flaml[spark,blendsearch]>=1.1.0
 ```

-For more details about installing Spark, please refer to [Installation](../Installation#distributed-tuning).
+For more details about installing Spark, please refer to [Installation](/docs/Installation#distributed-tuning).

 An example of using Spark for parallel tuning is:
 ```python
 automl.fit(X_train, y_train, n_concurrent_trials=4, use_spark=True)
 ```
-Details about parallel tuning with Spark could be found [here](../Examples/Integrate%20-%20Spark#parallel-spark-jobs). For Spark clusters, by default, we will launch one trial per executor. However, sometimes we want to launch more trials than the number of executors (e.g., local mode). In this case, we can set the environment variable `FLAML_MAX_CONCURRENT` to override the detected `num_executors`. The final number of concurrent trials will be the minimum of `n_concurrent_trials` and `num_executors`. Also, GPU training is not supported yet when use_spark is True.
+Details about parallel tuning with Spark could be found [here](/docs/Examples/Integrate%20-%20Spark#parallel-spark-jobs). For Spark clusters, by default, we will launch one trial per executor. However, sometimes we want to launch more trials than the number of executors (e.g., local mode). In this case, we can set the environment variable `FLAML_MAX_CONCURRENT` to override the detected `num_executors`. The final number of concurrent trials will be the minimum of `n_concurrent_trials` and `num_executors`. Also, GPU training is not supported yet when use_spark is True.

 #### **Guidelines on parallel vs sequential tuning**

@ -527,7 +527,7 @@ print(automl.model)
 # <flaml.model.LGBMEstimator object at 0x7f9b502c4550>
 ```

-[`flaml.model.LGBMEstimator`](../reference/automl/model#lgbmestimator-objects) is a wrapper class for LightGBM models. To access the underlying model, use the `estimator` property of the `flaml.model.LGBMEstimator` instance.
+[`flaml.model.LGBMEstimator`](/docs/reference/automl/model#lgbmestimator-objects) is a wrapper class for LightGBM models. To access the underlying model, use the `estimator` property of the `flaml.model.LGBMEstimator` instance.

 ```python
 print(automl.model.estimator)
--- a/website/docs/Use-Cases/Tune-User-Defined-Function.md
+++ b/website/docs/Use-Cases/Tune-User-Defined-Function.md
@ -1,6 +1,6 @@
 # Tune User Defined Function

-[`flaml.tune`](../reference/tune/tune) is a module for economical hyperparameter tuning. It is used internally by `flaml.AutoML`. It can also be used to directly tune a user-defined function (UDF), which is not limited to machine learning model training. You can use `flaml.tune` instead of `flaml.AutoML` if one of the following is true:
+[`flaml.tune`](/docs/reference/tune/tune) is a module for economical hyperparameter tuning. It is used internally by `flaml.AutoML`. It can also be used to directly tune a user-defined function (UDF), which is not limited to machine learning model training. You can use `flaml.tune` instead of `flaml.AutoML` if one of the following is true:

 1. Your machine learning task is not one of the built-in tasks from `flaml.AutoML`.
 1. Your input cannot be represented as X_train + y_train or dataframe + label.
@ -27,7 +27,7 @@ The first step is to specify your tuning objective.
 To do it, you should first specify your evaluation procedure (e.g., perform a machine learning model training and validation) with respect to the hyperparameters in a user-defined function `evaluation_function`.
 The function requires a hyperparameter configuration as input, and can simply return a metric value in a scalar or return a dictionary of metric name and metric value pairs.

-In the following code, we define an evaluation function with respect to two hyperparameters named `x` and `y` according to $obj := (x-85000)^2 - x/y$. Note that we use this toy example here for more accessible demonstration purposes. In real use cases, the evaluation function usually cannot be written in this closed form, but instead involves a black-box and expensive evaluation procedure.  Please check out [Tune HuggingFace](../Examples/Tune-HuggingFace), [Tune PyTorch](../Examples/Tune-PyTorch) and [Tune LightGBM](../Getting-Started#tune-user-defined-function) for real examples of tuning tasks.
+In the following code, we define an evaluation function with respect to two hyperparameters named `x` and `y` according to $obj := (x-85000)^2 - x/y$. Note that we use this toy example here for more accessible demonstration purposes. In real use cases, the evaluation function usually cannot be written in this closed form, but instead involves a black-box and expensive evaluation procedure.  Please check out [Tune HuggingFace](/docs/Examples/Tune-HuggingFace), [Tune PyTorch](/docs/Examples/Tune-PyTorch) and [Tune LightGBM](/docs/Getting-Started#tune-user-defined-function) for real examples of tuning tasks.

 ```python
 import time
@ -220,7 +220,7 @@ Optionally, you can provide a list of config constraints to be satisfied through


 ### Put together
-After the aforementioned key steps, one is ready to perform a tuning task by calling [`flaml.tune.run()`](../reference/tune/tune#run). Below is a quick sequential tuning example using the pre-defined search space `config_search_space` and a minimization (`mode='min'`) objective for the `score` metric evaluated in `evaluate_config`, using the default serach algorithm in flaml. The time budget is 10 seconds (`time_budget_s=10`).
+After the aforementioned key steps, one is ready to perform a tuning task by calling [`flaml.tune.run()`](/docs/reference/tune/tune#run). Below is a quick sequential tuning example using the pre-defined search space `config_search_space` and a minimization (`mode='min'`) objective for the `score` metric evaluated in `evaluate_config`, using the default serach algorithm in flaml. The time budget is 10 seconds (`time_budget_s=10`).
 ```python
 # require: pip install flaml[blendsearch]
 analysis = tune.run(
@ -236,7 +236,7 @@ analysis = tune.run(

 ### Result analysis

-Once the tuning process finishes, it returns an [ExperimentAnalysis](../reference/tune/analysis) object, which provides methods to analyze the tuning.
+Once the tuning process finishes, it returns an [ExperimentAnalysis](/docs/reference/tune/analysis) object, which provides methods to analyze the tuning.

 In the following code example, we retrieve the best configuration found during the tuning, and retrieve the best trial's result from the returned `analysis`.

@ -293,7 +293,7 @@ Related arguments:
 - `use_spark`: A boolean of whether to use spark as the backend.
 - `resources_per_trial`: A dictionary of the hardware resources to allocate per trial, e.g., `{'cpu': 1}`. Only valid when using ray backend.

-Details about parallel tuning with Spark could be found [here](../Examples/Integrate%20-%20Spark#parallel-spark-jobs).
+Details about parallel tuning with Spark could be found [here](/docs/Examples/Integrate%20-%20Spark#parallel-spark-jobs).


 You can perform parallel tuning by specifying `use_ray=True` (requiring flaml[ray] option installed) or `use_spark=True`
--- a/website/docs/Use-Cases/Zero-Shot-AutoML.md
+++ b/website/docs/Use-Cases/Zero-Shot-AutoML.md
@ -1,6 +1,6 @@
 # Zero Shot AutoML

-`flaml.default` is a package for zero-shot AutoML, or "no-tuning" AutoML. It uses [`flaml.AutoML`](../reference/automl/automl#automl-objects) and [`flaml.default.portfolio`](../reference/default/portfolio) to mine good hyperparameter configurations across different datasets offline, and recommend data-dependent default configurations at runtime without expensive tuning.
+`flaml.default` is a package for zero-shot AutoML, or "no-tuning" AutoML. It uses [`flaml.AutoML`](/docs/reference/automl/automl#automl-objects) and [`flaml.default.portfolio`](/docs/reference/default/portfolio) to mine good hyperparameter configurations across different datasets offline, and recommend data-dependent default configurations at runtime without expensive tuning.

 Zero-shot AutoML has several benefits:
 * The computation cost is just training one model. No tuning is involved.
@ -236,7 +236,7 @@ Change "binary" into "multiclass" or "regression", or your own types in your "re

 You have now effectively built your own zero-shot AutoML solution. Congratulations!

-Optionally, you can "flamlize" a learner using [`flaml.default.flamlize_estimator`](../reference/default/estimator#flamlize_estimator) for easy dissemination. For example,
+Optionally, you can "flamlize" a learner using [`flaml.default.flamlize_estimator`](/docs/reference/default/estimator#flamlize_estimator) for easy dissemination. For example,

 ```python
 import sklearn.ensemble as ensemble