From 5387a0a60786726412545dd3c56b9671dd1b5b71 Mon Sep 17 00:00:00 2001 From: Chi Wang Date: Fri, 9 Jun 2023 11:40:04 -0700 Subject: [PATCH] Agent notebook example with human feedback; Support shell command and multiple code blocks; Improve the system message for assistant agent; Improve utility functions for config lists; reuse docker image (#1056) * add agent notebook and documentation * fix bug * set flush to True when printing msg in agent * add a math problem in agent notebook * remove * header * improve notebook doc * notebook update * improve notebook example * improve doc * agent notebook example with user feedback * log * log * improve notebook doc * improve print * doc * human_input_mode * human_input_mode str * indent * indent * Update flaml/autogen/agent/user_proxy_agent.py Co-authored-by: Chi Wang * shell command and multiple code blocks * Update notebook/autogen_agent.ipynb Co-authored-by: Chi Wang * Update notebook/autogen_agent.ipynb Co-authored-by: Chi Wang * Update notebook/autogen_agent.ipynb Co-authored-by: Chi Wang * coding agent * math notebook * renaming and doc format * typo * infer lang * sh * docker * docker * reset consecutive autoreply counter * fix explanation * paper talk * human feedback * web info * rename test * config list explanation * link to blogpost * installation * homepage features * features * features * rename agent * remove notebook * notebook test * docker command * notebook update * lang -> cmd * notebook * make it work for gpt-3.5 * return full log * quote * docker * docker * docker * docker * docker * docker image list * notebook * notebook * use_docker * use_docker * use_docker * doc * agent * doc * abs path * pandas * docker * reuse docker image * context window * news * print format * pyspark version in py3.8 * pyspark in py3.8 * pyspark and ray * quote * pyspark * pyspark * pyspark --------- Co-authored-by: Qingyun Wu --- .github/workflows/python-package.yml | 15 +- README.md | 60 +- flaml/autogen/agent/__init__.py | 5 + flaml/autogen/agent/agent.py | 3 +- flaml/autogen/agent/assistant_agent.py | 47 + flaml/autogen/agent/coding_agent.py | 41 - flaml/autogen/agent/user_proxy_agent.py | 65 +- flaml/autogen/code_utils.py | 94 +- flaml/autogen/oai/__init__.py | 16 +- flaml/autogen/oai/openai_utils.py | 92 +- ...nt_auto_feedback_from_code_execution.ipynb | 457 ++++++---- notebook/autogen_agent_human_feedback.ipynb | 776 ++++++++++++++++ notebook/autogen_agent_web_info.ipynb | 607 +++++++++++++ notebook/autogen_chatgpt_gpt4.ipynb | 6 +- notebook/autogen_openai_completion.ipynb | 2 +- notebook/automl_flight_delays.ipynb | 18 +- notebook/automl_synapseML.ipynb | 831 ------------------ notebook/research/autogen_code.ipynb | 4 +- setup.py | 2 - test/autogen/oai/test_completion.py | 63 -- test/autogen/oai/test_notebook.py | 12 +- ...{test_agent.py => test_assistant_agent.py} | 60 +- test/autogen/test_code.py | 85 ++ website/docs/Getting-Started.md | 68 +- website/docs/Installation.md | 67 +- website/docs/Research.md | 13 +- website/docs/Use-Cases/Auto-Generation.md | 8 +- website/src/components/HomepageFeatures.js | 23 +- 28 files changed, 2228 insertions(+), 1312 deletions(-) create mode 100644 flaml/autogen/agent/assistant_agent.py delete mode 100644 flaml/autogen/agent/coding_agent.py create mode 100644 notebook/autogen_agent_human_feedback.ipynb create mode 100644 notebook/autogen_agent_web_info.ipynb delete mode 100644 notebook/automl_synapseML.ipynb rename test/autogen/{test_agent.py => test_assistant_agent.py} (57%) create mode 100644 test/autogen/test_code.py diff --git a/.github/workflows/python-package.yml b/.github/workflows/python-package.yml index fb1799528c..28787287fc 100644 --- a/.github/workflows/python-package.yml +++ b/.github/workflows/python-package.yml @@ -49,22 +49,21 @@ jobs: export CFLAGS="$CFLAGS -I/usr/local/opt/libomp/include" export CXXFLAGS="$CXXFLAGS -I/usr/local/opt/libomp/include" export LDFLAGS="$LDFLAGS -Wl,-rpath,/usr/local/opt/libomp/lib -L/usr/local/opt/libomp/lib -lomp" - - name: On Linux + python 3.8, install pyspark 3.2.3 - if: matrix.os == 'ubuntu-latest' && matrix.python-version == '3.8' - run: | - python -m pip install --upgrade pip wheel - pip install pyspark==3.2.3 - name: Install packages and dependencies run: | python -m pip install --upgrade pip wheel pip install -e . python -c "import flaml" pip install -e .[test] + - name: On Ubuntu python 3.8, install pyspark 3.2.3 + if: matrix.python-version == '3.8' && matrix.os == 'ubuntu-latest' + run: | + pip install pyspark==3.2.3 pip list | grep "pyspark" - name: If linux, install ray 2 if: matrix.os == 'ubuntu-latest' run: | - pip install ray[tune] + pip install "ray[tune]<2.5.0" - name: If mac, install ray if: matrix.os == 'macOS-latest' run: | @@ -77,8 +76,8 @@ jobs: if: matrix.python-version != '3.10' run: | pip install -e .[vw] - - name: Uninstall pyspark on python 3.9 - if: matrix.python-version == '3.9' + - name: Uninstall pyspark on python 3.8 or 3.9 for windows + if: matrix.python-version == '3.8' || matrix.python-version == '3.9' && matrix.os == 'windows-2019' run: | # Uninstall pyspark to test env without pyspark pip uninstall -y pyspark diff --git a/README.md b/README.md index ad79a29004..e645841bd9 100644 --- a/README.md +++ b/README.md @@ -14,7 +14,9 @@

-:fire: v1.2.0 is released with support for ChatGPT and GPT-4. +:fire: FLAML is highlighted in OpenAI's [cookbook](https://github.com/openai/openai-cookbook#related-resources-from-around-the-web) +:fire: [autogen](https://microsoft.github.io/FLAML/docs/Use-Cases/Auto-Generation) is released with support for ChatGPT and GPT-4, based on [Cost-Effective Hyperparameter Optimization for Large Language Model Generation Inference](https://arxiv.org/abs/2303.04673). +:fire: FLAML supports AutoML and Hyperparameter Tuning features in [Microsoft Fabric](https://learn.microsoft.com/en-us/fabric/get-started/microsoft-fabric-overview) private preview. Sign up for these features at: https://aka.ms/fabric/data-science/sign-up. ## What is FLAML @@ -22,10 +24,9 @@ FLAML is a lightweight Python library for efficient automation of machine learning and AI operations, including selection of models, hyperparameters, and other tunable choices of an application (e.g., inference hyperparameters for foundation models, configurations in MLOps/LMOps workflows, pipelines, mathematical/statistical models, algorithms, computing experiments, software configurations). -* For foundation models like the GPT series and AI agents based on them, it automates the experimentation and optimization of their performance to maximize the effectiveness for applications and minimize the inference cost. -* For common machine learning tasks like classification and regression, it quickly finds quality models for user-provided data with low computational resources. -* It is easy to customize or extend. Users can find their desired customizability from a smooth range: minimal customization (computational resource budget), medium customization (e.g., scikit-style learner, search space and metric), or full customization (arbitrary training/inference/evaluation code). -* It supports fast automatic tuning, capable of handling complex constraints/guidance/early stopping. FLAML is powered by a [cost-effective +* For foundation models like the GPT models, it automates the experimentation and optimization of their performance to maximize the effectiveness for applications and minimize the inference cost. FLAML enables users to build and use adaptive AI agents with minimal effort. +* For common machine learning tasks like classification and regression, it quickly finds quality models for user-provided data with low computational resources. It is easy to customize or extend. Users can find their desired customizability from a smooth range: minimal customization (computational resource budget), medium customization (e.g., search space and metric), or full customization (arbitrary training/inference/evaluation code). +* It supports fast and economical automatic tuning, capable of handling complex constraints/guidance/early stopping. FLAML is powered by a [cost-effective hyperparameter optimization](https://microsoft.github.io/FLAML/docs/Use-Cases/Tune-User-Defined-Function/#hyperparameter-optimization-algorithm) and model selection method invented by Microsoft Research, and many followup [research studies](https://microsoft.github.io/FLAML/docs/Research). @@ -42,13 +43,14 @@ FLAML requires **Python version >= 3.7**. It can be installed from pip: pip install flaml ``` -To run the [`notebook examples`](https://github.com/microsoft/FLAML/tree/main/notebook), -install flaml with the [notebook] option: - +Minimal dependencies are installed without extra options. You can install extra options based on the feature you need. For example, use the following to install the dependencies needed by the [`autogen`](https://microsoft.github.io/FLAML/docs/Use-Cases/Auto-Generation) package. ```bash -pip install flaml[notebook] +pip install "flaml[autogen]" ``` +Find more options in [Installation](Installation). +Each of the [`notebook examples`](https://github.com/microsoft/FLAML/tree/main/notebook) may require a specific option to be installed. + ### .NET Use the following guides to get started with FLAML in .NET: @@ -59,25 +61,31 @@ Use the following guides to get started with FLAML in .NET: ## Quickstart -* (New) You can optimize [generations](https://microsoft.github.io/FLAML/docs/Use-Cases/Auto-Generation) by ChatGPT or GPT-4 etc. with your own tuning data, success metrics and budgets. +* (New) The [autogen](https://microsoft.github.io/FLAML/docs/Use-Cases/Auto-Generation) package can help you maximize the utility out of the expensive LLMs such as ChatGPT and GPT-4, including: + - A drop-in replacement of `openai.Completion` or `openai.ChatCompletion` with powerful functionalites like tuning, caching, templating, filtering. For example, you can optimize generations by LLM with your own tuning data, success metrics and budgets. + ```python + from flaml import oai -```python -from flaml import oai - -config, analysis = oai.Completion.tune( - data=tune_data, - metric="success", - mode="max", - eval_func=eval_func, - inference_budget=0.05, - optimization_budget=3, - num_samples=-1, -) -``` - -The automated experimentation and optimization can help you maximize the utility out of these expensive models. -A suite of utilities are offered to accelerate the experimentation and application development, such as low-level inference API with caching, templating, filtering, and higher-level components like LLM-based coding and interactive agents. + # perform tuning + config, analysis = oai.Completion.tune( + data=tune_data, + metric="success", + mode="max", + eval_func=eval_func, + inference_budget=0.05, + optimization_budget=3, + num_samples=-1, + ) + # perform inference for a test instance + response = oai.Completion.create(context=test_instance, **config) + ``` + - LLM-driven intelligent agents which can perform tasks autonomously or with human feedback, including tasks that require using tools via code. + ```python + assistant = AssistantAgent("assistant") + user = UserProxyAgent("user", human_input_mode="TERMINATE") + assistant.receive("Draw a rocket and save to a file named 'rocket.svg'") + ``` * With three lines of code, you can start using this economical and fast AutoML engine as a [scikit-learn style estimator](https://microsoft.github.io/FLAML/docs/Use-Cases/Task-Oriented-AutoML). diff --git a/flaml/autogen/agent/__init__.py b/flaml/autogen/agent/__init__.py index e69de29bb2..0e2c02683e 100644 --- a/flaml/autogen/agent/__init__.py +++ b/flaml/autogen/agent/__init__.py @@ -0,0 +1,5 @@ +from .agent import Agent +from .assistant_agent import AssistantAgent +from .user_proxy_agent import UserProxyAgent + +__all__ = ["Agent", "AssistantAgent", "UserProxyAgent"] diff --git a/flaml/autogen/agent/agent.py b/flaml/autogen/agent/agent.py index b81ca35df7..8dbf223de6 100644 --- a/flaml/autogen/agent/agent.py +++ b/flaml/autogen/agent/agent.py @@ -37,7 +37,8 @@ class Agent: def _receive(self, message, sender): """Receive a message from another agent.""" - print("\n****", self.name, "received message from", sender.name, "****\n", flush=True) + print("\n", "-" * 80, "\n", flush=True) + print(sender.name, "(to", f"{self.name}):", flush=True) print(message, flush=True) self._conversations[sender.name].append({"content": message, "role": "user"}) diff --git a/flaml/autogen/agent/assistant_agent.py b/flaml/autogen/agent/assistant_agent.py new file mode 100644 index 0000000000..09e0ae0761 --- /dev/null +++ b/flaml/autogen/agent/assistant_agent.py @@ -0,0 +1,47 @@ +from .agent import Agent +from flaml.autogen.code_utils import DEFAULT_MODEL +from flaml import oai + + +class AssistantAgent(Agent): + """(Experimental) Assistant agent, able to suggest code blocks.""" + + DEFAULT_SYSTEM_MESSAGE = """You are a helpful AI assistant. + In the following cases, suggest python code (in a python coding block) or shell script (in a sh coding block) for the user to execute. You must indicate the script type in the code block. + 1. When you need to ask the user for some info, use the code to output the info you need, for example, browse or search the web, download/read a file. + 2. When you need to perform some task with code, use the code to perform the task and output the result. Finish the task smartly. Solve the task step by step if you need to. + If you want the user to save the code in a file before executing it, put # filename: inside the code block as the first line. Don't include multiple code blocks in one response. Do not ask users to copy and paste the result. Instead, use 'print' function for the output when relevant. Check the execution result returned by the user. + If the result indicates there is an error, fix the error and output the code again. Suggeset the full code instead of partial code or code changes. + Reply "TERMINATE" in the end when everything is done. + """ + + DEFAULT_CONFIG = { + "model": DEFAULT_MODEL, + } + + def __init__(self, name, system_message=DEFAULT_SYSTEM_MESSAGE, **config): + """ + Args: + name (str): agent name. + system_message (str): system message to be sent to the agent. + **config (dict): other configurations allowed in + [oai.Completion.create](../oai/Completion#create). + These configurations will be used when invoking LLM. + """ + super().__init__(name, system_message) + self._config = self.DEFAULT_CONFIG.copy() + self._config.update(config) + self._sender_dict = {} + + def receive(self, message, sender): + if sender.name not in self._sender_dict: + self._sender_dict[sender.name] = sender + self._conversations[sender.name] = [{"content": self._system_message, "role": "system"}] + super().receive(message, sender) + responses = oai.ChatCompletion.create(messages=self._conversations[sender.name], **self._config) + response = oai.ChatCompletion.extract_text(responses)[0] + self._send(response, sender) + + def reset(self): + self._sender_dict.clear() + self._conversations.clear() diff --git a/flaml/autogen/agent/coding_agent.py b/flaml/autogen/agent/coding_agent.py deleted file mode 100644 index f7c16f8253..0000000000 --- a/flaml/autogen/agent/coding_agent.py +++ /dev/null @@ -1,41 +0,0 @@ -from .agent import Agent -from flaml.autogen.code_utils import DEFAULT_MODEL -from flaml import oai - - -class PythonAgent(Agent): - """(Experimental) Suggest code blocks.""" - - DEFAULT_SYSTEM_MESSAGE = """You suggest python code (in a python coding block) for a user to execute for a given task. If you want the user to save the code in a file before executing it, put # filename: inside the code block as the first line. Finish the task smartly. Don't suggest shell command. Don't include multiple code blocks in one response. Use 'print' function for the output when relevant. Check the execution result returned by the user. - If the result indicates there is an error, fix the error and output the code again. - Reply "TERMINATE" in the end when the task is done. - """ - - DEFAULT_CONFIG = { - "model": DEFAULT_MODEL, - } - - def __init__(self, name, system_message=DEFAULT_SYSTEM_MESSAGE, **config): - """ - Args: - name (str): agent name - system_message (str): system message to be sent to the agent - config (dict): other configurations. - """ - super().__init__(name, system_message) - self._config = self.DEFAULT_CONFIG.copy() - self._config.update(config) - self._sender_dict = {} - - def receive(self, message, sender): - if sender.name not in self._sender_dict: - self._sender_dict[sender.name] = sender - self._conversations[sender.name] = [{"content": self._system_message, "role": "system"}] - super().receive(message, sender) - responses = oai.ChatCompletion.create(messages=self._conversations[sender.name], **self._config) - response = oai.ChatCompletion.extract_text(responses)[0] - self._send(response, sender) - - def reset(self): - self._sender_dict.clear() - self._conversations.clear() diff --git a/flaml/autogen/agent/user_proxy_agent.py b/flaml/autogen/agent/user_proxy_agent.py index f1975dcd34..dbc986064a 100644 --- a/flaml/autogen/agent/user_proxy_agent.py +++ b/flaml/autogen/agent/user_proxy_agent.py @@ -1,5 +1,5 @@ from .agent import Agent -from flaml.autogen.code_utils import extract_code, execute_code +from flaml.autogen.code_utils import UNKNOWN, extract_code, execute_code, infer_lang from collections import defaultdict @@ -54,36 +54,51 @@ class UserProxyAgent(Agent): self._consecutive_auto_reply_counter = defaultdict(int) self._use_docker = use_docker - def _execute_code(self, code, lang): + def _execute_code(self, code_blocks): """Execute the code and return the result.""" - if lang in ["bash", "shell"]: - if not code.startswith("python "): - return 1, f"please do not suggest bash or shell commands like {code}" - file_name = code[len("python ") :] - exitcode, logs = execute_code(filename=file_name, work_dir=self._work_dir, use_docker=self._use_docker) - logs = logs.decode("utf-8") - elif lang == "python": - if code.startswith("# filename: "): - filename = code[11 : code.find("\n")].strip() + logs_all = "" + for code_block in code_blocks: + lang, code = code_block + if not lang: + lang = infer_lang(code) + if lang in ["bash", "shell", "sh"]: + # if code.startswith("python "): + # # return 1, f"please do not suggest bash or shell commands like {code}" + # file_name = code[len("python ") :] + # exitcode, logs = execute_code(filename=file_name, work_dir=self._work_dir, use_docker=self._use_docker) + # else: + exitcode, logs, image = execute_code( + code, work_dir=self._work_dir, use_docker=self._use_docker, lang=lang + ) + logs = logs.decode("utf-8") + elif lang == "python": + if code.startswith("# filename: "): + filename = code[11 : code.find("\n")].strip() + else: + filename = None + exitcode, logs, image = execute_code( + code, work_dir=self._work_dir, filename=filename, use_docker=self._use_docker + ) + logs = logs.decode("utf-8") else: - filename = None - exitcode, logs = execute_code(code, work_dir=self._work_dir, filename=filename, use_docker=self._use_docker) - logs = logs.decode("utf-8") - else: - # TODO: could this happen? - exitcode, logs = 1, f"unknown language {lang}" - # raise NotImplementedError - return exitcode, logs + # TODO: could this happen? + exitcode, logs, image = 1, f"unknown language {lang}" + # raise NotImplementedError + self._use_docker = image + logs_all += "\n" + logs + if exitcode != 0: + return exitcode, logs_all + return exitcode, logs_all def auto_reply(self, message, sender, default_reply=""): """Generate an auto reply.""" - code, lang = extract_code(message) - if lang == "unknown": - # no code block is found, lang should be "unknown" + code_blocks = extract_code(message) + if len(code_blocks) == 1 and code_blocks[0][0] == UNKNOWN: + # no code block is found, lang should be `UNKNOWN`` self._send(default_reply, sender) else: # try to execute the code - exitcode, logs = self._execute_code(code, lang) + exitcode, logs = self._execute_code(code_blocks) exitcode2str = "execution succeeded" if exitcode == 0 else "execution failed" self._send(f"exitcode: {exitcode} ({exitcode2str})\nCode output: {logs}", sender) @@ -111,8 +126,10 @@ class UserProxyAgent(Agent): # this corresponds to the case when self._human_input_mode == "NEVER" reply = "exit" if reply == "exit" or (self._is_termination_msg(message) and not reply): + # reset the consecutive_auto_reply_counter + self._consecutive_auto_reply_counter[sender.name] = 0 return - elif reply: + if reply: # reset the consecutive_auto_reply_counter self._consecutive_auto_reply_counter[sender.name] = 0 self._send(reply, sender) diff --git a/flaml/autogen/code_utils.py b/flaml/autogen/code_utils.py index 8264ef2edd..be52b2e5b3 100644 --- a/flaml/autogen/code_utils.py +++ b/flaml/autogen/code_utils.py @@ -12,16 +12,36 @@ from flaml.autogen import oai, DEFAULT_MODEL, FAST_MODEL # Regular expression for finding a code block CODE_BLOCK_PATTERN = r"```(\w*)\n(.*?)\n```" WORKING_DIR = os.path.join(os.path.dirname(os.path.realpath(__file__)), "extensions") +UNKNOWN = "unknown" -def extract_code(text: str, pattern: str = CODE_BLOCK_PATTERN) -> str: - # Use a regular expression to find the code block - match = re.search(pattern, text, flags=re.DOTALL) +def infer_lang(code): + """infer the language for the code. + TODO: make it robust. + """ + if code.startswith("python ") or code.startswith("pip"): + return "sh" + return "python" + + +def extract_code(text: str, pattern: str = CODE_BLOCK_PATTERN) -> List[Tuple[str, str]]: + """Extract code from a text. + + Args: + text (str): The text to extract code from. + pattern (Optional, str): The regular expression pattern for finding the code block. + + Returns: + list: A list of tuples, each containing the language and the code. + """ + # Use a regular expression to find all the code blocks + match = re.findall(pattern, text, flags=re.DOTALL) + # match = re.search(pattern, text, flags=re.DOTALL) # If a match is found, return the code - if match: - return match.group(2), match.group(1) + # if match: + # return match.group(2), match.group(1) # If no code block is found, return the whole text - return text, "unknown" + return match if match else [(UNKNOWN, text)] def generate_code(pattern: str = CODE_BLOCK_PATTERN, **config) -> Tuple[str, float]: @@ -102,13 +122,22 @@ def timeout_handler(signum, frame): raise TimeoutError("Timed out!") +def _cmd(lang): + if lang.startswith("python") or lang in ["bash", "sh"]: + return lang + if lang == "shell": + return "sh" + raise NotImplementedError(f"{lang} not recognized in code execution") + + def execute_code( code: Optional[str] = None, timeout: Optional[int] = 600, filename: Optional[str] = None, work_dir: Optional[str] = None, - use_docker: Optional[bool] = True, -) -> Tuple[int, bytes]: + use_docker: Optional[Union[List[str], str, bool]] = True, + lang: Optional[str] = "python", +) -> Tuple[int, bytes, str]: """Execute code in a docker container. This function is not tested on MacOS. @@ -125,15 +154,19 @@ def execute_code( If None, a default working directory will be used. The default working directory is the "extensions" directory under "xxx/flaml/autogen", where "xxx" is the path to the flaml package. - use_docker (Optional, bool): Whether to use a docker container for code execution. - If True, the code will be executed in a docker container. - If False, the code will be executed in the current environment. - Default is True. If the code is executed in the current environment, + use_docker (Optional, list, str or bool): The docker image to use for code execution. + If a list or a str of image name(s) is provided, the code will be executed in a docker container + with the first image successfully pulled. + If None, False or empty, the code will be executed in the current environment. + Default is True, which will be converted into a list. + If the code is executed in the current environment, the code must be trusted. + lang (Optional, str): The language of the code. Default is "python". Returns: int: 0 if the code executes successfully. bytes: The error message if the code fails to execute; the stdout otherwise. + image: The docker image name after container run when docker is used. """ assert code is not None or filename is not None, "Either code or filename must be provided." @@ -141,7 +174,7 @@ def execute_code( if filename is None: code_hash = md5(code.encode()).hexdigest() # create a file with a automatically generated name - filename = f"tmp_code_{code_hash}.py" + filename = f"tmp_code_{code_hash}.{'py' if lang.startswith('python') else lang}" if work_dir is None: work_dir = WORKING_DIR filepath = os.path.join(work_dir, filename) @@ -155,12 +188,13 @@ def execute_code( in_docker_container = os.path.exists("/.dockerenv") if not use_docker or in_docker_container: # already running in a docker container + cmd = [sys.executable if lang.startswith("python") else _cmd(lang), filename] signal.signal(signal.SIGALRM, timeout_handler) try: signal.alarm(timeout) # run the code in a subprocess in the current docker container in the working directory result = subprocess.run( - [sys.executable, filename], + cmd, cwd=work_dir, capture_output=True, ) @@ -168,17 +202,22 @@ def execute_code( except TimeoutError: if original_filename is None: os.remove(filepath) - return 1, "Timeout" + return 1, "Timeout", None if original_filename is None: os.remove(filepath) - return result.returncode, result.stderr if result.returncode else result.stdout + return result.returncode, result.stderr if result.returncode else result.stdout, None import docker - from requests.exceptions import ReadTimeout, ConnectionError # create a docker client client = docker.from_env() - image_list = ["python:3-alpine", "python:3", "python:3-windowsservercore"] + image_list = ( + ["python:3-alpine", "python:3", "python:3-windowsservercore"] + if use_docker is True + else [use_docker] + if isinstance(use_docker, str) + else use_docker + ) for image in image_list: # check if the image exists try: @@ -198,14 +237,15 @@ def execute_code( # if sys.platform == "win32": # abs_path = str(abs_path).replace("\\", "/") # abs_path = f"/{abs_path[0].lower()}{abs_path[2:]}" + cmd = [ + "sh", + "-c", + f"{_cmd(lang)} {filename}; exit_code=$?; echo -n {exit_code_str}; echo -n $exit_code; echo {exit_code_str}", + ] # create a docker container container = client.containers.run( image, - command=[ - "sh", - "-c", - f"python {filename}; exit_code=$?; echo -n {exit_code_str}; echo -n $exit_code; echo {exit_code_str}", - ], + command=cmd, working_dir="/workspace", detach=True, # get absolute path to the working directory @@ -220,7 +260,7 @@ def execute_code( container.remove() if original_filename is None: os.remove(filepath) - return 1, "Timeout" + return 1, "Timeout", image # try: # container.wait(timeout=timeout) # except (ReadTimeout, ConnectionError): @@ -231,6 +271,8 @@ def execute_code( # return 1, "Timeout" # get the container logs logs = container.logs().decode("utf-8").rstrip() + # commit the image + container.commit(repository="python", tag=filename.replace("/", "")) # remove the container container.remove() # check if the code executed successfully @@ -246,8 +288,8 @@ def execute_code( logs = bytes(logs, "utf-8") if original_filename is None: os.remove(filepath) - # return the exit code and logs - return exit_code, logs + # return the exit code, logs and image + return exit_code, logs, f"python:{filename}" _GENERATE_ASSERTIONS_CONFIG = { diff --git a/flaml/autogen/oai/__init__.py b/flaml/autogen/oai/__init__.py index 5457351b1b..819d408cc2 100644 --- a/flaml/autogen/oai/__init__.py +++ b/flaml/autogen/oai/__init__.py @@ -1,4 +1,16 @@ from flaml.autogen.oai.completion import Completion, ChatCompletion -from flaml.autogen.oai.openai_utils import get_config_list, config_list_gpt4_gpt35, config_list_openai_aoai +from flaml.autogen.oai.openai_utils import ( + get_config_list, + config_list_gpt4_gpt35, + config_list_openai_aoai, + config_list_from_models, +) -__all__ = ["Completion", "ChatCompletion", "get_config_list", "config_list_gpt4_gpt35", "config_list_openai_aoai"] +__all__ = [ + "Completion", + "ChatCompletion", + "get_config_list", + "config_list_gpt4_gpt35", + "config_list_openai_aoai", + "config_list_from_models", +] diff --git a/flaml/autogen/oai/openai_utils.py b/flaml/autogen/oai/openai_utils.py index 44bc0ad615..dafe9d90eb 100644 --- a/flaml/autogen/oai/openai_utils.py +++ b/flaml/autogen/oai/openai_utils.py @@ -59,6 +59,7 @@ def config_list_openai_aoai( openai_api_key_file: Optional[str] = "key_openai.txt", aoai_api_key_file: Optional[str] = "key_aoai.txt", aoai_api_base_file: Optional[str] = "base_aoai.txt", + exclude: Optional[str] = None, ) -> List[Dict]: """Get a list of configs for openai + azure openai api calls. @@ -67,57 +68,103 @@ def config_list_openai_aoai( openai_api_key_file (str, optional): The file name of the openai api key. aoai_api_key_file (str, optional): The file name of the azure openai api key. aoai_api_base_file (str, optional): The file name of the azure openai api base. + exclude (str, optional): The api type to exclude, "openai" or "aoai". Returns: list: A list of configs for openai api calls. """ - if "OPENAI_API_KEY" not in os.environ: + if "OPENAI_API_KEY" not in os.environ and exclude != "openai": try: - os.environ["OPENAI_API_KEY"] = open(f"{key_file_path}/{openai_api_key_file}").read().strip() + with open(f"{key_file_path}/{openai_api_key_file}") as key_file: + os.environ["OPENAI_API_KEY"] = key_file.read().strip() except FileNotFoundError: logging.info( "To use OpenAI API, please set OPENAI_API_KEY in os.environ " "or create key_openai.txt in the specified path, or specify the api_key in config_list." ) - if "AZURE_OPENAI_API_KEY" not in os.environ: + if "AZURE_OPENAI_API_KEY" not in os.environ and exclude != "aoai": try: - os.environ["AZURE_OPENAI_API_KEY"] = open(f"{key_file_path}/{aoai_api_key_file}").read().strip() + with open(f"{key_file_path}/{aoai_api_key_file}") as key_file: + os.environ["AZURE_OPENAI_API_KEY"] = key_file.read().strip() except FileNotFoundError: logging.info( "To use Azure OpenAI API, please set AZURE_OPENAI_API_KEY in os.environ " "or create key_aoai.txt in the specified path, or specify the api_key in config_list." ) - if "AZURE_OPENAI_API_BASE" not in os.environ: + if "AZURE_OPENAI_API_BASE" not in os.environ and exclude != "aoai": try: - os.environ["AZURE_OPENAI_API_BASE"] = open(f"{key_file_path}/{aoai_api_base_file}").read().strip() + with open(f"{key_file_path}/{aoai_api_base_file}") as key_file: + os.environ["AZURE_OPENAI_API_BASE"] = key_file.read().strip() except FileNotFoundError: logging.info( "To use Azure OpenAI API, please set AZURE_OPENAI_API_BASE in os.environ " "or create base_aoai.txt in the specified path, or specify the api_base in config_list." ) - aoai_config = get_config_list( - # Assuming Azure OpenAI api keys in os.environ["AZURE_OPENAI_API_KEY"], in separated lines - api_keys=os.environ.get("AZURE_OPENAI_API_KEY", "").split("\n"), - # Assuming Azure OpenAI api bases in os.environ["AZURE_OPENAI_API_BASE"], in separated lines - api_bases=os.environ.get("AZURE_OPENAI_API_BASE", "").split("\n"), - api_type="azure", - api_version="2023-03-15-preview", # change if necessary + aoai_config = ( + get_config_list( + # Assuming Azure OpenAI api keys in os.environ["AZURE_OPENAI_API_KEY"], in separated lines + api_keys=os.environ.get("AZURE_OPENAI_API_KEY", "").split("\n"), + # Assuming Azure OpenAI api bases in os.environ["AZURE_OPENAI_API_BASE"], in separated lines + api_bases=os.environ.get("AZURE_OPENAI_API_BASE", "").split("\n"), + api_type="azure", + api_version="2023-03-15-preview", # change if necessary + ) + if exclude != "aoai" + else [] ) - openai_config = get_config_list( - # Assuming OpenAI API_KEY in os.environ["OPENAI_API_KEY"] - api_keys=os.environ.get("OPENAI_API_KEY", "").split("\n"), - # "api_type": "open_ai", - # "api_base": "https://api.openai.com/v1", + openai_config = ( + get_config_list( + # Assuming OpenAI API_KEY in os.environ["OPENAI_API_KEY"] + api_keys=os.environ.get("OPENAI_API_KEY", "").split("\n"), + # "api_type": "open_ai", + # "api_base": "https://api.openai.com/v1", + ) + if exclude != "openai" + else [] ) config_list = openai_config + aoai_config return config_list +def config_list_from_models( + key_file_path: Optional[str] = ".", + openai_api_key_file: Optional[str] = "key_openai.txt", + aoai_api_key_file: Optional[str] = "key_aoai.txt", + aoai_api_base_file: Optional[str] = "base_aoai.txt", + exclude: Optional[str] = None, + model_list: Optional[list] = None, +) -> List[Dict]: + """Get a list of configs for api calls with models in the model list. + + Args: + key_file_path (str, optional): The path to the key files. + openai_api_key_file (str, optional): The file name of the openai api key. + aoai_api_key_file (str, optional): The file name of the azure openai api key. + aoai_api_base_file (str, optional): The file name of the azure openai api base. + exclude (str, optional): The api type to exclude, "openai" or "aoai". + model_list (list, optional): The model list. + + Returns: + list: A list of configs for openai api calls. + """ + config_list = config_list_openai_aoai( + key_file_path, + openai_api_key_file, + aoai_api_key_file, + aoai_api_base_file, + exclude, + ) + if model_list: + config_list = [{**config, "model": model} for config in config_list for model in model_list] + return config_list + + def config_list_gpt4_gpt35( key_file_path: Optional[str] = ".", openai_api_key_file: Optional[str] = "key_openai.txt", aoai_api_key_file: Optional[str] = "key_aoai.txt", aoai_api_base_file: Optional[str] = "base_aoai.txt", + exclude: Optional[str] = None, ) -> List[Dict]: """Get a list of configs for gpt-4 followed by gpt-3.5 api calls. @@ -126,17 +173,16 @@ def config_list_gpt4_gpt35( openai_api_key_file (str, optional): The file name of the openai api key. aoai_api_key_file (str, optional): The file name of the azure openai api key. aoai_api_base_file (str, optional): The file name of the azure openai api base. + exclude (str, optional): The api type to exclude, "openai" or "aoai". Returns: list: A list of configs for openai api calls. """ - - config_list = config_list_openai_aoai( + return config_list_from_models( key_file_path, openai_api_key_file, aoai_api_key_file, aoai_api_base_file, + exclude, + model_list=["gpt-4", "gpt-3.5-turbo"], ) - return [{**config, "model": "gpt-4"} for config in config_list] + [ - {**config, "model": "gpt-3.5-turbo"} for config in config_list - ] diff --git a/notebook/autogen_agent_auto_feedback_from_code_execution.ipynb b/notebook/autogen_agent_auto_feedback_from_code_execution.ipynb index ad817c0f2c..686e28d7ef 100644 --- a/notebook/autogen_agent_auto_feedback_from_code_execution.ipynb +++ b/notebook/autogen_agent_auto_feedback_from_code_execution.ipynb @@ -1,5 +1,13 @@ { "cells": [ + { + "attachments": {}, + "cell_type": "markdown", + "metadata": {}, + "source": [ + "\"Open" + ] + }, { "attachments": {}, "cell_type": "markdown", @@ -9,11 +17,11 @@ } }, "source": [ - "# Interactive LLM Agent\n", + "# Interactive LLM Agent with Auto Feedback from Code Execution\n", "\n", - "FLAML offers an experimental feature of interactive LLM agents, which can be used to solve various tasks, including coding and math problem-solving.\n", + "FLAML offers an experimental feature of interactive LLM agents, which can be used to solve various tasks with human or automatic feedback, including tasks that require using tools via code.\n", "\n", - "In this notebook, we demonstrate how to use `PythonAgent` and `UserProxyAgent` to write code and execute the code. Here `PythonAgent` is an LLM-based agent that can write Python code (in a Python coding block) for a user to execute for a given task. `UserProxyAgent` is an agent which serves as a proxy for the human user to execute the code written by `PythonAgent`, or automatically execute the code. Depending on the setting of `user_interaction_mode` and `max_consecutive_auto_reply`, the `UserProxyAgent` either solicits feedback from the human user or uses auto-feedback based on the result of code execution. For example, when `user_interaction_mode` is set to \"ALWAYS\", the `UserProxyAgent` will always prompt the user for feedback. When user feedback is provided, the `UserProxyAgent` will directly pass the feedback to `PythonAgent` without doing any additional steps. When no user feedback is provided, the `UserProxyAgent` will execute the code written by `PythonAgent` directly and return the execution results (success or failure and corresponding outputs) to `PythonAgent`.\n", + "In this notebook, we demonstrate how to use `AssistantAgent` and `UserProxyAgent` to write code and execute the code. Here `AssistantAgent` is an LLM-based agent that can write Python code (in a Python coding block) for a user to execute for a given task. `UserProxyAgent` is an agent which serves as a proxy for the human user to execute the code written by `AssistantAgent`, or automatically execute the code. Depending on the setting of `human_input_mode` and `max_consecutive_auto_reply`, the `UserProxyAgent` either solicits feedback from the human user or uses auto-feedback based on the result of code execution. For example, when `human_input_mode` is set to \"ALWAYS\", the `UserProxyAgent` will always prompt the user for feedback. When user feedback is provided, the `UserProxyAgent` will directly pass the feedback to `AssistantAgent` without doing any additional steps. When no user feedback is provided, the `UserProxyAgent` will execute the code written by `AssistantAgent` directly and return the execution results (success or failure and corresponding outputs) to `AssistantAgent`.\n", "\n", "## Requirements\n", "\n", @@ -25,7 +33,7 @@ }, { "cell_type": "code", - "execution_count": null, + "execution_count": 1, "metadata": { "execution": { "iopub.execute_input": "2023-02-13T23:40:52.317406Z", @@ -52,18 +60,18 @@ "- Azure OpenAI API key: os.environ[\"AZURE_OPENAI_API_KEY\"] or `aoai_api_key_file=\"key_aoai.txt\"`. Multiple keys can be stored, one per line.\n", "- Azure OpenAI API base: os.environ[\"AZURE_OPENAI_API_BASE\"] or `aoai_api_base_file=\"base_aoai.txt\"`. Multiple bases can be stored, one per line.\n", "\n", - "It's OK to have only the OpenAI API key, or only the Azure Open API key + base.\n" + "It's OK to have only the OpenAI API key, or only the Azure OpenAI API key + base.\n" ] }, { "cell_type": "code", - "execution_count": 1, + "execution_count": 2, "metadata": {}, "outputs": [], "source": [ "from flaml import oai\n", "\n", - "config_list = oai.config_list_gpt4_gpt35()" + "config_list = oai.config_list_from_models(model_list=[\"gpt-4\"])" ] }, { @@ -71,14 +79,46 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "## Example Task: Create and execute a python script with agents\n", + "The config list looks like the following:\n", + "```python\n", + "config_list = [\n", + " {\n", + " 'model': 'gpt-4',\n", + " 'api_key': '',\n", + " }, # only if OpenAI API key is found\n", + " {\n", + " 'model': 'gpt-4',\n", + " 'api_key': '',\n", + " 'api_base': '',\n", + " 'api_type': 'azure',\n", + " 'api_version': '2023-03-15-preview',\n", + " }, # only if the at least one Azure OpenAI API key is found\n", + " {\n", + " 'model': 'gpt-4',\n", + " 'api_key': '',\n", + " 'api_base': '',\n", + " 'api_type': 'azure',\n", + " 'api_version': '2023-03-15-preview',\n", + " }, # only if the second Azure OpenAI API key is found\n", + "]\n", + "```\n", "\n", - "In the example below, let's see how to use the agents in FLAML to write a python script and execute the script. This process involves constructing a `PythonAgent` to serve as the assistant, along with a `UserProxyAgent` that acts as a proxy for the human user. In this example demonstrated below, when constructing the `UserProxyAgent`, we select the `human_input_mode` to \"NEVER\". This means that the `UserProxyAgent` will not solicit feedback from the human user until the limit defined by `max_consecutive_auto_reply` is reached. For the purpose of this example, we've set this limit to 10." + "You can directly override it if the above function returns an empty list, i.e., it doesn't find the keys in the specified locations." + ] + }, + { + "attachments": {}, + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Example Task: Write Code to Draw a Plot\n", + "\n", + "In the example below, let's see how to use the agents in FLAML to write a python script and execute the script. This process involves constructing a `AssistantAgent` to serve as the assistant, along with a `UserProxyAgent` that acts as a proxy for the human user. In this example demonstrated below, when constructing the `UserProxyAgent`, we select the `human_input_mode` to \"NEVER\". This means that the `UserProxyAgent` will not solicit feedback from the human user. It stops replying when the limit defined by `max_consecutive_auto_reply` is reached, or when `is_termination_msg()` returns true for the received message." ] }, { "cell_type": "code", - "execution_count": 6, + "execution_count": 3, "metadata": {}, "outputs": [ { @@ -86,172 +126,84 @@ "output_type": "stream", "text": [ "\n", - "**** coding_agent received message from user ****\n", + " -------------------------------------------------------------------------------- \n", "\n", - "Create and execute a script to plot a rocket without using matplotlib\n", + "user (to assistant):\n", + "Draw a rocket and save to a file named 'rocket.svg'\n", "\n", - "**** user received message from coding_agent ****\n", - "\n", - "Creating a rocket involves using ASCII characters to display it visually. Here's a simple script to get you started:\n", + " -------------------------------------------------------------------------------- \n", "\n", + "assistant (to user):\n", "```python\n", - "# filename: rocket.py\n", - "def plot_rocket():\n", - " rocket = '''\n", - " |\n", - " /_\\\n", - " /^|^\\ \n", - " //| \\\\\n", - " // | \\\\\n", - " '''\n", - " print(rocket)\n", + "# filename: draw_rocket.py\n", + "import svgwrite\n", "\n", - "if __name__ == \"__main__\":\n", - " plot_rocket()\n", + "def draw_rocket():\n", + " dwg = svgwrite.Drawing('rocket.svg', profile='tiny')\n", + "\n", + " rocket_body_color = \"gray\"\n", + " rocket_fire_color = \"red\"\n", + "\n", + " # Draw the rocket body\n", + " dwg.add(dwg.rect(insert=(50, 20), size=(50, 100), stroke=rocket_body_color, fill=rocket_body_color))\n", + " \n", + " # Draw the rocket top\n", + " dwg.add(dwg.polygon(points=[(75, 0), (50, 20), (100, 20)], stroke=rocket_body_color, fill=rocket_body_color))\n", + " \n", + " # Draw the fins\n", + " dwg.add(dwg.polygon(points=[(50, 60), (30, 80), (50, 100)], stroke=rocket_body_color, fill=rocket_body_color))\n", + " dwg.add(dwg.polygon(points=[(100, 60), (120, 80), (100, 100)], stroke=rocket_body_color, fill=rocket_body_color))\n", + " \n", + " # Draw the rocket fire/flame\n", + " dwg.add(dwg.polygon(points=[(50, 120), (75, 160), (100, 120)], stroke=rocket_fire_color, fill=rocket_fire_color))\n", + "\n", + " dwg.save()\n", + "\n", + "if __name__ == '__main__':\n", + " draw_rocket()\n", "```\n", "\n", - "Save this code in a file named `rocket.py` and execute the script. After that, let me know the results.\n", + "1. Save the above Python code in a file named \"draw_rocket.py\".\n", + "2. Execute the script by running `python draw_rocket.py` in your terminal or command prompt.\n", + "3. The script will create a file named 'rocket.svg' in the same directory containing the rocket drawing.\n", "\n", - " >>>>>>>> NO HUMAN INPUT RECEIVED. USING AUTO REPLY FOR THE USER...\n", + ">>>>>>>> NO HUMAN INPUT RECEIVED. USING AUTO REPLY FOR THE USER...\n", "\n", - "**** coding_agent received message from user ****\n", + " -------------------------------------------------------------------------------- \n", "\n", + "user (to assistant):\n", "exitcode: 0 (execution succeeded)\n", "Code output: \n", - " |\n", - " /_ /^|^\\ \n", - " //| \\\n", - " // | \\\n", - " \n", "\n", "\n", - "**** user received message from coding_agent ****\n", + " -------------------------------------------------------------------------------- \n", "\n", - "It seems that there's an issue with the spacing in the rocket drawing. Let's fix that.\n", + "assistant (to user):\n", + "Great! The script executed successfully and the rocket drawing has been saved to a file named 'rocket.svg'. You can now open the 'rocket.svg' file to view the rocket illustration. \n", "\n", - "```python\n", - "# filename: rocket.py\n", - "def plot_rocket():\n", - " rocket = '''\n", - " |\n", - " /_\\\\\n", - " /^|^\\\\\n", - " //|\\\\\\\\\n", - " // | \\\\\\\\\n", - " '''\n", - " print(rocket)\n", - "\n", - "if __name__ == \"__main__\":\n", - " plot_rocket()\n", - "```\n", - "\n", - "Save this code in the existing `rocket.py` file and execute the script again. After that, let me know the results.\n", - "\n", - " >>>>>>>> NO HUMAN INPUT RECEIVED. USING AUTO REPLY FOR THE USER...\n", - "\n", - "**** coding_agent received message from user ****\n", - "\n", - "exitcode: 0 (execution succeeded)\n", - "Code output: \n", - " |\n", - " /_\\\n", - " /^|^\\\n", - " //|\\\\\n", - " // | \\\\\n", - " \n", - "\n", - "\n", - "**** user received message from coding_agent ****\n", - "\n", - "Great! The rocket is now displayed correctly. If you have any more tasks or modifications, please let me know. Otherwise, I hope you enjoyed the ASCII rocket!\n", + "If you have any further questions or need additional assistance, feel free to ask.\n", "\n", "TERMINATE\n" ] } ], "source": [ - "from flaml.autogen.agent.coding_agent import PythonAgent\n", - "from flaml.autogen.agent.user_proxy_agent import UserProxyAgent\n", + "from flaml.autogen.agent import AssistantAgent, UserProxyAgent\n", "\n", - "# create an assistant which is essentially a PythonAgent instance named \"coding_agent\"\n", - "assistant = PythonAgent(\"coding_agent\", request_timeout=600, seed=42, config_list=config_list)\n", + "# create an AssistantAgent named \"assistant\"\n", + "assistant = AssistantAgent(\"assistant\", request_timeout=600, seed=42, config_list=config_list)\n", "# create a UserProxyAgent instance named \"user\"\n", "user = UserProxyAgent(\n", " \"user\",\n", " human_input_mode=\"NEVER\",\n", " max_consecutive_auto_reply=10,\n", - " is_termination_msg=lambda x: x.rstrip().endswith(\"TERMINATE\"),\n", + " is_termination_msg=lambda x: x.rstrip().endswith(\"TERMINATE\") or x.rstrip().endswith('\"TERMINATE\".'),\n", + " work_dir=\"coding\",\n", + " use_docker=False, # set to True if you are using docker\n", ")\n", "# the assistant receives a message from the user, which contains the task description\n", "assistant.receive(\n", - " \"\"\"Create and execute a script to plot a rocket without using matplotlib\"\"\",\n", - " user,\n", - ")" - ] - }, - { - "attachments": {}, - "cell_type": "markdown", - "metadata": {}, - "source": [ - "Next, let's see how to use the agents to first write the generated script to a file and then execute the script in two sessions of conversation between the `PythonAgent` and the `UserProxyAgent`." - ] - }, - { - "cell_type": "code", - "execution_count": 7, - "metadata": {}, - "outputs": [ - { - "name": "stdout", - "output_type": "stream", - "text": [ - "\n", - "**** coding_agent received message from user ****\n", - "\n", - "Create a temp.py file with the following content:\n", - "```\n", - "print('Hello world!')\n", - "```\n", - "\n", - "**** user received message from coding_agent ****\n", - "\n", - "Here is the code to create the temp.py file with the specified content. Please execute this code:\n", - "\n", - "```python\n", - "with open('temp.py', 'w') as file:\n", - " file.write(\"print('Hello world!')\")\n", - "```\n", - "\n", - "After executing this code, you should have a file named temp.py with the content:\n", - "\n", - "```\n", - "print('Hello world!')\n", - "```\n", - "\n", - " >>>>>>>> NO HUMAN INPUT RECEIVED. USING AUTO REPLY FOR THE USER...\n", - "\n", - "**** coding_agent received message from user ****\n", - "\n", - "exitcode: 0 (execution succeeded)\n", - "Code output: \n", - "\n", - "**** user received message from coding_agent ****\n", - "\n", - "Great! The temp.py file has been created successfully. Now, you can run this file to see the output. If you need any further assistance, feel free to ask.\n", - "\n", - "TERMINATE\n" - ] - } - ], - "source": [ - "# it is suggested to reset the assistant to clear the state if the new task is not related to the previous one.\n", - "assistant.reset()\n", - "assistant.receive(\n", - " \"\"\"Create a temp.py file with the following content:\n", - " ```\n", - " print('Hello world!')\n", - " ```\"\"\",\n", + " \"\"\"Draw a rocket and save to a file named 'rocket.svg'\"\"\",\n", " user,\n", ")" ] @@ -263,12 +215,45 @@ "source": [ "The example above involves code execution. In FLAML, code execution is triggered automatically by the `UserProxyAgent` when it detects an executable code block in a received message and no human user input is provided. This process occurs in a designated working directory, using a Docker container by default. Unless a specific directory is specified, FLAML defaults to the `flaml/autogen/extensions` directory. Users have the option to specify a different working directory by setting the `work_dir` argument when constructing a new instance of the `UserProxyAgent`.\n", "\n", - "Upon successful execution of the preceding code block, a file named `temp.py` will be created and saved in the default working directory `flaml/autogen/extensions`. Now, let's prompt the assistant to execute the code contained within this file using the following line of code." + "Let's display the generated figure." ] }, { "cell_type": "code", - "execution_count": 8, + "execution_count": 4, + "metadata": {}, + "outputs": [ + { + "data": { + "image/svg+xml": [ + "" + ], + "text/plain": [ + "" + ] + }, + "metadata": {}, + "output_type": "display_data" + } + ], + "source": [ + "# uncomment the following to render the svg file\n", + "# from IPython.display import SVG, display\n", + "\n", + "# display(SVG(\"coding/rocket.svg\"))" + ] + }, + { + "attachments": {}, + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Example Task: Use Code to Check Stock Price Change" + ] + }, + { + "cell_type": "code", + "execution_count": 5, "metadata": {}, "outputs": [ { @@ -276,42 +261,190 @@ "output_type": "stream", "text": [ "\n", - "**** coding_agent received message from user ****\n", + " -------------------------------------------------------------------------------- \n", "\n", - "Execute temp.py\n", + "user (to assistant):\n", + "What date is today? Which big tech stock has the largest year-to-date gain this year? How much is the gain?\n", "\n", - "**** user received message from coding_agent ****\n", + " -------------------------------------------------------------------------------- \n", "\n", - "To execute temp.py, run the following code:\n", + "assistant (to user):\n", + "First, let's find out today's date. You can do that by running the following Python code:\n", "\n", "```python\n", - "import os\n", + "from datetime import datetime\n", "\n", - "os.system('python temp.py')\n", + "today = datetime.now()\n", + "print(\"Today's date:\", today.strftime(\"%B %d, %Y\"))\n", "```\n", "\n", - "This code imports the os module and then runs the temp.py file. After executing this code, you should see the output:\n", + "For the tech stock information, you will need to use an API or web scraping to fetch this data. I will show you how to do it using the `yfinance` library in Python. Before running the code, make sure you have the `yfinance` library installed by executing the following command:\n", "\n", - "Hello world!\n", + "```sh\n", + "pip install yfinance\n", + "```\n", "\n", - " >>>>>>>> NO HUMAN INPUT RECEIVED. USING AUTO REPLY FOR THE USER...\n", + "Now, we can fetch the stock information using the following Python code:\n", "\n", - "**** coding_agent received message from user ****\n", + "```python\n", + "import yfinance as yf\n", "\n", + "def get_stock_info(ticker):\n", + " stock = yf.Ticker(ticker)\n", + " stock_info = stock.history(\"ytd\")\n", + " current_price = stock_info[\"Close\"][-1]\n", + " start_price = stock_info[\"Close\"][0]\n", + " return (current_price - start_price) / start_price * 100\n", + "\n", + "tech_stocks = {\n", + " \"Apple\": \"AAPL\",\n", + " \"Microsoft\": \"MSFT\",\n", + " \"Amazon\": \"AMZN\",\n", + " \"Google\": \"GOOGL\",\n", + " \"Facebook\": \"FB\",\n", + "}\n", + "\n", + "ytd_gains = {stock: get_stock_info(ticker) for stock, ticker in tech_stocks.items()}\n", + "largest_gain = max(ytd_gains, key=ytd_gains.get)\n", + "print(f\"{largest_gain} has the largest year-to-date gain with {ytd_gains[largest_gain]:.2f}% gain.\")\n", + "```\n", + "\n", + "This script will print out the big tech stock with the largest year-to-date gain and the gain percentage.\n", + "\n", + ">>>>>>>> NO HUMAN INPUT RECEIVED. USING AUTO REPLY FOR THE USER...\n", + "\n", + " -------------------------------------------------------------------------------- \n", + "\n", + "user (to assistant):\n", + "exitcode: 1 (execution failed)\n", + "Code output: \n", + "Today's date: June 08, 2023\n", + "\n", + "Defaulting to user installation because normal site-packages is not writeable\n", + "Requirement already satisfied: yfinance in /home/vscode/.local/lib/python3.9/site-packages (0.2.18)\n", + "Requirement already satisfied: pandas>=1.3.0 in /usr/local/lib/python3.9/site-packages (from yfinance) (1.5.2)\n", + "Requirement already satisfied: numpy>=1.16.5 in /home/vscode/.local/lib/python3.9/site-packages (from yfinance) (1.23.5)\n", + "Requirement already satisfied: requests>=2.26 in /usr/local/lib/python3.9/site-packages (from yfinance) (2.28.1)\n", + "Requirement already satisfied: multitasking>=0.0.7 in /home/vscode/.local/lib/python3.9/site-packages (from yfinance) (0.0.11)\n", + "Requirement already satisfied: lxml>=4.9.1 in /home/vscode/.local/lib/python3.9/site-packages (from yfinance) (4.9.2)\n", + "Requirement already satisfied: appdirs>=1.4.4 in /home/vscode/.local/lib/python3.9/site-packages (from yfinance) (1.4.4)\n", + "Requirement already satisfied: pytz>=2022.5 in /usr/local/lib/python3.9/site-packages (from yfinance) (2022.6)\n", + "Requirement already satisfied: frozendict>=2.3.4 in /home/vscode/.local/lib/python3.9/site-packages (from yfinance) (2.3.8)\n", + "Requirement already satisfied: cryptography>=3.3.2 in /home/vscode/.local/lib/python3.9/site-packages (from yfinance) (38.0.4)\n", + "Requirement already satisfied: beautifulsoup4>=4.11.1 in /home/vscode/.local/lib/python3.9/site-packages (from yfinance) (4.11.1)\n", + "Requirement already satisfied: html5lib>=1.1 in /home/vscode/.local/lib/python3.9/site-packages (from yfinance) (1.1)\n", + "Requirement already satisfied: soupsieve>1.2 in /home/vscode/.local/lib/python3.9/site-packages (from beautifulsoup4>=4.11.1->yfinance) (2.3.2.post1)\n", + "Requirement already satisfied: cffi>=1.12 in /home/vscode/.local/lib/python3.9/site-packages (from cryptography>=3.3.2->yfinance) (1.15.1)\n", + "Requirement already satisfied: six>=1.9 in /usr/local/lib/python3.9/site-packages (from html5lib>=1.1->yfinance) (1.16.0)\n", + "Requirement already satisfied: webencodings in /home/vscode/.local/lib/python3.9/site-packages (from html5lib>=1.1->yfinance) (0.5.1)\n", + "Requirement already satisfied: python-dateutil>=2.8.1 in /usr/local/lib/python3.9/site-packages (from pandas>=1.3.0->yfinance) (2.8.2)\n", + "Requirement already satisfied: charset-normalizer<3,>=2 in /usr/local/lib/python3.9/site-packages (from requests>=2.26->yfinance) (2.1.1)\n", + "Requirement already satisfied: idna<4,>=2.5 in /usr/local/lib/python3.9/site-packages (from requests>=2.26->yfinance) (3.4)\n", + "Requirement already satisfied: urllib3<1.27,>=1.21.1 in /usr/local/lib/python3.9/site-packages (from requests>=2.26->yfinance) (1.26.13)\n", + "Requirement already satisfied: certifi>=2017.4.17 in /usr/local/lib/python3.9/site-packages (from requests>=2.26->yfinance) (2022.9.24)\n", + "Requirement already satisfied: pycparser in /home/vscode/.local/lib/python3.9/site-packages (from cffi>=1.12->cryptography>=3.3.2->yfinance) (2.21)\n", + "\n", + "Traceback (most recent call last):\n", + " File \"/workspaces/FLAML/notebook/coding/tmp_code_74e4297091e1a4a01622501c25dfb9db.py\", line 18, in \n", + " ytd_gains = {stock: get_stock_info(ticker) for stock, ticker in tech_stocks.items()}\n", + " File \"/workspaces/FLAML/notebook/coding/tmp_code_74e4297091e1a4a01622501c25dfb9db.py\", line 18, in \n", + " ytd_gains = {stock: get_stock_info(ticker) for stock, ticker in tech_stocks.items()}\n", + " File \"/workspaces/FLAML/notebook/coding/tmp_code_74e4297091e1a4a01622501c25dfb9db.py\", line 6, in get_stock_info\n", + " current_price = stock_info[\"Close\"][-1]\n", + " File \"/usr/local/lib/python3.9/site-packages/pandas/core/series.py\", line 978, in __getitem__\n", + " return self._values[key]\n", + "IndexError: index -1 is out of bounds for axis 0 with size 0\n", + "\n", + "\n", + " -------------------------------------------------------------------------------- \n", + "\n", + "assistant (to user):\n", + "Apologies for the error. It seems there might be an issue with `yfinance` fetching historical price data. To address the error, let's modify the code to include a try-except block to handle any errors when fetching stock data.\n", + "\n", + "Here's the updated code:\n", + "\n", + "```python\n", + "import yfinance as yf\n", + "\n", + "def get_stock_info(ticker):\n", + " try:\n", + " stock = yf.Ticker(ticker)\n", + " stock_info = stock.history(\"ytd\")\n", + " if stock_info.empty:\n", + " return None\n", + " current_price = stock_info[\"Close\"][-1]\n", + " start_price = stock_info[\"Close\"][0]\n", + " return (current_price - start_price) / start_price * 100\n", + " except Exception as e:\n", + " print(f\"Error fetching stock data for {ticker}: {e}\")\n", + " return None\n", + "\n", + "tech_stocks = {\n", + " \"Apple\": \"AAPL\",\n", + " \"Microsoft\": \"MSFT\",\n", + " \"Amazon\": \"AMZN\",\n", + " \"Google\": \"GOOGL\",\n", + " \"Facebook\": \"FB\",\n", + "}\n", + "\n", + "ytd_gains = {stock: get_stock_info(ticker) for stock, ticker in tech_stocks.items()}\n", + "ytd_gains = {stock: gain for stock, gain in ytd_gains.items() if gain is not None} # Remove stocks with errors\n", + "if ytd_gains:\n", + " largest_gain = max(ytd_gains, key=ytd_gains.get)\n", + " print(f\"{largest_gain} has the largest year-to-date gain with {ytd_gains[largest_gain]:.2f}% gain.\")\n", + "else:\n", + " print(\"Unable to retrieve stock information for any tech stocks.\")\n", + "```\n", + "\n", + "This code will handle any errors that may occur when fetching stock data and continue to the next one. It will also print out an error message for the stocks that failed to fetch data. Run the modified code and let me know the result.\n", + "\n", + ">>>>>>>> NO HUMAN INPUT RECEIVED. USING AUTO REPLY FOR THE USER...\n", + "\n", + " -------------------------------------------------------------------------------- \n", + "\n", + "user (to assistant):\n", "exitcode: 0 (execution succeeded)\n", - "Code output: Hello world!\n", + "Code output: \n", + "FB: No data found, symbol may be delisted\n", + "Apple has the largest year-to-date gain with 42.59% gain.\n", "\n", "\n", - "**** user received message from coding_agent ****\n", + " -------------------------------------------------------------------------------- \n", "\n", - "I'm glad that the code execution was successful and you got the desired output! If you need any further help or assistance with another task, feel free to ask.\n", + "assistant (to user):\n", + "Great! The updated code successfully fetched the stock data and determined that Apple has the largest year-to-date gain with 42.59%. Please note that the error message for Facebook stock (FB) indicates that no data was found, which may be due to the stock symbol being delisted or an issue with the `yfinance` library.\n", "\n", + "If you have any more questions or need further assistance, feel free to ask. Otherwise, type \"TERMINATE\" to end this session.\n", + "\n", + ">>>>>>>> NO HUMAN INPUT RECEIVED. USING AUTO REPLY FOR THE USER...\n", + "\n", + " -------------------------------------------------------------------------------- \n", + "\n", + "user (to assistant):\n", + "\n", + "\n", + " -------------------------------------------------------------------------------- \n", + "\n", + "assistant (to user):\n", "TERMINATE\n" ] } ], "source": [ - "assistant.receive(\"\"\"Execute temp.py\"\"\", user)" + "# it is suggested to reset the assistant to clear the state if the new task is not related to the previous one.\n", + "assistant.reset()\n", + "assistant.receive(\n", + " \"\"\"What date is today? Which big tech stock has the largest year-to-date gain this year? How much is the gain?\"\"\",\n", + " user,\n", + ")" + ] + }, + { + "attachments": {}, + "cell_type": "markdown", + "metadata": {}, + "source": [ + "All the feedback is auto generated." ] } ], @@ -331,7 +464,7 @@ "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", - "version": "3.9.16" + "version": "3.9.15" }, "vscode": { "interpreter": { diff --git a/notebook/autogen_agent_human_feedback.ipynb b/notebook/autogen_agent_human_feedback.ipynb new file mode 100644 index 0000000000..be702636ce --- /dev/null +++ b/notebook/autogen_agent_human_feedback.ipynb @@ -0,0 +1,776 @@ +{ + "cells": [ + { + "attachments": {}, + "cell_type": "markdown", + "metadata": {}, + "source": [ + "\"Open" + ] + }, + { + "attachments": {}, + "cell_type": "markdown", + "metadata": { + "slideshow": { + "slide_type": "slide" + } + }, + "source": [ + "# Interactive LLM Agent with Human Feedback\n", + "\n", + "FLAML offers an experimental feature of interactive LLM agents, which can be used to solve various tasks with human or automatic feedback, including tasks that require using tools via code.\n", + "\n", + "In this notebook, we demonstrate how to use `AssistantAgent` and `UserProxyAgent` to solve a challenging math problem with human feedback. Here `AssistantAgent` is an LLM-based agent that can write Python code (in a Python coding block) for a user to execute for a given task. `UserProxyAgent` is an agent which serves as a proxy for a user to execute the code written by `AssistantAgent`. By setting `human_input_mode` properly, the `UserProxyAgent` can also prompt the user for feedback to `AssistantAgent`. For example, when `human_input_mode` is set to \"ALWAYS\", the `UserProxyAgent` will always prompt the user for feedback. When user feedback is provided, the `UserProxyAgent` will directly pass the feedback to `AssistantAgent` without doing any additional steps. When no user feedback is provided, the `UserProxyAgent` will execute the code written by `AssistantAgent` directly and return the execution results (success or failure and corresponding outputs) to `AssistantAgent`.\n", + "\n", + "## Requirements\n", + "\n", + "FLAML requires `Python>=3.7`. To run this notebook example, please install flaml with the [autogen] option:\n", + "```bash\n", + "pip install flaml[autogen]\n", + "```" + ] + }, + { + "cell_type": "code", + "execution_count": 1, + "metadata": { + "execution": { + "iopub.execute_input": "2023-02-13T23:40:52.317406Z", + "iopub.status.busy": "2023-02-13T23:40:52.316561Z", + "iopub.status.idle": "2023-02-13T23:40:52.321193Z", + "shell.execute_reply": "2023-02-13T23:40:52.320628Z" + } + }, + "outputs": [], + "source": [ + "# %pip install flaml[autogen]" + ] + }, + { + "attachments": {}, + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Set your API Endpoint\n", + "\n", + "The [`config_list_gpt4_gpt35`](https://microsoft.github.io/FLAML/docs/reference/autogen/oai/openai_utils#config_list_gpt4_gpt35) function tries to create a list of gpt-4 and gpt-3.5 configurations using Azure OpenAI endpoints and OpenAI endpoints. It assumes the api keys and api bases are stored in the corresponding environment variables or local txt files:\n", + "\n", + "- OpenAI API key: os.environ[\"OPENAI_API_KEY\"] or `openai_api_key_file=\"key_openai.txt\"`.\n", + "- Azure OpenAI API key: os.environ[\"AZURE_OPENAI_API_KEY\"] or `aoai_api_key_file=\"key_aoai.txt\"`. Multiple keys can be stored, one per line.\n", + "- Azure OpenAI API base: os.environ[\"AZURE_OPENAI_API_BASE\"] or `aoai_api_base_file=\"base_aoai.txt\"`. Multiple bases can be stored, one per line.\n", + "\n", + "It's OK to have only the OpenAI API key, or only the Azure OpenAI API key + base.\n" + ] + }, + { + "cell_type": "code", + "execution_count": 2, + "metadata": {}, + "outputs": [], + "source": [ + "from flaml import oai\n", + "\n", + "config_list = oai.config_list_gpt4_gpt35()" + ] + }, + { + "attachments": {}, + "cell_type": "markdown", + "metadata": {}, + "source": [ + "The config list looks like the following:\n", + "```python\n", + "config_list = [\n", + " {\n", + " 'model': 'gpt-4',\n", + " 'api_key': '',\n", + " }, # only if OpenAI API key is found\n", + " {\n", + " 'model': 'gpt-4',\n", + " 'api_key': '',\n", + " 'api_base': '',\n", + " 'api_type': 'azure',\n", + " 'api_version': '2023-03-15-preview',\n", + " }, # only if the at least one Azure OpenAI API key is found\n", + " {\n", + " 'model': 'gpt-4',\n", + " 'api_key': '',\n", + " 'api_base': '',\n", + " 'api_type': 'azure',\n", + " 'api_version': '2023-03-15-preview',\n", + " }, # only if the second Azure OpenAI API key is found\n", + " {\n", + " 'model': 'gpt-3.5-turbo',\n", + " 'api_key': '',\n", + " }, # only if OpenAI API key is found\n", + " {\n", + " 'model': 'gpt-3.5-turbo',\n", + " 'api_key': '',\n", + " 'api_base': '',\n", + " 'api_type': 'azure',\n", + " 'api_version': '2023-03-15-preview',\n", + " }, # only if the at least one Azure OpenAI API key is found\n", + " {\n", + " 'model': 'gpt-3.5-turbo',\n", + " 'api_key': '',\n", + " 'api_base': '',\n", + " 'api_type': 'azure',\n", + " 'api_version': '2023-03-15-preview',\n", + " }, # only if the second Azure OpenAI API key is found\n", + "]\n", + "```\n", + "\n", + "You can directly override it if the above function returns an empty list, i.e., it doesn't find the keys in the specified locations." + ] + }, + { + "attachments": {}, + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Construct Agents\n", + "\n", + "We construct the assistant agent and the user proxy agent." + ] + }, + { + "cell_type": "code", + "execution_count": 3, + "metadata": {}, + "outputs": [], + "source": [ + "from flaml.autogen.agent import AssistantAgent, UserProxyAgent\n", + "\n", + "# create an AssistantAgent instance named \"assistant\"\n", + "assistant = AssistantAgent(name=\"assistant\", request_timeout=600, seed=42, config_list=config_list)\n", + "# create a UserProxyAgent instance named \"user\"\n", + "user = UserProxyAgent(\n", + " name=\"user\",\n", + " human_input_mode=\"ALWAYS\",\n", + " is_termination_msg=lambda x: x.rstrip().endswith(\"TERMINATE\"),\n", + ")\n", + "\n", + "# the purpose of the following line is to log the conversation history\n", + "oai.ChatCompletion.start_logging()\n" + ] + }, + { + "attachments": {}, + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Perform a task\n", + "\n", + "We invoke the `receive()` method of the assistant agent to start the conversation. When you run the cell below, you will be prompted to provide feedback after receving a message from the assistant agent. If you don't provide any feedback (by pressing Enter directly), the user proxy agent will try to execute the code suggested by the assistant agent on behalf of you, or terminate if the assistant agent sends a \"TERMINATE\" signal in the end of the message." + ] + }, + { + "cell_type": "code", + "execution_count": 4, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "\n", + " -------------------------------------------------------------------------------- \n", + "\n", + "user (to assistant):\n", + "\n", + " Find $a + b + c$, given that $x+y \\neq -1$ and \n", + " \\begin{align}\n", + "\tax + by + c & = x + 7,\\\n", + "\ta + bx + cy & = 2x + 6y,\\\n", + "\tay + b + cx & = 4x + y.\n", + "\t\\end{align}.\n", + " \n", + "\n", + " -------------------------------------------------------------------------------- \n", + "\n", + "assistant (to user):\n", + "To find the values of $a$, $b$, and $c$, we can use the system of linear equations and solve them using a Python script. Let's use the numpy library to solve the system of linear equations.\n", + "\n", + "```python\n", + "import numpy as np\n", + "\n", + "# Define the coefficients matrix (A) and the constant matrix (B)\n", + "A = np.array([[1, 1, 1], [1, 1, 1], [1, 1, 1]])\n", + "B = np.array([7, 6, 4])\n", + "\n", + "# Solve the system of linear equations using numpy\n", + "variables = np.linalg.solve(A, B)\n", + "a, b, c = variables\n", + "\n", + "# Print the values of a, b, and c\n", + "print(\"a =\", a, \"b =\", b, \"c =\", c)\n", + "\n", + "# Calculate and print the sum of a, b, and c\n", + "sum_abc = a + b + c\n", + "print(\"a + b + c =\", sum_abc)\n", + "```\n", + "\n", + "Please run this Python code to calculate the values of $a$, $b$, $c$, and the sum of $a + b + c$.\n", + "\n", + " -------------------------------------------------------------------------------- \n", + "\n", + "user (to assistant):\n", + "Are you assuming x=y=1?\n", + "\n", + " -------------------------------------------------------------------------------- \n", + "\n", + "assistant (to user):\n", + "My apologies for the misunderstanding. I mistakenly assumed x = y = 1 in the previous code, which should not have been the case. I will provide the corrected Python code to solve the given system of linear equations without any assumptions.\n", + "\n", + "```python\n", + "import sympy as sp\n", + "\n", + "a, b, c, x, y = sp.symbols('a b c x y')\n", + "\n", + "eq1 = a * x + b * y + c - (x + 7)\n", + "eq2 = a + b * x + c * y - (2 * x + 6 * y)\n", + "eq3 = a * y + b + c * x - (4 * x + y)\n", + "\n", + "# Solve the system of linear equations using sympy\n", + "solutions = sp.solve([eq1, eq2, eq3], (a, b, c))\n", + "\n", + "# Display the solutions for a, b, and c\n", + "print(\"a =\", solutions[a], \"b =\", solutions[b], \"c =\", solutions[c])\n", + "\n", + "# Calculate and print the sum of a, b, and c\n", + "sum_abc = solutions[a] + solutions[b] + solutions[c]\n", + "print(\"a + b + c =\", sum_abc)\n", + "```\n", + "\n", + "Please run this Python code to calculate the values of $a$, $b$, $c$, and the sum of $a + b + c$ without any incorrect assumptions.\n", + "\n", + ">>>>>>>> NO HUMAN INPUT RECEIVED. USING AUTO REPLY FOR THE USER...\n", + "\n", + " -------------------------------------------------------------------------------- \n", + "\n", + "user (to assistant):\n", + "exitcode: 0 (execution succeeded)\n", + "Code output: \n", + "a = (x**2 - 3*x*y + 2*x + y**2 - y)/(x**2 - x*y - x + y**2 - y + 1) b = (2*x**2 - 3*x + y)/(x**2 - x*y - x + y**2 - y + 1) c = (4*x**2 - 4*x*y - 6*x + 6*y**2 - 7*y + 7)/(x**2 - x*y - x + y**2 - y + 1)\n", + "a + b + c = (2*x**2 - 3*x + y)/(x**2 - x*y - x + y**2 - y + 1) + (x**2 - 3*x*y + 2*x + y**2 - y)/(x**2 - x*y - x + y**2 - y + 1) + (4*x**2 - 4*x*y - 6*x + 6*y**2 - 7*y + 7)/(x**2 - x*y - x + y**2 - y + 1)\n", + "\n", + "\n", + " -------------------------------------------------------------------------------- \n", + "\n", + "assistant (to user):\n", + "The solutions for $a$, $b$, $c$, and the sum of $a + b + c$ are expressed in terms of $x$ and $y$. Here are the solutions:\n", + "\n", + "- $a = \\frac{x^2 - 3xy + 2x + y^2 - y}{x^2 - xy - x + y^2 - y + 1}$\n", + "- $b = \\frac{2x^2 - 3x + y}{x^2 - xy - x + y^2 - y + 1}$\n", + "- $c = \\frac{4x^2 - 4xy - 6x + 6y^2 - 7y + 7}{x^2 - xy - x + y^2 - y + 1}$\n", + "\n", + "And the sum of $a + b + c$ is:\n", + "\n", + "- $a + b + c = \\frac{2x^2 - 3x + y}{x^2 - xy - x + y^2 - y + 1} + \\frac{x^2 - 3xy + 2x + y^2 - y}{x^2 - xy - x + y^2 - y + 1} + \\frac{4x^2 - 4xy - 6x + 6y^2 - 7y + 7}{x^2 - xy - x + y^2 - y + 1}$\n", + "\n", + "TERMINATE\n", + "\n", + " -------------------------------------------------------------------------------- \n", + "\n", + "user (to assistant):\n", + "can you simplify the answer?\n", + "\n", + " -------------------------------------------------------------------------------- \n", + "\n", + "assistant (to user):\n", + "I apologize for the confusion in my previous response. I will simplify the sum of $a + b + c$ for you. Here is the updated Python code with the simplified result:\n", + "\n", + "```python\n", + "import sympy as sp\n", + "\n", + "a, b, c, x, y = sp.symbols('a b c x y')\n", + "\n", + "eq1 = a * x + b * y + c - (x + 7)\n", + "eq2 = a + b * x + c * y - (2 * x + 6 * y)\n", + "eq3 = a * y + b + c * x - (4 * x + y)\n", + "\n", + "# Solve the system of linear equations using sympy\n", + "solutions = sp.solve([eq1, eq2, eq3], (a, b, c))\n", + "\n", + "# Calculate the sum of a, b, and c\n", + "sum_abc = solutions[a] + solutions[b] + solutions[c]\n", + "\n", + "# Simplify the sum of a, b, and c\n", + "simplified_sum_abc = sp.simplify(sum_abc)\n", + "\n", + "# Print the simplified sum of a, b, and c\n", + "print(\"Simplified sum of a + b + c =\", simplified_sum_abc)\n", + "```\n", + "\n", + "Please run this Python code to obtain the simplified sum of $a + b + c$.\n", + "\n", + ">>>>>>>> NO HUMAN INPUT RECEIVED. USING AUTO REPLY FOR THE USER...\n", + "\n", + " -------------------------------------------------------------------------------- \n", + "\n", + "user (to assistant):\n", + "exitcode: 0 (execution succeeded)\n", + "Code output: \n", + "Simplified sum of a + b + c = 7\n", + "\n", + "\n", + " -------------------------------------------------------------------------------- \n", + "\n", + "assistant (to user):\n", + "Great! The simplified sum of $a + b + c$ is 7.\n", + "\n", + "TERMINATE\n" + ] + } + ], + "source": [ + "math_problem_to_solve = \"\"\"\n", + " Find $a + b + c$, given that $x+y \\\\neq -1$ and \n", + " \\\\begin{align}\n", + "\tax + by + c & = x + 7,\\\\\n", + "\ta + bx + cy & = 2x + 6y,\\\\\n", + "\tay + b + cx & = 4x + y.\n", + "\t\\\\end{align}.\n", + " \"\"\"\n", + "\n", + "# the assistant receives a message from the user, which contains the task description\n", + "assistant.receive(math_problem_to_solve, user)" + ] + }, + { + "attachments": {}, + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Analyze the conversation\n", + "\n", + "The human user can provide feedback at each step. When the human user didn't provide feedback, the code was executed. The executed results and error messages are returned to the assistant, and the assistant was able to modify the code to rectify. In the end, the task is complete and a \"TERMINATE\" signal is sent from the assistant. The user skipped feedback in the end and the conversation is finished.\n", + "\n", + "After the conversation is finished, we can save the log of the conversation between the two agents. The log can be accessed from `oai.ChatCompletion.logged_history`." + ] + }, + { + "cell_type": "code", + "execution_count": 5, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "{'[{\"content\": \"You are a helpful AI assistant.\\\\n In the following cases, suggest python code (in a python coding block) or shell script (in a sh coding block) for the user to execute. You must indicate the script type in the code block.\\\\n 1. When you need to ask the user for some info, use the code to output the info you need, for example, browse or search the web, download/read a file.\\\\n 2. When you need to perform some task with code, use the code to perform the task and output the result. Finish the task smartly. Solve the task step by step if you need to.\\\\n If you want the user to save the code in a file before executing it, put # filename: inside the code block as the first line. Don\\'t include multiple code blocks in one response. Do not ask users to copy and paste the result. Instead, use \\'print\\' function for the output when relevant. Check the execution result returned by the user.\\\\n If the result indicates there is an error, fix the error and output the code again. Suggeset the full code instead of partial code or code changes.\\\\n Reply \\\\\"TERMINATE\\\\\" in the end when everything is done.\\\\n \", \"role\": \"system\"}, {\"content\": \"\\\\n Find $a + b + c$, given that $x+y \\\\\\\\neq -1$ and \\\\n \\\\\\\\begin{align}\\\\n\\\\tax + by + c & = x + 7,\\\\\\\\\\\\n\\\\ta + bx + cy & = 2x + 6y,\\\\\\\\\\\\n\\\\tay + b + cx & = 4x + y.\\\\n\\\\t\\\\\\\\end{align}.\\\\n \", \"role\": \"user\"}, {\"content\": \"To find the values of $a$, $b$, and $c$, we can use the system of linear equations and solve them using a Python script. Let\\'s use the numpy library to solve the system of linear equations.\\\\n\\\\n```python\\\\nimport numpy as np\\\\n\\\\n# Define the coefficients matrix (A) and the constant matrix (B)\\\\nA = np.array([[1, 1, 1], [1, 1, 1], [1, 1, 1]])\\\\nB = np.array([7, 6, 4])\\\\n\\\\n# Solve the system of linear equations using numpy\\\\nvariables = np.linalg.solve(A, B)\\\\na, b, c = variables\\\\n\\\\n# Print the values of a, b, and c\\\\nprint(\\\\\"a =\\\\\", a, \\\\\"b =\\\\\", b, \\\\\"c =\\\\\", c)\\\\n\\\\n# Calculate and print the sum of a, b, and c\\\\nsum_abc = a + b + c\\\\nprint(\\\\\"a + b + c =\\\\\", sum_abc)\\\\n```\\\\n\\\\nPlease run this Python code to calculate the values of $a$, $b$, $c$, and the sum of $a + b + c$.\", \"role\": \"assistant\"}, {\"content\": \"Are you assuming x=y=1?\", \"role\": \"user\"}, {\"content\": \"My apologies for the misunderstanding. I mistakenly assumed x = y = 1 in the previous code, which should not have been the case. I will provide the corrected Python code to solve the given system of linear equations without any assumptions.\\\\n\\\\n```python\\\\nimport sympy as sp\\\\n\\\\na, b, c, x, y = sp.symbols(\\'a b c x y\\')\\\\n\\\\neq1 = a * x + b * y + c - (x + 7)\\\\neq2 = a + b * x + c * y - (2 * x + 6 * y)\\\\neq3 = a * y + b + c * x - (4 * x + y)\\\\n\\\\n# Solve the system of linear equations using sympy\\\\nsolutions = sp.solve([eq1, eq2, eq3], (a, b, c))\\\\n\\\\n# Display the solutions for a, b, and c\\\\nprint(\\\\\"a =\\\\\", solutions[a], \\\\\"b =\\\\\", solutions[b], \\\\\"c =\\\\\", solutions[c])\\\\n\\\\n# Calculate and print the sum of a, b, and c\\\\nsum_abc = solutions[a] + solutions[b] + solutions[c]\\\\nprint(\\\\\"a + b + c =\\\\\", sum_abc)\\\\n```\\\\n\\\\nPlease run this Python code to calculate the values of $a$, $b$, $c$, and the sum of $a + b + c$ without any incorrect assumptions.\", \"role\": \"assistant\"}, {\"content\": \"exitcode: 0 (execution succeeded)\\\\nCode output: \\\\na = (x**2 - 3*x*y + 2*x + y**2 - y)/(x**2 - x*y - x + y**2 - y + 1) b = (2*x**2 - 3*x + y)/(x**2 - x*y - x + y**2 - y + 1) c = (4*x**2 - 4*x*y - 6*x + 6*y**2 - 7*y + 7)/(x**2 - x*y - x + y**2 - y + 1)\\\\na + b + c = (2*x**2 - 3*x + y)/(x**2 - x*y - x + y**2 - y + 1) + (x**2 - 3*x*y + 2*x + y**2 - y)/(x**2 - x*y - x + y**2 - y + 1) + (4*x**2 - 4*x*y - 6*x + 6*y**2 - 7*y + 7)/(x**2 - x*y - x + y**2 - y + 1)\\\\n\", \"role\": \"user\"}, {\"content\": \"The solutions for $a$, $b$, $c$, and the sum of $a + b + c$ are expressed in terms of $x$ and $y$. Here are the solutions:\\\\n\\\\n- $a = \\\\\\\\frac{x^2 - 3xy + 2x + y^2 - y}{x^2 - xy - x + y^2 - y + 1}$\\\\n- $b = \\\\\\\\frac{2x^2 - 3x + y}{x^2 - xy - x + y^2 - y + 1}$\\\\n- $c = \\\\\\\\frac{4x^2 - 4xy - 6x + 6y^2 - 7y + 7}{x^2 - xy - x + y^2 - y + 1}$\\\\n\\\\nAnd the sum of $a + b + c$ is:\\\\n\\\\n- $a + b + c = \\\\\\\\frac{2x^2 - 3x + y}{x^2 - xy - x + y^2 - y + 1} + \\\\\\\\frac{x^2 - 3xy + 2x + y^2 - y}{x^2 - xy - x + y^2 - y + 1} + \\\\\\\\frac{4x^2 - 4xy - 6x + 6y^2 - 7y + 7}{x^2 - xy - x + y^2 - y + 1}$\\\\n\\\\nTERMINATE\", \"role\": \"assistant\"}, {\"content\": \"can you simplify the answer?\", \"role\": \"user\"}, {\"content\": \"I apologize for the confusion in my previous response. I will simplify the sum of $a + b + c$ for you. Here is the updated Python code with the simplified result:\\\\n\\\\n```python\\\\nimport sympy as sp\\\\n\\\\na, b, c, x, y = sp.symbols(\\'a b c x y\\')\\\\n\\\\neq1 = a * x + b * y + c - (x + 7)\\\\neq2 = a + b * x + c * y - (2 * x + 6 * y)\\\\neq3 = a * y + b + c * x - (4 * x + y)\\\\n\\\\n# Solve the system of linear equations using sympy\\\\nsolutions = sp.solve([eq1, eq2, eq3], (a, b, c))\\\\n\\\\n# Calculate the sum of a, b, and c\\\\nsum_abc = solutions[a] + solutions[b] + solutions[c]\\\\n\\\\n# Simplify the sum of a, b, and c\\\\nsimplified_sum_abc = sp.simplify(sum_abc)\\\\n\\\\n# Print the simplified sum of a, b, and c\\\\nprint(\\\\\"Simplified sum of a + b + c =\\\\\", simplified_sum_abc)\\\\n```\\\\n\\\\nPlease run this Python code to obtain the simplified sum of $a + b + c$.\", \"role\": \"assistant\"}, {\"content\": \"exitcode: 0 (execution succeeded)\\\\nCode output: \\\\nSimplified sum of a + b + c = 7\\\\n\", \"role\": \"user\"}, {\"content\": \"Great! The simplified sum of $a + b + c$ is 7.\\\\n\\\\nTERMINATE\", \"role\": \"assistant\"}]': {'created_at': [0, 1, 2, 3, 4], 'cost': [0.02448, 0.03471, 0.0528, 0.05976000000000001, 0.054060000000000004]}}\n" + ] + } + ], + "source": [ + "print(oai.ChatCompletion.logged_history)" + ] + }, + { + "cell_type": "code", + "execution_count": 6, + "metadata": {}, + "outputs": [], + "source": [ + "import json\n", + "\n", + "json.dump(oai.ChatCompletion.logged_history, open(\"conversations.json\", \"w\"), indent=2)" + ] + } + ], + "metadata": { + "kernelspec": { + "display_name": "Python 3", + "language": "python", + "name": "python3" + }, + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3", + "version": "3.9.15" + }, + "vscode": { + "interpreter": { + "hash": "949777d72b0d2535278d3dc13498b2535136f6dfe0678499012e853ee9abcab1" + } + }, + "widgets": { + "application/vnd.jupyter.widget-state+json": { + "state": { + "2d910cfd2d2a4fc49fc30fbbdc5576a7": { + "model_module": "@jupyter-widgets/base", + "model_module_version": "2.0.0", + "model_name": "LayoutModel", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "2.0.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "2.0.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border_bottom": null, + "border_left": null, + "border_right": null, + "border_top": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": null + } + }, + "454146d0f7224f038689031002906e6f": { + "model_module": "@jupyter-widgets/controls", + "model_module_version": "2.0.0", + "model_name": "HBoxModel", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "2.0.0", + "_model_name": "HBoxModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "2.0.0", + "_view_name": "HBoxView", + "box_style": "", + "children": [ + "IPY_MODEL_e4ae2b6f5a974fd4bafb6abb9d12ff26", + "IPY_MODEL_577e1e3cc4db4942b0883577b3b52755", + "IPY_MODEL_b40bdfb1ac1d4cffb7cefcb870c64d45" + ], + "layout": "IPY_MODEL_dc83c7bff2f241309537a8119dfc7555", + "tabbable": null, + "tooltip": null + } + }, + "577e1e3cc4db4942b0883577b3b52755": { + "model_module": "@jupyter-widgets/controls", + "model_module_version": "2.0.0", + "model_name": "FloatProgressModel", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "2.0.0", + "_model_name": "FloatProgressModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "2.0.0", + "_view_name": "ProgressView", + "bar_style": "success", + "description": "", + "description_allow_html": false, + "layout": "IPY_MODEL_2d910cfd2d2a4fc49fc30fbbdc5576a7", + "max": 1, + "min": 0, + "orientation": "horizontal", + "style": "IPY_MODEL_74a6ba0c3cbc4051be0a83e152fe1e62", + "tabbable": null, + "tooltip": null, + "value": 1 + } + }, + "6086462a12d54bafa59d3c4566f06cb2": { + "model_module": "@jupyter-widgets/base", + "model_module_version": "2.0.0", + "model_name": "LayoutModel", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "2.0.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "2.0.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border_bottom": null, + "border_left": null, + "border_right": null, + "border_top": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": null + } + }, + "74a6ba0c3cbc4051be0a83e152fe1e62": { + "model_module": "@jupyter-widgets/controls", + "model_module_version": "2.0.0", + "model_name": "ProgressStyleModel", + "state": { + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "2.0.0", + "_model_name": "ProgressStyleModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "2.0.0", + "_view_name": "StyleView", + "bar_color": null, + "description_width": "" + } + }, + "7d3f3d9e15894d05a4d188ff4f466554": { + "model_module": "@jupyter-widgets/controls", + "model_module_version": "2.0.0", + "model_name": "HTMLStyleModel", + "state": { + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "2.0.0", + "_model_name": "HTMLStyleModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "2.0.0", + "_view_name": "StyleView", + "background": null, + "description_width": "", + "font_size": null, + "text_color": null + } + }, + "b40bdfb1ac1d4cffb7cefcb870c64d45": { + "model_module": "@jupyter-widgets/controls", + "model_module_version": "2.0.0", + "model_name": "HTMLModel", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "2.0.0", + "_model_name": "HTMLModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "2.0.0", + "_view_name": "HTMLView", + "description": "", + "description_allow_html": false, + "layout": "IPY_MODEL_f1355871cc6f4dd4b50d9df5af20e5c8", + "placeholder": "​", + "style": "IPY_MODEL_ca245376fd9f4354af6b2befe4af4466", + "tabbable": null, + "tooltip": null, + "value": " 1/1 [00:00<00:00, 44.69it/s]" + } + }, + "ca245376fd9f4354af6b2befe4af4466": { + "model_module": "@jupyter-widgets/controls", + "model_module_version": "2.0.0", + "model_name": "HTMLStyleModel", + "state": { + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "2.0.0", + "_model_name": "HTMLStyleModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "2.0.0", + "_view_name": "StyleView", + "background": null, + "description_width": "", + "font_size": null, + "text_color": null + } + }, + "dc83c7bff2f241309537a8119dfc7555": { + "model_module": "@jupyter-widgets/base", + "model_module_version": "2.0.0", + "model_name": "LayoutModel", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "2.0.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "2.0.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border_bottom": null, + "border_left": null, + "border_right": null, + "border_top": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": null + } + }, + "e4ae2b6f5a974fd4bafb6abb9d12ff26": { + "model_module": "@jupyter-widgets/controls", + "model_module_version": "2.0.0", + "model_name": "HTMLModel", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "2.0.0", + "_model_name": "HTMLModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "2.0.0", + "_view_name": "HTMLView", + "description": "", + "description_allow_html": false, + "layout": "IPY_MODEL_6086462a12d54bafa59d3c4566f06cb2", + "placeholder": "​", + "style": "IPY_MODEL_7d3f3d9e15894d05a4d188ff4f466554", + "tabbable": null, + "tooltip": null, + "value": "100%" + } + }, + "f1355871cc6f4dd4b50d9df5af20e5c8": { + "model_module": "@jupyter-widgets/base", + "model_module_version": "2.0.0", + "model_name": "LayoutModel", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "2.0.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "2.0.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border_bottom": null, + "border_left": null, + "border_right": null, + "border_top": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": null + } + } + }, + "version_major": 2, + "version_minor": 0 + } + } + }, + "nbformat": 4, + "nbformat_minor": 2 +} diff --git a/notebook/autogen_agent_web_info.ipynb b/notebook/autogen_agent_web_info.ipynb new file mode 100644 index 0000000000..2b11cd09c8 --- /dev/null +++ b/notebook/autogen_agent_web_info.ipynb @@ -0,0 +1,607 @@ +{ + "cells": [ + { + "attachments": {}, + "cell_type": "markdown", + "metadata": {}, + "source": [ + "\"Open" + ] + }, + { + "attachments": {}, + "cell_type": "markdown", + "metadata": { + "slideshow": { + "slide_type": "slide" + } + }, + "source": [ + "# Interactive LLM Agent Dealing with Web Info\n", + "\n", + "FLAML offers an experimental feature of interactive LLM agents, which can be used to solve various tasks with human or automatic feedback, including tasks that require using tools via code.\n", + "\n", + "In this notebook, we demonstrate how to use `AssistantAgent` and `UserProxyAgent` to discuss a paper based on its URL. Here `AssistantAgent` is an LLM-based agent that can write Python code (in a Python coding block) for a user to execute for a given task. `UserProxyAgent` is an agent which serves as a proxy for a user to execute the code written by `AssistantAgent`. By setting `human_input_mode` properly, the `UserProxyAgent` can also prompt the user for feedback to `AssistantAgent`. For example, when `human_input_mode` is set to \"ALWAYS\", the `UserProxyAgent` will always prompt the user for feedback. When user feedback is provided, the `UserProxyAgent` will directly pass the feedback to `AssistantAgent` without doing any additional steps. When no user feedback is provided, the `UserProxyAgent` will execute the code written by `AssistantAgent` directly and return the execution results (success or failure and corresponding outputs) to `AssistantAgent`.\n", + "\n", + "## Requirements\n", + "\n", + "FLAML requires `Python>=3.7`. To run this notebook example, please install flaml with the [openai] option:\n", + "```bash\n", + "pip install flaml[autogen]\n", + "```" + ] + }, + { + "cell_type": "code", + "execution_count": 1, + "metadata": { + "execution": { + "iopub.execute_input": "2023-02-13T23:40:52.317406Z", + "iopub.status.busy": "2023-02-13T23:40:52.316561Z", + "iopub.status.idle": "2023-02-13T23:40:52.321193Z", + "shell.execute_reply": "2023-02-13T23:40:52.320628Z" + } + }, + "outputs": [], + "source": [ + "# %pip install flaml[autogen]" + ] + }, + { + "attachments": {}, + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Set your API Endpoint\n", + "\n", + "The [`config_list_openai_aoai`](https://microsoft.github.io/FLAML/docs/reference/autogen/oai/openai_utils#config_list_openai_aoai) function tries to create a list of configurations using Azure OpenAI endpoints and OpenAI endpoints. It assumes the api keys and api bases are stored in the corresponding environment variables or local txt files:\n", + "\n", + "- OpenAI API key: os.environ[\"OPENAI_API_KEY\"] or `openai_api_key_file=\"key_openai.txt\"`.\n", + "- Azure OpenAI API key: os.environ[\"AZURE_OPENAI_API_KEY\"] or `aoai_api_key_file=\"key_aoai.txt\"`. Multiple keys can be stored, one per line.\n", + "- Azure OpenAI API base: os.environ[\"AZURE_OPENAI_API_BASE\"] or `aoai_api_base_file=\"base_aoai.txt\"`. Multiple bases can be stored, one per line.\n", + "\n", + "It's OK to have only the OpenAI API key, or only the Azure OpenAI API key + base.\n", + "\n", + "The following code excludes openai endpoints from the config list.\n", + "Change to `exclude=\"aoai\"` to exclude Azure OpenAI, or remove the `exclude` argument to include both.\n" + ] + }, + { + "cell_type": "code", + "execution_count": 2, + "metadata": {}, + "outputs": [], + "source": [ + "from flaml import oai\n", + "\n", + "config_list = oai.config_list_openai_aoai(exclude=\"openai\")" + ] + }, + { + "attachments": {}, + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Construct Agents\n", + "\n", + "We construct the assistant agent and the user proxy agent. We specify `human_input_mode` as \"TERMINATE\" in the user proxy agent, which will ask for feedback when it receives a \"TERMINATE\" signal from the assistant agent." + ] + }, + { + "cell_type": "code", + "execution_count": 3, + "metadata": {}, + "outputs": [], + "source": [ + "from flaml.autogen.agent import AssistantAgent, UserProxyAgent\n", + "\n", + "# create an AssistantAgent instance named \"assistant\"\n", + "assistant = AssistantAgent(\n", + " name=\"assistant\",\n", + " request_timeout=600,\n", + " seed=42,\n", + " config_list=config_list,\n", + " model=\"gpt-4-32k\", # make sure the endpoint you use supports the model\n", + ")\n", + "# create a UserProxyAgent instance named \"user\"\n", + "user = UserProxyAgent(\n", + " name=\"user\",\n", + " human_input_mode=\"TERMINATE\",\n", + " max_consecutive_auto_reply=10,\n", + " is_termination_msg=lambda x: x.rstrip().endswith(\"TERMINATE\"),\n", + " work_dir='web',\n", + ")" + ] + }, + { + "attachments": {}, + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Perform a task\n", + "\n", + "We invoke the `receive()` method of the coding agent to start the conversation. When you run the cell below, you will be prompted to provide feedback after receving a message from the coding agent. If you don't provide any feedback (by pressing Enter directly), the user proxy agent will try to execute the code suggested by the coding agent on behalf of you, or terminate if the coding agent sends a \"TERMINATE\" signal in the end of the message." + ] + }, + { + "cell_type": "code", + "execution_count": 4, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "\n", + " -------------------------------------------------------------------------------- \n", + "\n", + "user (to assistant):\n", + "\n", + "Who should read this paper: https://arxiv.org/abs/2306.01337\n", + "\n", + "\n", + " -------------------------------------------------------------------------------- \n", + "\n", + "assistant (to user):\n", + "To determine who should read the paper, I will fetch and analyze the abstract of the paper.\n", + "\n", + "```python\n", + "import requests\n", + "from bs4 import BeautifulSoup\n", + "\n", + "def get_arxiv_abstract(url):\n", + " response = requests.get(url)\n", + " soup = BeautifulSoup(response.text, 'html.parser')\n", + " abstract = soup.find('blockquote', {'class': 'abstract'}).text.strip()\n", + " return abstract.replace(\"Abstract: \", \"\")\n", + "\n", + "url = \"https://arxiv.org/abs/2306.01337\"\n", + "abstract = get_arxiv_abstract(url)\n", + "print(abstract)\n", + "```\n", + "\n", + "Please run this Python code to fetch and display the abstract of the paper. Based on the abstract, we can figure out who should read the paper.\n", + "\n", + ">>>>>>>> NO HUMAN INPUT RECEIVED. USING AUTO REPLY FOR THE USER...\n", + "\n", + " -------------------------------------------------------------------------------- \n", + "\n", + "user (to assistant):\n", + "exitcode: 0 (execution succeeded)\n", + "Code output: \n", + " Employing Large Language Models (LLMs) to address mathematical problems is an\n", + "intriguing research endeavor, considering the abundance of math problems\n", + "expressed in natural language across numerous science and engineering fields.\n", + "While several prior works have investigated solving elementary mathematics\n", + "using LLMs, this work explores the frontier of using GPT-4 for solving more\n", + "complex and challenging math problems. We evaluate various ways of using GPT-4.\n", + "Some of them are adapted from existing work, and one is \\MathChat, a\n", + "conversational problem-solving framework newly proposed in this work. We\n", + "perform the evaluation on difficult high school competition problems from the\n", + "MATH dataset, which shows the advantage of the proposed conversational\n", + "approach.\n", + "\n", + "\n", + " -------------------------------------------------------------------------------- \n", + "\n", + "assistant (to user):\n", + "Based on the abstract, the following people may be interested in reading the paper:\n", + "\n", + "1. Researchers and practitioners working on large language models (LLMs)\n", + "2. Artificial intelligence (AI) and natural language processing (NLP) researchers exploring the application of LLMs in solving mathematical problems\n", + "3. Educators, mathematicians, and researchers studying advanced mathematical problem-solving techniques\n", + "4. Individuals working on conversational AI for math tutoring or educational purposes\n", + "5. Anyone interested in the development and improvement of models like GPT-4 for complex problem-solving\n", + "\n", + "If you belong to any of these categories or have an interest in these topics, you should consider reading the paper.\n", + "\n", + "TERMINATE\n" + ] + } + ], + "source": [ + "# the assistant receives a message from the user, which contains the task description\n", + "assistant.receive(\n", + " \"\"\"\n", + "Who should read this paper: https://arxiv.org/abs/2306.01337\n", + "\"\"\",\n", + " user\n", + ")" + ] + } + ], + "metadata": { + "kernelspec": { + "display_name": "Python 3", + "language": "python", + "name": "python3" + }, + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3", + "version": "3.9.15" + }, + "vscode": { + "interpreter": { + "hash": "949777d72b0d2535278d3dc13498b2535136f6dfe0678499012e853ee9abcab1" + } + }, + "widgets": { + "application/vnd.jupyter.widget-state+json": { + "state": { + "2d910cfd2d2a4fc49fc30fbbdc5576a7": { + "model_module": "@jupyter-widgets/base", + "model_module_version": "2.0.0", + "model_name": "LayoutModel", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "2.0.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "2.0.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border_bottom": null, + "border_left": null, + "border_right": null, + "border_top": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": null + } + }, + "454146d0f7224f038689031002906e6f": { + "model_module": "@jupyter-widgets/controls", + "model_module_version": "2.0.0", + "model_name": "HBoxModel", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "2.0.0", + "_model_name": "HBoxModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "2.0.0", + "_view_name": "HBoxView", + "box_style": "", + "children": [ + "IPY_MODEL_e4ae2b6f5a974fd4bafb6abb9d12ff26", + "IPY_MODEL_577e1e3cc4db4942b0883577b3b52755", + "IPY_MODEL_b40bdfb1ac1d4cffb7cefcb870c64d45" + ], + "layout": "IPY_MODEL_dc83c7bff2f241309537a8119dfc7555", + "tabbable": null, + "tooltip": null + } + }, + "577e1e3cc4db4942b0883577b3b52755": { + "model_module": "@jupyter-widgets/controls", + "model_module_version": "2.0.0", + "model_name": "FloatProgressModel", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "2.0.0", + "_model_name": "FloatProgressModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "2.0.0", + "_view_name": "ProgressView", + "bar_style": "success", + "description": "", + "description_allow_html": false, + "layout": "IPY_MODEL_2d910cfd2d2a4fc49fc30fbbdc5576a7", + "max": 1, + "min": 0, + "orientation": "horizontal", + "style": "IPY_MODEL_74a6ba0c3cbc4051be0a83e152fe1e62", + "tabbable": null, + "tooltip": null, + "value": 1 + } + }, + "6086462a12d54bafa59d3c4566f06cb2": { + "model_module": "@jupyter-widgets/base", + "model_module_version": "2.0.0", + "model_name": "LayoutModel", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "2.0.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "2.0.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border_bottom": null, + "border_left": null, + "border_right": null, + "border_top": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": null + } + }, + "74a6ba0c3cbc4051be0a83e152fe1e62": { + "model_module": "@jupyter-widgets/controls", + "model_module_version": "2.0.0", + "model_name": "ProgressStyleModel", + "state": { + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "2.0.0", + "_model_name": "ProgressStyleModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "2.0.0", + "_view_name": "StyleView", + "bar_color": null, + "description_width": "" + } + }, + "7d3f3d9e15894d05a4d188ff4f466554": { + "model_module": "@jupyter-widgets/controls", + "model_module_version": "2.0.0", + "model_name": "HTMLStyleModel", + "state": { + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "2.0.0", + "_model_name": "HTMLStyleModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "2.0.0", + "_view_name": "StyleView", + "background": null, + "description_width": "", + "font_size": null, + "text_color": null + } + }, + "b40bdfb1ac1d4cffb7cefcb870c64d45": { + "model_module": "@jupyter-widgets/controls", + "model_module_version": "2.0.0", + "model_name": "HTMLModel", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "2.0.0", + "_model_name": "HTMLModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "2.0.0", + "_view_name": "HTMLView", + "description": "", + "description_allow_html": false, + "layout": "IPY_MODEL_f1355871cc6f4dd4b50d9df5af20e5c8", + "placeholder": "​", + "style": "IPY_MODEL_ca245376fd9f4354af6b2befe4af4466", + "tabbable": null, + "tooltip": null, + "value": " 1/1 [00:00<00:00, 44.69it/s]" + } + }, + "ca245376fd9f4354af6b2befe4af4466": { + "model_module": "@jupyter-widgets/controls", + "model_module_version": "2.0.0", + "model_name": "HTMLStyleModel", + "state": { + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "2.0.0", + "_model_name": "HTMLStyleModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "2.0.0", + "_view_name": "StyleView", + "background": null, + "description_width": "", + "font_size": null, + "text_color": null + } + }, + "dc83c7bff2f241309537a8119dfc7555": { + "model_module": "@jupyter-widgets/base", + "model_module_version": "2.0.0", + "model_name": "LayoutModel", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "2.0.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "2.0.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border_bottom": null, + "border_left": null, + "border_right": null, + "border_top": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": null + } + }, + "e4ae2b6f5a974fd4bafb6abb9d12ff26": { + "model_module": "@jupyter-widgets/controls", + "model_module_version": "2.0.0", + "model_name": "HTMLModel", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "2.0.0", + "_model_name": "HTMLModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "2.0.0", + "_view_name": "HTMLView", + "description": "", + "description_allow_html": false, + "layout": "IPY_MODEL_6086462a12d54bafa59d3c4566f06cb2", + "placeholder": "​", + "style": "IPY_MODEL_7d3f3d9e15894d05a4d188ff4f466554", + "tabbable": null, + "tooltip": null, + "value": "100%" + } + }, + "f1355871cc6f4dd4b50d9df5af20e5c8": { + "model_module": "@jupyter-widgets/base", + "model_module_version": "2.0.0", + "model_name": "LayoutModel", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "2.0.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "2.0.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border_bottom": null, + "border_left": null, + "border_right": null, + "border_top": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": null + } + } + }, + "version_major": 2, + "version_minor": 0 + } + } + }, + "nbformat": 4, + "nbformat_minor": 2 +} diff --git a/notebook/autogen_chatgpt_gpt4.ipynb b/notebook/autogen_chatgpt_gpt4.ipynb index a8ca9917a6..9a080406ef 100644 --- a/notebook/autogen_chatgpt_gpt4.ipynb +++ b/notebook/autogen_chatgpt_gpt4.ipynb @@ -25,7 +25,9 @@ "\n", "FLAML offers a cost-effective hyperparameter optimization technique [EcoOptiGen](https://arxiv.org/abs/2303.04673) for tuning Large Language Models. Our study finds that tuning hyperparameters can significantly improve the utility of LLMs.\n", "\n", - "In this notebook, we tune OpenAI ChatGPT (both GPT-3.5 and GPT-4) models for math problem solving. We use [the MATH benchmark](https://crfm.stanford.edu/helm/latest/?group=math_chain_of_thought) for measuring mathematical problem solving on competition math problems with chain-of-thoughts style reasoning. \n", + "In this notebook, we tune OpenAI ChatGPT (both GPT-3.5 and GPT-4) models for math problem solving. We use [the MATH benchmark](https://crfm.stanford.edu/helm/latest/?group=math_chain_of_thought) for measuring mathematical problem solving on competition math problems with chain-of-thoughts style reasoning.\n", + "\n", + "Related link: [Blogpost](https://microsoft.github.io/FLAML/blog/2023/04/21/LLM-tuning-math) based on this experiment.\n", "\n", "## Requirements\n", "\n", @@ -93,7 +95,7 @@ "- Azure OpenAI API key: os.environ[\"AZURE_OPENAI_API_KEY\"] or `aoai_api_key_file=\"key_aoai.txt\"`. Multiple keys can be stored, one per line.\n", "- Azure OpenAI API base: os.environ[\"AZURE_OPENAI_API_BASE\"] or `aoai_api_base_file=\"base_aoai.txt\"`. Multiple bases can be stored, one per line.\n", "\n", - "It's OK to have only the OpenAI API key, or only the Azure Open API key + base.\n" + "It's OK to have only the OpenAI API key, or only the Azure OpenAI API key + base.\n" ] }, { diff --git a/notebook/autogen_openai_completion.ipynb b/notebook/autogen_openai_completion.ipynb index 780c17a014..88388336b0 100644 --- a/notebook/autogen_openai_completion.ipynb +++ b/notebook/autogen_openai_completion.ipynb @@ -5,7 +5,7 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "\"Open" + "\"Open" ] }, { diff --git a/notebook/automl_flight_delays.ipynb b/notebook/automl_flight_delays.ipynb index 05b5222d06..2edd20abb0 100644 --- a/notebook/automl_flight_delays.ipynb +++ b/notebook/automl_flight_delays.ipynb @@ -37,10 +37,7 @@ "\n", "In this notebook, we use one real data example (binary classification) to showcase how to use FLAML library.\n", "\n", - "FLAML requires `Python>=3.7`. To run this notebook example, please install flaml with the `notebook` option:\n", - "```bash\n", - "pip install flaml[notebook]==1.1.3\n", - "```" + "FLAML requires `Python>=3.7`. To run this notebook example, please install the following packages." ] }, { @@ -420,6 +417,7 @@ ] }, { + "attachments": {}, "cell_type": "markdown", "metadata": { "slideshow": { @@ -617,6 +615,7 @@ ] }, { + "attachments": {}, "cell_type": "markdown", "metadata": { "slideshow": { @@ -1047,6 +1046,7 @@ ] }, { + "attachments": {}, "cell_type": "markdown", "metadata": { "slideshow": { @@ -1323,6 +1323,7 @@ ] }, { + "attachments": {}, "cell_type": "markdown", "metadata": { "slideshow": { @@ -1450,6 +1451,7 @@ ] }, { + "attachments": {}, "cell_type": "markdown", "metadata": {}, "source": [ @@ -1457,6 +1459,7 @@ ] }, { + "attachments": {}, "cell_type": "markdown", "metadata": {}, "source": [ @@ -1574,6 +1577,7 @@ ] }, { + "attachments": {}, "cell_type": "markdown", "metadata": {}, "source": [ @@ -1768,6 +1772,7 @@ ] }, { + "attachments": {}, "cell_type": "markdown", "metadata": { "slideshow": { @@ -1779,6 +1784,7 @@ ] }, { + "attachments": {}, "cell_type": "markdown", "metadata": { "slideshow": { @@ -1792,6 +1798,7 @@ ] }, { + "attachments": {}, "cell_type": "markdown", "metadata": { "slideshow": { @@ -1964,6 +1971,7 @@ ] }, { + "attachments": {}, "cell_type": "markdown", "metadata": { "slideshow": { @@ -2184,6 +2192,7 @@ ] }, { + "attachments": {}, "cell_type": "markdown", "metadata": {}, "source": [ @@ -2244,6 +2253,7 @@ ] }, { + "attachments": {}, "cell_type": "markdown", "metadata": {}, "source": [ diff --git a/notebook/automl_synapseML.ipynb b/notebook/automl_synapseML.ipynb deleted file mode 100644 index f6b550c45f..0000000000 --- a/notebook/automl_synapseML.ipynb +++ /dev/null @@ -1,831 +0,0 @@ -{ - "cells": [ - { - "attachments": {}, - "cell_type": "markdown", - "metadata": {}, - "source": [ - "# AutoML with FLAML Library for synapseML models and spark dataframes\n", - "\n", - "\n", - "## 1. Introduction\n", - "\n", - "FLAML is a Python library (https://github.com/microsoft/FLAML) designed to automatically produce accurate machine learning models \n", - "with low computational cost. It is fast and economical. The simple and lightweight design makes it easy \n", - "to use and extend, such as adding new learners. FLAML can \n", - "- serve as an economical AutoML engine,\n", - "- be used as a fast hyperparameter tuning tool, or \n", - "- be embedded in self-tuning software that requires low latency & resource in repetitive\n", - " tuning tasks.\n", - "\n", - "In this notebook, we demonstrate how to use FLAML library to do AutoML for synapseML models and spark dataframes. We also compare the results between FLAML AutoML and default SynapseML. \n", - "In this example, we use LightGBM to build a classification model in order to predict bankruptcy.\n", - "\n", - "Since the dataset is unbalanced, `AUC` is a better metric than `Accuracy`. FLAML (1 min of training) achieved AUC **0.79**, the default SynapseML model only got AUC **0.64**. \n", - "\n", - "FLAML requires `Python>=3.7`. To run this notebook example, please install flaml with the `synapse` option:\n", - "```bash\n", - "pip install flaml[synapse] \n", - "```\n", - " " - ] - }, - { - "cell_type": "code", - "execution_count": 1, - "metadata": {}, - "outputs": [], - "source": [ - "# %pip install \"flaml[synapse]\"" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## 2. Load data and preprocess" - ] - }, - { - "cell_type": "code", - "execution_count": 2, - "metadata": {}, - "outputs": [ - { - "name": "stdout", - "output_type": "stream", - "text": [ - ":: loading settings :: url = jar:file:/datadrive/spark/spark33/jars/ivy-2.5.0.jar!/org/apache/ivy/core/settings/ivysettings.xml\n" - ] - }, - { - "name": "stderr", - "output_type": "stream", - "text": [ - "Ivy Default Cache set to: /home/lijiang1/.ivy2/cache\n", - "The jars for the packages stored in: /home/lijiang1/.ivy2/jars\n", - "com.microsoft.azure#synapseml_2.12 added as a dependency\n", - "org.apache.hadoop#hadoop-azure added as a dependency\n", - "com.microsoft.azure#azure-storage added as a dependency\n", - ":: resolving dependencies :: org.apache.spark#spark-submit-parent-bfb2447b-61c5-4941-bf9b-0548472077eb;1.0\n", - "\tconfs: [default]\n", - "\tfound com.microsoft.azure#synapseml_2.12;0.10.2 in central\n", - "\tfound com.microsoft.azure#synapseml-core_2.12;0.10.2 in central\n", - "\tfound org.scalactic#scalactic_2.12;3.2.14 in local-m2-cache\n", - "\tfound org.scala-lang#scala-reflect;2.12.15 in central\n", - "\tfound io.spray#spray-json_2.12;1.3.5 in central\n", - "\tfound com.jcraft#jsch;0.1.54 in central\n", - "\tfound org.apache.httpcomponents.client5#httpclient5;5.1.3 in central\n", - "\tfound org.apache.httpcomponents.core5#httpcore5;5.1.3 in central\n", - "\tfound org.apache.httpcomponents.core5#httpcore5-h2;5.1.3 in central\n", - "\tfound org.slf4j#slf4j-api;1.7.25 in local-m2-cache\n", - "\tfound commons-codec#commons-codec;1.15 in local-m2-cache\n", - "\tfound org.apache.httpcomponents#httpmime;4.5.13 in local-m2-cache\n", - "\tfound org.apache.httpcomponents#httpclient;4.5.13 in local-m2-cache\n", - "\tfound org.apache.httpcomponents#httpcore;4.4.13 in central\n", - "\tfound commons-logging#commons-logging;1.2 in central\n", - "\tfound com.linkedin.isolation-forest#isolation-forest_3.2.0_2.12;2.0.8 in central\n", - "\tfound com.chuusai#shapeless_2.12;2.3.2 in central\n", - "\tfound org.typelevel#macro-compat_2.12;1.1.1 in central\n", - "\tfound org.apache.spark#spark-avro_2.12;3.2.0 in central\n", - "\tfound org.tukaani#xz;1.8 in central\n", - "\tfound org.spark-project.spark#unused;1.0.0 in central\n", - "\tfound org.testng#testng;6.8.8 in central\n", - "\tfound org.beanshell#bsh;2.0b4 in central\n", - "\tfound com.beust#jcommander;1.27 in central\n", - "\tfound com.microsoft.azure#synapseml-deep-learning_2.12;0.10.2 in central\n", - "\tfound com.microsoft.azure#synapseml-opencv_2.12;0.10.2 in central\n", - "\tfound org.openpnp#opencv;3.2.0-1 in central\n", - "\tfound com.microsoft.azure#onnx-protobuf_2.12;0.9.1 in central\n", - "\tfound com.microsoft.cntk#cntk;2.4 in central\n", - "\tfound com.microsoft.onnxruntime#onnxruntime_gpu;1.8.1 in central\n", - "\tfound com.microsoft.azure#synapseml-cognitive_2.12;0.10.2 in central\n", - "\tfound com.microsoft.cognitiveservices.speech#client-jar-sdk;1.14.0 in central\n", - "\tfound com.microsoft.azure#synapseml-vw_2.12;0.10.2 in central\n", - "\tfound com.github.vowpalwabbit#vw-jni;8.9.1 in central\n", - "\tfound com.microsoft.azure#synapseml-lightgbm_2.12;0.10.2 in central\n", - "\tfound com.microsoft.ml.lightgbm#lightgbmlib;3.2.110 in central\n", - "\tfound org.apache.hadoop#hadoop-azure;3.3.1 in central\n", - "\tfound org.apache.hadoop.thirdparty#hadoop-shaded-guava;1.1.1 in local-m2-cache\n", - "\tfound org.eclipse.jetty#jetty-util-ajax;9.4.40.v20210413 in central\n", - "\tfound org.eclipse.jetty#jetty-util;9.4.40.v20210413 in central\n", - "\tfound org.codehaus.jackson#jackson-mapper-asl;1.9.13 in local-m2-cache\n", - "\tfound org.codehaus.jackson#jackson-core-asl;1.9.13 in local-m2-cache\n", - "\tfound org.wildfly.openssl#wildfly-openssl;1.0.7.Final in local-m2-cache\n", - "\tfound com.microsoft.azure#azure-storage;8.6.6 in central\n", - "\tfound com.fasterxml.jackson.core#jackson-core;2.9.4 in central\n", - "\tfound org.apache.commons#commons-lang3;3.4 in local-m2-cache\n", - "\tfound com.microsoft.azure#azure-keyvault-core;1.2.4 in central\n", - "\tfound com.google.guava#guava;24.1.1-jre in central\n", - "\tfound com.google.code.findbugs#jsr305;1.3.9 in central\n", - "\tfound org.checkerframework#checker-compat-qual;2.0.0 in central\n", - "\tfound com.google.errorprone#error_prone_annotations;2.1.3 in central\n", - "\tfound com.google.j2objc#j2objc-annotations;1.1 in central\n", - "\tfound org.codehaus.mojo#animal-sniffer-annotations;1.14 in central\n", - ":: resolution report :: resolve 992ms :: artifacts dl 77ms\n", - "\t:: modules in use:\n", - "\tcom.beust#jcommander;1.27 from central in [default]\n", - "\tcom.chuusai#shapeless_2.12;2.3.2 from central in [default]\n", - "\tcom.fasterxml.jackson.core#jackson-core;2.9.4 from central in [default]\n", - "\tcom.github.vowpalwabbit#vw-jni;8.9.1 from central in [default]\n", - "\tcom.google.code.findbugs#jsr305;1.3.9 from central in [default]\n", - "\tcom.google.errorprone#error_prone_annotations;2.1.3 from central in [default]\n", - "\tcom.google.guava#guava;24.1.1-jre from central in [default]\n", - "\tcom.google.j2objc#j2objc-annotations;1.1 from central in [default]\n", - "\tcom.jcraft#jsch;0.1.54 from central in [default]\n", - "\tcom.linkedin.isolation-forest#isolation-forest_3.2.0_2.12;2.0.8 from central in [default]\n", - "\tcom.microsoft.azure#azure-keyvault-core;1.2.4 from central in [default]\n", - "\tcom.microsoft.azure#azure-storage;8.6.6 from central in [default]\n", - "\tcom.microsoft.azure#onnx-protobuf_2.12;0.9.1 from central in [default]\n", - "\tcom.microsoft.azure#synapseml-cognitive_2.12;0.10.2 from central in [default]\n", - "\tcom.microsoft.azure#synapseml-core_2.12;0.10.2 from central in [default]\n", - "\tcom.microsoft.azure#synapseml-deep-learning_2.12;0.10.2 from central in [default]\n", - "\tcom.microsoft.azure#synapseml-lightgbm_2.12;0.10.2 from central in [default]\n", - "\tcom.microsoft.azure#synapseml-opencv_2.12;0.10.2 from central in [default]\n", - "\tcom.microsoft.azure#synapseml-vw_2.12;0.10.2 from central in [default]\n", - "\tcom.microsoft.azure#synapseml_2.12;0.10.2 from central in [default]\n", - "\tcom.microsoft.cntk#cntk;2.4 from central in [default]\n", - "\tcom.microsoft.cognitiveservices.speech#client-jar-sdk;1.14.0 from central in [default]\n", - "\tcom.microsoft.ml.lightgbm#lightgbmlib;3.2.110 from central in [default]\n", - "\tcom.microsoft.onnxruntime#onnxruntime_gpu;1.8.1 from central in [default]\n", - "\tcommons-codec#commons-codec;1.15 from local-m2-cache in [default]\n", - "\tcommons-logging#commons-logging;1.2 from central in [default]\n", - "\tio.spray#spray-json_2.12;1.3.5 from central in [default]\n", - "\torg.apache.commons#commons-lang3;3.4 from local-m2-cache in [default]\n", - "\torg.apache.hadoop#hadoop-azure;3.3.1 from central in [default]\n", - "\torg.apache.hadoop.thirdparty#hadoop-shaded-guava;1.1.1 from local-m2-cache in [default]\n", - "\torg.apache.httpcomponents#httpclient;4.5.13 from local-m2-cache in [default]\n", - "\torg.apache.httpcomponents#httpcore;4.4.13 from central in [default]\n", - "\torg.apache.httpcomponents#httpmime;4.5.13 from local-m2-cache in [default]\n", - "\torg.apache.httpcomponents.client5#httpclient5;5.1.3 from central in [default]\n", - "\torg.apache.httpcomponents.core5#httpcore5;5.1.3 from central in [default]\n", - "\torg.apache.httpcomponents.core5#httpcore5-h2;5.1.3 from central in [default]\n", - "\torg.apache.spark#spark-avro_2.12;3.2.0 from central in [default]\n", - "\torg.beanshell#bsh;2.0b4 from central in [default]\n", - "\torg.checkerframework#checker-compat-qual;2.0.0 from central in [default]\n", - "\torg.codehaus.jackson#jackson-core-asl;1.9.13 from local-m2-cache in [default]\n", - "\torg.codehaus.jackson#jackson-mapper-asl;1.9.13 from local-m2-cache in [default]\n", - "\torg.codehaus.mojo#animal-sniffer-annotations;1.14 from central in [default]\n", - "\torg.eclipse.jetty#jetty-util;9.4.40.v20210413 from central in [default]\n", - "\torg.eclipse.jetty#jetty-util-ajax;9.4.40.v20210413 from central in [default]\n", - "\torg.openpnp#opencv;3.2.0-1 from central in [default]\n", - "\torg.scala-lang#scala-reflect;2.12.15 from central in [default]\n", - "\torg.scalactic#scalactic_2.12;3.2.14 from local-m2-cache in [default]\n", - "\torg.slf4j#slf4j-api;1.7.25 from local-m2-cache in [default]\n", - "\torg.spark-project.spark#unused;1.0.0 from central in [default]\n", - "\torg.testng#testng;6.8.8 from central in [default]\n", - "\torg.tukaani#xz;1.8 from central in [default]\n", - "\torg.typelevel#macro-compat_2.12;1.1.1 from central in [default]\n", - "\torg.wildfly.openssl#wildfly-openssl;1.0.7.Final from local-m2-cache in [default]\n", - "\t:: evicted modules:\n", - "\tcommons-codec#commons-codec;1.11 by [commons-codec#commons-codec;1.15] in [default]\n", - "\tcom.microsoft.azure#azure-storage;7.0.1 by [com.microsoft.azure#azure-storage;8.6.6] in [default]\n", - "\torg.slf4j#slf4j-api;1.7.12 by [org.slf4j#slf4j-api;1.7.25] in [default]\n", - "\torg.apache.commons#commons-lang3;3.8.1 by [org.apache.commons#commons-lang3;3.4] in [default]\n", - "\t---------------------------------------------------------------------\n", - "\t| | modules || artifacts |\n", - "\t| conf | number| search|dwnlded|evicted|| number|dwnlded|\n", - "\t---------------------------------------------------------------------\n", - "\t| default | 57 | 0 | 0 | 4 || 53 | 0 |\n", - "\t---------------------------------------------------------------------\n", - ":: retrieving :: org.apache.spark#spark-submit-parent-bfb2447b-61c5-4941-bf9b-0548472077eb\n", - "\tconfs: [default]\n", - "\t0 artifacts copied, 53 already retrieved (0kB/20ms)\n" - ] - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "23/02/28 02:12:16 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable\n" - ] - }, - { - "name": "stderr", - "output_type": "stream", - "text": [ - "Setting default log level to \"WARN\".\n", - "To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use setLogLevel(newLevel).\n" - ] - } - ], - "source": [ - "import pyspark\n", - "\n", - "spark = (\n", - " pyspark.sql.SparkSession.builder.appName(\"MyApp\")\n", - " .config(\n", - " \"spark.jars.packages\",\n", - " f\"com.microsoft.azure:synapseml_2.12:0.10.2,org.apache.hadoop:hadoop-azure:{pyspark.__version__},com.microsoft.azure:azure-storage:8.6.6\",\n", - " )\n", - " .config(\"spark.sql.debug.maxToStringFields\", \"100\")\n", - " .getOrCreate()\n", - ")" - ] - }, - { - "cell_type": "code", - "execution_count": 3, - "metadata": {}, - "outputs": [ - { - "name": "stdout", - "output_type": "stream", - "text": [ - "23/02/28 02:12:32 WARN MetricsConfig: Cannot locate configuration: tried hadoop-metrics2-azure-file-system.properties,hadoop-metrics2.properties\n", - "records read: 6819\n", - "Schema: \n", - "root\n", - " |-- Bankrupt?: integer (nullable = true)\n", - " |-- ROA(C) before interest and depreciation before interest: double (nullable = true)\n", - " |-- ROA(A) before interest and % after tax: double (nullable = true)\n", - " |-- ROA(B) before interest and depreciation after tax: double (nullable = true)\n", - " |-- Operating Gross Margin: double (nullable = true)\n", - " |-- Realized Sales Gross Margin: double (nullable = true)\n", - " |-- Operating Profit Rate: double (nullable = true)\n", - " |-- Pre-tax net Interest Rate: double (nullable = true)\n", - " |-- After-tax net Interest Rate: double (nullable = true)\n", - " |-- Non-industry income and expenditure/revenue: double (nullable = true)\n", - " |-- Continuous interest rate (after tax): double (nullable = true)\n", - " |-- Operating Expense Rate: double (nullable = true)\n", - " |-- Research and development expense rate: double (nullable = true)\n", - " |-- Cash flow rate: double (nullable = true)\n", - " |-- Interest-bearing debt interest rate: double (nullable = true)\n", - " |-- Tax rate (A): double (nullable = true)\n", - " |-- Net Value Per Share (B): double (nullable = true)\n", - " |-- Net Value Per Share (A): double (nullable = true)\n", - " |-- Net Value Per Share (C): double (nullable = true)\n", - " |-- Persistent EPS in the Last Four Seasons: double (nullable = true)\n", - " |-- Cash Flow Per Share: double (nullable = true)\n", - " |-- Revenue Per Share (Yuan ??): double (nullable = true)\n", - " |-- Operating Profit Per Share (Yuan ??): double (nullable = true)\n", - " |-- Per Share Net profit before tax (Yuan ??): double (nullable = true)\n", - " |-- Realized Sales Gross Profit Growth Rate: double (nullable = true)\n", - " |-- Operating Profit Growth Rate: double (nullable = true)\n", - " |-- After-tax Net Profit Growth Rate: double (nullable = true)\n", - " |-- Regular Net Profit Growth Rate: double (nullable = true)\n", - " |-- Continuous Net Profit Growth Rate: double (nullable = true)\n", - " |-- Total Asset Growth Rate: double (nullable = true)\n", - " |-- Net Value Growth Rate: double (nullable = true)\n", - " |-- Total Asset Return Growth Rate Ratio: double (nullable = true)\n", - " |-- Cash Reinvestment %: double (nullable = true)\n", - " |-- Current Ratio: double (nullable = true)\n", - " |-- Quick Ratio: double (nullable = true)\n", - " |-- Interest Expense Ratio: double (nullable = true)\n", - " |-- Total debt/Total net worth: double (nullable = true)\n", - " |-- Debt ratio %: double (nullable = true)\n", - " |-- Net worth/Assets: double (nullable = true)\n", - " |-- Long-term fund suitability ratio (A): double (nullable = true)\n", - " |-- Borrowing dependency: double (nullable = true)\n", - " |-- Contingent liabilities/Net worth: double (nullable = true)\n", - " |-- Operating profit/Paid-in capital: double (nullable = true)\n", - " |-- Net profit before tax/Paid-in capital: double (nullable = true)\n", - " |-- Inventory and accounts receivable/Net value: double (nullable = true)\n", - " |-- Total Asset Turnover: double (nullable = true)\n", - " |-- Accounts Receivable Turnover: double (nullable = true)\n", - " |-- Average Collection Days: double (nullable = true)\n", - " |-- Inventory Turnover Rate (times): double (nullable = true)\n", - " |-- Fixed Assets Turnover Frequency: double (nullable = true)\n", - " |-- Net Worth Turnover Rate (times): double (nullable = true)\n", - " |-- Revenue per person: double (nullable = true)\n", - " |-- Operating profit per person: double (nullable = true)\n", - " |-- Allocation rate per person: double (nullable = true)\n", - " |-- Working Capital to Total Assets: double (nullable = true)\n", - " |-- Quick Assets/Total Assets: double (nullable = true)\n", - " |-- Current Assets/Total Assets: double (nullable = true)\n", - " |-- Cash/Total Assets: double (nullable = true)\n", - " |-- Quick Assets/Current Liability: double (nullable = true)\n", - " |-- Cash/Current Liability: double (nullable = true)\n", - " |-- Current Liability to Assets: double (nullable = true)\n", - " |-- Operating Funds to Liability: double (nullable = true)\n", - " |-- Inventory/Working Capital: double (nullable = true)\n", - " |-- Inventory/Current Liability: double (nullable = true)\n", - " |-- Current Liabilities/Liability: double (nullable = true)\n", - " |-- Working Capital/Equity: double (nullable = true)\n", - " |-- Current Liabilities/Equity: double (nullable = true)\n", - " |-- Long-term Liability to Current Assets: double (nullable = true)\n", - " |-- Retained Earnings to Total Assets: double (nullable = true)\n", - " |-- Total income/Total expense: double (nullable = true)\n", - " |-- Total expense/Assets: double (nullable = true)\n", - " |-- Current Asset Turnover Rate: double (nullable = true)\n", - " |-- Quick Asset Turnover Rate: double (nullable = true)\n", - " |-- Working capitcal Turnover Rate: double (nullable = true)\n", - " |-- Cash Turnover Rate: double (nullable = true)\n", - " |-- Cash Flow to Sales: double (nullable = true)\n", - " |-- Fixed Assets to Assets: double (nullable = true)\n", - " |-- Current Liability to Liability: double (nullable = true)\n", - " |-- Current Liability to Equity: double (nullable = true)\n", - " |-- Equity to Long-term Liability: double (nullable = true)\n", - " |-- Cash Flow to Total Assets: double (nullable = true)\n", - " |-- Cash Flow to Liability: double (nullable = true)\n", - " |-- CFO to Assets: double (nullable = true)\n", - " |-- Cash Flow to Equity: double (nullable = true)\n", - " |-- Current Liability to Current Assets: double (nullable = true)\n", - " |-- Liability-Assets Flag: double (nullable = true)\n", - " |-- Net Income to Total Assets: double (nullable = true)\n", - " |-- Total assets to GNP price: double (nullable = true)\n", - " |-- No-credit Interval: double (nullable = true)\n", - " |-- Gross Profit to Sales: double (nullable = true)\n", - " |-- Net Income to Stockholder's Equity: double (nullable = true)\n", - " |-- Liability to Equity: double (nullable = true)\n", - " |-- Degree of Financial Leverage (DFL): double (nullable = true)\n", - " |-- Interest Coverage Ratio (Interest expense to EBIT): double (nullable = true)\n", - " |-- Net Income Flag: double (nullable = true)\n", - " |-- Equity to Liability: double (nullable = true)\n", - "\n" - ] - } - ], - "source": [ - "df = (\n", - " spark.read.format(\"csv\")\n", - " .option(\"header\", True)\n", - " .option(\"inferSchema\", True)\n", - " .load(\n", - " \"wasbs://publicwasb@mmlspark.blob.core.windows.net/company_bankruptcy_prediction_data.csv\"\n", - " )\n", - ")\n", - "# print dataset size\n", - "print(\"records read: \" + str(df.count()))\n", - "print(\"Schema: \")\n", - "df.printSchema()" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "Split the dataset into train and test" - ] - }, - { - "cell_type": "code", - "execution_count": 4, - "metadata": {}, - "outputs": [], - "source": [ - "train, test = df.randomSplit([0.8, 0.2], seed=41)" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "Add featurizer to convert features to vector" - ] - }, - { - "cell_type": "code", - "execution_count": 5, - "metadata": {}, - "outputs": [], - "source": [ - "from pyspark.ml.feature import VectorAssembler\n", - "\n", - "feature_cols = df.columns[1:]\n", - "featurizer = VectorAssembler(inputCols=feature_cols, outputCol=\"features\")\n", - "train_data = featurizer.transform(train)[\"Bankrupt?\", \"features\"]\n", - "test_data = featurizer.transform(test)[\"Bankrupt?\", \"features\"]" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "### Default SynapseML LightGBM" - ] - }, - { - "cell_type": "code", - "execution_count": 6, - "metadata": {}, - "outputs": [ - { - "name": "stdout", - "output_type": "stream", - "text": [ - "23/02/28 02:12:42 WARN package: Truncated the string representation of a plan since it was too large. This behavior can be adjusted by setting 'spark.sql.debug.maxToStringFields'.\n", - "[LightGBM] [Warning] Find whitespaces in feature_names, replace with underlines\n" - ] - }, - { - "name": "stderr", - "output_type": "stream", - "text": [ - " \r" - ] - } - ], - "source": [ - "from synapse.ml.lightgbm import LightGBMClassifier\n", - "\n", - "model = LightGBMClassifier(\n", - " objective=\"binary\", featuresCol=\"features\", labelCol=\"Bankrupt?\", isUnbalance=True\n", - ")\n", - "\n", - "model = model.fit(train_data)" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "#### Model Prediction" - ] - }, - { - "cell_type": "code", - "execution_count": 7, - "metadata": {}, - "outputs": [ - { - "data": { - "text/plain": [ - "DataFrame[evaluation_type: string, confusion_matrix: matrix, accuracy: double, precision: double, recall: double, AUC: double]" - ] - }, - "metadata": {}, - "output_type": "display_data" - }, - { - "name": "stderr", - "output_type": "stream", - "text": [ - "[Stage 27:> (0 + 1) / 1]\r" - ] - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "+---------------+--------------------+-----------------+------------------+-------------------+------------------+\n", - "|evaluation_type| confusion_matrix| accuracy| precision| recall| AUC|\n", - "+---------------+--------------------+-----------------+------------------+-------------------+------------------+\n", - "| Classification|1250.0 23.0 \\n3...|0.958997722095672|0.3611111111111111|0.29545454545454547|0.6386934942512319|\n", - "+---------------+--------------------+-----------------+------------------+-------------------+------------------+\n", - "\n" - ] - }, - { - "name": "stderr", - "output_type": "stream", - "text": [ - " \r" - ] - } - ], - "source": [ - "def predict(model):\n", - " from synapse.ml.train import ComputeModelStatistics\n", - "\n", - " predictions = model.transform(test_data)\n", - " # predictions.limit(10).show()\n", - " \n", - " metrics = ComputeModelStatistics(\n", - " evaluationMetric=\"classification\",\n", - " labelCol=\"Bankrupt?\",\n", - " scoredLabelsCol=\"prediction\",\n", - " ).transform(predictions)\n", - " display(metrics)\n", - " return metrics\n", - "\n", - "default_metrics = predict(model)\n", - "default_metrics.show()" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "### Run FLAML\n", - "In the FLAML automl run configuration, users can specify the task type, time budget, error metric, learner list, whether to subsample, resampling strategy type, and so on. All these arguments have default values which will be used if users do not provide them. " - ] - }, - { - "cell_type": "code", - "execution_count": 8, - "metadata": {}, - "outputs": [], - "source": [ - "''' import AutoML class from flaml package '''\n", - "from flaml import AutoML\n", - "from flaml.automl.spark.utils import to_pandas_on_spark\n", - "\n", - "automl = AutoML()" - ] - }, - { - "cell_type": "code", - "execution_count": 9, - "metadata": {}, - "outputs": [], - "source": [ - "import os\n", - "settings = {\n", - " \"time_budget\": 30, # total running time in seconds\n", - " \"metric\": 'roc_auc',\n", - " \"estimator_list\": ['lgbm_spark'], # list of ML learners; we tune lightgbm in this example\n", - " \"task\": 'classification', # task type\n", - " \"log_file_name\": 'flaml_experiment.log', # flaml log file\n", - " \"seed\": 41, # random seed\n", - " \"force_cancel\": True, # force stop training once time_budget is used up\n", - "}" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "Disable Arrow optimization to omit below warning:\n", - "```\n", - "/opt/spark/python/lib/pyspark.zip/pyspark/sql/pandas/conversion.py:87: UserWarning: toPandas attempted Arrow optimization because 'spark.sql.execution.arrow.pyspark.enabled' is set to true; however, failed by the reason below:\n", - " Unsupported type in conversion to Arrow: VectorUDT\n", - "Attempting non-optimization as 'spark.sql.execution.arrow.pyspark.fallback.enabled' is set to true.\n", - " warnings.warn(msg)\n", - "```" - ] - }, - { - "cell_type": "code", - "execution_count": 10, - "metadata": {}, - "outputs": [], - "source": [ - "spark.conf.set(\"spark.sql.execution.arrow.pyspark.enabled\", \"false\")" - ] - }, - { - "cell_type": "code", - "execution_count": 11, - "metadata": {}, - "outputs": [ - { - "data": { - "text/html": [ - "
\n", - "\n", - "\n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - "
indexBankrupt?features
000[0.0828, 0.0693, 0.0884, 0.6468, 0.6468, 0.997...
110[0.1606, 0.1788, 0.1832, 0.5897, 0.5897, 0.998...
220[0.204, 0.2638, 0.2598, 0.4483, 0.4483, 0.9959...
330[0.217, 0.1881, 0.2451, 0.5992, 0.5992, 0.9962...
440[0.2314, 0.1628, 0.2068, 0.6001, 0.6001, 0.998...
\n", - "
" - ], - "text/plain": [ - " index Bankrupt? features\n", - "0 0 0 [0.0828, 0.0693, 0.0884, 0.6468, 0.6468, 0.9971, 0.7958, 0.8078, 0.3047, 0.78, 0.0027, 0.0029, 0.428, 0.0, 0.0, 0.1273, 0.1273, 0.1273, 0.1872, 0.3127, 0.0038, 0.062, 0.1482, 0.022, 0.8478, 0.6893, 0.6893, 0.2176, 0.0, 0.0002, 0.2628, 0.291, 0.0039, 0.0025, 0.6306, 0.0137, 0.1776, 0.8224, 0.005, 0.3696, 0.0054, 0.062, 0.1473, 0.3986, 0.1109, 0.0003, 0.0182, 7150000000.0, 0.0003, 0.0302, 0.0025, 0.3763, 0.0009, 0.6971, 0.262, 0.3948, 0.0918, 0.0025, 0.0027, 0.1828, 0.242, 0.2766, 0.0039, 0.984, 0.7264, 0.3382, 0.0, 0.0, 0.0021, 1.0, 3650000000.0, 2500000000.0, 0.5939, 3060000000.0, 0.6714, 0.4836, 0.984, 0.3382, 0.1109, 0.0, 0.3666, 0.0, 0.1653, 0.072, 0.0, 0.0, 0.0, 0.6237, 0.6468, 0.7483, 0.2847, 0.0268, 0.5652, 1.0, 0.0199]\n", - "1 1 0 [0.1606, 0.1788, 0.1832, 0.5897, 0.5897, 0.9986, 0.7969, 0.8088, 0.3034, 0.781, 0.0003, 0.0002, 0.4434, 0.0002, 0.0, 0.1341, 0.1341, 0.1341, 0.1637, 0.2935, 0.0215, 0.0575, 0.1295, 0.0222, 0.848, 0.6894, 0.6894, 0.2176, 6700000000.0, 0.0003, 0.2646, 0.1561, 0.0075, 0.0016, 0.6306, 0.0275, 0.2228, 0.7772, 0.0061, 0.3952, 0.0054, 0.0574, 0.1285, 0.4264, 0.2579, 0.0218, 0.0003, 7550000000.0, 0.0029, 0.0569, 0.0184, 0.3689, 0.0009, 0.8013, 0.3721, 0.9357, 0.1842, 0.0028, 0.0042, 0.232, 0.2865, 0.2785, 0.0123, 1.0, 0.7403, 0.3506, 0.0, 0.811, 0.0019, 0.1083, 0.0001, 5310000000.0, 0.5939, 7880000000.0, 0.6715, 0.0499, 1.0, 0.3506, 0.1109, 0.463, 0.4385, 0.1781, 0.2476, 0.0388, 0.0, 0.5917, 4370000000.0, 0.6236, 0.5897, 0.8023, 0.2947, 0.0268, 0.5651, 1.0, 0.0151]\n", - "2 2 0 [0.204, 0.2638, 0.2598, 0.4483, 0.4483, 0.9959, 0.7937, 0.8063, 0.3034, 0.7782, 0.0007, 0.0004, 0.4511, 0.0003, 0.0, 0.1387, 0.1387, 0.1387, 0.1546, 0.263, 0.004, 0.0393, 0.0757, 0.0187, 0.8468, 0.6872, 0.6872, 0.2173, 0.0002, 0.0004, 0.2588, 0.1568, 0.0025, 0.0007, 0.6305, 0.04, 0.2419, 0.7581, 0.0048, 0.4073, 0.0054, 0.0394, 0.1165, 0.4142, 0.0315, 0.0009, 0.0074, 5310000000.0, 3030000000.0, 0.0195, 0.002, 0.3723, 0.0124, 0.6252, 0.1282, 0.3562, 0.0377, 0.0008, 0.0008, 0.2515, 0.3097, 0.2767, 0.0046, 1.0, 0.7042, 0.3617, 0.0, 0.8891, 0.0013, 0.0213, 0.0006, 0.0002, 0.5933, 0.0002, 0.6715, 0.5863, 1.0, 0.3617, 0.1109, 0.635, 0.4584, 0.3252, 0.3106, 0.1097, 0.0, 0.6816, 0.0003, 0.6221, 0.4483, 0.8117, 0.3038, 0.0268, 0.5651, 1.0, 0.0136]\n", - "3 3 0 [0.217, 0.1881, 0.2451, 0.5992, 0.5992, 0.9962, 0.794, 0.8061, 0.3034, 0.7781, 0.0029, 0.0038, 0.4555, 0.0003, 0.0, 0.1277, 0.1277, 0.1277, 0.1387, 0.271, 0.0049, 0.0319, 0.0091, 0.022, 0.848, 0.6893, 0.6893, 0.2176, 9790000000.0, 0.0011, 0.2629, 0.0, 0.004, 0.004, 0.6305, 0.2222, 0.286, 0.714, 0.0052, 0.6137, 0.0054, 0.0608, 0.1361, 0.407, 0.039, 0.0008, 0.0078, 0.0002, 0.0006, 0.1497, 0.0091, 0.3072, 0.0015, 0.6671, 0.6679, 0.656, 0.6709, 0.004, 0.012, 0.2966, 0.3228, 0.2769, 0.0003, 1.0, 0.6453, 0.523, 0.0, 0.8015, 0.002, 0.112, 0.0008, 0.0008, 0.5937, 0.0022, 0.6723, 0.022, 1.0, 0.523, 0.1109, 0.9353, 0.4857, 0.402, 1.0, 0.0707, 0.0, 0.6196, 0.0011, 0.6236, 0.5992, 0.6346, 0.4359, 0.0268, 0.565, 1.0, 0.0108]\n", - "4 4 0 [0.2314, 0.1628, 0.2068, 0.6001, 0.6001, 0.9988, 0.796, 0.8078, 0.3015, 0.7801, 0.0003, 0.0002, 0.458, 0.0005, 0.0, 0.1351, 0.1351, 0.1351, 0.1599, 0.315, 0.0085, 0.088, 0.1271, 0.0223, 0.8481, 0.6894, 0.6894, 0.2176, 3860000000.0, 0.0003, 0.2633, 0.363, 0.011, 0.0072, 0.6306, 0.0214, 0.2081, 0.7919, 0.0053, 0.3832, 0.0123, 0.088, 0.1261, 0.3996, 0.0885, 0.0008, 0.0075, 0.0005, 0.0003, 0.025, 0.0108, 0.3855, 0.0044, 0.8522, 0.8464, 0.8194, 0.0331, 0.0111, 0.0013, 0.1393, 0.3341, 0.277, 0.0003, 0.637, 0.7459, 0.3384, 0.0024, 0.8278, 0.002, 0.184, 0.0003, 0.0003, 0.594, 3320000000.0, 0.6715, 0.1798, 0.637, 0.3384, 0.1171, 0.587, 0.4524, 0.521, 0.2972, 0.0265, 0.0, 0.5269, 0.0003, 0.6241, 0.6001, 0.7985, 0.2903, 0.0268, 0.5651, 1.0, 0.0164]" - ] - }, - "execution_count": 11, - "metadata": {}, - "output_type": "execute_result" - } - ], - "source": [ - "df = to_pandas_on_spark(to_pandas_on_spark(train_data).to_spark(index_col=\"index\"))\n", - "\n", - "df.head()" - ] - }, - { - "cell_type": "code", - "execution_count": 12, - "metadata": {}, - "outputs": [ - { - "name": "stdout", - "output_type": "stream", - "text": [ - "[flaml.automl.automl: 02-28 02:12:59] {2922} INFO - task = classification\n", - "[flaml.automl.automl: 02-28 02:13:00] {2924} INFO - Data split method: stratified\n", - "[flaml.automl.automl: 02-28 02:13:00] {2927} INFO - Evaluation method: cv\n" - ] - }, - { - "name": "stderr", - "output_type": "stream", - "text": [ - "/datadrive/spark/spark33/python/pyspark/pandas/utils.py:975: PandasAPIOnSparkAdviceWarning: `to_pandas` loads all data into the driver's memory. It should only be used if the resulting pandas Series is expected to be small.\n", - " warnings.warn(message, PandasAPIOnSparkAdviceWarning)\n" - ] - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "[flaml.automl.automl: 02-28 02:13:01] {3054} INFO - Minimizing error metric: 1-roc_auc\n", - "[flaml.automl.automl: 02-28 02:13:01] {3209} INFO - List of ML learners in AutoML Run: ['lgbm_spark']\n", - "[flaml.automl.automl: 02-28 02:13:01] {3539} INFO - iteration 0, current learner lgbm_spark\n" - ] - }, - { - "name": "stderr", - "output_type": "stream", - "text": [ - "/datadrive/spark/spark33/python/pyspark/pandas/utils.py:975: PandasAPIOnSparkAdviceWarning: `to_numpy` loads all data into the driver's memory. It should only be used if the resulting NumPy ndarray is expected to be small.\n", - " warnings.warn(message, PandasAPIOnSparkAdviceWarning)\n", - "/datadrive/spark/spark33/python/pyspark/pandas/utils.py:975: PandasAPIOnSparkAdviceWarning: If `index_col` is not specified for `to_spark`, the existing index is lost when converting to Spark DataFrame.\n", - " warnings.warn(message, PandasAPIOnSparkAdviceWarning)\n" - ] - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "[LightGBM] [Warning] Find whitespaces in feature_names, replace with underlines\n", - "[LightGBM] [Warning] Find whitespaces in feature_names, replace with underlines\n" - ] - }, - { - "name": "stderr", - "output_type": "stream", - "text": [ - "/datadrive/spark/spark33/python/pyspark/pandas/utils.py:975: PandasAPIOnSparkAdviceWarning: `to_numpy` loads all data into the driver's memory. It should only be used if the resulting NumPy ndarray is expected to be small.\n", - " warnings.warn(message, PandasAPIOnSparkAdviceWarning)\n" - ] - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "[flaml.automl.automl: 02-28 02:13:48] {3677} INFO - Estimated sufficient time budget=464999s. Estimated necessary time budget=465s.\n", - "[flaml.automl.automl: 02-28 02:13:48] {3724} INFO - at 48.5s,\testimator lgbm_spark's best error=0.0871,\tbest estimator lgbm_spark's best error=0.0871\n" - ] - }, - { - "name": "stderr", - "output_type": "stream", - "text": [ - "/datadrive/spark/spark33/python/pyspark/pandas/utils.py:975: PandasAPIOnSparkAdviceWarning: If `index_col` is not specified for `to_spark`, the existing index is lost when converting to Spark DataFrame.\n", - " warnings.warn(message, PandasAPIOnSparkAdviceWarning)\n" - ] - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "[LightGBM] [Warning] Find whitespaces in feature_names, replace with underlines\n", - "[LightGBM] [Warning] Find whitespaces in feature_names, replace with underlines\n", - "[flaml.automl.automl: 02-28 02:13:54] {3988} INFO - retrain lgbm_spark for 6.2s\n", - "[flaml.automl.automl: 02-28 02:13:54] {3995} INFO - retrained model: LightGBMClassifier_a2177c5be001\n", - "[flaml.automl.automl: 02-28 02:13:54] {3239} INFO - fit succeeded\n", - "[flaml.automl.automl: 02-28 02:13:54] {3240} INFO - Time taken to find the best model: 48.4579541683197\n" - ] - } - ], - "source": [ - "'''The main flaml automl API'''\n", - "automl.fit(dataframe=df, label='Bankrupt?', labelCol=\"Bankrupt?\", isUnbalance=True, **settings)" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "### Best model and metric" - ] - }, - { - "cell_type": "code", - "execution_count": 13, - "metadata": {}, - "outputs": [ - { - "name": "stdout", - "output_type": "stream", - "text": [ - "Best hyperparmeter config: {'numIterations': 4, 'numLeaves': 4, 'minDataInLeaf': 20, 'learningRate': 0.09999999999999995, 'log_max_bin': 8, 'featureFraction': 1.0, 'lambdaL1': 0.0009765625, 'lambdaL2': 1.0}\n", - "Best roc_auc on validation data: 0.9129\n", - "Training duration of best run: 6.237 s\n" - ] - } - ], - "source": [ - "''' retrieve best config'''\n", - "print('Best hyperparmeter config:', automl.best_config)\n", - "print('Best roc_auc on validation data: {0:.4g}'.format(1-automl.best_loss))\n", - "print('Training duration of best run: {0:.4g} s'.format(automl.best_config_train_time))" - ] - }, - { - "cell_type": "code", - "execution_count": 14, - "metadata": {}, - "outputs": [ - { - "data": { - "text/plain": [ - "DataFrame[evaluation_type: string, confusion_matrix: matrix, accuracy: double, precision: double, recall: double, AUC: double]" - ] - }, - "metadata": {}, - "output_type": "display_data" - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "+---------------+--------------------+------------------+-------------------+------------------+------------------+\n", - "|evaluation_type| confusion_matrix| accuracy| precision| recall| AUC|\n", - "+---------------+--------------------+------------------+-------------------+------------------+------------------+\n", - "| Classification|1218.0 55.0 \\n1...|0.9453302961275627|0.32926829268292684|0.6136363636363636|0.7852156680711276|\n", - "+---------------+--------------------+------------------+-------------------+------------------+------------------+\n", - "\n" - ] - } - ], - "source": [ - "flaml_metrics = predict(automl.model.estimator)\n", - "flaml_metrics.show()" - ] - } - ], - "metadata": { - "accelerator": "GPU", - "colab": { - "collapsed_sections": [], - "include_colab_link": true, - "name": "Copy of automl_nlp.ipynb", - "provenance": [] - }, - "gpuClass": "standard", - "kernelspec": { - "display_name": "flaml-dev", - "language": "python", - "name": "python3" - }, - "language_info": { - "codemirror_mode": { - "name": "ipython", - "version": 3 - }, - "file_extension": ".py", - "mimetype": "text/x-python", - "name": "python", - "nbconvert_exporter": "python", - "pygments_lexer": "ipython3", - "version": "3.10.8" - }, - "vscode": { - "interpreter": { - "hash": "cbbf4d250a3560c7073bd6e01a7ecfe1c772dc45f2100f74412fcaea735f0880" - } - }, - "widgets": {} - }, - "nbformat": 4, - "nbformat_minor": 0 -} diff --git a/notebook/research/autogen_code.ipynb b/notebook/research/autogen_code.ipynb index 8b396b3bc7..4fb0dfd8b0 100644 --- a/notebook/research/autogen_code.ipynb +++ b/notebook/research/autogen_code.ipynb @@ -15,7 +15,9 @@ "\n", "# Use FLAML to Optimize Code Generation Performance\n", "\n", - "In this notebook, we optimize OpenAI models for code generation. We use [the HumanEval benchmark](https://huggingface.co/datasets/openai_humaneval) released by OpenAI for synthesizing programs from docstrings. \n", + "In this notebook, we optimize OpenAI models for code generation. We use [the HumanEval benchmark](https://huggingface.co/datasets/openai_humaneval) released by OpenAI for synthesizing programs from docstrings.\n", + "\n", + "Related link: [Blogpost](https://microsoft.github.io/FLAML/blog/2023/05/18/GPT-adaptive-humaneval) based on this experiment.\n", "\n", "## Requirements\n", "\n", diff --git a/setup.py b/setup.py index f248758852..656b17c710 100644 --- a/setup.py +++ b/setup.py @@ -43,8 +43,6 @@ setuptools.setup( ], "notebook": [ "jupyter", - "matplotlib", - "openml", ], "spark": [ "pyspark>=3.2.0", diff --git a/test/autogen/oai/test_completion.py b/test/autogen/oai/test_completion.py index 80b453615e..fd5f295773 100644 --- a/test/autogen/oai/test_completion.py +++ b/test/autogen/oai/test_completion.py @@ -11,10 +11,8 @@ from flaml.autogen.code_utils import ( generate_assertions, implement, generate_code, - extract_code, improve_function, improve_code, - execute_code, ) from flaml.autogen.math_utils import eval_math_responses, solve_problem @@ -101,34 +99,6 @@ def test_multi_model(): print(response) -@pytest.mark.skipif( - sys.platform in ["darwin", "win32"], - reason="do not run on MacOS or windows", -) -def test_execute_code(): - try: - import docker - except ImportError as exc: - print(exc) - return - exitcode, msg = execute_code("print('hello world')", filename="tmp/codetest.py") - assert exitcode == 0 and msg == b"hello world\n", msg - # read a file - print(execute_code("with open('tmp/codetest.py', 'r') as f: a=f.read()")) - # create a file - print(execute_code("with open('tmp/codetest.py', 'w') as f: f.write('b=1')", work_dir=f"{here}/my_tmp")) - # execute code in a file - print(execute_code(filename="tmp/codetest.py")) - # execute code for assertion error - exit_code, msg = execute_code("assert 1==2") - assert exit_code, msg - # execute code which takes a long time - exit_code, error = execute_code("import time; time.sleep(2)", timeout=1) - assert exit_code and error == "Timeout" - exit_code, error = execute_code("import time; time.sleep(2)", timeout=1, use_docker=False) - assert exit_code and error == "Timeout" - - def test_improve(): try: import openai @@ -187,39 +157,7 @@ def test_nocontext(): ], ) print(code) - # test extract_code from markdown - code, _ = extract_code( - """ -Example: -``` -print("hello extract code") -``` -""" - ) - print(code) - code, _ = extract_code( - """ -Example: -```python -def scrape(url): - import requests - from bs4 import BeautifulSoup - response = requests.get(url) - soup = BeautifulSoup(response.text, "html.parser") - title = soup.find("title").text - text = soup.find("div", {"id": "bodyContent"}).text - return title, text -``` -Test: -```python -url = "https://en.wikipedia.org/wiki/Web_scraping" -title, text = scrape(url) -print(f"Title: {title}") -print(f"Text: {text}") -""" - ) - print(code) solution, cost = solve_problem("1+1=", config_list=oai.config_list_gpt4_gpt35(KEY_LOC)) print(solution, cost) @@ -445,7 +383,6 @@ if __name__ == "__main__": # test_filter() # test_chatcompletion() # test_multi_model() - # test_execute_code() # test_improve() # test_nocontext() test_humaneval(1) diff --git a/test/autogen/oai/test_notebook.py b/test/autogen/oai/test_notebook.py index 8c76a6eef0..773520f624 100644 --- a/test/autogen/oai/test_notebook.py +++ b/test/autogen/oai/test_notebook.py @@ -45,7 +45,15 @@ def run_notebook(input_nb, output_nb="executed_openai_notebook.ipynb", save=Fals @pytest.mark.skipif( skip or not sys.version.startswith("3.10"), - reason="do not run openai test if openai is not installed or py!=3.10", + reason="do not run if openai is not installed or py!=3.10", +) +def test_autogen_agent_auto_feedback_from_code(save=False): + run_notebook("autogen_agent_auto_feedback_from_code_execution.ipynb", save=save) + + +@pytest.mark.skipif( + skip or not sys.version.startswith("3.10"), + reason="do not run if openai is not installed or py!=3.10", ) def test_autogen_openai_completion(save=False): run_notebook("autogen_openai_completion.ipynb", save=save) @@ -53,7 +61,7 @@ def test_autogen_openai_completion(save=False): @pytest.mark.skipif( skip or not sys.version.startswith("3.11"), - reason="do not run openai test if openai is not installed or py!=3.11", + reason="do not run if openai is not installed or py!=3.11", ) def test_autogen_chatgpt_gpt4(save=False): run_notebook("autogen_chatgpt_gpt4.ipynb", save=save) diff --git a/test/autogen/test_agent.py b/test/autogen/test_assistant_agent.py similarity index 57% rename from test/autogen/test_agent.py rename to test/autogen/test_assistant_agent.py index 9695d62e7e..c5230930b3 100644 --- a/test/autogen/test_agent.py +++ b/test/autogen/test_assistant_agent.py @@ -1,39 +1,63 @@ import os -from flaml.autogen.code_utils import extract_code from flaml import oai +from flaml.autogen.agent import AssistantAgent, UserProxyAgent KEY_LOC = "test/autogen" here = os.path.abspath(os.path.dirname(__file__)) -def test_extract_code(): - print(extract_code("```bash\npython temp.py\n```")) - - -def test_coding_agent(human_input_mode="NEVER", max_consecutive_auto_reply=10): +def test_gpt35(human_input_mode="NEVER", max_consecutive_auto_reply=5): + try: + import openai + except ImportError: + return + config_list = oai.config_list_from_models(key_file_path=KEY_LOC, model_list=["gpt-3.5-turbo"]) + assistant = AssistantAgent( + "coding_agent", + request_timeout=600, + seed=40, + max_tokens=1024, + config_list=config_list, + ) + user = UserProxyAgent( + "user", + work_dir=f"{here}/test_agent_scripts", + human_input_mode=human_input_mode, + is_termination_msg=lambda x: x.rstrip().endswith("TERMINATE"), + max_consecutive_auto_reply=max_consecutive_auto_reply, + use_docker="python:3", + ) + coding_task = "Print hello world to a file called hello.txt" + assistant.receive(coding_task, user) + # coding_task = "Create a powerpoint with the text hello world in it." + # assistant.receive(coding_task, user) + assistant.reset() + coding_task = "Save a pandas df with 3 rows and 3 columns to disk." + assistant.receive(coding_task, user) + + +def test_create_execute_script(human_input_mode="NEVER", max_consecutive_auto_reply=10): try: import openai except ImportError: return - from flaml.autogen.agent.coding_agent import PythonAgent - from flaml.autogen.agent.user_proxy_agent import UserProxyAgent config_list = oai.config_list_gpt4_gpt35(key_file_path=KEY_LOC) conversations = {} oai.ChatCompletion.start_logging(conversations) - agent = PythonAgent("coding_agent", request_timeout=600, seed=42, config_list=config_list) + assistant = AssistantAgent("assistant", request_timeout=600, seed=42, config_list=config_list) user = UserProxyAgent( "user", human_input_mode=human_input_mode, max_consecutive_auto_reply=max_consecutive_auto_reply, is_termination_msg=lambda x: x.rstrip().endswith("TERMINATE"), ) - agent.receive( + assistant.receive( """Create and execute a script to plot a rocket without using matplotlib""", user, ) - agent.reset() - agent.receive( + assistant.reset() + assistant.receive( """Create a temp.py file with the following content: ``` print('Hello world!') @@ -42,7 +66,7 @@ print('Hello world!') ) print(conversations) oai.ChatCompletion.start_logging(compact=False) - agent.receive("""Execute temp.py""", user) + assistant.receive("""Execute temp.py""", user) print(oai.ChatCompletion.logged_history) oai.ChatCompletion.stop_logging() @@ -52,8 +76,6 @@ def test_tsp(human_input_mode="NEVER", max_consecutive_auto_reply=10): import openai except ImportError: return - from flaml.autogen.agent.coding_agent import PythonAgent - from flaml.autogen.agent.user_proxy_agent import UserProxyAgent config_list = oai.config_list_openai_aoai(key_file_path=KEY_LOC) hard_questions = [ @@ -63,7 +85,7 @@ def test_tsp(human_input_mode="NEVER", max_consecutive_auto_reply=10): ] oai.ChatCompletion.start_logging() - agent = PythonAgent("coding_agent", temperature=0, config_list=config_list) + assistant = AssistantAgent("assistant", temperature=0, config_list=config_list) user = UserProxyAgent( "user", work_dir=f"{here}", @@ -74,14 +96,14 @@ def test_tsp(human_input_mode="NEVER", max_consecutive_auto_reply=10): prompt = f.read() # agent.receive(prompt.format(question=hard_questions[0]), user) # agent.receive(prompt.format(question=hard_questions[1]), user) - agent.receive(prompt.format(question=hard_questions[2]), user) + assistant.receive(prompt.format(question=hard_questions[2]), user) print(oai.ChatCompletion.logged_history) oai.ChatCompletion.stop_logging() if __name__ == "__main__": - # test_extract_code() - test_coding_agent(human_input_mode="TERMINATE") + test_gpt35() + test_create_execute_script(human_input_mode="TERMINATE") # when GPT-4, i.e., the DEFAULT_MODEL, is used, conversation in the following test # should terminate in 2-3 rounds of interactions (because is_termination_msg should be true after 2-3 rounds) # although the max_consecutive_auto_reply is set to 10. diff --git a/test/autogen/test_code.py b/test/autogen/test_code.py new file mode 100644 index 0000000000..e3becba135 --- /dev/null +++ b/test/autogen/test_code.py @@ -0,0 +1,85 @@ +import sys +import os +import pytest +from flaml.autogen.code_utils import UNKNOWN, extract_code, execute_code, infer_lang + +here = os.path.abspath(os.path.dirname(__file__)) + + +def test_infer_lang(): + assert infer_lang("print('hello world')") == "python" + assert infer_lang("pip install flaml") == "sh" + + +def test_extract_code(): + print(extract_code("```bash\npython temp.py\n```")) + # test extract_code from markdown + codeblocks = extract_code( + """ +Example: +``` +print("hello extract code") +``` +""" + ) + print(codeblocks) + + codeblocks = extract_code( + """ +Example: +```python +def scrape(url): + import requests + from bs4 import BeautifulSoup + response = requests.get(url) + soup = BeautifulSoup(response.text, "html.parser") + title = soup.find("title").text + text = soup.find("div", {"id": "bodyContent"}).text + return title, text +``` +Test: +```python +url = "https://en.wikipedia.org/wiki/Web_scraping" +title, text = scrape(url) +print(f"Title: {title}") +print(f"Text: {text}") +""" + ) + print(codeblocks) + codeblocks = extract_code("no code block") + assert len(codeblocks) == 1 and codeblocks[0] == (UNKNOWN, "no code block") + + +@pytest.mark.skipif( + sys.platform in ["darwin", "win32"], + reason="do not run on MacOS or windows", +) +def test_execute_code(): + try: + import docker + except ImportError as exc: + print(exc) + return + exitcode, msg, image = execute_code("print('hello world')", filename="tmp/codetest.py") + assert exitcode == 0 and msg == b"hello world\n", msg + # read a file + print(execute_code("with open('tmp/codetest.py', 'r') as f: a=f.read()")) + # create a file + print(execute_code("with open('tmp/codetest.py', 'w') as f: f.write('b=1')", work_dir=f"{here}/my_tmp")) + # execute code in a file + print(execute_code(filename="tmp/codetest.py")) + print(execute_code("python tmp/codetest.py", lang="sh")) + # execute code for assertion error + exit_code, msg, image = execute_code("assert 1==2") + assert exit_code, msg + # execute code which takes a long time + exit_code, error, image = execute_code("import time; time.sleep(2)", timeout=1) + assert exit_code and error == "Timeout" + exit_code, error, image = execute_code("import time; time.sleep(2)", timeout=1, use_docker=False) + assert exit_code and error == "Timeout" and image is None + + +if __name__ == "__main__": + # test_infer_lang() + # test_extract_code() + test_execute_code() diff --git a/website/docs/Getting-Started.md b/website/docs/Getting-Started.md index 622b00cacb..2ab63b47b6 100644 --- a/website/docs/Getting-Started.md +++ b/website/docs/Getting-Started.md @@ -8,41 +8,47 @@ models, hyperparameters, and other tunable choices of an application. ### Main Features -* For foundation models like the GPT serie and AI agents based on them, it automates the experimentation and optimization of their performance to maximize the effectiveness for applications and minimize the inference cost. -* For common machine learning tasks like classification and regression, it quickly finds quality models for user-provided data with low computational resources. -* It is easy to customize or extend. Users can find their desired customizability from a smooth range: minimal customization (computational resource budget), medium customization (e.g., scikit-style learner, search space and metric), or full customization (arbitrary training/inference/evaluation code). Users can customize only when and what they need to, and leave the rest to the library. +* For foundation models like the GPT models, it automates the experimentation and optimization of their performance to maximize the effectiveness for applications and minimize the inference cost. FLAML enables users to build and use adaptive AI agents with minimal effort. +* For common machine learning tasks like classification and regression, it quickly finds quality models for user-provided data with low computational resources. It is easy to customize or extend. Users can find their desired customizability from a smooth range: minimal customization (computational resource budget), medium customization (e.g., search space and metric), or full customization (arbitrary training/inference/evaluation code). * It supports fast and economical automatic tuning, capable of handling large search space with heterogeneous evaluation cost and complex constraints/guidance/early stopping. FLAML is powered by a [cost-effective -hyperparameter optimization](Use-Cases/Tune-User-Defined-Function#hyperparameter-optimization-algorithm) -and model selection method invented by Microsoft Research, and many followup [research studies](Research). +hyperparameter optimization](/docs/Use-Cases/Tune-User-Defined-Function#hyperparameter-optimization-algorithm) +and model selection method invented by Microsoft Research, and many followup [research studies](/docs/Research). ### Quickstart -Install FLAML from pip: `pip install flaml`. Find more options in [Installation](Installation). +Install FLAML from pip: `pip install flaml`. Find more options in [Installation](/docs/Installation). There are several ways of using flaml: -#### (New) [Auto Generation](Use-Cases/Auto-Generation) +#### (New) [Auto Generation](/docs/Use-Cases/Auto-Generation) -For example, you can optimize generations by ChatGPT or GPT-4 etc. with your own tuning data, success metrics and budgets. +Maximize the utility out of the expensive LLMs such as ChatGPT and GPT-4, including: + - A drop-in replacement of `openai.Completion` or `openai.ChatCompletion` with powerful functionalites like tuning, caching, templating, filtering. For example, you can optimize generations by LLM with your own tuning data, success metrics and budgets. + ```python + from flaml import oai -```python -from flaml import oai + # perform tuning + config, analysis = oai.Completion.tune( + data=tune_data, + metric="success", + mode="max", + eval_func=eval_func, + inference_budget=0.05, + optimization_budget=3, + num_samples=-1, + ) -config, analysis = oai.Completion.tune( - data=tune_data, - metric="success", - mode="max", - eval_func=eval_func, - inference_budget=0.05, - optimization_budget=3, - num_samples=-1, -) -``` + # perform inference for a test instance + response = oai.Completion.create(context=test_instance, **config) + ``` + - LLM-driven intelligent agents which can perform tasks autonomously or with human feedback, including tasks that require using tools via code. For example, + ```python + assistant = AssistantAgent("assistant") + user = UserProxyAgent("user", human_input_mode="TERMINATE") + assistant.receive("Draw a rocket and save to a file named 'rocket.svg'") + ``` -The automated experimentation and optimization can help you maximize the utility out of these expensive models. -A suite of utilities are offered to accelerate the experimentation and application development, such as low-level inference API with caching, templating, filtering, and higher-level components like LLM-based coding and interactive agents. - -#### [Task-oriented AutoML](Use-Cases/task-oriented-automl) +#### [Task-oriented AutoML](/docs/Use-Cases/task-oriented-automl) For example, with three lines of code, you can start using this economical and fast AutoML engine as a scikit-learn style estimator. @@ -52,14 +58,14 @@ automl = AutoML() automl.fit(X_train, y_train, task="classification", time_budget=60) ``` -It automatically tunes the hyperparameters and selects the best model from default learners such as LightGBM, XGBoost, random forest etc. for the specified time budget 60 seconds. [Customizing](Use-Cases/task-oriented-automl#customize-automlfit) the optimization metrics, learners and search spaces etc. is very easy. For example, +It automatically tunes the hyperparameters and selects the best model from default learners such as LightGBM, XGBoost, random forest etc. for the specified time budget 60 seconds. [Customizing](/docs/Use-Cases/task-oriented-automl#customize-automlfit) the optimization metrics, learners and search spaces etc. is very easy. For example, ```python automl.add_learner("mylgbm", MyLGBMEstimator) automl.fit(X_train, y_train, task="classification", metric=custom_metric, estimator_list=["mylgbm"], time_budget=60) ``` -#### [Tune user-defined function](Use-Cases/Tune-User-Defined-Function) +#### [Tune user-defined function](/docs/Use-Cases/Tune-User-Defined-Function) You can run generic hyperparameter tuning for a custom function (machine learning or beyond). For example, @@ -99,7 +105,7 @@ analysis = tune.run( ``` Please see this [script](https://github.com/microsoft/FLAML/blob/main/test/tune_example.py) for the complete version of the above example. -#### [Zero-shot AutoML](Use-Cases/Zero-Shot-AutoML) +#### [Zero-shot AutoML](/docs/Use-Cases/Zero-Shot-AutoML) FLAML offers a unique, seamless and effortless way to leverage AutoML for the commonly used classifiers and regressors such as LightGBM and XGBoost. For example, if you are using `lightgbm.LGBMClassifier` as your current learner, all you need to do is to replace `from lightgbm import LGBMClassifier` by: @@ -111,11 +117,11 @@ Then, you can use it just like you use the original `LGMBClassifier`. Your other ### Where to Go Next? -* Understand the use cases for [Auto Generation](Use-Cases/Auto-Generation), [Task-oriented AutoML](Use-Cases/Task-Oriented-Automl), [Tune user-defined function](Use-Cases/Tune-User-Defined-Function) and [Zero-shot AutoML](Use-Cases/Zero-Shot-AutoML). -* Find code examples under "Examples": from [AutoGen - OpenAI](Examples/AutoGen-OpenAI) to [Tune - PyTorch](Examples/Tune-PyTorch). -* Learn about [research](Research) around FLAML and check [blogposts](/blog). +* Understand the use cases for [Auto Generation](/docs/Use-Cases/Auto-Generation), [Task-oriented AutoML](/docs/Use-Cases/Task-Oriented-Automl), [Tune user-defined function](/docs/Use-Cases/Tune-User-Defined-Function) and [Zero-shot AutoML](/docs/Use-Cases/Zero-Shot-AutoML). +* Find code examples under "Examples": from [AutoGen - OpenAI](/docs/Examples/AutoGen-OpenAI) to [Tune - PyTorch](/docs/Examples/Tune-PyTorch). +* Learn about [research](/docs/Research) around FLAML and check [blogposts](/blog). * Chat on [Discord](https://discord.gg/Cppx2vSPVP). -If you like our project, please give it a [star](https://github.com/microsoft/FLAML/stargazers) on GitHub. If you are interested in contributing, please read [Contributor's Guide](Contribute). +If you like our project, please give it a [star](https://github.com/microsoft/FLAML/stargazers) on GitHub. If you are interested in contributing, please read [Contributor's Guide](/docs/Contribute). diff --git a/website/docs/Installation.md b/website/docs/Installation.md index 92285a7034..31145f649d 100644 --- a/website/docs/Installation.md +++ b/website/docs/Installation.md @@ -15,47 +15,60 @@ conda install flaml -c conda-forge ### Optional Dependencies +#### [Auto Generation](Use-Cases/Auto-Generation) + +```bash +pip install "flaml[autogen]" +``` + +#### [Task-oriented AutoML](Use-Cases/Task-Oriented-AutoML) + +```bash +pip install "flaml[automl]" +``` + +#### Extra learners/models + +* openai models +```bash +pip install "flaml[openai]" +``` +* catboost +```bash +pip install "flaml[catboost]" +``` +* vowpal wabbit +```bash +pip install "flaml[vw]" +``` +* time series forecaster: prophet, statsmodels +```bash +pip install "flaml[forecast]" +``` +* huggingface transformers +```bash +pip install "flaml[hf]" +``` + #### Notebook To run the [notebook examples](https://github.com/microsoft/FLAML/tree/main/notebook), install flaml with the [notebook] option: ```bash -pip install flaml[notebook] -``` - -#### Extra learners/models -* openai models -```bash -pip install flaml[openai] -``` -* catboost -```bash -pip install flaml[catboost] -``` -* vowpal wabbit -```bash -pip install flaml[vw] -``` -* time series forecaster: prophet, statsmodels -```bash -pip install flaml[forecast] -``` -* huggingface transformers -```bash -pip install flaml[hf] +pip install "flaml[notebook]" ``` #### Distributed tuning * ray ```bash -pip install flaml[ray] +pip install "flaml[ray]" ``` * spark > *Spark support is added in v1.1.0* ```bash -pip install flaml[spark]>=1.1.0 +pip install "flaml[spark]>=1.1.0" ``` For cloud platforms such as [Azure Synapse](https://azure.microsoft.com/en-us/products/synapse-analytics/), Spark clusters are provided. @@ -76,11 +89,11 @@ export PATH=$PATH:$SPARK_HOME/bin * nni ```bash -pip install flaml[nni] +pip install "flaml[nni]" ``` * blendsearch ```bash -pip install flaml[blendsearch] +pip install "flaml[blendsearch]" ``` * synapse diff --git a/website/docs/Research.md b/website/docs/Research.md index 7c1d9ceb49..a510d41fad 100644 --- a/website/docs/Research.md +++ b/website/docs/Research.md @@ -82,13 +82,12 @@ For technical details, please check our research publications. * [Targeted Hyperparameter Optimization with Lexicographic Preferences Over Multiple Objectives](https://openreview.net/forum?id=0Ij9_q567Ma). Shaokun Zhang, Feiran Jia, Chi Wang, Qingyun Wu. ICLR 2023 (notable-top-5%). ```bibtex -@inproceedings{ -zhang2023targeted, -title={Targeted Hyperparameter Optimization with Lexicographic Preferences Over Multiple Objectives}, -author={Shaokun Zhang and Feiran Jia and Chi Wang and Qingyun Wu}, -booktitle={International Conference on Learning Representations}, -year={2023}, -url={https://openreview.net/forum?id=0Ij9_q567Ma} +@inproceedings{zhang2023targeted, + title={Targeted Hyperparameter Optimization with Lexicographic Preferences Over Multiple Objectives}, + author={Shaokun Zhang and Feiran Jia and Chi Wang and Qingyun Wu}, + booktitle={International Conference on Learning Representations}, + year={2023}, + url={https://openreview.net/forum?id=0Ij9_q567Ma}, } ``` diff --git a/website/docs/Use-Cases/Auto-Generation.md b/website/docs/Use-Cases/Auto-Generation.md index 57f191b238..3aad9242e9 100644 --- a/website/docs/Use-Cases/Auto-Generation.md +++ b/website/docs/Use-Cases/Auto-Generation.md @@ -4,8 +4,8 @@ * Leveraging [`flaml.tune`](../reference/tune/tune) to adapt LLMs to applications, such that: - Maximize the utility out of using expensive foundation models. - Reduce the inference cost by using cheaper models or configurations which achieve equal or better performance. -* An enhanced inference API with utilities like API unification, caching, error handling, multi-config inference, context programming etc. -* Higher-level utility functions like LLM-based coding and interactive agents. +* An enhanced inference API as a drop-in replacement of `openai.Completion.create` or `openai.ChatCompletion.create` with utilities like API unification, caching, error handling, multi-config inference, context programming etc. +* Higher-level components like LLM-based intelligent agents which can perform tasks autonomously or with human feedback, including tasks that require using tools via code. The package is under active development with more features upcoming. @@ -32,7 +32,7 @@ There are also complex interactions among subsets of the hyperparameters. For ex the temperature and top_p are not recommended to be altered from their default values together because they both control the randomness of the generated text, and changing both at the same time can result in conflicting effects; n and best_of are rarely tuned together because if the application can process multiple outputs, filtering on the server side causes unnecessary information loss; both n and max_tokens will affect the total number of tokens generated, which in turn will affect the cost of the request. These interactions and trade-offs make it difficult to manually determine the optimal hyperparameter settings for a given text generation task. -*Do the choices matter? Check this [blog post](/blog/2023/04/21/LLM-tuning-math) for a case study.* +*Do the choices matter? Check this [blogpost](/blog/2023/04/21/LLM-tuning-math) to find example tuning results about gpt-3.5-turbo and gpt-4.* With `flaml.autogen`, the tuning can be performed with the following information: @@ -190,6 +190,8 @@ response = oai.Completion.create( The example above will try to use text-ada-001, gpt-3.5-turbo, and text-davinci-003 iteratively, until a valid json string is returned or the last config is used. One can also repeat the same model in the list for multiple times to try one model multiple times for increasing the robustness of the final response. +*Advanced use case: Check this [blogpost](/blog/2023/05/18/GPT-adaptive-humaneval) to find how to improve GPT-4's coding performance from 68% to 90% while reducing the inference cost.* + ### Templating If the provided prompt or message is a template, it will be automatically materialized with a given context. For example, diff --git a/website/src/components/HomepageFeatures.js b/website/src/components/HomepageFeatures.js index 7d21256812..56f7cd4619 100644 --- a/website/src/components/HomepageFeatures.js +++ b/website/src/components/HomepageFeatures.js @@ -10,22 +10,33 @@ const FeatureList = [ <> FLAML finds accurate models or configurations with low computational resources for common ML/AI tasks. - It frees users from selecting models and hyperparameters for training or inference. - {/* It is fast and economical. */} + It frees users from selecting models and hyperparameters for training or inference, + with smooth customizability. ), }, { - title: 'Easy to Customize or Extend', + title: 'Adapt Large Language Models to Your Needs', Svg: require('../../static/img/extend.svg').default, description: ( <> - FLAML is designed easy to extend, such as adding custom learners or metrics. - The customization level ranges smoothly from minimal -(training data and task type as only input) to full (tuning a user-defined function). + By automatically adapting LLMs to applications, FLAML + maximizes the benefits of expensive LLMs and reduce monetary cost. + FLAML enables users to build and use intelligent adaptive AI agents with minimal effort. ), }, +// { +// title: 'Easy to Customize or Extend', +// Svg: require('../../static/img/extend.svg').default, +// description: ( +// <> +// FLAML is designed easy to extend, such as adding custom learners or metrics. +// The customization level ranges smoothly from minimal +// (training data and task type as only input) to full (tuning a user-defined function). +// +// ), +// }, { title: 'Tune It Fast, Tune It As You Like', Svg: require('../../static/img/fast.svg').default,