From 067fa88b6793edbefd7ccfe1bceed06b1643ea69 Mon Sep 17 00:00:00 2001 From: Li Jiang Date: Tue, 11 Jun 2024 21:04:37 +0800 Subject: [PATCH] Update notebook (#2886) --- notebook/agentchat_microsoft_fabric.ipynb | 3452 +++++++++++++++++---- 1 file changed, 2862 insertions(+), 590 deletions(-) diff --git a/notebook/agentchat_microsoft_fabric.ipynb b/notebook/agentchat_microsoft_fabric.ipynb index 58e7ddd202..e4c2a7119c 100644 --- a/notebook/agentchat_microsoft_fabric.ipynb +++ b/notebook/agentchat_microsoft_fabric.ipynb @@ -2,23 +2,32 @@ "cells": [ { "cell_type": "markdown", - "id": "be5a8d87", - "metadata": {}, + "id": "0", + "metadata": { + "nteract": { + "transient": { + "deleting": false + } + } + }, "source": [ - "# Use AutoGen in Microsoft Fabric\n", + "## Use AutoGen in Microsoft Fabric\n", "\n", - "AutoGen offers conversable LLM agents, which can be used to solve various tasks with human or automatic feedback, including tasks that require using tools via code.\n", + "[AutoGen](https://github.com/microsoft/autogen) offers conversable LLM agents, which can be used to solve various tasks with human or automatic feedback, including tasks that require using tools via code.\n", "Please find documentation about this feature [here](https://microsoft.github.io/autogen/docs/Use-Cases/agent_chat).\n", "\n", - "[Microsoft Fabric](https://learn.microsoft.com/en-us/fabric/get-started/microsoft-fabric-overview) is an all-in-one analytics solution for enterprises that covers everything from data movement to data science, Real-Time Analytics, and business intelligence. It offers a comprehensive suite of services, including data lake, data engineering, and data integration, all in one place. Its pre-built AI models include GPT-x models such as `gpt-4-turbo`, `gpt-4`, `gpt-4-8k`, `gpt-4-32k`, `gpt-35-turbo`, `gpt-35-turbo-16k` and `gpt-35-turbo-instruct`, etc. It's important to note that the Azure Open AI service is not supported on trial SKUs and only paid SKUs (F64 or higher, or P1 or higher) are supported. Azure Open AI is being enabled in stages, with access for all users expected by March 2024.\n", + "[Microsoft Fabric](https://learn.microsoft.com/en-us/fabric/get-started/microsoft-fabric-overview) is an all-in-one analytics solution for enterprises that covers everything from data movement to data science, Real-Time Analytics, and business intelligence. It offers a comprehensive suite of services, including data lake, data engineering, and data integration, all in one place. Its pre-built AI models include GPT-x models such as `gpt-4o`, `gpt-4-turbo`, `gpt-4`, `gpt-4-8k`, `gpt-4-32k`, `gpt-35-turbo`, `gpt-35-turbo-16k` and `gpt-35-turbo-instruct`, etc. It's important to note that the Azure Open AI service is not supported on trial SKUs and only paid SKUs (F64 or higher, or P1 or higher) are supported.\n", "\n", - "In this notebook, we demonstrate how to use `AssistantAgent` and `UserProxyAgent` to write code and execute the code. Here `AssistantAgent` is an LLM-based agent that can write Python code (in a Python coding block) for a user to execute for a given task. `UserProxyAgent` is an agent which serves as a proxy for the human user to execute the code written by `AssistantAgent`, or automatically execute the code. Depending on the setting of `human_input_mode` and `max_consecutive_auto_reply`, the `UserProxyAgent` either solicits feedback from the human user or returns auto-feedback based on the result of code execution (success or failure and corresponding outputs) to `AssistantAgent`. `AssistantAgent` will debug the code and suggest new code if the result contains error. The two agents keep communicating to each other until the task is done.\n", + "In this notebook, we demonstrate several examples:\n", + "- 1. How to use `AssistantAgent` and `UserProxyAgent` to write code and execute the code.\n", + "- 2. How to use `RetrieveAssistantAgent` and `RetrieveUserProxyAgent` to do Retrieval Augmented Generation (RAG) for QA and Code Generation.\n", + "- 3. How to use `MultimodalConversableAgent` to chat with images.\n", "\n", - "## Requirements\n", + "### Requirements\n", "\n", "AutoGen requires `Python>=3.8`. To run this notebook example, please install:\n", "```bash\n", - "pip install \"pyautogen\"\n", + "pip install \"pyautogen[retrievechat,lmm]>=0.2.28\"\n", "```\n", "\n", "Also, this notebook depends on Microsoft Fabric pre-built LLM endpoints. Running it elsewhere may encounter errors." @@ -26,7 +35,7 @@ }, { "cell_type": "markdown", - "id": "34ce050c-134a-4787-9655-73d9bd7afb6b", + "id": "1", "metadata": { "nteract": { "transient": { @@ -35,155 +44,93 @@ } }, "source": [ - "## AutoGen version < 0.2.0\n", + "### Install AutoGen" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "2", + "metadata": {}, + "outputs": [], + "source": [ + "%pip install \"pyautogen[retrievechat,lmm]>=0.2.28\" -q" + ] + }, + { + "cell_type": "markdown", + "id": "3", + "metadata": { + "nteract": { + "transient": { + "deleting": false + } + } + }, + "source": [ + "### Set up config_list and llm_config" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "4", + "metadata": { + "jupyter": { + "outputs_hidden": false, + "source_hidden": false + }, + "nteract": { + "transient": { + "deleting": false + } + } + }, + "outputs": [ + { + "data": { + "application/vnd.livy.statement-meta+json": { + "execution_finish_time": "2024-06-07T15:24:20.5752101Z", + "execution_start_time": "2024-06-07T15:24:03.7868628Z", + "livy_statement_state": "available", + "parent_msg_id": "bf8925aa-a2a2-4686-9388-3ec1eb12c5d7", + "queued_time": "2024-06-07T15:23:08.5880731Z", + "session_id": "1d5e9aec-2019-408c-a19a-5db9fb175ae2", + "session_start_time": null, + "spark_pool": null, + "state": "finished", + "statement_id": 9, + "statement_ids": [ + 9 + ] + }, + "text/plain": [ + "StatementMeta(, 1d5e9aec-2019-408c-a19a-5db9fb175ae2, 9, Finished, Available)" + ] + }, + "metadata": {}, + "output_type": "display_data" + } + ], + "source": [ + "import types\n", "\n", - "For AutoGen version < 0.2.0, the Azure OpenAI endpoint is pre-configured." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "id": "6a6b4a95-5766-442d-9de5-b7fc1fb3d140", - "metadata": { - "jupyter": { - "outputs_hidden": false, - "source_hidden": false - }, - "nteract": { - "transient": { - "deleting": false - } - } - }, - "outputs": [ - { - "data": { - "application/vnd.livy.statement-meta+json": { - "execution_finish_time": "2023-12-11T05:07:36.8889779Z", - "execution_start_time": "2023-12-11T05:07:36.8886587Z", - "livy_statement_state": "available", - "parent_msg_id": "4aa7c4ee-8126-4206-8a8b-b38491ff16dc", - "queued_time": "2023-12-11T05:07:11.6799575Z", - "session_id": null, - "session_start_time": null, - "spark_pool": null, - "state": "finished", - "statement_id": -1 - }, - "text/plain": [ - "StatementMeta(, , -1, Finished, Available)" - ] - }, - "metadata": {}, - "output_type": "display_data" - }, - { - "data": {}, - "execution_count": null, - "metadata": {}, - "output_type": "execute_result" - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "Collecting pyautogen<0.2.0\n", - " Downloading pyautogen-0.1.14-py3-none-any.whl (88 kB)\n", - "\u001b[2K \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m88.8/88.8 kB\u001b[0m \u001b[31m6.4 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n", - "\u001b[?25hRequirement already satisfied: diskcache in /home/trusted-service-user/cluster-env/trident_env/lib/python3.10/site-packages (from pyautogen<0.2.0) (5.6.3)\n", - "Requirement already satisfied: flaml in /home/trusted-service-user/cluster-env/trident_env/lib/python3.10/site-packages (from pyautogen<0.2.0) (2.1.1.dev2)\n", - "Requirement already satisfied: openai<1 in /home/trusted-service-user/cluster-env/trident_env/lib/python3.10/site-packages (from pyautogen<0.2.0) (0.27.8)\n", - "Collecting python-dotenv (from pyautogen<0.2.0)\n", - " Downloading python_dotenv-1.0.0-py3-none-any.whl (19 kB)\n", - "Requirement already satisfied: termcolor in /home/trusted-service-user/cluster-env/trident_env/lib/python3.10/site-packages (from pyautogen<0.2.0) (2.3.0)\n", - "Requirement already satisfied: requests>=2.20 in /home/trusted-service-user/cluster-env/trident_env/lib/python3.10/site-packages (from openai<1->pyautogen<0.2.0) (2.31.0)\n", - "Requirement already satisfied: tqdm in /home/trusted-service-user/cluster-env/trident_env/lib/python3.10/site-packages (from openai<1->pyautogen<0.2.0) (4.66.1)\n", - "Requirement already satisfied: aiohttp in /home/trusted-service-user/cluster-env/trident_env/lib/python3.10/site-packages (from openai<1->pyautogen<0.2.0) (3.8.6)\n", - "Requirement already satisfied: NumPy>=1.17.0rc1 in /home/trusted-service-user/cluster-env/trident_env/lib/python3.10/site-packages (from flaml->pyautogen<0.2.0) (1.24.3)\n", - "Requirement already satisfied: charset-normalizer<4,>=2 in /home/trusted-service-user/cluster-env/trident_env/lib/python3.10/site-packages (from requests>=2.20->openai<1->pyautogen<0.2.0) (3.3.1)\n", - "Requirement already satisfied: idna<4,>=2.5 in /home/trusted-service-user/cluster-env/trident_env/lib/python3.10/site-packages (from requests>=2.20->openai<1->pyautogen<0.2.0) (3.4)\n", - "Requirement already satisfied: urllib3<3,>=1.21.1 in /home/trusted-service-user/cluster-env/trident_env/lib/python3.10/site-packages (from requests>=2.20->openai<1->pyautogen<0.2.0) (1.26.17)\n", - "Requirement already satisfied: certifi>=2017.4.17 in /home/trusted-service-user/cluster-env/trident_env/lib/python3.10/site-packages (from requests>=2.20->openai<1->pyautogen<0.2.0) (2023.7.22)\n", - "Requirement already satisfied: attrs>=17.3.0 in /home/trusted-service-user/cluster-env/trident_env/lib/python3.10/site-packages (from aiohttp->openai<1->pyautogen<0.2.0) (23.1.0)\n", - "Requirement already satisfied: multidict<7.0,>=4.5 in /home/trusted-service-user/cluster-env/trident_env/lib/python3.10/site-packages (from aiohttp->openai<1->pyautogen<0.2.0) (6.0.4)\n", - "Requirement already satisfied: async-timeout<5.0,>=4.0.0a3 in /home/trusted-service-user/cluster-env/trident_env/lib/python3.10/site-packages (from aiohttp->openai<1->pyautogen<0.2.0) (4.0.3)\n", - "Requirement already satisfied: yarl<2.0,>=1.0 in /home/trusted-service-user/cluster-env/trident_env/lib/python3.10/site-packages (from aiohttp->openai<1->pyautogen<0.2.0) (1.9.2)\n", - "Requirement already satisfied: frozenlist>=1.1.1 in /home/trusted-service-user/cluster-env/trident_env/lib/python3.10/site-packages (from aiohttp->openai<1->pyautogen<0.2.0) (1.4.0)\n", - "Requirement already satisfied: aiosignal>=1.1.2 in /home/trusted-service-user/cluster-env/trident_env/lib/python3.10/site-packages (from aiohttp->openai<1->pyautogen<0.2.0) (1.3.1)\n", - "Installing collected packages: python-dotenv, pyautogen\n", - "Successfully installed pyautogen-0.1.14 python-dotenv-1.0.0\n", - "\n", - "\u001b[1m[\u001b[0m\u001b[34;49mnotice\u001b[0m\u001b[1;39;49m]\u001b[0m\u001b[39;49m A new release of pip is available: \u001b[0m\u001b[31;49m23.1.2\u001b[0m\u001b[39;49m -> \u001b[0m\u001b[32;49m23.3.1\u001b[0m\n", - "\u001b[1m[\u001b[0m\u001b[34;49mnotice\u001b[0m\u001b[1;39;49m]\u001b[0m\u001b[39;49m To update, run: \u001b[0m\u001b[32;49mpython -m pip install --upgrade pip\u001b[0m\n", - "Note: you may need to restart the kernel to use updated packages.\n" - ] - }, - { - "data": {}, - "execution_count": null, - "metadata": {}, - "output_type": "execute_result" - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "Warning: PySpark kernel has been restarted to use updated packages.\n", - "\n" - ] - } - ], - "source": [ - "%pip install \"pyautogen<0.2.0\"" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "id": "448f26d0-d1f7-4b2a-8dab-035ff2abbedc", - "metadata": { - "jupyter": { - "outputs_hidden": false, - "source_hidden": false - }, - "nteract": { - "transient": { - "deleting": false - } - } - }, - "outputs": [ - { - "data": { - "application/vnd.livy.statement-meta+json": { - "execution_finish_time": "2023-12-11T05:18:00.2585542Z", - "execution_start_time": "2023-12-11T05:17:59.8269627Z", - "livy_statement_state": "available", - "parent_msg_id": "0c686a15-8b9c-4479-ac26-2cca81b21cf3", - "queued_time": "2023-12-11T05:17:59.3165049Z", - "session_id": "865e72a4-f70b-46cf-8421-9f25745bd9bd", - "session_start_time": null, - "spark_pool": null, - "state": "finished", - "statement_id": 27 - }, - "text/plain": [ - "StatementMeta(, 865e72a4-f70b-46cf-8421-9f25745bd9bd, 27, Finished, Available)" - ] - }, - "metadata": {}, - "output_type": "display_data" - } - ], - "source": [ - "from synapse.ml.mlflow import get_mlflow_env_config\n", + "import httpx\n", + "from synapse.ml.fabric.credentials import get_openai_httpx_sync_client\n", "\n", "import autogen\n", "\n", - "# Choose different models\n", + "http_client = get_openai_httpx_sync_client()\n", + "http_client.__deepcopy__ = types.MethodType(\n", + " lambda self, memo: self, http_client\n", + ") # https://microsoft.github.io/autogen/docs/topics/llm_configuration#adding-http-client-in-llm_config-for-proxy\n", + "\n", "config_list = [\n", " {\n", - " \"model\": \"gpt-4-turbo\",\n", + " \"model\": \"gpt-4o\",\n", + " \"http_client\": http_client,\n", + " \"api_type\": \"azure\",\n", + " \"api_version\": \"2024-02-01\",\n", " },\n", "]\n", "\n", @@ -194,10 +141,25 @@ "}" ] }, + { + "cell_type": "markdown", + "id": "5", + "metadata": { + "nteract": { + "transient": { + "deleting": false + } + } + }, + "source": [ + "### Example 1\n", + "How to use `AssistantAgent` and `UserProxyAgent` to write code and execute the code." + ] + }, { "cell_type": "code", "execution_count": null, - "id": "793b6eb1-f8af-4b98-809d-21fd53f7de41", + "id": "6", "metadata": { "jupyter": { "outputs_hidden": false, @@ -213,19 +175,22 @@ { "data": { "application/vnd.livy.statement-meta+json": { - "execution_finish_time": "2023-12-11T05:18:21.8907776Z", - "execution_start_time": "2023-12-11T05:18:01.7118817Z", + "execution_finish_time": "2024-06-07T15:25:04.5390713Z", + "execution_start_time": "2024-06-07T15:24:21.6208975Z", "livy_statement_state": "available", - "parent_msg_id": "a3a03b66-c113-4b91-872f-213880814fbd", - "queued_time": "2023-12-11T05:18:01.293131Z", - "session_id": "865e72a4-f70b-46cf-8421-9f25745bd9bd", + "parent_msg_id": "93157ebd-4f6e-4ad6-b089-5b40edea3787", + "queued_time": "2024-06-07T15:23:08.5886561Z", + "session_id": "1d5e9aec-2019-408c-a19a-5db9fb175ae2", "session_start_time": null, "spark_pool": null, "state": "finished", - "statement_id": 28 + "statement_id": 10, + "statement_ids": [ + 10 + ] }, "text/plain": [ - "StatementMeta(, 865e72a4-f70b-46cf-8421-9f25745bd9bd, 28, Finished, Available)" + "StatementMeta(, 1d5e9aec-2019-408c-a19a-5db9fb175ae2, 10, Finished, Available)" ] }, "metadata": {}, @@ -244,34 +209,46 @@ "--------------------------------------------------------------------------------\n", "\u001b[33massistant\u001b[0m (to user_proxy):\n", "\n", - "To determine who should read the paper titled \"Learning to Prompt for Continual Learning\" available on arXiv, we need to first understand the abstract and the topics covered in the paper. I will fetch the abstract from the provided URL and analyze its content to suggest the target audience.\n", + "To determine who should read the paper titled \"https://arxiv.org/abs/2308.08155\", we need to extract and analyze the abstract and other relevant information from the paper. This will help us understand the content and target audience of the paper.\n", + "\n", + "Let's write a Python script to fetch and print the abstract and other relevant details from the arXiv page.\n", "\n", "```python\n", - "# filename: fetch_arxiv_abstract.py\n", + "# filename: fetch_arxiv_paper_info.py\n", + "\n", "import requests\n", "from bs4 import BeautifulSoup\n", "\n", - "# Function to get the abstract of the paper from arXiv\n", - "def get_arxiv_abstract(url):\n", + "def fetch_arxiv_paper_info(url):\n", " response = requests.get(url)\n", " if response.status_code == 200:\n", " soup = BeautifulSoup(response.content, 'html.parser')\n", - " abstract_text = soup.find('blockquote', class_='abstract').text\n", - " # Clean up the abstract text\n", - " abstract_text = abstract_text.replace('Abstract: ', '').strip()\n", - " return abstract_text\n", + " \n", + " # Extract the title\n", + " title = soup.find('h1', class_='title').text.replace('Title:', '').strip()\n", + " \n", + " # Extract the authors\n", + " authors = soup.find('div', class_='authors').text.replace('Authors:', '').strip()\n", + " \n", + " # Extract the abstract\n", + " abstract = soup.find('blockquote', class_='abstract').text.replace('Abstract:', '').strip()\n", + " \n", + " # Extract the subjects\n", + " subjects = soup.find('span', class_='primary-subject').text.strip()\n", + " \n", + " print(f\"Title: {title}\\n\")\n", + " print(f\"Authors: {authors}\\n\")\n", + " print(f\"Abstract: {abstract}\\n\")\n", + " print(f\"Subjects: {subjects}\\n\")\n", " else:\n", - " return \"Error: Unable to fetch the abstract from arXiv.\"\n", + " print(\"Failed to fetch the paper information.\")\n", "\n", - "# URL of the paper\n", - "paper_url = 'https://arxiv.org/abs/2308.08155'\n", - "\n", - "# Get the abstract of the paper\n", - "abstract = get_arxiv_abstract(paper_url)\n", - "print(abstract)\n", + "# URL of the arXiv paper\n", + "url = \"https://arxiv.org/abs/2308.08155\"\n", + "fetch_arxiv_paper_info(url)\n", "```\n", "\n", - "Please run the above Python script to fetch the abstract of the paper. Once we have the abstract, I will analyze it to suggest the appropriate audience.\n", + "Please save the code in a file named `fetch_arxiv_paper_info.py` and execute it. This script will fetch and print the title, authors, abstract, and subjects of the paper, which will help us determine the target audience.\n", "\n", "--------------------------------------------------------------------------------\n", "\u001b[31m\n", @@ -280,31 +257,41 @@ "\n", "exitcode: 0 (execution succeeded)\n", "Code output: \n", - "Abstract:AutoGen is an open-source framework that allows developers to build LLM applications via multiple agents that can converse with each other to accomplish tasks. AutoGen agents are customizable, conversable, and can operate in various modes that employ combinations of LLMs, human inputs, and tools. Using AutoGen, developers can also flexibly define agent interaction behaviors. Both natural language and computer code can be used to program flexible conversation patterns for different applications. AutoGen serves as a generic infrastructure to build diverse applications of various complexities and LLM capacities. Empirical studies demonstrate the effectiveness of the framework in many example applications, with domains ranging from mathematics, coding, question answering, operations research, online decision-making, entertainment, etc.\n", + "Title: AutoGen: Enabling Next-Gen LLM Applications via Multi-Agent Conversation\n", + "\n", + "Authors: Qingyun Wu, Gagan Bansal, Jieyu Zhang, Yiran Wu, Beibin Li, Erkang Zhu, Li Jiang, Xiaoyun Zhang, Shaokun Zhang, Jiale Liu, Ahmed Hassan Awadallah, Ryen W White, Doug Burger, Chi Wang\n", + "\n", + "Abstract: AutoGen is an open-source framework that allows developers to build LLM applications via multiple agents that can converse with each other to accomplish tasks. AutoGen agents are customizable, conversable, and can operate in various modes that employ combinations of LLMs, human inputs, and tools. Using AutoGen, developers can also flexibly define agent interaction behaviors. Both natural language and computer code can be used to program flexible conversation patterns for different applications. AutoGen serves as a generic infrastructure to build diverse applications of various complexities and LLM capacities. Empirical studies demonstrate the effectiveness of the framework in many example applications, with domains ranging from mathematics, coding, question answering, operations research, online decision-making, entertainment, etc.\n", + "\n", + "Subjects: Artificial Intelligence (cs.AI)\n", + "\n", "\n", "\n", "--------------------------------------------------------------------------------\n", "\u001b[33massistant\u001b[0m (to user_proxy):\n", "\n", - "Based on the abstract provided, the paper titled \"AutoGen: An Open-Source Framework for Building LLM Applications with Conversable Agents\" seems to be focused on a framework that enables developers to create applications using large language models (LLMs) with agents that can interact through conversation to accomplish tasks.\n", + "Based on the extracted information, here is a summary of who should read the paper titled \"AutoGen: Enabling Next-Gen LLM Applications via Multi-Agent Conversation\":\n", "\n", - "The target audience for this paper would likely include:\n", + "### Title:\n", + "**AutoGen: Enabling Next-Gen LLM Applications via Multi-Agent Conversation**\n", "\n", - "1. **Software Developers and Engineers** who are interested in building applications that leverage large language models and conversational agents.\n", + "### Authors:\n", + "Qingyun Wu, Gagan Bansal, Jieyu Zhang, Yiran Wu, Beibin Li, Erkang Zhu, Li Jiang, Xiaoyun Zhang, Shaokun Zhang, Jiale Liu, Ahmed Hassan Awadallah, Ryen W White, Doug Burger, Chi Wang\n", "\n", - "2. **Researchers in Artificial Intelligence and Machine Learning** who are working on natural language processing, conversational AI, and the integration of human inputs with AI agents.\n", + "### Abstract:\n", + "AutoGen is an open-source framework that allows developers to build LLM (Large Language Model) applications via multiple agents that can converse with each other to accomplish tasks. AutoGen agents are customizable, conversable, and can operate in various modes that employ combinations of LLMs, human inputs, and tools. Using AutoGen, developers can also flexibly define agent interaction behaviors. Both natural language and computer code can be used to program flexible conversation patterns for different applications. AutoGen serves as a generic infrastructure to build diverse applications of various complexities and LLM capacities. Empirical studies demonstrate the effectiveness of the framework in many example applications, with domains ranging from mathematics, coding, question answering, operations research, online decision-making, entertainment, etc.\n", "\n", - "3. **Product Managers and Technical Leads** who are looking to understand how conversational AI can be applied to various domains such as mathematics, coding, question answering, operations research, online decision-making, and entertainment.\n", + "### Subjects:\n", + "**Artificial Intelligence (cs.AI)**\n", "\n", - "4. **Educators and Students** in computer science and related fields who are interested in the latest developments in AI frameworks and applications.\n", + "### Target Audience:\n", + "1. **AI Researchers and Practitioners**: Those who are working in the field of artificial intelligence, especially those focusing on large language models (LLMs) and multi-agent systems.\n", + "2. **Developers and Engineers**: Software developers and engineers interested in building applications using LLMs and multi-agent frameworks.\n", + "3. **Academics and Students**: Academics and students studying AI, machine learning, and related fields who are interested in the latest frameworks and methodologies for building LLM applications.\n", + "4. **Industry Professionals**: Professionals in industries such as technology, operations research, and entertainment who are looking to leverage AI and LLMs for various applications.\n", + "5. **Open-Source Community**: Contributors and users of open-source AI frameworks who are interested in new tools and frameworks for developing AI applications.\n", "\n", - "5. **Innovators and Entrepreneurs** in the tech industry who are exploring new ways to incorporate AI into their products and services.\n", - "\n", - "6. **AI Enthusiasts and Hobbyists** who have a keen interest in the practical applications of large language models and conversational interfaces.\n", - "\n", - "The paper would be particularly relevant for those who are looking to understand or utilize the AutoGen framework to build complex applications that require the capabilities of LLMs.\n", - "\n", - "If you are part of or know someone who belongs to these groups, this paper would be a valuable read.\n", + "This paper is particularly relevant for those interested in the practical applications and infrastructure for building complex AI systems using conversational agents.\n", "\n", "TERMINATE\n", "\n", @@ -335,7 +322,7 @@ ")\n", "\n", "# the assistant receives a message from the user, which contains the task description\n", - "user_proxy.initiate_chat(\n", + "chat_result = user_proxy.initiate_chat(\n", " assistant,\n", " message=\"\"\"\n", "Who should read this paper: https://arxiv.org/abs/2308.08155\n", @@ -343,9 +330,62 @@ ")" ] }, + { + "cell_type": "code", + "execution_count": null, + "id": "7", + "metadata": { + "jupyter": { + "outputs_hidden": false, + "source_hidden": false + }, + "nteract": { + "transient": { + "deleting": false + } + } + }, + "outputs": [ + { + "data": { + "application/vnd.livy.statement-meta+json": { + "execution_finish_time": "2024-06-07T15:26:14.0364536Z", + "execution_start_time": "2024-06-07T15:26:13.6931272Z", + "livy_statement_state": "available", + "parent_msg_id": "50747d08-5234-4212-9d18-ea3133cfb35e", + "queued_time": "2024-06-07T15:26:12.4397897Z", + "session_id": "1d5e9aec-2019-408c-a19a-5db9fb175ae2", + "session_start_time": null, + "spark_pool": null, + "state": "finished", + "statement_id": 13, + "statement_ids": [ + 13 + ] + }, + "text/plain": [ + "StatementMeta(, 1d5e9aec-2019-408c-a19a-5db9fb175ae2, 13, Finished, Available)" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "Cost for the chat:\n", + "{'usage_including_cached_inference': {'total_cost': 0.02107, 'gpt-4o-2024-05-13': {'cost': 0.02107, 'prompt_tokens': 1616, 'completion_tokens': 866, 'total_tokens': 2482}}, 'usage_excluding_cached_inference': {'total_cost': 0.02107, 'gpt-4o-2024-05-13': {'cost': 0.02107, 'prompt_tokens': 1616, 'completion_tokens': 866, 'total_tokens': 2482}}}\n" + ] + } + ], + "source": [ + "print(f\"Cost for the chat:\\n{chat_result.cost}\")" + ] + }, { "cell_type": "markdown", - "id": "a958cf54-23e8-46e8-be78-782c1a17bc82", + "id": "8", "metadata": { "nteract": { "transient": { @@ -354,15 +394,16 @@ } }, "source": [ - "## AutoGen version >= 0.2.0\n", + "### Example 2\n", + "How to use `RetrieveAssistantAgent` and `RetrieveUserProxyAgent` to do Retrieval Augmented Generation (RAG) for QA and Code Generation.\n", "\n", - "For AutoGen version >= 0.2.0, we need to set up an API endpoint because the version of the openai-python package is different from the pre-configured version." + "Check out this [blog](https://microsoft.github.io/autogen/blog/2023/10/18/RetrieveChat) for more details." ] }, { "cell_type": "code", "execution_count": null, - "id": "83867b85-6fb2-4ca1-8859-206f0b854b24", + "id": "9", "metadata": { "jupyter": { "outputs_hidden": false, @@ -378,193 +419,22 @@ { "data": { "application/vnd.livy.statement-meta+json": { - "execution_finish_time": "2023-12-11T05:23:56.8983159Z", - "execution_start_time": "2023-12-11T05:23:56.8981286Z", + "execution_finish_time": "2024-06-07T15:26:26.4217205Z", + "execution_start_time": "2024-06-07T15:26:26.0872609Z", "livy_statement_state": "available", - "parent_msg_id": "cb272a67-8c4b-4e7f-8dfe-153b85d6b7fd", - "queued_time": "2023-12-11T05:23:43.2251661Z", - "session_id": null, + "parent_msg_id": "2d2b3ee3-300e-4959-b68c-c95843c42eb7", + "queued_time": "2024-06-07T15:26:25.1160753Z", + "session_id": "1d5e9aec-2019-408c-a19a-5db9fb175ae2", "session_start_time": null, "spark_pool": null, "state": "finished", - "statement_id": -1 + "statement_id": 14, + "statement_ids": [ + 14 + ] }, "text/plain": [ - "StatementMeta(, , -1, Finished, Available)" - ] - }, - "metadata": {}, - "output_type": "display_data" - }, - { - "data": {}, - "execution_count": null, - "metadata": {}, - "output_type": "execute_result" - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "Collecting pyautogen>=0.2.0\n", - " Downloading pyautogen-0.2.2-py3-none-any.whl (124 kB)\n", - "\u001b[2K \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m124.0/124.0 kB\u001b[0m \u001b[31m8.5 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n", - "\u001b[?25hRequirement already satisfied: diskcache in /home/trusted-service-user/cluster-env/trident_env/lib/python3.10/site-packages (from pyautogen>=0.2.0) (5.6.3)\n", - "Requirement already satisfied: flaml in /home/trusted-service-user/cluster-env/trident_env/lib/python3.10/site-packages (from pyautogen>=0.2.0) (2.1.1.dev2)\n", - "Collecting openai~=1.3 (from pyautogen>=0.2.0)\n", - " Downloading openai-1.3.8-py3-none-any.whl (221 kB)\n", - "\u001b[2K \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m221.5/221.5 kB\u001b[0m \u001b[31m37.4 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n", - "\u001b[?25hRequirement already satisfied: python-dotenv in /nfs4/pyenv-b962c9b1-be7a-4052-b362-e359a86c2a98/lib/python3.10/site-packages (from pyautogen>=0.2.0) (1.0.0)\n", - "Requirement already satisfied: termcolor in /home/trusted-service-user/cluster-env/trident_env/lib/python3.10/site-packages (from pyautogen>=0.2.0) (2.3.0)\n", - "Requirement already satisfied: tiktoken in /home/trusted-service-user/cluster-env/trident_env/lib/python3.10/site-packages (from pyautogen>=0.2.0) (0.5.1)\n", - "Requirement already satisfied: anyio<5,>=3.5.0 in /home/trusted-service-user/cluster-env/trident_env/lib/python3.10/site-packages (from openai~=1.3->pyautogen>=0.2.0) (3.7.1)\n", - "Collecting distro<2,>=1.7.0 (from openai~=1.3->pyautogen>=0.2.0)\n", - " Downloading distro-1.8.0-py3-none-any.whl (20 kB)\n", - "Collecting httpx<1,>=0.23.0 (from openai~=1.3->pyautogen>=0.2.0)\n", - " Downloading httpx-0.25.2-py3-none-any.whl (74 kB)\n", - "\u001b[2K \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m75.0/75.0 kB\u001b[0m \u001b[31m40.3 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n", - "\u001b[?25hRequirement already satisfied: pydantic<3,>=1.9.0 in /home/trusted-service-user/cluster-env/trident_env/lib/python3.10/site-packages (from openai~=1.3->pyautogen>=0.2.0) (1.10.9)\n", - "Requirement already satisfied: sniffio in /home/trusted-service-user/cluster-env/trident_env/lib/python3.10/site-packages (from openai~=1.3->pyautogen>=0.2.0) (1.3.0)\n", - "Requirement already satisfied: tqdm>4 in /home/trusted-service-user/cluster-env/trident_env/lib/python3.10/site-packages (from openai~=1.3->pyautogen>=0.2.0) (4.66.1)\n", - "Requirement already satisfied: typing-extensions<5,>=4.5 in /home/trusted-service-user/cluster-env/trident_env/lib/python3.10/site-packages (from openai~=1.3->pyautogen>=0.2.0) (4.5.0)\n", - "Requirement already satisfied: NumPy>=1.17.0rc1 in /home/trusted-service-user/cluster-env/trident_env/lib/python3.10/site-packages (from flaml->pyautogen>=0.2.0) (1.24.3)\n", - "Requirement already satisfied: regex>=2022.1.18 in /home/trusted-service-user/cluster-env/trident_env/lib/python3.10/site-packages (from tiktoken->pyautogen>=0.2.0) (2023.8.8)\n", - "Requirement already satisfied: requests>=2.26.0 in /home/trusted-service-user/cluster-env/trident_env/lib/python3.10/site-packages (from tiktoken->pyautogen>=0.2.0) (2.31.0)\n", - "Requirement already satisfied: idna>=2.8 in /home/trusted-service-user/cluster-env/trident_env/lib/python3.10/site-packages (from anyio<5,>=3.5.0->openai~=1.3->pyautogen>=0.2.0) (3.4)\n", - "Requirement already satisfied: exceptiongroup in /home/trusted-service-user/cluster-env/trident_env/lib/python3.10/site-packages (from anyio<5,>=3.5.0->openai~=1.3->pyautogen>=0.2.0) (1.1.3)\n", - "Requirement already satisfied: certifi in /home/trusted-service-user/cluster-env/trident_env/lib/python3.10/site-packages (from httpx<1,>=0.23.0->openai~=1.3->pyautogen>=0.2.0) (2023.7.22)\n", - "Collecting httpcore==1.* (from httpx<1,>=0.23.0->openai~=1.3->pyautogen>=0.2.0)\n", - " Downloading httpcore-1.0.2-py3-none-any.whl (76 kB)\n", - "\u001b[2K \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m76.9/76.9 kB\u001b[0m \u001b[31m39.4 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n", - "\u001b[?25hRequirement already satisfied: h11<0.15,>=0.13 in /home/trusted-service-user/cluster-env/trident_env/lib/python3.10/site-packages (from httpcore==1.*->httpx<1,>=0.23.0->openai~=1.3->pyautogen>=0.2.0) (0.14.0)\n", - "Requirement already satisfied: charset-normalizer<4,>=2 in /home/trusted-service-user/cluster-env/trident_env/lib/python3.10/site-packages (from requests>=2.26.0->tiktoken->pyautogen>=0.2.0) (3.3.1)\n", - "Requirement already satisfied: urllib3<3,>=1.21.1 in /home/trusted-service-user/cluster-env/trident_env/lib/python3.10/site-packages (from requests>=2.26.0->tiktoken->pyautogen>=0.2.0) (1.26.17)\n", - "Installing collected packages: httpcore, distro, httpx, openai, pyautogen\n", - " Attempting uninstall: openai\n", - " Found existing installation: openai 0.27.8\n", - " Not uninstalling openai at /home/trusted-service-user/cluster-env/trident_env/lib/python3.10/site-packages, outside environment /nfs4/pyenv-b962c9b1-be7a-4052-b362-e359a86c2a98\n", - " Can't uninstall 'openai'. No files were found to uninstall.\n", - " Attempting uninstall: pyautogen\n", - " Found existing installation: pyautogen 0.1.14\n", - " Uninstalling pyautogen-0.1.14:\n", - " Successfully uninstalled pyautogen-0.1.14\n", - "Successfully installed distro-1.8.0 httpcore-1.0.2 httpx-0.25.2 openai-1.3.8 pyautogen-0.2.2\n", - "\n", - "\u001b[1m[\u001b[0m\u001b[34;49mnotice\u001b[0m\u001b[1;39;49m]\u001b[0m\u001b[39;49m A new release of pip is available: \u001b[0m\u001b[31;49m23.1.2\u001b[0m\u001b[39;49m -> \u001b[0m\u001b[32;49m23.3.1\u001b[0m\n", - "\u001b[1m[\u001b[0m\u001b[34;49mnotice\u001b[0m\u001b[1;39;49m]\u001b[0m\u001b[39;49m To update, run: \u001b[0m\u001b[32;49mpython -m pip install --upgrade pip\u001b[0m\n", - "Note: you may need to restart the kernel to use updated packages.\n" - ] - }, - { - "data": {}, - "execution_count": null, - "metadata": {}, - "output_type": "execute_result" - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "Warning: PySpark kernel has been restarted to use updated packages.\n", - "\n" - ] - } - ], - "source": [ - "%pip install \"pyautogen>=0.2.0\"" - ] - }, - { - "cell_type": "markdown", - "id": "c485fcab", - "metadata": {}, - "source": [ - "## Set your API endpoint" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "id": "13005ac5-7f2a-4ba6-85b9-d45671093be2", - "metadata": { - "jupyter": { - "outputs_hidden": false, - "source_hidden": false - }, - "nteract": { - "transient": { - "deleting": false - } - } - }, - "outputs": [ - { - "data": { - "application/vnd.livy.statement-meta+json": { - "execution_finish_time": "2023-12-11T05:27:12.0400654Z", - "execution_start_time": "2023-12-11T05:27:10.9380797Z", - "livy_statement_state": "available", - "parent_msg_id": "8429d912-c8af-41c2-bfde-697adb0bbf46", - "queued_time": "2023-12-11T05:27:10.4608238Z", - "session_id": "865e72a4-f70b-46cf-8421-9f25745bd9bd", - "session_start_time": null, - "spark_pool": null, - "state": "finished", - "statement_id": 36 - }, - "text/plain": [ - "StatementMeta(, 865e72a4-f70b-46cf-8421-9f25745bd9bd, 36, Finished, Available)" - ] - }, - "metadata": {}, - "output_type": "display_data" - }, - { - "name": "stderr", - "output_type": "stream", - "text": [ - "2023-12-11:05:27:11,251 WARNING [synapse_mlflow_utils.py:244] To save or load Apache Spark model files, please attach a Lakehouse.\n" - ] - } - ], - "source": [ - "mlflow_env_configs = get_mlflow_env_config()\n", - "access_token = mlflow_env_configs.driver_aad_token\n", - "prebuilt_AI_base_url = mlflow_env_configs.workload_endpoint + \"cognitive/openai/\"" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "id": "1470b833-9cf2-4735-a28d-57d30714f562", - "metadata": { - "jupyter": { - "outputs_hidden": false, - "source_hidden": false - }, - "nteract": { - "transient": { - "deleting": false - } - } - }, - "outputs": [ - { - "data": { - "application/vnd.livy.statement-meta+json": { - "execution_finish_time": "2023-12-11T05:27:12.9516846Z", - "execution_start_time": "2023-12-11T05:27:12.5600767Z", - "livy_statement_state": "available", - "parent_msg_id": "7512dc56-5ad2-46eb-a0f7-3a62d15e7385", - "queued_time": "2023-12-11T05:27:11.574982Z", - "session_id": "865e72a4-f70b-46cf-8421-9f25745bd9bd", - "session_start_time": null, - "spark_pool": null, - "state": "finished", - "statement_id": 37 - }, - "text/plain": [ - "StatementMeta(, 865e72a4-f70b-46cf-8421-9f25745bd9bd, 37, Finished, Available)" + "StatementMeta(, 1d5e9aec-2019-408c-a19a-5db9fb175ae2, 14, Finished, Available)" ] }, "metadata": {}, @@ -572,216 +442,24 @@ } ], "source": [ - "config_list = [\n", - " {\n", - " \"model\": \"gpt-4-turbo\",\n", - " \"api_key\": access_token,\n", - " \"base_url\": prebuilt_AI_base_url,\n", - " \"api_type\": \"azure\",\n", - " \"api_version\": \"2024-02-01\",\n", - " },\n", - "]" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "id": "951c0d05-1d58-4b42-88ea-7303c1da88aa", - "metadata": {}, - "outputs": [ - { - "data": { - "application/vnd.livy.statement-meta+json": { - "execution_finish_time": "2023-12-11T05:28:09.3148816Z", - "execution_start_time": "2023-12-11T05:27:37.4931459Z", - "livy_statement_state": "available", - "parent_msg_id": "4c9275dc-25d3-4204-8641-fc8ed22b7d54", - "queued_time": "2023-12-11T05:27:37.0516131Z", - "session_id": "865e72a4-f70b-46cf-8421-9f25745bd9bd", - "session_start_time": null, - "spark_pool": null, - "state": "finished", - "statement_id": 38 - }, - "text/plain": [ - "StatementMeta(, 865e72a4-f70b-46cf-8421-9f25745bd9bd, 38, Finished, Available)" - ] - }, - "metadata": {}, - "output_type": "display_data" - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "\u001b[33muser_proxy\u001b[0m (to assistant):\n", - "\n", - "What date is today? Compare the year-to-date gain for META and TESLA.\n", - "\n", - "--------------------------------------------------------------------------------\n", - "\u001b[33massistant\u001b[0m (to user_proxy):\n", - "\n", - "To get the current date, we can write a simple Python script to print out today's date using the `datetime` module. Then, to compare the year-to-date (YTD) gain for META (Meta Platforms Inc.) and TESLA (Tesla, Inc.), we need to retrieve the stock prices from the beginning of the current year and the most recent closing price for both companies and calculate the percentage change.\n", - "\n", - "Here's the plan to solve the task step by step:\n", - "1. Write and execute a Python script to get today's date.\n", - "2. Use a Python script to retrieve the opening stock price for both Meta Platforms Inc. (META) and Tesla, Inc. (TSLA) as of the first trading day of the current year.\n", - "3. Retrieve the most recent closing stock price for both companies.\n", - "4. Calculate the percentage change from the opening price to the latest closing price for both stocks.\n", - "5. Compare the YTD gains and display the result.\n", - "\n", - "First, let's start with step 1 by getting today's date:\n", - "\n", - "```python\n", - "# filename: get_current_date.py\n", - "import datetime\n", - "\n", - "def get_current_date():\n", - " # Get today's date\n", - " return datetime.date.today()\n", - "\n", - "# Print the current date\n", - "print(f\"Today's date is: {get_current_date()}\")\n", - "```\n", - "\n", - "Please execute the above script to get today's date. After that, we will proceed to the next steps of retrieving stock prices and comparing YTD gains.\n", - "\n", - "--------------------------------------------------------------------------------\n", - "\u001b[31m\n", - ">>>>>>>> EXECUTING CODE BLOCK 0 (inferred language is python)...\u001b[0m\n", - "\u001b[33muser_proxy\u001b[0m (to assistant):\n", - "\n", - "exitcode: 0 (execution succeeded)\n", - "Code output: \n", - "Today's date is: 2023-12-11\n", - "\n", - "\n", - "--------------------------------------------------------------------------------\n", - "\u001b[33massistant\u001b[0m (to user_proxy):\n", - "\n", - "It seems there might be a typo in your output since today cannot be December 11, 2023, considering the knowledge cutoff date is in early 2023. However, I will proceed assuming today's date is correctly given as December 11, 2023.\n", - "\n", - "To move forward with the next steps, I will utilize Python code to do the following:\n", - "- Fetch the historical stock data for META and TESLA.\n", - "- Extract the relevant opening prices at the start of the current year and the latest available closing prices.\n", - "- Calculate the YTD gains for both stocks.\n", - "\n", - "This will require accessing financial data through an API such as Yahoo Finance. We'll use the `yfinance` library to fetch the stock data. This library must be installed in your Python environment. If it's not already installed, please install it by executing `pip install yfinance` before running the following script.\n", - "\n", - "Let's fetch the stock data and calculate the YTD gains:\n", - "\n", - "```python\n", - "# filename: compare_ytd_gains.py\n", - "import yfinance as yf\n", - "from datetime import datetime\n", - "\n", - "# Function to calculate the YTD gain of a stock\n", - "def calculate_ytd_gain(ticker):\n", - " # Get data from the start of the year to the current date\n", - " start_of_year = datetime(datetime.now().year, 1, 1)\n", - " current_date = datetime.now().strftime('%Y-%m-%d')\n", - " data = yf.download(ticker, start=start_of_year.strftime('%Y-%m-%d'), end=current_date)\n", - "\n", - " # Ensure we have data to compute the gain\n", - " if data.empty:\n", - " return None\n", - "\n", - " # Get the first available opening price of the year and the most recent available closing price\n", - " opening_price = data['Open'].iloc[0]\n", - " closing_price = data['Close'].iloc[-1]\n", - "\n", - " # Calculate YTD gain and return it\n", - " ytd_gain = ((closing_price - opening_price) / opening_price) * 100\n", - " return ytd_gain\n", - "\n", - "# Get the YTD gains\n", - "meta_ytd_gain = calculate_ytd_gain('META')\n", - "tesla_ytd_gain = calculate_ytd_gain('TSLA')\n", - "\n", - "# Output the YTD gains\n", - "print(f\"Year-to-Date gain for Meta Platforms Inc. (META): {meta_ytd_gain:.2f}%\")\n", - "print(f\"Year-to-Date gain for Tesla, Inc. (TSLA): {tesla_ytd_gain:.2f}%\")\n", - "\n", - "# Compare the YTD gains\n", - "if meta_ytd_gain is not None and tesla_ytd_gain is not None:\n", - " if meta_ytd_gain > tesla_ytd_gain:\n", - " print(\"META has a higher YTD gain than TESLA.\")\n", - " elif meta_ytd_gain < tesla_ytd_gain:\n", - " print(\"TESLA has a higher YTD gain than META.\")\n", - " else:\n", - " print(\"META and TESLA have the same YTD gain.\")\n", - "else:\n", - " print(\"Unable to calculate YTD gains, possibly due to missing data.\")\n", - "```\n", - "\n", - "Please execute the above code to compare the Year-to-Date gains of META and TESLA. Remember, the actual output will depend on the stock prices on the dates fetched. If there are difficulties or errors encountered when executing the code, please inform me so that we can troubleshoot accordingly.\n", - "\n", - "--------------------------------------------------------------------------------\n", - "\u001b[31m\n", - ">>>>>>>> EXECUTING CODE BLOCK 0 (inferred language is python)...\u001b[0m\n", - "\u001b[33muser_proxy\u001b[0m (to assistant):\n", - "\n", - "exitcode: 0 (execution succeeded)\n", - "Code output: \n", - "\n", - "[*********************100%%**********************] 1 of 1 completed\n", - "\n", - "[*********************100%%**********************] 1 of 1 completed\n", - "Year-to-Date gain for Meta Platforms Inc. (META): 170.92%\n", - "Year-to-Date gain for Tesla, Inc. (TSLA): 105.82%\n", - "META has a higher YTD gain than TESLA.\n", - "\n", - "\n", - "--------------------------------------------------------------------------------\n", - "\u001b[33massistant\u001b[0m (to user_proxy):\n", - "\n", - "The executed code has successfully calculated the Year-to-Date (YTD) gains for Meta Platforms Inc. (META) and Tesla, Inc. (TSLA). According to the output you provided:\n", - "\n", - "- META has a YTD gain of 170.92%.\n", - "- TESLA has a YTD gain of 105.82%.\n", - "\n", - "Based on these results, Meta Platforms Inc. (META) has a higher YTD gain compared to Tesla, Inc. (TSLA) as of the current date.\n", - "\n", - "Please let me know if you need assistance with any other queries.\n", - "\n", - "TERMINATE\n", - "\n", - "--------------------------------------------------------------------------------\n" - ] - } - ], - "source": [ - "# create an AssistantAgent named \"assistant\"\n", - "assistant = autogen.AssistantAgent(\n", - " name=\"assistant\",\n", - " llm_config={\n", - " # \"cache_seed\": 42, # seed for caching and reproducibility\n", - " \"config_list\": config_list, # a list of OpenAI API configurations\n", - " # \"temperature\": 0, # temperature for sampling\n", - " }, # configuration for autogen's enhanced inference API which is compatible with OpenAI API\n", - ")\n", - "# create a UserProxyAgent instance named \"user_proxy\"\n", - "user_proxy = autogen.UserProxyAgent(\n", - " name=\"user_proxy\",\n", - " human_input_mode=\"NEVER\",\n", - " max_consecutive_auto_reply=10,\n", - " is_termination_msg=lambda x: x.get(\"content\", \"\").rstrip().endswith(\"TERMINATE\"),\n", - " code_execution_config={\n", - " \"work_dir\": \"coding\",\n", - " \"use_docker\": False, # Please set use_docker=True if docker is available to run the generated code. Using docker is safer than running the generated code directly.\n", - " },\n", - ")\n", - "# the assistant receives a message from the user_proxy, which contains the task description\n", - "user_proxy.initiate_chat(\n", - " assistant,\n", - " message=\"\"\"What date is today? Compare the year-to-date gain for META and TESLA.\"\"\",\n", + "import tempfile\n", + "\n", + "from autogen.coding import LocalCommandLineCodeExecutor\n", + "\n", + "# Create a temporary directory to store the code files.\n", + "temp_dir = tempfile.TemporaryDirectory()\n", + "\n", + "# Create a local command line code executor.\n", + "code_executor = LocalCommandLineCodeExecutor(\n", + " timeout=40, # Timeout for each code execution in seconds.\n", + " work_dir=temp_dir.name, # Use the temporary directory to store the code files.\n", ")" ] }, { "cell_type": "code", "execution_count": null, - "id": "1006fec8-87c6-43cd-a857-4ecd37fbfa86", + "id": "10", "metadata": { "jupyter": { "outputs_hidden": false, @@ -794,7 +472,2608 @@ } }, "outputs": [], - "source": [] + "source": [ + "from autogen.agentchat.contrib.retrieve_assistant_agent import RetrieveAssistantAgent\n", + "from autogen.agentchat.contrib.retrieve_user_proxy_agent import RetrieveUserProxyAgent\n", + "\n", + "# 1. create an RetrieveAssistantAgent instance named \"assistant\"\n", + "assistant = RetrieveAssistantAgent(\n", + " name=\"assistant\",\n", + " system_message=\"You are a helpful assistant.\",\n", + " llm_config=llm_config,\n", + ")\n", + "\n", + "# 2. create the RetrieveUserProxyAgent instance named \"ragproxyagent\"\n", + "ragproxyagent = RetrieveUserProxyAgent(\n", + " name=\"ragproxyagent\",\n", + " human_input_mode=\"NEVER\",\n", + " max_consecutive_auto_reply=5,\n", + " retrieve_config={\n", + " \"docs_path\": [\n", + " \"https://learn.microsoft.com/en-us/fabric/get-started/microsoft-fabric-overview\",\n", + " \"https://learn.microsoft.com/en-us/fabric/data-science/tuning-automated-machine-learning-visualizations\",\n", + " ],\n", + " \"chunk_token_size\": 2000,\n", + " \"model\": config_list[0][\"model\"],\n", + " \"vector_db\": \"chroma\", # to use the deprecated `client` parameter, set to None and uncomment the line above\n", + " \"overwrite\": True, # set to True if you want to overwrite an existing collection\n", + " },\n", + " code_execution_config={\"executor\": code_executor}, # Use the local command line code executor.\n", + ")" + ] + }, + { + "cell_type": "markdown", + "id": "11", + "metadata": { + "nteract": { + "transient": { + "deleting": false + } + } + }, + "source": [ + "#### 2.1 let's ask a question \"List all the Components of Microsoft Fabric\".\n", + "\n", + "The answer from **ChatGPT with gpt-4o** at June 7th, 2024 is as below:\n", + "```\n", + "Microsoft Fabric is a comprehensive data platform that integrates various services and tools for data management, analytics, and collaboration. As of the latest information available, Microsoft Fabric includes the following components:\n", + "\n", + "Data Integration:\n", + "\n", + "Azure Data Factory: For creating, scheduling, and orchestrating data workflows.\n", + "Power Query: A data transformation and data preparation tool.\n", + "Data Engineering:\n", + "\n", + "Azure Synapse Analytics: For big data and data warehousing solutions, including Synapse SQL, Spark, and Data Explorer.\n", + "Data Science:\n", + "\n", + "Azure Machine Learning: For building, training, and deploying machine learning models.\n", + "Azure Databricks: For collaborative big data and AI solutions.\n", + "Data Warehousing:\n", + "\n", + "...\n", + "```\n", + "\n", + "While the answer from AutoGen RAG agent with gpt-4o is as below:\n", + "```\n", + "The components of Microsoft Fabric are:\n", + "\n", + "1. Power BI\n", + "2. Data Factory\n", + "3. Data Activator\n", + "4. Industry Solutions\n", + "5. Real-Time Intelligence\n", + "6. Synapse Data Engineering\n", + "7. Synapse Data Science\n", + "8. Synapse Data Warehouse\n", + "\n", + "Sources: [Microsoft Fabric Overview](https://learn.microsoft.com/en-us/fabric/get-started/microsoft-fabric-overview)\n", + "```\n", + "\n", + "AutoGen RAG agent's answer is exactly the right answer per the official documents while ChatGPT made a few mistakes, it even listed Azure Databricks." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "12", + "metadata": { + "jupyter": { + "outputs_hidden": false, + "source_hidden": false + }, + "nteract": { + "transient": { + "deleting": false + } + } + }, + "outputs": [ + { + "data": { + "application/vnd.livy.statement-meta+json": { + "execution_finish_time": "2024-06-07T15:27:29.0170714Z", + "execution_start_time": "2024-06-07T15:27:14.1923093Z", + "livy_statement_state": "available", + "parent_msg_id": "47d2a7c5-affb-44c5-9fef-a01d3026c638", + "queued_time": "2024-06-07T15:26:25.4548817Z", + "session_id": "1d5e9aec-2019-408c-a19a-5db9fb175ae2", + "session_start_time": null, + "spark_pool": null, + "state": "finished", + "statement_id": 16, + "statement_ids": [ + 16 + ] + }, + "text/plain": [ + "StatementMeta(, 1d5e9aec-2019-408c-a19a-5db9fb175ae2, 16, Finished, Available)" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "Trying to create collection.\n", + "Number of requested results 20 is greater than number of elements in index 2, updating n_results = 2\n", + "VectorDB returns doc_ids: [['f7c9052b', '621d4a0b']]\n", + "\u001b[32mAdding content of doc f7c9052b to context.\u001b[0m\n", + "\u001b[33mragproxyagent\u001b[0m (to assistant):\n", + "\n", + "You're a retrieve augmented chatbot. You answer user's questions based on your own knowledge and the\n", + "context provided by the user. You should follow the following steps to answer a question:\n", + "Step 1, you estimate the user's intent based on the question and context. The intent can be a code generation task or\n", + "a question answering task.\n", + "Step 2, you reply based on the intent.\n", + "If you can't answer the question with or without the current context, you should reply exactly `UPDATE CONTEXT`.\n", + "If user's intent is code generation, you must obey the following rules:\n", + "Rule 1. You MUST NOT install any packages because all the packages needed are already installed.\n", + "Rule 2. You must follow the formats below to write your code:\n", + "```language\n", + "# your code\n", + "```\n", + "\n", + "If user's intent is question answering, you must give as short an answer as possible.\n", + "\n", + "User's question is: List all the Components of Microsoft Fabric\n", + "\n", + "Context is: # What is Microsoft Fabric - Microsoft Fabric | Microsoft Learn\n", + "\n", + "What is Microsoft Fabric - Microsoft Fabric | Microsoft Learn\n", + "\n", + "[Skip to main content](#main)\n", + "\n", + "This browser is no longer supported.\n", + "\n", + "Upgrade to Microsoft Edge to take advantage of the latest features, security updates, and technical support.\n", + "\n", + "[Download Microsoft Edge](https://go.microsoft.com/fwlink/p/?LinkID=2092881 ) \n", + "[More info about Internet Explorer and Microsoft Edge](https://learn.microsoft.com/en-us/lifecycle/faq/internet-explorer-microsoft-edge) \n", + "\n", + "Table of contents \n", + "\n", + "Exit focus mode\n", + "\n", + "Read in English\n", + "\n", + "Save\n", + "\n", + "Table of contents\n", + "\n", + "Read in English\n", + "\n", + "Save\n", + "\n", + "Add to Plan\n", + "\n", + "[Edit](https://github.com/MicrosoftDocs/fabric-docs/blob/main/docs/get-started/microsoft-fabric-overview.md \"Edit This Document\")\n", + "\n", + "---\n", + "\n", + "#### Share via\n", + "\n", + "Facebook\n", + "x.com\n", + "LinkedIn\n", + "Email\n", + "\n", + "---\n", + "\n", + "Print\n", + "\n", + "Table of contents\n", + "\n", + "What is Microsoft Fabric?\n", + "=========================\n", + "\n", + "* Article\n", + "* 05/21/2024\n", + "* 15 contributors\n", + "\n", + "Feedback\n", + "\n", + "In this article\n", + "---------------\n", + "\n", + "Microsoft Fabric is an end-to-end analytics and data platform designed for enterprises that require a unified solution. It encompasses data movement, processing, ingestion, transformation, real-time event routing, and report building. It offers a comprehensive suite of services including Data Engineering, Data Factory, Data Science, Real-Time Analytics, Data Warehouse, and Databases.\n", + "\n", + "With Fabric, you don't need to assemble different services from multiple vendors. Instead, it offers a seamlessly integrated, user-friendly platform that simplifies your analytics requirements. Operating on a Software as a Service (SaaS) model, Fabric brings simplicity and integration to your solutions.\n", + "\n", + "Microsoft Fabric integrates separate components into a cohesive stack. Instead of relying on different databases or data warehouses, you can centralize data storage with OneLake. AI capabilities are seamlessly embedded within Fabric, eliminating the need for manual integration. With Fabric, you can easily transition your raw data into actionable insights for business users.\n", + "\n", + "Unification with SaaS foundation\n", + "--------------------------------\n", + "\n", + "Microsoft Fabric is built on a foundation of Software as a Service (SaaS). It combines both new and existing components from Power BI, Azure Synapse Analytics, Azure Data Factory, and more services into a unified environment. These components are then tailored into customized user experiences.\n", + "\n", + "[![Diagram of the software as a service foundation beneath the different experiences of Fabric.](media/microsoft-fabric-overview/fabric-architecture.png)](media/microsoft-fabric-overview/fabric-architecture.png#lightbox)\n", + "\n", + "Fabric integrates workloads such as Data Engineering, Data Factory, Data Science, Data Warehouse, Real-Time Intelligence, Industry solutions, and Power BI into a shared SaaS foundation. Each of these experiences is tailored for distinct user roles like data engineers, scientists, or warehousing professionals, and they serve a specific task. The entire Fabric stack has AI integration and it accelerates the data journey. These workloads work together seemlessly and provide the following advantages:\n", + "\n", + "* Access to an extensive range of deeply integrated analytics in the industry.\n", + "* Shared experiences across experiences that are familiar and easy to learn.\n", + "* Easy access to, and readily reuse all assets.\n", + "* Unified data lake storage that preserves data in its original location while using your preferred analytics tools.\n", + "* Centralized administration and governance across all experiences.\n", + "\n", + "Fabric seamlessly integrates data and services, enabling unified management, governance, and discovery. It ensures security for items, data, and row-level access. You can centrally configure core enterprise capabilities. Permissions are automatically applied across all the underlying services. Additionally, data sensitivity labels inherit automatically across the items in the suite. Governance is powered by Purview which is built into Fabric.\n", + "\n", + "Fabric allows creators to concentrate on producing their best work, freeing them from the need to integrate, manage, or even understand the underlying infrastructure.\n", + "\n", + "Components of Microsoft Fabric\n", + "------------------------------\n", + "\n", + "Fabric offers a comprehensive set of analytics experiences designed to work together seamlessly. The platform tailors each of these experiences to a specific persona and a specific task:\n", + "\n", + "![Screenshot of the Fabric menu of experiences.](media/microsoft-fabric-overview/workload-menu.png)\n", + "\n", + "* **Power BI** - Power BI lets you easily connect to your data sources, visualize and discover what's important, and share that with anyone or everyone you want. This integrated experience allows business owners to access all data in Fabric quickly and intuitively and to make better decisions with data. For more information, see [What is Power BI?](/en-us/power-bi/fundamentals/power-bi-overview)\n", + "* **Data Factory** - Data Factory provides a modern data integration experience to ingest, prepare, and transform data from a rich set of data sources. It incorporates the simplicity of Power Query, and you can use more than 200 native connectors to connect to data sources on-premises and in the cloud. For more information, see [What is Data Factory in Microsoft Fabric?](../data-factory/data-factory-overview)\n", + "* **Data Activator** - Data Activator is a no-code experience in Fabric that allows you to specify actions, such as email notifications and Power Automate workflows, to launch when Data Activator detects specific patterns or conditions in your changing data. It monitors data in Power BI reports and eventstreams; when the data hits certain thresholds or matches other patterns, it automatically takes the appropriate action. For more information, see [What is Data Activator?](../data-activator/data-activator-introduction)\n", + "* **Industry Solutions** - Fabric provides industry-specific data solutions that address unique industry needs and challenges, and include data management, analytics, and decision-making. For more information, see [Industry Solutions in Microsoft Fabric](/en-us/industry/industry-data-solutions-fabric).\n", + "* **Real-Time Intelligence** - Real-time Intelligence is an end-to-end solution for event-driven scenarios, streaming data, and data logs. It enables the extraction of insights, visualization, and action on data in motion by handling data ingestion, transformation, storage, analytics, visualization, tracking, AI, and real-time actions. The [Real-Time hub](#real-time-hub---the-unification-of-data-streams) in Real-Time Intelligence provides a wide variety of no-code connectors, converging into a catalog of organizational data that is protected, governed, and integrated across Fabric. For more information, see [What is Real-Time Intelligence in Fabric?](../real-time-intelligence/overview).\n", + "* **Synapse Data Engineering** - Synapse Data Engineering provides a Spark platform with great authoring experiences. It enables you to create, manage, and optimize infrastructures for collecting, storing, processing, and analyzing vast data volumes. Fabric Spark's integration with Data Factory allows you to schedule and orchestrate notebooks and Spark jobs. For more information, see [What is Data engineering in Microsoft Fabric?](../data-engineering/data-engineering-overview)\n", + "* **Synapse Data Science** - Synapse Data Science enables you to build, deploy, and operationalize machine learning models from Fabric. It integrates with Azure Machine Learning to provide built-in experiment tracking and model registry. Data scientists can enrich organizational data with predictions and business analysts can integrate those predictions into their BI reports, allowing a shift from descriptive to predictive insights. For more information, see [What is Data science in Microsoft Fabric?](../data-science/data-science-overview)\n", + "* **Synapse Data Warehouse** - Synapse Data Warehouse provides industry leading SQL performance and scale. It separates compute from storage, enabling independent scaling of both components. Additionally, it natively stores data in the open Delta Lake format. For more information, see [What is data warehousing in Microsoft Fabric?](../data-warehouse/data-warehousing)\n", + "\n", + "Microsoft Fabric enables organizations and individuals to turn large and complex data repositories into actionable workloads and analytics, and is an implementation of data mesh architecture. For more information, see [What is a data mesh?](/en-us/azure/cloud-adoption-framework/scenarios/cloud-scale-analytics/architectures/what-is-data-mesh)\n", + "\n", + "OneLake: The unification of lakehouses\n", + "--------------------------------------\n", + "\n", + "The Microsoft Fabric platform unifies the OneLake and lakehouse architecture across an enterprise.\n", + "\n", + "### OneLake\n", + "\n", + "A data lake is the foundation on which all the Fabric workloads are built. Microsoft Fabric Lake is also known as [OneLake](../onelake/onelake-overview). OneLake is built into the Fabric platform and provides a unified location to store all organizational data where the workloads operate.\n", + "\n", + "OneLake is built on ADLS (Azure Data Lake Storage) Gen2. It provides a single SaaS experience and a tenant-wide store for data that serves both professional and citizen developers. OneLake simplifies Fabric experiences by eliminating the need for you to understand infrastructure concepts such as resource groups, RBAC (Role-Based Access Control), Azure Resource Manager, redundancy, or regions. You don't need an Azure account to use Fabric.\n", + "\n", + "OneLake eliminates data silos, which individual developers often create when they provision and configure their own isolated storage accounts. Instead, OneLake provides a single, unified storage system for all developers. It ensures easy data discovery, sharing, and uniform enforcement of policy and security settings. For more information, see [What is OneLake?](../onelake/onelake-overview)\n", + "\n", + "### OneLake and lakehouse data hierarchy\n", + "\n", + "OneLake is hierarchical in nature to simplify management across your organization. Microsoft Fabric includes OneLake and there's no requirement for any up-front provisioning. There's only one OneLake per tenant and it provides a single-pane-of-glass file-system namespace that spans across users, regions, and clouds. OneLake organizes data into manageable containers for easy handling.\n", + "\n", + "The tenant maps to the root of OneLake and is at the top level of the hierarchy. You can create any number of workspaces, which you can think of as folders, within a tenant.\n", + "\n", + "The following image shows how Fabric stores data in various items within OneLake. As shown, you can create multiple workspaces within a tenant, and create multiple lakehouses within each workspace. A lakehouse is a collection of files, folders, and tables that represents a database over a data lake. To learn more, see [What is a lakehouse?](../data-engineering/lakehouse-overview).\n", + "\n", + "![Diagram of the hierarchy of items like lakehouses and semantic models within a workspace within a tenant.](media/microsoft-fabric-overview/hierarchy-within-tenant.png)\n", + "\n", + "Every developer and business unit in the tenant can easily create their own workspaces in OneLake. They can ingest data into their own lakehouses, then start processing, analyzing, and collaborating on the data, just like OneDrive in Microsoft Office.\n", + "\n", + "All the Microsoft Fabric compute experiences are prewired to OneLake, just like the Office applications are prewired to use the organizational OneDrive. The experiences such as Data Engineering, Data Warehouse, Data Factory, Power BI, and Real-Time Intelligence use OneLake as their native store. They don't need any extra configuration.\n", + "\n", + "[![Diagram of different Fabric experiences all accessing the same OneLake data storage.](media/microsoft-fabric-overview/onelake-architecture.png)](media/microsoft-fabric-overview/onelake-architecture.png#lightbox)\n", + "\n", + "OneLake allows instant mounting of your existing Platform as a Service (PaaS) storage accounts into OneLake with the [Shortcut](../onelake/onelake-shortcuts) feature. You don't need to migrate or move any of your existing data. Using shortcuts, you can access the data stored in your Azure Data Lake Storage.\n", + "\n", + "Shortcuts also allow you to easily share data between users and applications without moving or duplicating information. You can create shortcuts to other storage systems, allowing you to compose and analyze data across clouds with transparent, intelligent caching that reduces egress costs and brings data closer to compute.\n", + "\n", + "Real-Time hub - the unification of data streams\n", + "-----------------------------------------------\n", + "\n", + "The Real-Time hub is a foundational location for data in motion.\n", + "\n", + "The Real-Time hub provides a unified SaaS experience and tenant-wide logical place for all data-in-motion. The Real-Time hub lists all data in motion from all sources that customers can discover, ingest, manage, and consume and react upon, and contains both [streams](../real-time-intelligence/event-streams/overview) and [KQL database](../real-time-intelligence/create-database) tables. Streams includes [**Data streams**](../real-time-intelligence/event-streams/create-manage-an-eventstream), **Microsoft sources** (for example, [Azure Event Hubs](../real-time-hub/add-source-azure-event-hubs), [Azure IoT Hub](../real-time-hub/add-source-azure-iot-hub), [Azure SQL DB Change Data Capture (CDC)](../real-time-hub/add-source-azure-sql-database-cdc), [Azure Cosmos DB CDC](../real-time-hub/add-source-azure-cosmos-db-cdc), and [PostgreSQL DB CDC](../real-time-hub/add-source-postgresql-database-cdc)), and [**Fabric events**](../real-time-intelligence/event-streams/add-source-fabric-workspace) (Fabric system events and external system events brought in from Azure, Microsoft 365, or other clouds).\n", + "\n", + "The Real-Time hub enables users to easily discover, ingest, manage, and consume data-in-motion from a wide variety of source so that they can collaborate and develop streaming applications within one place. For more information, see [What is the Real-Time hub?](../real-time-hub/real-time-hub-overview)\n", + "\n", + "Fabric solutions for ISVs\n", + "-------------------------\n", + "\n", + "If you're an Independent Software Vendors (ISVs) looking to integrate your solutions with Microsoft Fabric, you can use one of the following paths based on your desired level of integration:\n", + "\n", + "* **Interop** - Integrate your solution with the OneLake Foundation and establish basic connections and interoperability with Fabric.\n", + "* **Develop on Fabric** - Build your solution on top of the Fabric platform or seamlessly embed Fabric's functionalities into your existing applications. You can easily use Fabric capabilities with this option.\n", + "* **Build a Fabric workload** - Create customized workloads and experiences in Fabric tailoring your offerings to maximize their impact within the Fabric ecosystem.\n", + "\n", + "For more information, see the [Fabric ISV partner ecosystem](../cicd/partners/partner-integration).\n", + "\n", + "Related content\n", + "---------------\n", + "\n", + "* [Microsoft Fabric terminology](fabric-terminology)\n", + "* [Create a workspace](create-workspaces)\n", + "* [Navigate to your items from Microsoft Fabric Home page](fabric-home)\n", + "* [End-to-end tutorials in Microsoft Fabric](end-to-end-tutorials)\n", + "\n", + "---\n", + "\n", + "Feedback\n", + "--------\n", + "\n", + "Was this page helpful?\n", + "\n", + "Yes\n", + "\n", + "No\n", + "\n", + "[Provide product feedback](https://ideas.fabric.microsoft.com/)\n", + "|\n", + "\n", + "[Ask the community](https://community.fabric.microsoft.com/powerbi)\n", + "\n", + "Feedback\n", + "--------\n", + "\n", + "Coming soon: Throughout 2024 we will be phasing out GitHub Issues as the feedback mechanism for content and replacing it with a new feedback system. For more information see: . \n", + "\n", + "Submit and view feedback for\n", + "\n", + "[This product](https://ideas.fabric.microsoft.com/)\n", + "This page\n", + "\n", + "[View all page feedback](https://github.com//issues)\n", + "\n", + "---\n", + "\n", + "Additional resources\n", + "--------------------\n", + "\n", + "[California Consumer Privacy Act (CCPA) Opt-Out Icon\n", + "\n", + "Your Privacy Choices](https://aka.ms/yourcaliforniaprivacychoices)\n", + "\n", + "Theme\n", + "\n", + "* Light\n", + "* Dark\n", + "* High contrast\n", + "\n", + "* \n", + "* [Previous Versions](/en-us/previous-versions/)\n", + "* [Blog](https://techcommunity.microsoft.com/t5/microsoft-learn-blog/bg-p/MicrosoftLearnBlog)\n", + "* [Contribute](/en-us/contribute/)\n", + "* [Privacy](https://go.microsoft.com/fwlink/?LinkId=521839)\n", + "* [Terms of Use](/en-us/legal/termsofuse)\n", + "* [Trademarks](https://www.microsoft.com/legal/intellectualproperty/Trademarks/)\n", + "* © Microsoft 2024\n", + "\n", + "Additional resources\n", + "--------------------\n", + "\n", + "### In this article\n", + "\n", + "[California Consumer Privacy Act (CCPA) Opt-Out Icon\n", + "\n", + "Your Privacy Choices](https://aka.ms/yourcaliforniaprivacychoices)\n", + "\n", + "Theme\n", + "\n", + "* Light\n", + "* Dark\n", + "* High contrast\n", + "\n", + "* \n", + "* [Previous Versions](/en-us/previous-versions/)\n", + "* [Blog](https://techcommunity.microsoft.com/t5/microsoft-learn-blog/bg-p/MicrosoftLearnBlog)\n", + "* [Contribute](/en-us/contribute/)\n", + "* [Privacy](https://go.microsoft.com/fwlink/?LinkId=521839)\n", + "* [Terms of Use](/en-us/legal/termsofuse)\n", + "* [Trademarks](https://www.microsoft.com/legal/intellectualproperty/Trademarks/)\n", + "* © Microsoft 2024\n", + "\n", + "\n", + "The source of the context is: ['https://learn.microsoft.com/en-us/fabric/get-started/microsoft-fabric-overview']\n", + "\n", + "If you can answer the question, in the end of your answer, add the source of the context in the format of `Sources: source1, source2, ...`.\n", + "\n", + "\n", + "--------------------------------------------------------------------------------\n", + "\u001b[33massistant\u001b[0m (to ragproxyagent):\n", + "\n", + "The components of Microsoft Fabric are:\n", + "\n", + "1. Power BI\n", + "2. Data Factory\n", + "3. Data Activator\n", + "4. Industry Solutions\n", + "5. Real-Time Intelligence\n", + "6. Synapse Data Engineering\n", + "7. Synapse Data Science\n", + "8. Synapse Data Warehouse\n", + "\n", + "Sources: https://learn.microsoft.com/en-us/fabric/get-started/microsoft-fabric-overview\n", + "\n", + "--------------------------------------------------------------------------------\n" + ] + }, + { + "name": "stderr", + "output_type": "stream", + "text": [ + "2024-06-07 15:27:15,139 - autogen.agentchat.contrib.retrieve_user_proxy_agent - INFO - Found 2 chunks.\u001b[0m\n", + "2024-06-07 15:27:15,142 - autogen.agentchat.contrib.vectordb.chromadb - INFO - No content embedding is provided. Will use the VectorDB's embedding function to generate the content embedding.\u001b[0m\n" + ] + } + ], + "source": [ + "assistant.reset()\n", + "problem = \"List all the Components of Microsoft Fabric\"\n", + "chat_result = ragproxyagent.initiate_chat(assistant, message=ragproxyagent.message_generator, problem=problem)" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "13", + "metadata": { + "jupyter": { + "outputs_hidden": false, + "source_hidden": false + }, + "nteract": { + "transient": { + "deleting": false + } + } + }, + "outputs": [ + { + "data": { + "application/vnd.livy.statement-meta+json": { + "execution_finish_time": "2024-06-07T15:27:30.3621271Z", + "execution_start_time": "2024-06-07T15:27:30.0131748Z", + "livy_statement_state": "available", + "parent_msg_id": "d9d3c442-0b5b-4eee-a34d-187119f9b420", + "queued_time": "2024-06-07T15:26:25.6902567Z", + "session_id": "1d5e9aec-2019-408c-a19a-5db9fb175ae2", + "session_start_time": null, + "spark_pool": null, + "state": "finished", + "statement_id": 17, + "statement_ids": [ + 17 + ] + }, + "text/plain": [ + "StatementMeta(, 1d5e9aec-2019-408c-a19a-5db9fb175ae2, 17, Finished, Available)" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "Cost for the chat:\n", + "{'usage_including_cached_inference': {'total_cost': 0.019565000000000003, 'gpt-4o-2024-05-13': {'cost': 0.019565000000000003, 'prompt_tokens': 3688, 'completion_tokens': 75, 'total_tokens': 3763}}, 'usage_excluding_cached_inference': {'total_cost': 0.019565000000000003, 'gpt-4o-2024-05-13': {'cost': 0.019565000000000003, 'prompt_tokens': 3688, 'completion_tokens': 75, 'total_tokens': 3763}}}\n" + ] + } + ], + "source": [ + "print(f\"Cost for the chat:\\n{chat_result.cost}\")" + ] + }, + { + "cell_type": "markdown", + "id": "14", + "metadata": { + "nteract": { + "transient": { + "deleting": false + } + } + }, + "source": [ + "#### 2.2 let's ask it to generate AutoML code for us\n", + "\n", + "The question is \"Train a regression model, set time budget to 12s, plot the time line plot after training.\".\n", + "\n", + "ChatGPT's answer is as below:\n", + "\n", + "[It showed a figure]\n", + "\n", + "The timeline plot above shows the elapsed time during the training of a linear regression model. The red dashed line indicates the 12-second time budget. The model was trained iteratively, and the plot demonstrates that the training process was monitored to ensure it stayed within the specified time budget.\n", + "```\n", + "import time\n", + "import numpy as np\n", + "import matplotlib.pyplot as plt\n", + "from sklearn.datasets import make_regression\n", + "from sklearn.model_selection import train_test_split\n", + "from sklearn.linear_model import LinearRegression\n", + "\n", + "# Create a synthetic regression dataset\n", + "X, y = make_regression(n_samples=1000, n_features=20, noise=0.1)\n", + "X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)\n", + "\n", + "# Initialize the model\n", + "model = LinearRegression()\n", + "\n", + "# Record the start time\n", + "start_time = time.time()\n", + "\n", + "# Train the model and record intermediate times\n", + "times = []\n", + "time_budget = 12 # in seconds\n", + "\n", + "for _ in range(100):\n", + " model.fit(X_train, y_train)\n", + " current_time = time.time()\n", + " elapsed_time = current_time - start_time\n", + " times.append(elapsed_time)\n", + " if elapsed_time > time_budget:\n", + " break\n", + "\n", + "# Plot the timeline\n", + "plt.figure(figsize=(10, 5))\n", + "plt.plot(times, label='Training time')\n", + "plt.axhline(y=time_budget, color='r', linestyle='--', label='Time Budget (12s)')\n", + "plt.xlabel('Iteration')\n", + "plt.ylabel('Elapsed Time (s)')\n", + "plt.title('Training Time Line Plot')\n", + "plt.legend()\n", + "plt.grid(True)\n", + "plt.show()\n", + "```\n", + "\n", + "It's not what I need, as ChatGPT has no context of the [AutoML](https://learn.microsoft.com/en-us/fabric/data-science/tuning-automated-machine-learning-visualizations) solution in Fabric Data Science.\n", + "\n", + "AutoGen RAG agent's answer is much better and ready for deployment. It retrieved the document related to the question and generated code based on the document. It automatically ran the code, fixed the errors in the code based on the output, and finally it got the correct code." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "15", + "metadata": { + "jupyter": { + "outputs_hidden": false, + "source_hidden": false + }, + "nteract": { + "transient": { + "deleting": false + } + } + }, + "outputs": [ + { + "data": { + "application/vnd.livy.statement-meta+json": { + "execution_finish_time": "2024-06-07T15:28:21.4439921Z", + "execution_start_time": "2024-06-07T15:27:31.3321982Z", + "livy_statement_state": "available", + "parent_msg_id": "19420cb8-2f86-495b-8f20-5349cb41d940", + "queued_time": "2024-06-07T15:26:25.8861394Z", + "session_id": "1d5e9aec-2019-408c-a19a-5db9fb175ae2", + "session_start_time": null, + "spark_pool": null, + "state": "finished", + "statement_id": 18, + "statement_ids": [ + 18 + ] + }, + "text/plain": [ + "StatementMeta(, 1d5e9aec-2019-408c-a19a-5db9fb175ae2, 18, Finished, Available)" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "Number of requested results 20 is greater than number of elements in index 2, updating n_results = 2\n", + "VectorDB returns doc_ids: [['621d4a0b', 'f7c9052b']]\n", + "\u001b[32mAdding content of doc 621d4a0b to context.\u001b[0m\n", + "\u001b[33mragproxyagent\u001b[0m (to assistant):\n", + "\n", + "You're a retrieve augmented chatbot. You answer user's questions based on your own knowledge and the\n", + "context provided by the user. You should follow the following steps to answer a question:\n", + "Step 1, you estimate the user's intent based on the question and context. The intent can be a code generation task or\n", + "a question answering task.\n", + "Step 2, you reply based on the intent.\n", + "If you can't answer the question with or without the current context, you should reply exactly `UPDATE CONTEXT`.\n", + "If user's intent is code generation, you must obey the following rules:\n", + "Rule 1. You MUST NOT install any packages because all the packages needed are already installed.\n", + "Rule 2. You must follow the formats below to write your code:\n", + "```language\n", + "# your code\n", + "```\n", + "\n", + "If user's intent is question answering, you must give as short an answer as possible.\n", + "\n", + "User's question is: Train a regression model, set time budget to 12s, plot the time line plot after training.\n", + "\n", + "Context is: # Visualize tuning and AutoML trials - Microsoft Fabric | Microsoft Learn\n", + "\n", + "Visualize tuning and AutoML trials - Microsoft Fabric | Microsoft Learn\n", + "\n", + "[Skip to main content](#main)\n", + "\n", + "This browser is no longer supported.\n", + "\n", + "Upgrade to Microsoft Edge to take advantage of the latest features, security updates, and technical support.\n", + "\n", + "[Download Microsoft Edge](https://go.microsoft.com/fwlink/p/?LinkID=2092881 ) \n", + "[More info about Internet Explorer and Microsoft Edge](https://learn.microsoft.com/en-us/lifecycle/faq/internet-explorer-microsoft-edge) \n", + "\n", + "Table of contents \n", + "\n", + "Exit focus mode\n", + "\n", + "Read in English\n", + "\n", + "Save\n", + "\n", + "Table of contents\n", + "\n", + "Read in English\n", + "\n", + "Save\n", + "\n", + "Add to Plan\n", + "\n", + "[Edit](https://github.com/MicrosoftDocs/fabric-docs/blob/main/docs/data-science/tuning-automated-machine-learning-visualizations.md \"Edit This Document\")\n", + "\n", + "---\n", + "\n", + "#### Share via\n", + "\n", + "Facebook\n", + "x.com\n", + "LinkedIn\n", + "Email\n", + "\n", + "---\n", + "\n", + "Print\n", + "\n", + "Table of contents\n", + "\n", + "Training visualizations (preview)\n", + "=================================\n", + "\n", + "* Article\n", + "* 03/26/2024\n", + "* 4 contributors\n", + "\n", + "Feedback\n", + "\n", + "In this article\n", + "---------------\n", + "\n", + "A hyperparameter trial or AutoML trial searches for the optimal parameters for a machine learning model. Each trial consists of multiple runs, where each run evaluates a specific parameter combination. Users can monitor these runs using ML experiment items in Fabric.\n", + "\n", + "The `flaml.visualization` module offers functions to plot and compare the runs in FLAML. Users can use Plotly to interact with their AutoML experiment plots. To use these functions, users need to input their optimized `flaml.AutoML` or `flaml.tune.tune.ExperimentAnalysis` object.\n", + "\n", + "This article teaches you how to use the `flaml.visualization` module to analyze and explore your AutoML trial results. You can follow the same steps for your hyperparameter trial as well.\n", + "\n", + "Important\n", + "\n", + "This feature is in [preview](../get-started/preview).\n", + "\n", + "Create an AutoML trial\n", + "----------------------\n", + "\n", + "AutoML offers a suite of automated processes that can identify the best machine learning pipeline for your dataset, making the entire modeling process more straightforward and often more accurate. In essence, it saves you the trouble of hand-tuning different models and hyperparameters.\n", + "\n", + "In the code cell below, we will:\n", + "\n", + "1. Load the Iris dataset.\n", + "2. Split the data into training and test sets.\n", + "3. Initiate an AutoML trial to fit our training data.\n", + "4. Explore the results of our AutoML trial with the visualizations from `flaml.visualization`.\n", + "\n", + "```\n", + "from sklearn.datasets import load_iris\n", + "from sklearn.model_selection import train_test_split\n", + "from flaml import AutoML\n", + "\n", + "# Load the Iris data and split it into train and test sets\n", + "x, y = load_iris(return_X_y=True, as_frame=True)\n", + "x_train, x_test, y_train, y_test = train_test_split(x, y, test_size=0.2, random_state=7654321)\n", + "\n", + "# Create an AutoML instance and set the parameters\n", + "automl = AutoML()\n", + "automl_settings = {\n", + " \"time_budget\": 10, # Time limit in seconds\n", + " \"task\": \"classification\", # Type of machine learning task\n", + " \"log_file_name\": \"aml_iris.log\", # Name of the log file\n", + " \"metric\": \"accuracy\", # Evaluation metric\n", + " \"log_type\": \"all\", # Level of logging\n", + "}\n", + "# Fit the AutoML instance on the training data\n", + "automl.fit(X_train=x_train, y_train=y_train, **automl_settings)\n", + "\n", + "```\n", + "\n", + "Visualize the experiment results\n", + "--------------------------------\n", + "\n", + "Once you run an AutoML trial, you need to visualize the outcomes to analyze how well the models performed and how they behaved. In this part of our documentation, we show you how to use the built-in utilities in the FLAML library for this purpose.\n", + "\n", + "### Import visualization module\n", + "\n", + "To access these visualization utilities, we run the following import command:\n", + "\n", + "```\n", + "import flaml.visualization as fviz\n", + "\n", + "```\n", + "\n", + "### Optimization history\n", + "\n", + "An optimization history plot typically has the number of trials/iterations on the x-axis and a performance metric (like accuracy, RMSE, etc.) on the y-axis. As the number of trials increases, you would see a line or scatter plot indicating the performance of each trial.\n", + "\n", + "```\n", + "fig = fviz.plot_optimization_history(automl)\n", + "# or\n", + "fig = fviz.plot(automl, \"optimization_history\")\n", + "fig.show()\n", + "\n", + "```\n", + "\n", + "Here is the resulting plot:\n", + "\n", + "[![Graph of optimization history plot.](media/model-training/optimization-history.png)](media/model-training/optimization-history.png#lightbox)\n", + "\n", + "### Feature importance\n", + "\n", + "A feature importance plot is a powerful visualization tool that allows you to understand the significance of different input features in determining the predictions of a model.\n", + "\n", + "```\n", + "fig = fviz.plot_feature_importance(automl)\n", + "# or\n", + "fig = fviz.plot(automl, \"feature_importance\")\n", + "fig.show()\n", + "\n", + "```\n", + "\n", + "Here is the resulting plot:\n", + "\n", + "[![Graph of feature importance plot.](media/model-training/feature-importance.png)](media/model-training/feature-importance.png#lightbox)\n", + "\n", + "### Parallel coordinate plot\n", + "\n", + "A parallel coordinate plot is a visualization tool that represents multi-dimensional data by drawing multiple vertical lines (axes) corresponding to variables or hyperparameters, with data points plotted as connected lines across these axes. In the context of an AutoML or tuning experiment, it's instrumental in visualizing and analyzing the performance of different hyperparameter combinations. By tracing the paths of high-performing configurations, one can discern patterns or trends in hyperparameter choices and their interactions. This plot aids in understanding which combinations lead to optimal performance, pinpointing potential areas for further exploration, and identifying any trade-offs between different hyperparameters.\n", + "\n", + "This utility takes the following other arguments:\n", + "\n", + "* `learner`: Specify the learner you intend to study in the experiment. This parameter is only applicable for AutoML experiment results. By leaving this blank, the system chooses the best learner in the whole experiment.\n", + "* `params`: A list to specify which hyperparameter to display. By leaving this blank, the system displays all the available hyperparameters.\n", + "\n", + "```\n", + "fig = fviz.plot_parallel_coordinate(automl, learner=\"lgbm\", params=[\"n_estimators\", \"num_leaves\", \"learning_rate\"])\n", + "# or\n", + "fig = fviz.plot(automl, \"parallel_coordinate\", learner=\"lgbm\", params=[\"n_estimators\", \"num_leaves\", \"learning_rate\"])\n", + "fig.show()\n", + "\n", + "```\n", + "\n", + "Here is the resulting plot:\n", + "\n", + "[![Graph of parallel coordinate plot.](media/model-training/parallel-coordinate-plot.png)](media/model-training/parallel-coordinate-plot.png#lightbox)\n", + "\n", + "### Contour plot\n", + "\n", + "A contour plot visualizes three-dimensional data in two dimensions, where the x and y axes represent two hyperparameters, and the contour lines or filled contours depict levels of a performance metric (for example, accuracy or loss). In the context of an AutoML or tuning experiment, a contour plot is beneficial for understanding the relationship between two hyperparameters and their combined effect on model performance.\n", + "\n", + "By examining the density and positioning of the contour lines, one can identify regions of hyperparameter space where performance is optimized, ascertain potential trade-offs between hyperparameters, and gain insights into their interactions. This visualization helps refine the search space and tuning process.\n", + "\n", + "This utility also takes the following arguments:\n", + "\n", + "* `learner`: Specify the learner you intend to study in the experiment. This parameter is only applicable for AutoML experiment results. By leaving this blank, the system chooses the best learner in the whole experiment.\n", + "* `params`: A list to specify which hyperparameter to display. By leaving this blank, the system displays all the available hyperparameters.\n", + "\n", + "```\n", + "fig = fviz.plot_contour(automl, learner=\"lgbm\", params=[\"n_estimators\", \"num_leaves\", \"learning_rate\"])\n", + "# or\n", + "fig = fviz.plot(automl, \"contour\", learner=\"lgbm\", params=[\"n_estimators\", \"num_leaves\", \"learning_rate\"])\n", + "fig.show()\n", + "\n", + "```\n", + "\n", + "Here is the resulting plot:\n", + "\n", + "[![Graph of contour plot.](media/model-training/contour-plot.png)](media/model-training/contour-plot.png#lightbox)\n", + "\n", + "### Empirical distribution function\n", + "\n", + "An empirical distribution function (EDF) plot, often visualized as a step function, represents the cumulative probability of data points being less than or equal to a particular value. Within an AutoML or tuning experiment, an EDF plot can be employed to visualize the distribution of model performances across different hyperparameter configurations.\n", + "\n", + "By observing the steepness or flatness of the curve at various points, one can understand the concentration of good or poor model performances, respectively. This visualization offers insights into the overall efficacy of the tuning process, highlighting whether most of the attempted configurations are yielding satisfactory results or if only a few configurations stand out.\n", + "\n", + "Note\n", + "\n", + "For AutoML experiments, multiple models will be applied during training. The trials of each learner are represented as an optimization series.\n", + "For hyperparameter tuning experiments, there will be only a single learner that is evaluated. However, you can provide additional tuning experiments to see the trends across each learner.\n", + "\n", + "```\n", + "fig = fviz.plot_edf(automl)\n", + "# or\n", + "fig = fviz.plot(automl, \"edf\")\n", + "fig.show()\n", + "\n", + "```\n", + "\n", + "Here is the resulting plot:\n", + "\n", + "[![Graph of the empirical distribution function plot.](media/model-training/empirical-distribution-function-plot.png)](media/model-training/empirical-distribution-function-plot.png#lightbox)\n", + "\n", + "### Timeline plot\n", + "\n", + "A timeline plot, often represented as a Gantt chart or a sequence of bars, visualizes the start, duration, and completion of tasks over time. In the context of an AutoML or tuning experiment, a timeline plot can showcase the progression of various model evaluations and their respective durations, plotted against time. By observing this plot, users can grasp the efficiency of the search process, identify any potential bottlenecks or idle periods, and understand the temporal dynamics of different hyperparameter evaluations.\n", + "\n", + "```\n", + "fig = fviz.plot_timeline(automl)\n", + "# or\n", + "fig = fviz.plot(automl, \"timeline\")\n", + "fig.show()\n", + "\n", + "```\n", + "\n", + "Here is the resulting plot:\n", + "\n", + "[![Graph of timeline plot.](media/model-training/timeline-plot.png)](media/model-training/timeline-plot.png#lightbox)\n", + "\n", + "### Slice plot\n", + "\n", + "Plot the parameter relationship as slice plot in a study.\n", + "\n", + "This utility also takes the following arguments:\n", + "\n", + "* `learner`: Specify the learner you intend to study in the experiment. This parameter is only applicable for AutoML experiment results. By leaving this blank, the system chooses the best learner in the whole experiment.\n", + "* `params`: A list to specify which hyperparameter to display. By leaving this blank, the system displays all the available hyperparameters.\n", + "\n", + "```\n", + "fig = fviz.plot_slice(automl, learner=\"sgd\")\n", + "# or\n", + "fig = fviz.plot(automl, \"slice\", learner=\"sgd\")\n", + "fig.show()\n", + "\n", + "```\n", + "\n", + "Here is the resulting plot:\n", + "\n", + "[![Graph of slice plot.](media/model-training/slice-plot.png)](media/model-training/slice-plot.png#lightbox)\n", + "\n", + "### Hyperparameter importance\n", + "\n", + "A hyperparameter importance plot visually ranks hyperparameters based on their influence on model performance in an AutoML or tuning experiment. Displayed typically as a bar chart, it quantifies the impact of each hyperparameter on the target metric. By examining this plot, practitioners can discern which hyperparameters are pivotal in determining model outcomes and which ones have minimal effect.\n", + "\n", + "This utility also takes the following arguments:\n", + "\n", + "* `learner`: Specify the learner you intend to study in the experiment. This parameter is only applicable for AutoML experiment results. By leaving this blank, the system chooses the best learner in the whole experiment.\n", + "* `params`: A list to specify which hyperparameter to display. By leaving this blank, the system displays all the available hyperparameters.\n", + "\n", + "```\n", + "fig = fviz.plot_param_importance(automl, learner=\"sgd\")\n", + "# or\n", + "fig = fviz.plot(automl, \"param_importance\", learner=\"sgd\")\n", + "fig.show()\n", + "\n", + "```\n", + "\n", + "Here is the resulting plot:\n", + "\n", + "[![Graph of hyperparameter importance plot.](media/model-training/hyperparameter-importance-plot.png)](media/model-training/hyperparameter-importance-plot.png#lightbox)\n", + "\n", + "Related content\n", + "---------------\n", + "\n", + "* [Tune a SynapseML Spark LightGBM model](how-to-tune-lightgbm-flaml)\n", + "\n", + "---\n", + "\n", + "Feedback\n", + "--------\n", + "\n", + "Was this page helpful?\n", + "\n", + "Yes\n", + "\n", + "No\n", + "\n", + "[Provide product feedback](https://ideas.fabric.microsoft.com/?forum=f2a1a698-503e-ed11-bba2-000d3a8b12b6&category=91402968-e13f-ed11-bba3-000d3a8b12b6)\n", + "|\n", + "\n", + "[Ask the community](https://community.fabric.microsoft.com/synapse)\n", + "\n", + "Feedback\n", + "--------\n", + "\n", + "Coming soon: Throughout 2024 we will be phasing out GitHub Issues as the feedback mechanism for content and replacing it with a new feedback system. For more information see: . \n", + "\n", + "Submit and view feedback for\n", + "\n", + "[This product](https://ideas.fabric.microsoft.com/?forum=f2a1a698-503e-ed11-bba2-000d3a8b12b6&category=91402968-e13f-ed11-bba3-000d3a8b12b6)\n", + "This page\n", + "\n", + "[View all page feedback](https://github.com//issues)\n", + "\n", + "---\n", + "\n", + "Additional resources\n", + "--------------------\n", + "\n", + "[California Consumer Privacy Act (CCPA) Opt-Out Icon\n", + "\n", + "Your Privacy Choices](https://aka.ms/yourcaliforniaprivacychoices)\n", + "\n", + "Theme\n", + "\n", + "* Light\n", + "* Dark\n", + "* High contrast\n", + "\n", + "* \n", + "* [Previous Versions](/en-us/previous-versions/)\n", + "* [Blog](https://techcommunity.microsoft.com/t5/microsoft-learn-blog/bg-p/MicrosoftLearnBlog)\n", + "* [Contribute](/en-us/contribute/)\n", + "* [Privacy](https://go.microsoft.com/fwlink/?LinkId=521839)\n", + "* [Terms of Use](/en-us/legal/termsofuse)\n", + "* [Trademarks](https://www.microsoft.com/legal/intellectualproperty/Trademarks/)\n", + "* © Microsoft 2024\n", + "\n", + "Additional resources\n", + "--------------------\n", + "\n", + "### In this article\n", + "\n", + "[California Consumer Privacy Act (CCPA) Opt-Out Icon\n", + "\n", + "Your Privacy Choices](https://aka.ms/yourcaliforniaprivacychoices)\n", + "\n", + "Theme\n", + "\n", + "* Light\n", + "* Dark\n", + "* High contrast\n", + "\n", + "* \n", + "* [Previous Versions](/en-us/previous-versions/)\n", + "* [Blog](https://techcommunity.microsoft.com/t5/microsoft-learn-blog/bg-p/MicrosoftLearnBlog)\n", + "* [Contribute](/en-us/contribute/)\n", + "* [Privacy](https://go.microsoft.com/fwlink/?LinkId=521839)\n", + "* [Terms of Use](/en-us/legal/termsofuse)\n", + "* [Trademarks](https://www.microsoft.com/legal/intellectualproperty/Trademarks/)\n", + "* © Microsoft 2024\n", + "\n", + "\n", + "The source of the context is: ['https://learn.microsoft.com/en-us/fabric/data-science/tuning-automated-machine-learning-visualizations']\n", + "\n", + "If you can answer the question, in the end of your answer, add the source of the context in the format of `Sources: source1, source2, ...`.\n", + "\n", + "\n", + "--------------------------------------------------------------------------------\n", + "\u001b[33massistant\u001b[0m (to ragproxyagent):\n", + "\n", + "Step 1: The user's intent is a code generation task to train a regression model with a time budget of 12 seconds and plot the timeline plot after training.\n", + "\n", + "Step 2: Here is the code to achieve this:\n", + "\n", + "```python\n", + "from sklearn.datasets import load_boston\n", + "from sklearn.model_selection import train_test_split\n", + "from flaml import AutoML\n", + "import flaml.visualization as fviz\n", + "\n", + "# Load the Boston housing data and split it into train and test sets\n", + "x, y = load_boston(return_X_y=True)\n", + "x_train, x_test, y_train, y_test = train_test_split(x, y, test_size=0.2, random_state=7654321)\n", + "\n", + "# Create an AutoML instance and set the parameters\n", + "automl = AutoML()\n", + "automl_settings = {\n", + " \"time_budget\": 12, # Time limit in seconds\n", + " \"task\": \"regression\", # Type of machine learning task\n", + " \"log_file_name\": \"aml_boston.log\", # Name of the log file\n", + " \"metric\": \"rmse\", # Evaluation metric\n", + " \"log_type\": \"all\", # Level of logging\n", + "}\n", + "\n", + "# Fit the AutoML instance on the training data\n", + "automl.fit(X_train=x_train, y_train=y_train, **automl_settings)\n", + "\n", + "# Plot the timeline plot\n", + "fig = fviz.plot_timeline(automl)\n", + "fig.show()\n", + "```\n", + "\n", + "Sources: [Visualize tuning and AutoML trials - Microsoft Fabric | Microsoft Learn](https://learn.microsoft.com/en-us/fabric/data-science/tuning-automated-machine-learning-visualizations)\n", + "\n", + "--------------------------------------------------------------------------------\n", + "\u001b[31m\n", + ">>>>>>>> EXECUTING CODE BLOCK (inferred language is python)...\u001b[0m\n", + "\u001b[33mragproxyagent\u001b[0m (to assistant):\n", + "\n", + "exitcode: 1 (execution failed)\n", + "Code output: Traceback (most recent call last):\n", + " File \"/tmp/tmp41070gi5/tmp_code_4463932bbc95a1921034eb428e7ded0c.py\", line 1, in \n", + " from sklearn.datasets import load_boston\n", + " File \"/home/trusted-service-user/cluster-env/trident_env/lib/python3.11/site-packages/sklearn/datasets/__init__.py\", line 157, in __getattr__\n", + " raise ImportError(msg)\n", + "ImportError: \n", + "`load_boston` has been removed from scikit-learn since version 1.2.\n", + "\n", + "The Boston housing prices dataset has an ethical problem: as\n", + "investigated in [1], the authors of this dataset engineered a\n", + "non-invertible variable \"B\" assuming that racial self-segregation had a\n", + "positive impact on house prices [2]. Furthermore the goal of the\n", + "research that led to the creation of this dataset was to study the\n", + "impact of air quality but it did not give adequate demonstration of the\n", + "validity of this assumption.\n", + "\n", + "The scikit-learn maintainers therefore strongly discourage the use of\n", + "this dataset unless the purpose of the code is to study and educate\n", + "about ethical issues in data science and machine learning.\n", + "\n", + "In this special case, you can fetch the dataset from the original\n", + "source::\n", + "\n", + " import pandas as pd\n", + " import numpy as np\n", + "\n", + " data_url = \"http://lib.stat.cmu.edu/datasets/boston\"\n", + " raw_df = pd.read_csv(data_url, sep=\"\\s+\", skiprows=22, header=None)\n", + " data = np.hstack([raw_df.values[::2, :], raw_df.values[1::2, :2]])\n", + " target = raw_df.values[1::2, 2]\n", + "\n", + "Alternative datasets include the California housing dataset and the\n", + "Ames housing dataset. You can load the datasets as follows::\n", + "\n", + " from sklearn.datasets import fetch_california_housing\n", + " housing = fetch_california_housing()\n", + "\n", + "for the California housing dataset and::\n", + "\n", + " from sklearn.datasets import fetch_openml\n", + " housing = fetch_openml(name=\"house_prices\", as_frame=True)\n", + "\n", + "for the Ames housing dataset.\n", + "\n", + "[1] M Carlisle.\n", + "\"Racist data destruction?\"\n", + "\n", + "\n", + "[2] Harrison Jr, David, and Daniel L. Rubinfeld.\n", + "\"Hedonic housing prices and the demand for clean air.\"\n", + "Journal of environmental economics and management 5.1 (1978): 81-102.\n", + "\n", + "\n", + "\n", + "\n", + "--------------------------------------------------------------------------------\n", + "\u001b[33massistant\u001b[0m (to ragproxyagent):\n", + "\n", + "The `load_boston` dataset has been removed from scikit-learn due to ethical concerns. Let's use the California housing dataset instead.\n", + "\n", + "Here is the updated code:\n", + "\n", + "```python\n", + "from sklearn.datasets import fetch_california_housing\n", + "from sklearn.model_selection import train_test_split\n", + "from flaml import AutoML\n", + "import flaml.visualization as fviz\n", + "\n", + "# Load the California housing data and split it into train and test sets\n", + "housing = fetch_california_housing()\n", + "x, y = housing.data, housing.target\n", + "x_train, x_test, y_train, y_test = train_test_split(x, y, test_size=0.2, random_state=7654321)\n", + "\n", + "# Create an AutoML instance and set the parameters\n", + "automl = AutoML()\n", + "automl_settings = {\n", + " \"time_budget\": 12, # Time limit in seconds\n", + " \"task\": \"regression\", # Type of machine learning task\n", + " \"log_file_name\": \"aml_california.log\", # Name of the log file\n", + " \"metric\": \"rmse\", # Evaluation metric\n", + " \"log_type\": \"all\", # Level of logging\n", + "}\n", + "\n", + "# Fit the AutoML instance on the training data\n", + "automl.fit(X_train=x_train, y_train=y_train, **automl_settings)\n", + "\n", + "# Plot the timeline plot\n", + "fig = fviz.plot_timeline(automl)\n", + "fig.show()\n", + "```\n", + "\n", + "Sources: [Visualize tuning and AutoML trials - Microsoft Fabric | Microsoft Learn](https://learn.microsoft.com/en-us/fabric/data-science/tuning-automated-machine-learning-visualizations)\n", + "\n", + "--------------------------------------------------------------------------------\n", + "\u001b[31m\n", + ">>>>>>>> EXECUTING CODE BLOCK (inferred language is python)...\u001b[0m\n", + "\u001b[33mragproxyagent\u001b[0m (to assistant):\n", + "\n", + "exitcode: 0 (execution succeeded)\n", + "Code output: Please make sure you passed environment EnvConfigs to workers by calling `set_mlflow_env_config` in order to trigger mlflow on workers correctly.\n", + "Please make sure you passed environment EnvConfigs to workers by calling `set_mlflow_env_config` in order to trigger mlflow on workers correctly.\n", + "Please make sure you passed environment EnvConfigs to workers by calling `set_mlflow_env_config` in order to trigger mlflow on workers correctly.\n", + "Please make sure you passed environment EnvConfigs to workers by calling `set_mlflow_env_config` in order to trigger mlflow on workers correctly.\n", + "[flaml.automl.logger: 06-07 15:28:07] {1767} INFO - task = regression\n", + "[flaml.automl.logger: 06-07 15:28:07] {1778} INFO - Evaluation method: holdout\n", + "[flaml.automl.logger: 06-07 15:28:07] {1881} INFO - Minimizing error metric: rmse\n", + "[flaml.automl.logger: 06-07 15:28:09] {1999} INFO - List of ML learners in AutoML Run: ['lgbm', 'rf', 'xgboost', 'extra_tree', 'xgb_limitdepth', 'sgd', 'catboost']\n", + "[flaml.automl.logger: 06-07 15:28:09] {2309} INFO - iteration 0, current learner lgbm\n", + "[flaml.automl.logger: 06-07 15:28:09] {2444} INFO - Estimated sufficient time budget=3982s. Estimated necessary time budget=34s.\n", + "[flaml.automl.logger: 06-07 15:28:09] {2493} INFO - at 4.9s,\testimator lgbm's best error=0.9511,\tbest estimator lgbm's best error=0.9511\n", + "[flaml.automl.logger: 06-07 15:28:09] {2309} INFO - iteration 1, current learner lgbm\n", + "[flaml.automl.logger: 06-07 15:28:09] {2493} INFO - at 4.9s,\testimator lgbm's best error=0.9511,\tbest estimator lgbm's best error=0.9511\n", + "[flaml.automl.logger: 06-07 15:28:09] {2309} INFO - iteration 2, current learner lgbm\n", + "[flaml.automl.logger: 06-07 15:28:09] {2493} INFO - at 4.9s,\testimator lgbm's best error=0.8172,\tbest estimator lgbm's best error=0.8172\n", + "[flaml.automl.logger: 06-07 15:28:09] {2309} INFO - iteration 3, current learner lgbm\n", + "[flaml.automl.logger: 06-07 15:28:09] {2493} INFO - at 4.9s,\testimator lgbm's best error=0.6288,\tbest estimator lgbm's best error=0.6288\n", + "[flaml.automl.logger: 06-07 15:28:09] {2309} INFO - iteration 4, current learner lgbm\n", + "[flaml.automl.logger: 06-07 15:28:09] {2493} INFO - at 5.0s,\testimator lgbm's best error=0.6288,\tbest estimator lgbm's best error=0.6288\n", + "[flaml.automl.logger: 06-07 15:28:09] {2309} INFO - iteration 5, current learner lgbm\n", + "[flaml.automl.logger: 06-07 15:28:09] {2493} INFO - at 5.0s,\testimator lgbm's best error=0.6104,\tbest estimator lgbm's best error=0.6104\n", + "[flaml.automl.logger: 06-07 15:28:09] {2309} INFO - iteration 6, current learner lgbm\n", + "[flaml.automl.logger: 06-07 15:28:09] {2493} INFO - at 5.0s,\testimator lgbm's best error=0.6104,\tbest estimator lgbm's best error=0.6104\n", + "[flaml.automl.logger: 06-07 15:28:09] {2309} INFO - iteration 7, current learner lgbm\n", + "[flaml.automl.logger: 06-07 15:28:09] {2493} INFO - at 5.0s,\testimator lgbm's best error=0.6104,\tbest estimator lgbm's best error=0.6104\n", + "[flaml.automl.logger: 06-07 15:28:09] {2309} INFO - iteration 8, current learner lgbm\n", + "[flaml.automl.logger: 06-07 15:28:09] {2493} INFO - at 5.0s,\testimator lgbm's best error=0.5627,\tbest estimator lgbm's best error=0.5627\n", + "[flaml.automl.logger: 06-07 15:28:09] {2309} INFO - iteration 9, current learner lgbm\n", + "[flaml.automl.logger: 06-07 15:28:09] {2493} INFO - at 5.0s,\testimator lgbm's best error=0.5627,\tbest estimator lgbm's best error=0.5627\n", + "[flaml.automl.logger: 06-07 15:28:09] {2309} INFO - iteration 10, current learner lgbm\n", + "[flaml.automl.logger: 06-07 15:28:09] {2493} INFO - at 5.1s,\testimator lgbm's best error=0.5001,\tbest estimator lgbm's best error=0.5001\n", + "[flaml.automl.logger: 06-07 15:28:09] {2309} INFO - iteration 11, current learner lgbm\n", + "[flaml.automl.logger: 06-07 15:28:10] {2493} INFO - at 5.3s,\testimator lgbm's best error=0.5001,\tbest estimator lgbm's best error=0.5001\n", + "[flaml.automl.logger: 06-07 15:28:10] {2309} INFO - iteration 12, current learner lgbm\n", + "[flaml.automl.logger: 06-07 15:28:10] {2493} INFO - at 5.3s,\testimator lgbm's best error=0.5001,\tbest estimator lgbm's best error=0.5001\n", + "[flaml.automl.logger: 06-07 15:28:10] {2309} INFO - iteration 13, current learner lgbm\n", + "[flaml.automl.logger: 06-07 15:28:10] {2493} INFO - at 5.4s,\testimator lgbm's best error=0.5001,\tbest estimator lgbm's best error=0.5001\n", + "[flaml.automl.logger: 06-07 15:28:10] {2309} INFO - iteration 14, current learner lgbm\n", + "[flaml.automl.logger: 06-07 15:28:10] {2493} INFO - at 5.6s,\testimator lgbm's best error=0.4888,\tbest estimator lgbm's best error=0.4888\n", + "[flaml.automl.logger: 06-07 15:28:10] {2309} INFO - iteration 15, current learner sgd\n", + "[flaml.automl.logger: 06-07 15:28:10] {2493} INFO - at 5.6s,\testimator sgd's best error=1.1240,\tbest estimator lgbm's best error=0.4888\n", + "[flaml.automl.logger: 06-07 15:28:10] {2309} INFO - iteration 16, current learner lgbm\n", + "[flaml.automl.logger: 06-07 15:28:10] {2493} INFO - at 6.0s,\testimator lgbm's best error=0.4888,\tbest estimator lgbm's best error=0.4888\n", + "[flaml.automl.logger: 06-07 15:28:10] {2309} INFO - iteration 17, current learner sgd\n", + "[flaml.automl.logger: 06-07 15:28:10] {2493} INFO - at 6.0s,\testimator sgd's best error=1.1240,\tbest estimator lgbm's best error=0.4888\n", + "[flaml.automl.logger: 06-07 15:28:10] {2309} INFO - iteration 18, current learner sgd\n", + "[flaml.automl.logger: 06-07 15:28:10] {2493} INFO - at 6.1s,\testimator sgd's best error=1.1240,\tbest estimator lgbm's best error=0.4888\n", + "[flaml.automl.logger: 06-07 15:28:10] {2309} INFO - iteration 19, current learner sgd\n", + "[flaml.automl.logger: 06-07 15:28:10] {2493} INFO - at 6.1s,\testimator sgd's best error=1.1067,\tbest estimator lgbm's best error=0.4888\n", + "[flaml.automl.logger: 06-07 15:28:10] {2309} INFO - iteration 20, current learner lgbm\n", + "[flaml.automl.logger: 06-07 15:28:10] {2493} INFO - at 6.2s,\testimator lgbm's best error=0.4888,\tbest estimator lgbm's best error=0.4888\n", + "[flaml.automl.logger: 06-07 15:28:10] {2309} INFO - iteration 21, current learner lgbm\n", + "[flaml.automl.logger: 06-07 15:28:11] {2493} INFO - at 6.5s,\testimator lgbm's best error=0.4888,\tbest estimator lgbm's best error=0.4888\n", + "[flaml.automl.logger: 06-07 15:28:11] {2309} INFO - iteration 22, current learner xgboost\n", + "[flaml.automl.logger: 06-07 15:28:11] {2493} INFO - at 6.6s,\testimator xgboost's best error=1.3843,\tbest estimator lgbm's best error=0.4888\n", + "[flaml.automl.logger: 06-07 15:28:11] {2309} INFO - iteration 23, current learner xgboost\n", + "[flaml.automl.logger: 06-07 15:28:11] {2493} INFO - at 6.7s,\testimator xgboost's best error=1.3843,\tbest estimator lgbm's best error=0.4888\n", + "[flaml.automl.logger: 06-07 15:28:11] {2309} INFO - iteration 24, current learner xgboost\n", + "[flaml.automl.logger: 06-07 15:28:11] {2493} INFO - at 6.7s,\testimator xgboost's best error=0.9469,\tbest estimator lgbm's best error=0.4888\n", + "[flaml.automl.logger: 06-07 15:28:11] {2309} INFO - iteration 25, current learner xgboost\n", + "[flaml.automl.logger: 06-07 15:28:11] {2493} INFO - at 6.7s,\testimator xgboost's best error=0.6871,\tbest estimator lgbm's best error=0.4888\n", + "[flaml.automl.logger: 06-07 15:28:11] {2309} INFO - iteration 26, current learner xgboost\n", + "[flaml.automl.logger: 06-07 15:28:11] {2493} INFO - at 6.7s,\testimator xgboost's best error=0.6871,\tbest estimator lgbm's best error=0.4888\n", + "[flaml.automl.logger: 06-07 15:28:11] {2309} INFO - iteration 27, current learner xgboost\n", + "[flaml.automl.logger: 06-07 15:28:11] {2493} INFO - at 6.7s,\testimator xgboost's best error=0.6871,\tbest estimator lgbm's best error=0.4888\n", + "[flaml.automl.logger: 06-07 15:28:11] {2309} INFO - iteration 28, current learner xgboost\n", + "[flaml.automl.logger: 06-07 15:28:11] {2493} INFO - at 6.7s,\testimator xgboost's best error=0.6203,\tbest estimator lgbm's best error=0.4888\n", + "[flaml.automl.logger: 06-07 15:28:11] {2309} INFO - iteration 29, current learner lgbm\n", + "[flaml.automl.logger: 06-07 15:28:11] {2493} INFO - at 6.8s,\testimator lgbm's best error=0.4888,\tbest estimator lgbm's best error=0.4888\n", + "[flaml.automl.logger: 06-07 15:28:11] {2309} INFO - iteration 30, current learner lgbm\n", + "[flaml.automl.logger: 06-07 15:28:11] {2493} INFO - at 6.9s,\testimator lgbm's best error=0.4888,\tbest estimator lgbm's best error=0.4888\n", + "[flaml.automl.logger: 06-07 15:28:11] {2309} INFO - iteration 31, current learner xgboost\n", + "[flaml.automl.logger: 06-07 15:28:11] {2493} INFO - at 6.9s,\testimator xgboost's best error=0.6053,\tbest estimator lgbm's best error=0.4888\n", + "[flaml.automl.logger: 06-07 15:28:11] {2309} INFO - iteration 32, current learner xgboost\n", + "[flaml.automl.logger: 06-07 15:28:11] {2493} INFO - at 6.9s,\testimator xgboost's best error=0.5953,\tbest estimator lgbm's best error=0.4888\n", + "[flaml.automl.logger: 06-07 15:28:11] {2309} INFO - iteration 33, current learner lgbm\n", + "[flaml.automl.logger: 06-07 15:28:12] {2493} INFO - at 7.4s,\testimator lgbm's best error=0.4888,\tbest estimator lgbm's best error=0.4888\n", + "[flaml.automl.logger: 06-07 15:28:12] {2309} INFO - iteration 34, current learner xgboost\n", + "[flaml.automl.logger: 06-07 15:28:12] {2493} INFO - at 7.4s,\testimator xgboost's best error=0.5550,\tbest estimator lgbm's best error=0.4888\n", + "[flaml.automl.logger: 06-07 15:28:12] {2309} INFO - iteration 35, current learner xgboost\n", + "[flaml.automl.logger: 06-07 15:28:12] {2493} INFO - at 7.4s,\testimator xgboost's best error=0.5550,\tbest estimator lgbm's best error=0.4888\n", + "[flaml.automl.logger: 06-07 15:28:12] {2309} INFO - iteration 36, current learner xgboost\n", + "[flaml.automl.logger: 06-07 15:28:12] {2493} INFO - at 7.4s,\testimator xgboost's best error=0.5550,\tbest estimator lgbm's best error=0.4888\n", + "[flaml.automl.logger: 06-07 15:28:12] {2309} INFO - iteration 37, current learner xgboost\n", + "[flaml.automl.logger: 06-07 15:28:12] {2493} INFO - at 7.5s,\testimator xgboost's best error=0.5285,\tbest estimator lgbm's best error=0.4888\n", + "[flaml.automl.logger: 06-07 15:28:12] {2309} INFO - iteration 38, current learner xgboost\n", + "[flaml.automl.logger: 06-07 15:28:12] {2493} INFO - at 7.5s,\testimator xgboost's best error=0.5285,\tbest estimator lgbm's best error=0.4888\n", + "[flaml.automl.logger: 06-07 15:28:12] {2309} INFO - iteration 39, current learner xgboost\n", + "[flaml.automl.logger: 06-07 15:28:12] {2493} INFO - at 7.6s,\testimator xgboost's best error=0.5285,\tbest estimator lgbm's best error=0.4888\n", + "[flaml.automl.logger: 06-07 15:28:12] {2309} INFO - iteration 40, current learner xgboost\n", + "[flaml.automl.logger: 06-07 15:28:12] {2493} INFO - at 7.6s,\testimator xgboost's best error=0.5285,\tbest estimator lgbm's best error=0.4888\n", + "[flaml.automl.logger: 06-07 15:28:12] {2309} INFO - iteration 41, current learner lgbm\n", + "[flaml.automl.logger: 06-07 15:28:12] {2493} INFO - at 7.7s,\testimator lgbm's best error=0.4824,\tbest estimator lgbm's best error=0.4824\n", + "[flaml.automl.logger: 06-07 15:28:12] {2309} INFO - iteration 42, current learner xgboost\n", + "[flaml.automl.logger: 06-07 15:28:12] {2493} INFO - at 7.8s,\testimator xgboost's best error=0.5285,\tbest estimator lgbm's best error=0.4824\n", + "[flaml.automl.logger: 06-07 15:28:12] {2309} INFO - iteration 43, current learner extra_tree\n", + "[flaml.automl.logger: 06-07 15:28:12] {2493} INFO - at 8.0s,\testimator extra_tree's best error=0.8723,\tbest estimator lgbm's best error=0.4824\n", + "[flaml.automl.logger: 06-07 15:28:12] {2309} INFO - iteration 44, current learner sgd\n", + "[flaml.automl.logger: 06-07 15:28:12] {2493} INFO - at 8.0s,\testimator sgd's best error=1.1055,\tbest estimator lgbm's best error=0.4824\n", + "[flaml.automl.logger: 06-07 15:28:12] {2309} INFO - iteration 45, current learner extra_tree\n", + "[flaml.automl.logger: 06-07 15:28:12] {2493} INFO - at 8.0s,\testimator extra_tree's best error=0.7612,\tbest estimator lgbm's best error=0.4824\n", + "[flaml.automl.logger: 06-07 15:28:12] {2309} INFO - iteration 46, current learner xgboost\n", + "[flaml.automl.logger: 06-07 15:28:12] {2493} INFO - at 8.1s,\testimator xgboost's best error=0.5285,\tbest estimator lgbm's best error=0.4824\n", + "[flaml.automl.logger: 06-07 15:28:12] {2309} INFO - iteration 47, current learner extra_tree\n", + "[flaml.automl.logger: 06-07 15:28:13] {2493} INFO - at 8.3s,\testimator extra_tree's best error=0.7612,\tbest estimator lgbm's best error=0.4824\n", + "[flaml.automl.logger: 06-07 15:28:13] {2309} INFO - iteration 48, current learner rf\n", + "[flaml.automl.logger: 06-07 15:28:13] {2493} INFO - at 8.4s,\testimator rf's best error=0.8142,\tbest estimator lgbm's best error=0.4824\n", + "[flaml.automl.logger: 06-07 15:28:13] {2309} INFO - iteration 49, current learner rf\n", + "[flaml.automl.logger: 06-07 15:28:13] {2493} INFO - at 8.5s,\testimator rf's best error=0.6937,\tbest estimator lgbm's best error=0.4824\n", + "[flaml.automl.logger: 06-07 15:28:13] {2309} INFO - iteration 50, current learner rf\n", + "[flaml.automl.logger: 06-07 15:28:13] {2493} INFO - at 8.6s,\testimator rf's best error=0.6937,\tbest estimator lgbm's best error=0.4824\n", + "[flaml.automl.logger: 06-07 15:28:13] {2309} INFO - iteration 51, current learner extra_tree\n", + "[flaml.automl.logger: 06-07 15:28:13] {2493} INFO - at 8.6s,\testimator extra_tree's best error=0.7209,\tbest estimator lgbm's best error=0.4824\n", + "[flaml.automl.logger: 06-07 15:28:13] {2309} INFO - iteration 52, current learner rf\n", + "[flaml.automl.logger: 06-07 15:28:13] {2493} INFO - at 8.8s,\testimator rf's best error=0.6425,\tbest estimator lgbm's best error=0.4824\n", + "[flaml.automl.logger: 06-07 15:28:13] {2309} INFO - iteration 53, current learner rf\n", + "[flaml.automl.logger: 06-07 15:28:13] {2493} INFO - at 9.0s,\testimator rf's best error=0.6055,\tbest estimator lgbm's best error=0.4824\n", + "[flaml.automl.logger: 06-07 15:28:13] {2309} INFO - iteration 54, current learner lgbm\n", + "[flaml.automl.logger: 06-07 15:28:14] {2493} INFO - at 9.2s,\testimator lgbm's best error=0.4824,\tbest estimator lgbm's best error=0.4824\n", + "[flaml.automl.logger: 06-07 15:28:14] {2309} INFO - iteration 55, current learner lgbm\n", + "[flaml.automl.logger: 06-07 15:28:14] {2493} INFO - at 9.4s,\testimator lgbm's best error=0.4824,\tbest estimator lgbm's best error=0.4824\n", + "[flaml.automl.logger: 06-07 15:28:14] {2309} INFO - iteration 56, current learner xgboost\n", + "[flaml.automl.logger: 06-07 15:28:14] {2493} INFO - at 9.5s,\testimator xgboost's best error=0.5187,\tbest estimator lgbm's best error=0.4824\n", + "[flaml.automl.logger: 06-07 15:28:14] {2309} INFO - iteration 57, current learner lgbm\n", + "[flaml.automl.logger: 06-07 15:28:14] {2493} INFO - at 9.8s,\testimator lgbm's best error=0.4824,\tbest estimator lgbm's best error=0.4824\n", + "[flaml.automl.logger: 06-07 15:28:14] {2309} INFO - iteration 58, current learner lgbm\n", + "[flaml.automl.logger: 06-07 15:28:15] {2493} INFO - at 10.2s,\testimator lgbm's best error=0.4794,\tbest estimator lgbm's best error=0.4794\n", + "[flaml.automl.logger: 06-07 15:28:15] {2309} INFO - iteration 59, current learner rf\n", + "[flaml.automl.logger: 06-07 15:28:15] {2493} INFO - at 10.5s,\testimator rf's best error=0.6055,\tbest estimator lgbm's best error=0.4794\n", + "[flaml.automl.logger: 06-07 15:28:15] {2309} INFO - iteration 60, current learner lgbm\n", + "[flaml.automl.logger: 06-07 15:28:15] {2493} INFO - at 10.7s,\testimator lgbm's best error=0.4794,\tbest estimator lgbm's best error=0.4794\n", + "[flaml.automl.logger: 06-07 15:28:15] {2309} INFO - iteration 61, current learner rf\n", + "[flaml.automl.logger: 06-07 15:28:15] {2493} INFO - at 11.0s,\testimator rf's best error=0.5968,\tbest estimator lgbm's best error=0.4794\n", + "[flaml.automl.logger: 06-07 15:28:15] {2309} INFO - iteration 62, current learner lgbm\n", + "[flaml.automl.logger: 06-07 15:28:16] {2493} INFO - at 12.1s,\testimator lgbm's best error=0.4794,\tbest estimator lgbm's best error=0.4794\n", + "[flaml.automl.logger: 06-07 15:28:17] {2736} INFO - retrain lgbm for 0.5s\n", + "[flaml.automl.logger: 06-07 15:28:17] {2739} INFO - retrained model: LGBMRegressor(colsample_bytree=0.591579264701285,\n", + " learning_rate=0.0715412842452619, max_bin=511,\n", + " min_child_samples=2, n_estimators=1, n_jobs=-1, num_leaves=168,\n", + " reg_alpha=0.01435520144866301, reg_lambda=0.006874802748054268,\n", + " verbose=-1)\n", + "[flaml.automl.logger: 06-07 15:28:17] {2740} INFO - Auto Feature Engineering pipeline: None\n", + "[flaml.automl.logger: 06-07 15:28:17] {2035} INFO - fit succeeded\n", + "[flaml.automl.logger: 06-07 15:28:17] {2036} INFO - Time taken to find the best model: 10.24332308769226\n", + "\n", + "\n", + "--------------------------------------------------------------------------------\n", + "\u001b[33massistant\u001b[0m (to ragproxyagent):\n", + "\n", + "TERMINATE\n", + "\n", + "--------------------------------------------------------------------------------\n" + ] + } + ], + "source": [ + "assistant.reset()\n", + "problem = \"Train a regression model, set time budget to 12s, plot the time line plot after training.\"\n", + "\n", + "chat_result = ragproxyagent.initiate_chat(assistant, message=ragproxyagent.message_generator, problem=problem)" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "16", + "metadata": { + "jupyter": { + "outputs_hidden": false, + "source_hidden": false + }, + "nteract": { + "transient": { + "deleting": false + } + } + }, + "outputs": [ + { + "data": { + "application/vnd.livy.statement-meta+json": { + "execution_finish_time": "2024-06-07T15:28:22.7924281Z", + "execution_start_time": "2024-06-07T15:28:22.4431692Z", + "livy_statement_state": "available", + "parent_msg_id": "8c89a821-45eb-47f0-8608-11ac711f02e9", + "queued_time": "2024-06-07T15:26:26.0620587Z", + "session_id": "1d5e9aec-2019-408c-a19a-5db9fb175ae2", + "session_start_time": null, + "spark_pool": null, + "state": "finished", + "statement_id": 19, + "statement_ids": [ + 19 + ] + }, + "text/plain": [ + "StatementMeta(, 1d5e9aec-2019-408c-a19a-5db9fb175ae2, 19, Finished, Available)" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "Cost for the chat:\n", + "{'usage_including_cached_inference': {'total_cost': 0.04863, 'gpt-4o-2024-05-13': {'cost': 0.04863, 'prompt_tokens': 7737, 'completion_tokens': 663, 'total_tokens': 8400}}, 'usage_excluding_cached_inference': {'total_cost': 0.04863, 'gpt-4o-2024-05-13': {'cost': 0.04863, 'prompt_tokens': 7737, 'completion_tokens': 663, 'total_tokens': 8400}}}\n" + ] + } + ], + "source": [ + "print(f\"Cost for the chat:\\n{chat_result.cost}\")" + ] + }, + { + "cell_type": "markdown", + "id": "17", + "metadata": { + "nteract": { + "transient": { + "deleting": false + } + } + }, + "source": [ + "Below is the code generated by AutoGen RAG agent. It's not a copy of the code in the related document as we asked for different task and training time, but AutoGen RAG agent adapted it very well." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "18", + "metadata": { + "jupyter": { + "outputs_hidden": false, + "source_hidden": false + }, + "nteract": { + "transient": { + "deleting": false + } + } + }, + "outputs": [ + { + "data": { + "application/vnd.livy.statement-meta+json": { + "execution_finish_time": "2024-06-07T15:28:56.954585Z", + "execution_start_time": "2024-06-07T15:28:23.7618029Z", + "livy_statement_state": "available", + "parent_msg_id": "ced1bbe3-3ab3-421a-a8a9-6eb151a3a7d3", + "queued_time": "2024-06-07T15:26:26.2444398Z", + "session_id": "1d5e9aec-2019-408c-a19a-5db9fb175ae2", + "session_start_time": null, + "spark_pool": null, + "state": "finished", + "statement_id": 20, + "statement_ids": [ + 20 + ] + }, + "text/plain": [ + "StatementMeta(, 1d5e9aec-2019-408c-a19a-5db9fb175ae2, 20, Finished, Available)" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "[flaml.automl.logger: 06-07 15:28:28] {1767} INFO - task = regression\n", + "[flaml.automl.logger: 06-07 15:28:28] {1778} INFO - Evaluation method: holdout\n", + "[flaml.automl.logger: 06-07 15:28:28] {1881} INFO - Minimizing error metric: rmse\n", + "[flaml.automl.logger: 06-07 15:28:28] {1999} INFO - List of ML learners in AutoML Run: ['lgbm', 'rf', 'xgboost', 'extra_tree', 'xgb_limitdepth', 'sgd', 'catboost']\n", + "[flaml.automl.logger: 06-07 15:28:28] {2309} INFO - iteration 0, current learner lgbm\n", + "[flaml.automl.logger: 06-07 15:28:28] {2444} INFO - Estimated sufficient time budget=145s. Estimated necessary time budget=1s.\n" + ] + }, + { + "name": "stderr", + "output_type": "stream", + "text": [ + "/home/trusted-service-user/cluster-env/trident_env/lib/python3.11/site-packages/_distutils_hack/__init__.py:26: UserWarning: Setuptools is replacing distutils.\n", + " warnings.warn(\"Setuptools is replacing distutils.\")\n", + "2024/06/07 15:28:47 WARNING mlflow.utils.requirements_utils: The following packages were not found in the public PyPI package index as of 2024-02-29; if these packages are not present in the public PyPI index, you must install them manually before loading your model: {'synapseml-internal', 'synapseml-mlflow'}\n" + ] + }, + { + "data": { + "application/vnd.mlflow.run-widget+json": { + "data": { + "metrics": { + "best_validation_loss": 0.9510965242768078, + "iter_counter": 0, + "rmse": 0.9510965242768078, + "trial_time": 0.012721061706542969, + "validation_loss": 0.9510965242768078, + "wall_clock_time": 4.973712205886841 + }, + "params": { + "colsample_bytree": "1.0", + "learner": "lgbm", + "learning_rate": "0.09999999999999995", + "log_max_bin": "8", + "min_child_samples": "20", + "n_estimators": "4", + "num_leaves": "4", + "reg_alpha": "0.0009765625", + "reg_lambda": "1.0", + "sample_size": "14860" + }, + "tags": { + "flaml.best_run": "False", + "flaml.estimator_class": "LGBMEstimator", + "flaml.estimator_name": "lgbm", + "flaml.iteration_number": "0", + "flaml.learner": "lgbm", + "flaml.log_type": "r_autolog", + "flaml.meric": "rmse", + "flaml.run_source": "flaml-automl", + "flaml.sample_size": "14860", + "flaml.version": "2.1.2.post1", + "mlflow.rootRunId": "da4aff39-ef24-4953-ab30-f9adc0c843bd", + "mlflow.runName": "careful_stomach_bzw71tb4", + "mlflow.user": "0e0e6551-b66b-41f3-bc82-bd86e0d203dc", + "synapseml.experiment.artifactId": "2ba08dad-7edc-4af2-b41b-5802fb6180c2", + "synapseml.experimentName": "autogen", + "synapseml.livy.id": "1d5e9aec-2019-408c-a19a-5db9fb175ae2", + "synapseml.notebook.artifactId": "72c91c1d-9cbf-4ca5-8180-2e318bb7d1d5", + "synapseml.user.id": "8abb9091-0a62-4ecd-bf6a-e49dbbf94431", + "synapseml.user.name": "Li Jiang" + } + }, + "info": { + "artifact_uri": "sds://onelakedxt.pbidedicated.windows.net/a9c17701-dbed-452d-91ee-ffeef4d6674f/2ba08dad-7edc-4af2-b41b-5802fb6180c2/da4aff39-ef24-4953-ab30-f9adc0c843bd/artifacts", + "end_time": 1717774129, + "experiment_id": "9d1ec9c8-d313-40a4-9ed8-b9bf496195ae", + "lifecycle_stage": "active", + "run_id": "da4aff39-ef24-4953-ab30-f9adc0c843bd", + "run_name": "", + "run_uuid": "da4aff39-ef24-4953-ab30-f9adc0c843bd", + "start_time": 1717774109, + "status": "FINISHED", + "user_id": "9ec1a2ed-32f8-4061-910f-25871321251b" + }, + "inputs": { + "dataset_inputs": [] + } + } + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "[flaml.automl.logger: 06-07 15:28:53] {2493} INFO - at 5.0s,\testimator lgbm's best error=0.9511,\tbest estimator lgbm's best error=0.9511\n", + "[flaml.automl.logger: 06-07 15:28:54] {2736} INFO - retrain lgbm for 0.0s\n", + "[flaml.automl.logger: 06-07 15:28:54] {2739} INFO - retrained model: LGBMRegressor(learning_rate=0.09999999999999995, max_bin=255, n_estimators=1,\n", + " n_jobs=-1, num_leaves=4, reg_alpha=0.0009765625, reg_lambda=1.0,\n", + " verbose=-1)\n", + "[flaml.automl.logger: 06-07 15:28:54] {2740} INFO - Auto Feature Engineering pipeline: None\n", + "[flaml.automl.logger: 06-07 15:28:54] {2742} INFO - Best MLflow run name: \n", + "[flaml.automl.logger: 06-07 15:28:54] {2743} INFO - Best MLflow run id: da4aff39-ef24-4953-ab30-f9adc0c843bd\n", + "[flaml.automl.logger: 06-07 15:28:54] {2035} INFO - fit succeeded\n", + "[flaml.automl.logger: 06-07 15:28:54] {2036} INFO - Time taken to find the best model: 4.973712205886841\n" + ] + }, + { + "data": { + "text/html": [ + " \n", + " " + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "data": { + "application/vnd.plotly.v1+json": { + "config": { + "plotlyServerURL": "https://plot.ly" + }, + "data": [ + { + "base": [ + 4.960991144180298 + ], + "name": "lgbm", + "orientation": "h", + "type": "bar", + "x": [ + 0.012721061706542969 + ], + "y": [ + 0 + ] + } + ], + "layout": { + "template": { + "data": { + "bar": [ + { + "error_x": { + "color": "#2a3f5f" + }, + "error_y": { + "color": "#2a3f5f" + }, + "marker": { + "line": { + "color": "#E5ECF6", + "width": 0.5 + }, + "pattern": { + "fillmode": "overlay", + "size": 10, + "solidity": 0.2 + } + }, + "type": "bar" + } + ], + "barpolar": [ + { + "marker": { + "line": { + "color": "#E5ECF6", + "width": 0.5 + }, + "pattern": { + "fillmode": "overlay", + "size": 10, + "solidity": 0.2 + } + }, + "type": "barpolar" + } + ], + "carpet": [ + { + "aaxis": { + "endlinecolor": "#2a3f5f", + "gridcolor": "white", + "linecolor": "white", + "minorgridcolor": "white", + "startlinecolor": "#2a3f5f" + }, + "baxis": { + "endlinecolor": "#2a3f5f", + "gridcolor": "white", + "linecolor": "white", + "minorgridcolor": "white", + "startlinecolor": "#2a3f5f" + }, + "type": "carpet" + } + ], + "choropleth": [ + { + "colorbar": { + "outlinewidth": 0, + "ticks": "" + }, + "type": "choropleth" + } + ], + "contour": [ + { + "colorbar": { + "outlinewidth": 0, + "ticks": "" + }, + "colorscale": [ + [ + 0, + "#0d0887" + ], + [ + 0.1111111111111111, + "#46039f" + ], + [ + 0.2222222222222222, + "#7201a8" + ], + [ + 0.3333333333333333, + "#9c179e" + ], + [ + 0.4444444444444444, + "#bd3786" + ], + [ + 0.5555555555555556, + "#d8576b" + ], + [ + 0.6666666666666666, + "#ed7953" + ], + [ + 0.7777777777777778, + "#fb9f3a" + ], + [ + 0.8888888888888888, + "#fdca26" + ], + [ + 1, + "#f0f921" + ] + ], + "type": "contour" + } + ], + "contourcarpet": [ + { + "colorbar": { + "outlinewidth": 0, + "ticks": "" + }, + "type": "contourcarpet" + } + ], + "heatmap": [ + { + "colorbar": { + "outlinewidth": 0, + "ticks": "" + }, + "colorscale": [ + [ + 0, + "#0d0887" + ], + [ + 0.1111111111111111, + "#46039f" + ], + [ + 0.2222222222222222, + "#7201a8" + ], + [ + 0.3333333333333333, + "#9c179e" + ], + [ + 0.4444444444444444, + "#bd3786" + ], + [ + 0.5555555555555556, + "#d8576b" + ], + [ + 0.6666666666666666, + "#ed7953" + ], + [ + 0.7777777777777778, + "#fb9f3a" + ], + [ + 0.8888888888888888, + "#fdca26" + ], + [ + 1, + "#f0f921" + ] + ], + "type": "heatmap" + } + ], + "heatmapgl": [ + { + "colorbar": { + "outlinewidth": 0, + "ticks": "" + }, + "colorscale": [ + [ + 0, + "#0d0887" + ], + [ + 0.1111111111111111, + "#46039f" + ], + [ + 0.2222222222222222, + "#7201a8" + ], + [ + 0.3333333333333333, + "#9c179e" + ], + [ + 0.4444444444444444, + "#bd3786" + ], + [ + 0.5555555555555556, + "#d8576b" + ], + [ + 0.6666666666666666, + "#ed7953" + ], + [ + 0.7777777777777778, + "#fb9f3a" + ], + [ + 0.8888888888888888, + "#fdca26" + ], + [ + 1, + "#f0f921" + ] + ], + "type": "heatmapgl" + } + ], + "histogram": [ + { + "marker": { + "pattern": { + "fillmode": "overlay", + "size": 10, + "solidity": 0.2 + } + }, + "type": "histogram" + } + ], + "histogram2d": [ + { + "colorbar": { + "outlinewidth": 0, + "ticks": "" + }, + "colorscale": [ + [ + 0, + "#0d0887" + ], + [ + 0.1111111111111111, + "#46039f" + ], + [ + 0.2222222222222222, + "#7201a8" + ], + [ + 0.3333333333333333, + "#9c179e" + ], + [ + 0.4444444444444444, + "#bd3786" + ], + [ + 0.5555555555555556, + "#d8576b" + ], + [ + 0.6666666666666666, + "#ed7953" + ], + [ + 0.7777777777777778, + "#fb9f3a" + ], + [ + 0.8888888888888888, + "#fdca26" + ], + [ + 1, + "#f0f921" + ] + ], + "type": "histogram2d" + } + ], + "histogram2dcontour": [ + { + "colorbar": { + "outlinewidth": 0, + "ticks": "" + }, + "colorscale": [ + [ + 0, + "#0d0887" + ], + [ + 0.1111111111111111, + "#46039f" + ], + [ + 0.2222222222222222, + "#7201a8" + ], + [ + 0.3333333333333333, + "#9c179e" + ], + [ + 0.4444444444444444, + "#bd3786" + ], + [ + 0.5555555555555556, + "#d8576b" + ], + [ + 0.6666666666666666, + "#ed7953" + ], + [ + 0.7777777777777778, + "#fb9f3a" + ], + [ + 0.8888888888888888, + "#fdca26" + ], + [ + 1, + "#f0f921" + ] + ], + "type": "histogram2dcontour" + } + ], + "mesh3d": [ + { + "colorbar": { + "outlinewidth": 0, + "ticks": "" + }, + "type": "mesh3d" + } + ], + "parcoords": [ + { + "line": { + "colorbar": { + "outlinewidth": 0, + "ticks": "" + } + }, + "type": "parcoords" + } + ], + "pie": [ + { + "automargin": true, + "type": "pie" + } + ], + "scatter": [ + { + "fillpattern": { + "fillmode": "overlay", + "size": 10, + "solidity": 0.2 + }, + "type": "scatter" + } + ], + "scatter3d": [ + { + "line": { + "colorbar": { + "outlinewidth": 0, + "ticks": "" + } + }, + "marker": { + "colorbar": { + "outlinewidth": 0, + "ticks": "" + } + }, + "type": "scatter3d" + } + ], + "scattercarpet": [ + { + "marker": { + "colorbar": { + "outlinewidth": 0, + "ticks": "" + } + }, + "type": "scattercarpet" + } + ], + "scattergeo": [ + { + "marker": { + "colorbar": { + "outlinewidth": 0, + "ticks": "" + } + }, + "type": "scattergeo" + } + ], + "scattergl": [ + { + "marker": { + "colorbar": { + "outlinewidth": 0, + "ticks": "" + } + }, + "type": "scattergl" + } + ], + "scattermapbox": [ + { + "marker": { + "colorbar": { + "outlinewidth": 0, + "ticks": "" + } + }, + "type": "scattermapbox" + } + ], + "scatterpolar": [ + { + "marker": { + "colorbar": { + "outlinewidth": 0, + "ticks": "" + } + }, + "type": "scatterpolar" + } + ], + "scatterpolargl": [ + { + "marker": { + "colorbar": { + "outlinewidth": 0, + "ticks": "" + } + }, + "type": "scatterpolargl" + } + ], + "scatterternary": [ + { + "marker": { + "colorbar": { + "outlinewidth": 0, + "ticks": "" + } + }, + "type": "scatterternary" + } + ], + "surface": [ + { + "colorbar": { + "outlinewidth": 0, + "ticks": "" + }, + "colorscale": [ + [ + 0, + "#0d0887" + ], + [ + 0.1111111111111111, + "#46039f" + ], + [ + 0.2222222222222222, + "#7201a8" + ], + [ + 0.3333333333333333, + "#9c179e" + ], + [ + 0.4444444444444444, + "#bd3786" + ], + [ + 0.5555555555555556, + "#d8576b" + ], + [ + 0.6666666666666666, + "#ed7953" + ], + [ + 0.7777777777777778, + "#fb9f3a" + ], + [ + 0.8888888888888888, + "#fdca26" + ], + [ + 1, + "#f0f921" + ] + ], + "type": "surface" + } + ], + "table": [ + { + "cells": { + "fill": { + "color": "#EBF0F8" + }, + "line": { + "color": "white" + } + }, + "header": { + "fill": { + "color": "#C8D4E3" + }, + "line": { + "color": "white" + } + }, + "type": "table" + } + ] + }, + "layout": { + "annotationdefaults": { + "arrowcolor": "#2a3f5f", + "arrowhead": 0, + "arrowwidth": 1 + }, + "autotypenumbers": "strict", + "coloraxis": { + "colorbar": { + "outlinewidth": 0, + "ticks": "" + } + }, + "colorscale": { + "diverging": [ + [ + 0, + "#8e0152" + ], + [ + 0.1, + "#c51b7d" + ], + [ + 0.2, + "#de77ae" + ], + [ + 0.3, + "#f1b6da" + ], + [ + 0.4, + "#fde0ef" + ], + [ + 0.5, + "#f7f7f7" + ], + [ + 0.6, + "#e6f5d0" + ], + [ + 0.7, + "#b8e186" + ], + [ + 0.8, + "#7fbc41" + ], + [ + 0.9, + "#4d9221" + ], + [ + 1, + "#276419" + ] + ], + "sequential": [ + [ + 0, + "#0d0887" + ], + [ + 0.1111111111111111, + "#46039f" + ], + [ + 0.2222222222222222, + "#7201a8" + ], + [ + 0.3333333333333333, + "#9c179e" + ], + [ + 0.4444444444444444, + "#bd3786" + ], + [ + 0.5555555555555556, + "#d8576b" + ], + [ + 0.6666666666666666, + "#ed7953" + ], + [ + 0.7777777777777778, + "#fb9f3a" + ], + [ + 0.8888888888888888, + "#fdca26" + ], + [ + 1, + "#f0f921" + ] + ], + "sequentialminus": [ + [ + 0, + "#0d0887" + ], + [ + 0.1111111111111111, + "#46039f" + ], + [ + 0.2222222222222222, + "#7201a8" + ], + [ + 0.3333333333333333, + "#9c179e" + ], + [ + 0.4444444444444444, + "#bd3786" + ], + [ + 0.5555555555555556, + "#d8576b" + ], + [ + 0.6666666666666666, + "#ed7953" + ], + [ + 0.7777777777777778, + "#fb9f3a" + ], + [ + 0.8888888888888888, + "#fdca26" + ], + [ + 1, + "#f0f921" + ] + ] + }, + "colorway": [ + "#636efa", + "#EF553B", + "#00cc96", + "#ab63fa", + "#FFA15A", + "#19d3f3", + "#FF6692", + "#B6E880", + "#FF97FF", + "#FECB52" + ], + "font": { + "color": "#2a3f5f" + }, + "geo": { + "bgcolor": "white", + "lakecolor": "white", + "landcolor": "#E5ECF6", + "showlakes": true, + "showland": true, + "subunitcolor": "white" + }, + "hoverlabel": { + "align": "left" + }, + "hovermode": "closest", + "mapbox": { + "style": "light" + }, + "paper_bgcolor": "white", + "plot_bgcolor": "#E5ECF6", + "polar": { + "angularaxis": { + "gridcolor": "white", + "linecolor": "white", + "ticks": "" + }, + "bgcolor": "#E5ECF6", + "radialaxis": { + "gridcolor": "white", + "linecolor": "white", + "ticks": "" + } + }, + "scene": { + "xaxis": { + "backgroundcolor": "#E5ECF6", + "gridcolor": "white", + "gridwidth": 2, + "linecolor": "white", + "showbackground": true, + "ticks": "", + "zerolinecolor": "white" + }, + "yaxis": { + "backgroundcolor": "#E5ECF6", + "gridcolor": "white", + "gridwidth": 2, + "linecolor": "white", + "showbackground": true, + "ticks": "", + "zerolinecolor": "white" + }, + "zaxis": { + "backgroundcolor": "#E5ECF6", + "gridcolor": "white", + "gridwidth": 2, + "linecolor": "white", + "showbackground": true, + "ticks": "", + "zerolinecolor": "white" + } + }, + "shapedefaults": { + "line": { + "color": "#2a3f5f" + } + }, + "ternary": { + "aaxis": { + "gridcolor": "white", + "linecolor": "white", + "ticks": "" + }, + "baxis": { + "gridcolor": "white", + "linecolor": "white", + "ticks": "" + }, + "bgcolor": "#E5ECF6", + "caxis": { + "gridcolor": "white", + "linecolor": "white", + "ticks": "" + } + }, + "title": { + "x": 0.05 + }, + "xaxis": { + "automargin": true, + "gridcolor": "white", + "linecolor": "white", + "ticks": "", + "title": { + "standoff": 15 + }, + "zerolinecolor": "white", + "zerolinewidth": 2 + }, + "yaxis": { + "automargin": true, + "gridcolor": "white", + "linecolor": "white", + "ticks": "", + "title": { + "standoff": 15 + }, + "zerolinecolor": "white", + "zerolinewidth": 2 + } + } + }, + "title": { + "text": "Timeline Plot" + }, + "xaxis": { + "title": { + "text": "Time (s)" + } + }, + "yaxis": { + "title": { + "text": "Trial" + } + } + } + }, + "text/html": [ + "
" + ] + }, + "metadata": {}, + "output_type": "display_data" + } + ], + "source": [ + "import flaml.visualization as fviz\n", + "from flaml import AutoML\n", + "from sklearn.datasets import fetch_california_housing\n", + "from sklearn.model_selection import train_test_split\n", + "\n", + "# Load the California housing data and split it into train and test sets\n", + "housing = fetch_california_housing()\n", + "x, y = housing.data, housing.target\n", + "x_train, x_test, y_train, y_test = train_test_split(x, y, test_size=0.2, random_state=7654321)\n", + "\n", + "# Create an AutoML instance and set the parameters\n", + "automl = AutoML()\n", + "automl_settings = {\n", + " \"time_budget\": 12, # Time limit in seconds\n", + " \"task\": \"regression\", # Type of machine learning task\n", + " \"log_file_name\": \"aml_california.log\", # Name of the log file\n", + " \"metric\": \"rmse\", # Evaluation metric\n", + " \"log_type\": \"all\", # Level of logging\n", + "}\n", + "\n", + "# Fit the AutoML instance on the training data\n", + "automl.fit(X_train=x_train, y_train=y_train, **automl_settings)\n", + "\n", + "# Plot the timeline plot\n", + "fig = fviz.plot_timeline(automl)\n", + "fig.show()" + ] + }, + { + "cell_type": "markdown", + "id": "19", + "metadata": { + "nteract": { + "transient": { + "deleting": false + } + } + }, + "source": [ + "### Example 3\n", + "How to use `MultimodalConversableAgent` to chat with images.\n", + "\n", + "Check out this [blog](https://microsoft.github.io/autogen/blog/2023/11/06/LMM-Agent) for more details." + ] + }, + { + "cell_type": "markdown", + "id": "20", + "metadata": { + "nteract": { + "transient": { + "deleting": false + } + } + }, + "source": [ + "We'll ask a question about below image:![image-alt-text](https://th.bing.com/th/id/R.422068ce8af4e15b0634fe2540adea7a?rik=y4OcXBE%2fqutDOw&pid=ImgRaw&r=0)" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "21", + "metadata": { + "jupyter": { + "outputs_hidden": false, + "source_hidden": false + }, + "nteract": { + "transient": { + "deleting": false + } + } + }, + "outputs": [ + { + "data": { + "application/vnd.livy.statement-meta+json": { + "execution_finish_time": "2024-06-07T15:29:04.6027047Z", + "execution_start_time": "2024-06-07T15:28:57.9532564Z", + "livy_statement_state": "available", + "parent_msg_id": "71bfdcee-445d-4564-b423-61d9a6378939", + "queued_time": "2024-06-07T15:26:26.4400435Z", + "session_id": "1d5e9aec-2019-408c-a19a-5db9fb175ae2", + "session_start_time": null, + "spark_pool": null, + "state": "finished", + "statement_id": 21, + "statement_ids": [ + 21 + ] + }, + "text/plain": [ + "StatementMeta(, 1d5e9aec-2019-408c-a19a-5db9fb175ae2, 21, Finished, Available)" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "\u001b[33mUser_proxy\u001b[0m (to image-explainer):\n", + "\n", + "What's the breed of this dog?\n", + ".\n", + "\n", + "--------------------------------------------------------------------------------\n", + "\u001b[31m\n", + ">>>>>>>> USING AUTO REPLY...\u001b[0m\n", + "\u001b[33mimage-explainer\u001b[0m (to User_proxy):\n", + "\n", + "The dog in the image appears to be a Poodle or a Poodle mix, such as a Labradoodle or a Goldendoodle, based on its curly coat and overall appearance.\n", + "\n", + "--------------------------------------------------------------------------------\n" + ] + } + ], + "source": [ + "from autogen.agentchat.contrib.multimodal_conversable_agent import MultimodalConversableAgent\n", + "\n", + "image_agent = MultimodalConversableAgent(\n", + " name=\"image-explainer\",\n", + " max_consecutive_auto_reply=10,\n", + " llm_config={\"config_list\": config_list, \"temperature\": 0.5, \"max_tokens\": 300},\n", + ")\n", + "\n", + "user_proxy = autogen.UserProxyAgent(\n", + " name=\"User_proxy\",\n", + " system_message=\"A human admin.\",\n", + " human_input_mode=\"NEVER\", # Try between ALWAYS or NEVER\n", + " max_consecutive_auto_reply=0,\n", + " code_execution_config={\n", + " \"use_docker\": False\n", + " }, # Please set use_docker=True if docker is available to run the generated code. Using docker is safer than running the generated code directly.\n", + ")\n", + "\n", + "# Ask the question with an image\n", + "chat_result = user_proxy.initiate_chat(\n", + " image_agent,\n", + " message=\"\"\"What's the breed of this dog?\n", + ".\"\"\",\n", + ")" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "22", + "metadata": { + "jupyter": { + "outputs_hidden": false, + "source_hidden": false + }, + "nteract": { + "transient": { + "deleting": false + } + } + }, + "outputs": [ + { + "data": { + "application/vnd.livy.statement-meta+json": { + "execution_finish_time": "2024-06-07T15:29:05.9669658Z", + "execution_start_time": "2024-06-07T15:29:05.613333Z", + "livy_statement_state": "available", + "parent_msg_id": "af81a0c7-9ee8-4da4-aa6e-dcd735209961", + "queued_time": "2024-06-07T15:26:26.7741139Z", + "session_id": "1d5e9aec-2019-408c-a19a-5db9fb175ae2", + "session_start_time": null, + "spark_pool": null, + "state": "finished", + "statement_id": 22, + "statement_ids": [ + 22 + ] + }, + "text/plain": [ + "StatementMeta(, 1d5e9aec-2019-408c-a19a-5db9fb175ae2, 22, Finished, Available)" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "Cost for the chat:\n", + "{'usage_including_cached_inference': {'total_cost': 0.0053950000000000005, 'gpt-4o-2024-05-13': {'cost': 0.0053950000000000005, 'prompt_tokens': 965, 'completion_tokens': 38, 'total_tokens': 1003}}, 'usage_excluding_cached_inference': {'total_cost': 0.0053950000000000005, 'gpt-4o-2024-05-13': {'cost': 0.0053950000000000005, 'prompt_tokens': 965, 'completion_tokens': 38, 'total_tokens': 1003}}}\n" + ] + } + ], + "source": [ + "print(f\"Cost for the chat:\\n{chat_result.cost}\")" + ] } ], "metadata": { @@ -802,24 +3081,17 @@ "name": "synapse_pyspark" }, "kernelspec": { - "display_name": "Synapse PySpark", - "language": "Python", + "display_name": "synapse_pyspark", "name": "synapse_pyspark" }, "language_info": { "name": "python" }, - "notebook_environment": {}, "nteract": { "version": "nteract-front-end@1.0.0" }, - "save_output": true, "spark_compute": { - "compute_id": "/trident/default", - "session_options": { - "conf": {}, - "enableDebugMode": false - } + "compute_id": "/trident/default" } }, "nbformat": 4,