mirror of https://github.com/microsoft/autogen.git
879 lines
54 KiB
Plaintext
879 lines
54 KiB
Plaintext
{
|
|
"cells": [
|
|
{
|
|
"cell_type": "markdown",
|
|
"source": [
|
|
"# Automatically Build Multi-agent System from Agent Library\n",
|
|
"\n",
|
|
"By: [Linxin Song](https://linxins97.github.io/), [Jieyu Zhang](https://jieyuz2.github.io/)\n",
|
|
"\n",
|
|
"In this notebook, we introduce a new feature for AutoBuild, `build_from_library`, which help users build an automatic task-solving process powered by a multi-agent system from a pre-defined agent library. \n",
|
|
"Specifically, in `build_from_library`, we prompt an LLM to explore useful agents from a pre-defined agent library, generating configurations for those agents for a group chat to solve the user's task."
|
|
],
|
|
"metadata": {
|
|
"collapsed": false
|
|
},
|
|
"id": "6264276d39875995"
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"id": "ec78dda8e3826d8a",
|
|
"metadata": {
|
|
"collapsed": false
|
|
},
|
|
"source": [
|
|
"## Requirement\n",
|
|
"\n",
|
|
"AutoBuild require `pyautogen[autobuild]`, which can be installed by the following command:"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": null,
|
|
"id": "e8e9ae50658be975",
|
|
"metadata": {
|
|
"collapsed": false
|
|
},
|
|
"outputs": [],
|
|
"source": [
|
|
"%pip install pyautogen[autobuild]"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"source": [
|
|
"## Preparation and useful tools\n",
|
|
"We need to specify a `config_path`, `default_llm_config` that include backbone LLM configurations."
|
|
],
|
|
"metadata": {
|
|
"collapsed": false
|
|
},
|
|
"id": "176c200804af63f3"
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": 1,
|
|
"id": "2505f029423b21ab",
|
|
"metadata": {
|
|
"collapsed": false,
|
|
"ExecuteTime": {
|
|
"end_time": "2023-12-26T16:58:02.762702600Z",
|
|
"start_time": "2023-12-26T16:58:02.472073Z"
|
|
}
|
|
},
|
|
"outputs": [],
|
|
"source": [
|
|
"import json\n",
|
|
"\n",
|
|
"import autogen\n",
|
|
"from autogen.agentchat.contrib.agent_builder import AgentBuilder\n",
|
|
"\n",
|
|
"config_file_or_env = \"OAI_CONFIG_LIST\" # modify path\n",
|
|
"llm_config = {\"temperature\": 0}\n",
|
|
"config_list = autogen.config_list_from_json(config_file_or_env, filter_dict={\"model\": [\"gpt-4-1106-preview\", \"gpt-4\"]})\n",
|
|
"\n",
|
|
"\n",
|
|
"def start_task(execution_task: str, agent_list: list):\n",
|
|
" group_chat = autogen.GroupChat(agents=agent_list, messages=[], max_round=12)\n",
|
|
" manager = autogen.GroupChatManager(groupchat=group_chat, llm_config={\"config_list\": config_list, **llm_config})\n",
|
|
" agent_list[0].initiate_chat(manager, message=execution_task)"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"source": [
|
|
"## Example for generating an agent library\n",
|
|
"Here, we show an example of generating an agent library from a pre-defined list of agents' names by prompting a `gpt-4`. You can also prepare a handcraft library yourself.\n",
|
|
"\n",
|
|
"A Library contains each agent's name and profile. The profile is a brief introduction about agent's characteristics. As we will put all agents' names and profiles into gpt-4 and let it choose the best agents for us, each agent's profile should be simple and capable. We will further complete the selected agents' system message based on the agents' names and the short profile as in the previous `build`.\n",
|
|
"\n",
|
|
"First, we define a prompt template and a list of agents' name:"
|
|
],
|
|
"metadata": {
|
|
"collapsed": false
|
|
},
|
|
"id": "5fb3db8885dd6ee6"
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": 3,
|
|
"id": "68315f6ec912c58a",
|
|
"metadata": {
|
|
"collapsed": false,
|
|
"ExecuteTime": {
|
|
"end_time": "2023-12-23T07:39:03.317527600Z",
|
|
"start_time": "2023-12-23T07:39:03.276859600Z"
|
|
}
|
|
},
|
|
"outputs": [],
|
|
"source": [
|
|
"AGENT_SYS_MSG_PROMPT = \"\"\"Considering the following position:\n",
|
|
"\n",
|
|
"POSITION: {position}\n",
|
|
"\n",
|
|
"What requirements should this position be satisfied?\n",
|
|
"\n",
|
|
"Hint:\n",
|
|
"# Your answer should be in one sentence.\n",
|
|
"# Your answer should be natural, starting from \"As a ...\".\n",
|
|
"# People with the above position need to complete a task given by a leader or colleague.\n",
|
|
"# People will work in a group chat, solving tasks with other people with different jobs.\n",
|
|
"# The modified requirement should not contain the code interpreter skill.\n",
|
|
"# Coding skill is limited to Python.\n",
|
|
"\"\"\"\n",
|
|
"\n",
|
|
"position_list = [\n",
|
|
" \"Environmental_Scientist\",\n",
|
|
" \"Astronomer\",\n",
|
|
" \"Software_Developer\",\n",
|
|
" \"Data_Analyst\",\n",
|
|
" \"Journalist\",\n",
|
|
" \"Teacher\",\n",
|
|
" \"Lawyer\",\n",
|
|
" \"Programmer\",\n",
|
|
" \"Accountant\",\n",
|
|
" \"Mathematician\",\n",
|
|
" \"Physicist\",\n",
|
|
" \"Biologist\",\n",
|
|
" \"Chemist\",\n",
|
|
" \"Statistician\",\n",
|
|
" \"IT_Specialist\",\n",
|
|
" \"Cybersecurity_Expert\",\n",
|
|
" \"Artificial_Intelligence_Engineer\",\n",
|
|
" \"Financial_Analyst\",\n",
|
|
"]"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"source": [
|
|
"Then we can prompt a `gpt-4` model to generate each agent's profile:"
|
|
],
|
|
"metadata": {
|
|
"collapsed": false
|
|
},
|
|
"id": "72b8e7d9d334a5c2"
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": 4,
|
|
"outputs": [],
|
|
"source": [
|
|
"build_manager = autogen.OpenAIWrapper(config_list=config_list)\n",
|
|
"sys_msg_list = []\n",
|
|
"\n",
|
|
"for pos in position_list:\n",
|
|
" resp_agent_sys_msg = (\n",
|
|
" build_manager.create(\n",
|
|
" messages=[\n",
|
|
" {\n",
|
|
" \"role\": \"user\",\n",
|
|
" \"content\": AGENT_SYS_MSG_PROMPT.format(\n",
|
|
" position=pos,\n",
|
|
" default_sys_msg=autogen.AssistantAgent.DEFAULT_SYSTEM_MESSAGE,\n",
|
|
" ),\n",
|
|
" }\n",
|
|
" ]\n",
|
|
" )\n",
|
|
" .choices[0]\n",
|
|
" .message.content\n",
|
|
" )\n",
|
|
" sys_msg_list.append({\"name\": pos, \"profile\": resp_agent_sys_msg})"
|
|
],
|
|
"metadata": {
|
|
"collapsed": false,
|
|
"ExecuteTime": {
|
|
"end_time": "2023-12-23T07:40:01.703372Z",
|
|
"start_time": "2023-12-23T07:39:04.472589200Z"
|
|
}
|
|
},
|
|
"id": "8fbfef9268fc5191"
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"source": [
|
|
"The generated profile will have the following format:"
|
|
],
|
|
"metadata": {
|
|
"collapsed": false
|
|
},
|
|
"id": "9e26c6db4befacc5"
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": 5,
|
|
"outputs": [
|
|
{
|
|
"data": {
|
|
"text/plain": "[{'name': 'Environmental_Scientist',\n 'profile': 'As an Environmental Scientist, the candidate should possess a strong background in environmental science, demonstrate the ability to effectively collaborate with a diverse team in a group chat to solve tasks, and have proficiency in Python for data analysis, without the need for code interpretation skills.'},\n {'name': 'Astronomer',\n 'profile': 'As an astronomer required to work collaboratively in a group chat setting, the candidate must possess strong proficiency in Python for data analysis and research purposes, alongside the ability to efficiently complete tasks assigned by leadership or colleagues without the need for code interpretation skills.'},\n {'name': 'Software_Developer',\n 'profile': 'As a Software Developer for this position, you must be able to work collaboratively in a group chat environment to complete tasks assigned by a leader or colleague, primarily using Python programming expertise, excluding the need for code interpretation skills.'},\n {'name': 'Data_Analyst',\n 'profile': 'As a Data Analyst for this position, you must be adept at analyzing data using Python, completing tasks assigned by leaders or colleagues, and collaboratively solving problems in a group chat setting with professionals of various roles.'},\n {'name': 'Journalist',\n 'profile': 'As a journalist in this position, you must possess strong collaboration and communication abilities to efficiently complete tasks assigned by leaders or colleagues within a group chat environment, without the need for code interpretation skills, although a basic understanding of Python is preferred.'},\n {'name': 'Teacher',\n 'profile': \"As a teacher, you need to possess a bachelor's degree in education or a related field, have a valid teaching certificate, be able to complete assignments provided by supervisors or colleagues, work collaboratively in group chats with professionals from various fields, and have a basic understanding of Python for educational purposes, excluding the need to interpret code.\"},\n {'name': 'Lawyer',\n 'profile': 'As a lawyer in this position, you must possess a Juris Doctor degree, be licensed to practice law, have strong analytical and communication skills, be able to complete tasks assigned by leaders or colleagues, and collaborate effectively in group chat environments with professionals across various disciplines, while having a basic understanding of Python for task-related purposes, excluding code interpretation.'},\n {'name': 'Programmer',\n 'profile': 'As a Programmer for this position, you should be proficient in Python, able to effectively collaborate and solve problems within a group chat environment, and complete tasks assigned by leaders or colleagues without requiring expertise in code interpretation.'},\n {'name': 'Accountant',\n 'profile': 'As an accountant in this position, one should possess a strong proficiency in accounting principles, the ability to effectively collaborate within team environments, such as group chats, to solve tasks, and have a basic understanding of Python for limited coding tasks, all while being able to follow directives from leaders and colleagues.'},\n {'name': 'Mathematician',\n 'profile': 'As a mathematician in this position, you should possess an advanced degree in mathematics, excel at collaborating and communicating within a group chat to solve complex tasks alongside professionals from various disciplines, and have proficiency in Python for any required computational work.'},\n {'name': 'Physicist',\n 'profile': \"As a physicist for this position, one must hold a strong foundation in physics principles, possess a minimum of a master's degree in physics or related fields, demonstrate proficiency in Python for task-specific computations, be willing to collaborate and solve problems within a multidisciplinary group chat, and not be required to interpret code from languages other than Python.\"},\n {'name': 'Biologist',\n 'profile': 'As a biologist for this position, one must hold a degree in biology or a related field, have proficiency in Python for data analysis, be able to complete tasks assigned by leaders or colleagues, and collaborate effectively in a group chat with professionals from various disciplines.'},\n {'name': 'Chemist',\n 'profile': 'As a chemist, one should possess a degree in chemistry or a related field, have strong analytical skills, work collaboratively within a team setting to complete tasks assigned by supervisors or peers, and have a basic proficiency in Python for any necessary data analysis.'},\n {'name': 'Statistician',\n 'profile': 'As a Statistician, the applicant should possess a strong background in statistics or mathematics, proficiency in Python for data analysis, the ability to work collaboratively in a team setting through group chats, and readiness to tackle and solve tasks delegated by supervisors or peers.'},\n {'name': 'IT_Specialist',\n 'profile': 'As an IT Specialist, you should possess strong problem-solving skills, be able to effectively collaborate within a team setting through group chats, complete tasks assigned by leaders or colleagues, and have proficiency in Python programming, excluding the need for code interpretation expertise.'},\n {'name': 'Cybersecurity_Expert',\n 'profile': 'As a Cybersecurity Expert, you must have the ability to collaborate in a group chat, completing tasks assigned by leaders or peers, and possess proficiency in Python, albeit without the need for code interpretation skills.'},\n {'name': 'Artificial_Intelligence_Engineer',\n 'profile': 'As an Artificial Intelligence Engineer, you should be adept in Python, able to fulfill tasks assigned by leaders or colleagues, and capable of collaboratively solving problems in a group chat with diverse professionals.'},\n {'name': 'Financial_Analyst',\n 'profile': 'As a Financial Analyst, one must possess strong analytical and problem-solving abilities, be proficient in Python for data analysis, have excellent communication skills to collaborate effectively in group chats, and be capable of completing assignments delegated by leaders or colleagues.'}]"
|
|
},
|
|
"execution_count": 5,
|
|
"metadata": {},
|
|
"output_type": "execute_result"
|
|
}
|
|
],
|
|
"source": [
|
|
"sys_msg_list"
|
|
],
|
|
"metadata": {
|
|
"collapsed": false,
|
|
"ExecuteTime": {
|
|
"end_time": "2023-12-23T07:40:01.712399300Z",
|
|
"start_time": "2023-12-23T07:40:01.707400200Z"
|
|
}
|
|
},
|
|
"id": "8ede1d7088eb183d"
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"source": [
|
|
"We can save the generated agents' information into a json file."
|
|
],
|
|
"metadata": {
|
|
"collapsed": false
|
|
},
|
|
"id": "256dd32b03a7a172"
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": 6,
|
|
"outputs": [],
|
|
"source": [
|
|
"json.dump(sys_msg_list, open(\"./agent_library_example.json\", \"w\"), indent=4)"
|
|
],
|
|
"metadata": {
|
|
"collapsed": false,
|
|
"ExecuteTime": {
|
|
"end_time": "2023-12-23T07:40:01.750855900Z",
|
|
"start_time": "2023-12-23T07:40:01.710399600Z"
|
|
}
|
|
},
|
|
"id": "53111125938845cf"
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"source": [
|
|
"## Build agents from library (by LLM)\n",
|
|
"Here, we introduce how to build agents from the generated library. As in the previous `build`, we also need to specify a `building_task` that lets the build manager know which agents should be selected from the library according to the task. \n",
|
|
"\n",
|
|
"We also need to specify a `library_path_or_json`, which can be a path of library or a JSON string with agents' configs. Here, we use the previously saved path as the library path."
|
|
],
|
|
"metadata": {
|
|
"collapsed": false
|
|
},
|
|
"id": "cfd883b79a3bd932"
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": 8,
|
|
"outputs": [],
|
|
"source": [
|
|
"library_path_or_json = \"./agent_library_example.json\"\n",
|
|
"building_task = \"Find a paper on arxiv by programming, and analyze its application in some domain. For example, find a recent paper about gpt-4 on arxiv and find its potential applications in software.\""
|
|
],
|
|
"metadata": {
|
|
"collapsed": false,
|
|
"ExecuteTime": {
|
|
"end_time": "2023-12-23T07:40:01.752918500Z",
|
|
"start_time": "2023-12-23T07:40:01.735461Z"
|
|
}
|
|
},
|
|
"id": "8963a8709c8e92e2"
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"source": [
|
|
"Then, we can call the `build_from_library` from the AgentBuilder to generate a list of agents from the library and let them complete the user's `execution_task` in a group chat."
|
|
],
|
|
"metadata": {
|
|
"collapsed": false
|
|
},
|
|
"id": "72656a8d0c1a9b12"
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": 10,
|
|
"outputs": [
|
|
{
|
|
"name": "stdout",
|
|
"output_type": "stream",
|
|
"text": [
|
|
"Looking for suitable agents in ./agent_library_example.json...\n",
|
|
"['Software_Developer', 'Programmer', 'Artificial_Intelligence_Engineer'] are selected.\n",
|
|
"Preparing configuration for Software_Developer...\n",
|
|
"Preparing configuration for Programmer...\n",
|
|
"Preparing configuration for Artificial_Intelligence_Engineer...\n",
|
|
"Creating agent Software_Developer with backbone gpt-4-1106-preview...\n",
|
|
"Creating agent Programmer with backbone gpt-4-1106-preview...\n",
|
|
"Creating agent Artificial_Intelligence_Engineer with backbone gpt-4-1106-preview...\n",
|
|
"Adding user console proxy...\n",
|
|
"\u001b[33mUser_console_and_Python_code_interpreter\u001b[0m (to chat_manager):\n",
|
|
"Find a recent paper about explainable AI on arxiv and find its potential applications in medical.\n",
|
|
"\n",
|
|
"--------------------------------------------------------------------------------\n",
|
|
"\u001b[33mArtificial_Intelligence_Engineer\u001b[0m (to chat_manager):\n",
|
|
"\n",
|
|
"To find a recent paper about explainable AI on arXiv and explore its potential applications in the medical field, I will perform the following steps:\n",
|
|
"\n",
|
|
"1. Write a Python script to interact with the arXiv API to search for recent papers related to explainable AI.\n",
|
|
"2. Analyze the abstract and content of the retrieved paper to understand its implications and potential applications in the medical domain.\n",
|
|
"3. Discuss the findings with the team.\n",
|
|
"\n",
|
|
"Let's start with step 1. Below is a Python script that uses the `arxiv` library to search for papers related to explainable AI. If you don't have the `arxiv` library installed, you can install it using `pip install arxiv`.\n",
|
|
"\n",
|
|
"```python\n",
|
|
"import arxiv\n",
|
|
"\n",
|
|
"# Define the search query and parameters\n",
|
|
"search_query = 'cat:cs.AI AND ti:explainable'\n",
|
|
"max_results = 5\n",
|
|
"sort_by = arxiv.SortCriterion.SubmittedDate\n",
|
|
"\n",
|
|
"# Search for papers on arXiv\n",
|
|
"search = arxiv.Search(\n",
|
|
" query=search_query,\n",
|
|
" max_results=max_results,\n",
|
|
" sort_by=sort_by,\n",
|
|
" sort_order=arxiv.SortOrder.Descending\n",
|
|
")\n",
|
|
"\n",
|
|
"# Fetch the results\n",
|
|
"papers = list(search.results())\n",
|
|
"\n",
|
|
"# Print out the title and summary of the most recent paper\n",
|
|
"if papers:\n",
|
|
" recent_paper = papers[0]\n",
|
|
" print(f\"Title: {recent_paper.title}\\n\")\n",
|
|
" print(f\"Authors: {', '.join(author.name for author in recent_paper.authors)}\\n\")\n",
|
|
" print(f\"Abstract: {recent_paper.summary}\\n\")\n",
|
|
" print(f\"Published: {recent_paper.published}\\n\")\n",
|
|
" print(f\"Link: {recent_paper.entry_id}\\n\")\n",
|
|
"else:\n",
|
|
" print(\"No papers found on the topic of explainable AI.\")\n",
|
|
"```\n",
|
|
"\n",
|
|
"Please note that this script is meant to be run in a Python environment where you have the necessary permissions and capabilities to install and use external libraries. If you are ready to proceed, you can run this script in your Python environment to retrieve the most recent papers on explainable AI from arXiv.\n",
|
|
"\n",
|
|
"Once we have the paper, we can move on to step 2 and analyze its content for potential medical applications. Since I cannot execute Python code directly, you would need to run the script on your local machine or development environment. After running the script, you can share the paper's title and abstract here, and we can discuss its potential applications in the medical field.\n",
|
|
"\n",
|
|
"--------------------------------------------------------------------------------\n",
|
|
"\u001b[31m\n",
|
|
">>>>>>>> EXECUTING CODE BLOCK 0 (inferred language is python)...\u001b[0m\n",
|
|
"\u001b[33mUser_console_and_Python_code_interpreter\u001b[0m (to chat_manager):\n",
|
|
"\n",
|
|
"exitcode: 0 (execution succeeded)\n",
|
|
"Code output: \n",
|
|
"Title: Transparency and Privacy: The Role of Explainable AI and Federated Learning in Financial Fraud Detection\n",
|
|
"\n",
|
|
"Authors: Tomisin Awosika, Raj Mani Shukla, Bernardi Pranggono\n",
|
|
"\n",
|
|
"Abstract: Fraudulent transactions and how to detect them remain a significant problem\n",
|
|
"for financial institutions around the world. The need for advanced fraud\n",
|
|
"detection systems to safeguard assets and maintain customer trust is paramount\n",
|
|
"for financial institutions, but some factors make the development of effective\n",
|
|
"and efficient fraud detection systems a challenge. One of such factors is the\n",
|
|
"fact that fraudulent transactions are rare and that many transaction datasets\n",
|
|
"are imbalanced; that is, there are fewer significant samples of fraudulent\n",
|
|
"transactions than legitimate ones. This data imbalance can affect the\n",
|
|
"performance or reliability of the fraud detection model. Moreover, due to the\n",
|
|
"data privacy laws that all financial institutions are subject to follow,\n",
|
|
"sharing customer data to facilitate a higher-performing centralized model is\n",
|
|
"impossible. Furthermore, the fraud detection technique should be transparent so\n",
|
|
"that it does not affect the user experience. Hence, this research introduces a\n",
|
|
"novel approach using Federated Learning (FL) and Explainable AI (XAI) to\n",
|
|
"address these challenges. FL enables financial institutions to collaboratively\n",
|
|
"train a model to detect fraudulent transactions without directly sharing\n",
|
|
"customer data, thereby preserving data privacy and confidentiality. Meanwhile,\n",
|
|
"the integration of XAI ensures that the predictions made by the model can be\n",
|
|
"understood and interpreted by human experts, adding a layer of transparency and\n",
|
|
"trust to the system. Experimental results, based on realistic transaction\n",
|
|
"datasets, reveal that the FL-based fraud detection system consistently\n",
|
|
"demonstrates high performance metrics. This study grounds FL's potential as an\n",
|
|
"effective and privacy-preserving tool in the fight against fraud.\n",
|
|
"\n",
|
|
"Published: 2023-12-20 18:26:59+00:00\n",
|
|
"\n",
|
|
"Link: http://arxiv.org/abs/2312.13334v1\n",
|
|
"\n",
|
|
"\n",
|
|
"--------------------------------------------------------------------------------\n",
|
|
"\u001b[33mArtificial_Intelligence_Engineer\u001b[0m (to chat_manager):\n",
|
|
"\n",
|
|
"The paper you've found, titled \"Transparency and Privacy: The Role of Explainable AI and Federated Learning in Financial Fraud Detection,\" discusses the use of Explainable AI (XAI) and Federated Learning (FL) in the context of financial fraud detection. While the paper's primary focus is on the financial industry, the concepts of XAI and FL can be applied to the medical field as well.\n",
|
|
"\n",
|
|
"Potential applications of XAI and FL in the medical field include:\n",
|
|
"\n",
|
|
"1. **Patient Data Privacy**: Similar to financial institutions, healthcare providers must adhere to strict privacy regulations like HIPAA in the United States. FL can enable different healthcare institutions to collaboratively train machine learning models on patient data without sharing the data itself, thus preserving patient privacy.\n",
|
|
"\n",
|
|
"2. **Disease Diagnosis**: XAI can help in developing transparent AI systems that assist doctors in diagnosing diseases by providing interpretable predictions. This transparency is crucial for gaining the trust of medical professionals and patients when AI is used to support decision-making in diagnoses.\n",
|
|
"\n",
|
|
"3. **Personalized Medicine**: By using FL, medical researchers can develop more generalized and robust models for personalized medicine, as they can learn from a wide range of patient data across different institutions without compromising privacy.\n",
|
|
"\n",
|
|
"4. **Fraud Detection in Healthcare**: The approach discussed in the paper can be adapted to detect fraudulent activities within healthcare, such as insurance fraud or prescription fraud, by training models across various healthcare providers.\n",
|
|
"\n",
|
|
"5. **Clinical Trial Research**: FL can facilitate the analysis of clinical trial data from multiple sources, enhancing the development of new drugs and treatments while maintaining the confidentiality of trial participants.\n",
|
|
"\n",
|
|
"6. **Predictive Analytics**: XAI can improve predictive analytics in healthcare by providing insights into the risk factors and predictors of patient outcomes, making it easier for clinicians to understand and trust the predictions made by AI models.\n",
|
|
"\n",
|
|
"7. **Medical Imaging**: In medical imaging, XAI can help radiologists and other specialists understand the reasoning behind AI-generated insights, which can be critical for early detection and treatment planning.\n",
|
|
"\n",
|
|
"The integration of XAI ensures that the AI's decision-making process in these applications is transparent, which is essential for clinical acceptance. Meanwhile, FL addresses the challenge of leveraging large-scale, diverse datasets while respecting privacy concerns, which is particularly relevant in the medical field due to the sensitive nature of health data.\n",
|
|
"\n",
|
|
"To discuss these findings with your team, you can highlight the parallels between the financial and medical domains in terms of data privacy and the need for transparency in AI systems. The paper's approach can be a starting point for developing similar systems in healthcare that benefit from the privacy-preserving and explainable nature of the technologies discussed.\n",
|
|
"\n",
|
|
"--------------------------------------------------------------------------------\n",
|
|
"\u001b[33mUser_console_and_Python_code_interpreter\u001b[0m (to chat_manager):\n",
|
|
"\n",
|
|
"\n",
|
|
"--------------------------------------------------------------------------------\n",
|
|
"\u001b[33mSoftware_Developer\u001b[0m (to chat_manager):\n",
|
|
"\n",
|
|
"TERMINATE\n",
|
|
"\n",
|
|
"--------------------------------------------------------------------------------\n",
|
|
"All agents have been cleared.\n"
|
|
]
|
|
}
|
|
],
|
|
"source": [
|
|
"new_builder = AgentBuilder(\n",
|
|
" config_file_or_env=config_file_or_env, builder_model=\"gpt-4-1106-preview\", agent_model=\"gpt-4-1106-preview\"\n",
|
|
")\n",
|
|
"agent_list, _ = new_builder.build_from_library(building_task, library_path_or_json, llm_config)\n",
|
|
"start_task(\n",
|
|
" execution_task=\"Find a recent paper about explainable AI on arxiv and find its potential applications in medical.\",\n",
|
|
" agent_list=agent_list,\n",
|
|
")\n",
|
|
"new_builder.clear_all_agents()"
|
|
],
|
|
"metadata": {
|
|
"collapsed": false,
|
|
"ExecuteTime": {
|
|
"end_time": "2023-12-23T07:46:02.075542200Z",
|
|
"start_time": "2023-12-23T07:43:55.489042900Z"
|
|
}
|
|
},
|
|
"id": "5c669b76b2c9b750"
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"source": [
|
|
"## Build agents from library (by profile-task similarity)\n",
|
|
"We also support using embedding similarity to select agents. You can use a [Sentence-Transformers model](https://www.sbert.net/docs/pretrained_models.html) as an embedding extractor, and AgentBuilder will select agents with profiles that are the most similar to the building task from the library by comparing their embedding similarity. This will reduce the use of LLMs but may have less accuracy."
|
|
],
|
|
"metadata": {
|
|
"collapsed": false
|
|
},
|
|
"id": "c7a10e6fa00a5a0d"
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"outputs": [
|
|
{
|
|
"name": "stdout",
|
|
"output_type": "stream",
|
|
"text": [
|
|
"Looking for suitable agents in ./agent_library_example.json...\n",
|
|
"['Programmer', 'Mathematician', 'Software_Developer', 'Physicist', 'Data_Analyst'] are selected.\n",
|
|
"Preparing configuration for Programmer...\n",
|
|
"Preparing configuration for Mathematician...\n",
|
|
"Preparing configuration for Software_Developer...\n",
|
|
"Preparing configuration for Physicist...\n",
|
|
"Preparing configuration for Data_Analyst...\n",
|
|
"Creating agent Programmer with backbone gpt-4-1106-preview...\n",
|
|
"Creating agent Mathematician with backbone gpt-4-1106-preview...\n",
|
|
"Creating agent Software_Developer with backbone gpt-4-1106-preview...\n",
|
|
"Creating agent Physicist with backbone gpt-4-1106-preview...\n",
|
|
"Creating agent Data_Analyst with backbone gpt-4-1106-preview...\n",
|
|
"Adding user console proxy...\n",
|
|
"\u001b[33mUser_console_and_Python_code_interpreter\u001b[0m (to chat_manager):\n",
|
|
"Find a recent paper about gpt-4 on arxiv and find its potential applications in software.\n",
|
|
"\n",
|
|
"--------------------------------------------------------------------------------\n",
|
|
"\u001b[33mProgrammer\u001b[0m (to chat_manager):\n",
|
|
"\n",
|
|
"To find a recent paper about GPT-4 on arXiv and analyze its potential applications in software, I would typically write a Python script that uses the arXiv API to search for papers related to GPT-4. However, as an AI, I can't execute scripts or make live API calls. Instead, I can guide you through the process of writing such a script.\n",
|
|
"\n",
|
|
"Here's a Python script template that you can use to search for papers on arXiv:\n",
|
|
"\n",
|
|
"```python\n",
|
|
"import requests\n",
|
|
"import feedparser\n",
|
|
"\n",
|
|
"# Define the search parameters\n",
|
|
"base_url = 'http://export.arxiv.org/api/query?'\n",
|
|
"search_query = 'all:gpt-4' # Search for GPT-4 in all fields\n",
|
|
"start = 0 # Start at the first result\n",
|
|
"max_results = 10 # Maximum number of results\n",
|
|
"\n",
|
|
"# Construct the query with the search parameters\n",
|
|
"query = f'search_query={search_query}&start={start}&max_results={max_results}'\n",
|
|
"url = base_url + query\n",
|
|
"\n",
|
|
"# Perform the GET request\n",
|
|
"response = requests.get(url)\n",
|
|
"\n",
|
|
"# Parse the response using feedparser\n",
|
|
"feed = feedparser.parse(response.content)\n",
|
|
"\n",
|
|
"# Loop through the entries, and print out information\n",
|
|
"for entry in feed.entries:\n",
|
|
" print('Title:', entry.title)\n",
|
|
" print('Authors:', ', '.join(author.name for author in entry.authors))\n",
|
|
" print('Abstract:', entry.summary)\n",
|
|
" print('arXiv ID:', entry.id.split('/abs/')[-1])\n",
|
|
" print('Publication Date:', entry.published)\n",
|
|
" print('Link:', entry.link)\n",
|
|
" print('\\n')\n",
|
|
"\n",
|
|
"# Note: To analyze the potential applications in software, you would need to\n",
|
|
"# read the abstracts and possibly the full papers to determine their relevance\n",
|
|
"# and applications. This part cannot be fully automated and requires human expertise.\n",
|
|
"```\n",
|
|
"\n",
|
|
"To run this script, you'll need Python installed on your machine along with the `requests` and `feedparser` libraries, which you can install using pip:\n",
|
|
"\n",
|
|
"```bash\n",
|
|
"pip install requests feedparser\n",
|
|
"```\n",
|
|
"\n",
|
|
"After running the script, you'll get a list of recent papers related to GPT-4. You would then need to manually read through the abstracts and potentially the full papers to understand their potential applications in software.\n",
|
|
"\n",
|
|
"Remember, the actual applications will depend on the content of the papers, which might include but are not limited to natural language processing, content generation, automation, decision support systems, and more.\n",
|
|
"\n",
|
|
"--------------------------------------------------------------------------------\n",
|
|
"\u001b[31m\n",
|
|
">>>>>>>> EXECUTING CODE BLOCK 0 (inferred language is python)...\u001b[0m\n",
|
|
"\u001b[31m\n",
|
|
">>>>>>>> EXECUTING CODE BLOCK 1 (inferred language is bash)...\u001b[0m\n",
|
|
"\u001b[33mUser_console_and_Python_code_interpreter\u001b[0m (to chat_manager):\n",
|
|
"\n",
|
|
"exitcode: 0 (execution succeeded)\n",
|
|
"Code output: \n",
|
|
"Title: Can LLMs like GPT-4 outperform traditional AI tools in dementia\n",
|
|
" diagnosis? Maybe, but not today\n",
|
|
"Authors: Zhuo Wang, Rongzhen Li, Bowen Dong, Jie Wang, Xiuxing Li, Ning Liu, Chenhui Mao, Wei Zhang, Liling Dong, Jing Gao, Jianyong Wang\n",
|
|
"Abstract: Recent investigations show that large language models (LLMs), specifically\n",
|
|
"GPT-4, not only have remarkable capabilities in common Natural Language\n",
|
|
"Processing (NLP) tasks but also exhibit human-level performance on various\n",
|
|
"professional and academic benchmarks. However, whether GPT-4 can be directly\n",
|
|
"used in practical applications and replace traditional artificial intelligence\n",
|
|
"(AI) tools in specialized domains requires further experimental validation. In\n",
|
|
"this paper, we explore the potential of LLMs such as GPT-4 to outperform\n",
|
|
"traditional AI tools in dementia diagnosis. Comprehensive comparisons between\n",
|
|
"GPT-4 and traditional AI tools are conducted to examine their diagnostic\n",
|
|
"accuracy in a clinical setting. Experimental results on two real clinical\n",
|
|
"datasets show that, although LLMs like GPT-4 demonstrate potential for future\n",
|
|
"advancements in dementia diagnosis, they currently do not surpass the\n",
|
|
"performance of traditional AI tools. The interpretability and faithfulness of\n",
|
|
"GPT-4 are also evaluated by comparison with real doctors. We discuss the\n",
|
|
"limitations of GPT-4 in its current state and propose future research\n",
|
|
"directions to enhance GPT-4 in dementia diagnosis.\n",
|
|
"arXiv ID: 2306.01499v1\n",
|
|
"Publication Date: 2023-06-02T12:47:45Z\n",
|
|
"Link: http://arxiv.org/abs/2306.01499v1\n",
|
|
"\n",
|
|
"\n",
|
|
"Title: GPT-4 Can't Reason\n",
|
|
"Authors: Konstantine Arkoudas\n",
|
|
"Abstract: GPT-4 was released in March 2023 to wide acclaim, marking a very substantial\n",
|
|
"improvement across the board over GPT-3.5 (OpenAI's previously best model,\n",
|
|
"which had powered the initial release of ChatGPT). However, despite the\n",
|
|
"genuinely impressive improvement, there are good reasons to be highly skeptical\n",
|
|
"of GPT-4's ability to reason. This position paper discusses the nature of\n",
|
|
"reasoning; criticizes the current formulation of reasoning problems in the NLP\n",
|
|
"community, as well as the way in which LLM reasoning performance is currently\n",
|
|
"evaluated; introduces a small collection of 21 diverse reasoning problems; and\n",
|
|
"performs a detailed qualitative evaluation of GPT-4's performance on those\n",
|
|
"problems. Based on this analysis, the paper concludes that, despite its\n",
|
|
"occasional flashes of analytical brilliance, GPT-4 at present is utterly\n",
|
|
"incapable of reasoning.\n",
|
|
"arXiv ID: 2308.03762v2\n",
|
|
"Publication Date: 2023-07-21T17:04:25Z\n",
|
|
"Link: http://arxiv.org/abs/2308.03762v2\n",
|
|
"\n",
|
|
"\n",
|
|
"Title: Evaluating the Logical Reasoning Ability of ChatGPT and GPT-4\n",
|
|
"Authors: Hanmeng Liu, Ruoxi Ning, Zhiyang Teng, Jian Liu, Qiji Zhou, Yue Zhang\n",
|
|
"Abstract: Harnessing logical reasoning ability is a comprehensive natural language\n",
|
|
"understanding endeavor. With the release of Generative Pretrained Transformer 4\n",
|
|
"(GPT-4), highlighted as \"advanced\" at reasoning tasks, we are eager to learn\n",
|
|
"the GPT-4 performance on various logical reasoning tasks. This report analyses\n",
|
|
"multiple logical reasoning datasets, with popular benchmarks like LogiQA and\n",
|
|
"ReClor, and newly-released datasets like AR-LSAT. We test the multi-choice\n",
|
|
"reading comprehension and natural language inference tasks with benchmarks\n",
|
|
"requiring logical reasoning. We further construct a logical reasoning\n",
|
|
"out-of-distribution dataset to investigate the robustness of ChatGPT and GPT-4.\n",
|
|
"We also make a performance comparison between ChatGPT and GPT-4. Experiment\n",
|
|
"results show that ChatGPT performs significantly better than the RoBERTa\n",
|
|
"fine-tuning method on most logical reasoning benchmarks. With early access to\n",
|
|
"the GPT-4 API we are able to conduct intense experiments on the GPT-4 model.\n",
|
|
"The results show GPT-4 yields even higher performance on most logical reasoning\n",
|
|
"datasets. Among benchmarks, ChatGPT and GPT-4 do relatively well on well-known\n",
|
|
"datasets like LogiQA and ReClor. However, the performance drops significantly\n",
|
|
"when handling newly released and out-of-distribution datasets. Logical\n",
|
|
"reasoning remains challenging for ChatGPT and GPT-4, especially on\n",
|
|
"out-of-distribution and natural language inference datasets. We release the\n",
|
|
"prompt-style logical reasoning datasets as a benchmark suite and name it\n",
|
|
"LogiEval.\n",
|
|
"arXiv ID: 2304.03439v3\n",
|
|
"Publication Date: 2023-04-07T01:37:45Z\n",
|
|
"Link: http://arxiv.org/abs/2304.03439v3\n",
|
|
"\n",
|
|
"\n",
|
|
"Title: How is ChatGPT's behavior changing over time?\n",
|
|
"Authors: Lingjiao Chen, Matei Zaharia, James Zou\n",
|
|
"Abstract: GPT-3.5 and GPT-4 are the two most widely used large language model (LLM)\n",
|
|
"services. However, when and how these models are updated over time is opaque.\n",
|
|
"Here, we evaluate the March 2023 and June 2023 versions of GPT-3.5 and GPT-4 on\n",
|
|
"several diverse tasks: 1) math problems, 2) sensitive/dangerous questions, 3)\n",
|
|
"opinion surveys, 4) multi-hop knowledge-intensive questions, 5) generating\n",
|
|
"code, 6) US Medical License tests, and 7) visual reasoning. We find that the\n",
|
|
"performance and behavior of both GPT-3.5 and GPT-4 can vary greatly over time.\n",
|
|
"For example, GPT-4 (March 2023) was reasonable at identifying prime vs.\n",
|
|
"composite numbers (84% accuracy) but GPT-4 (June 2023) was poor on these same\n",
|
|
"questions (51% accuracy). This is partly explained by a drop in GPT-4's amenity\n",
|
|
"to follow chain-of-thought prompting. Interestingly, GPT-3.5 was much better in\n",
|
|
"June than in March in this task. GPT-4 became less willing to answer sensitive\n",
|
|
"questions and opinion survey questions in June than in March. GPT-4 performed\n",
|
|
"better at multi-hop questions in June than in March, while GPT-3.5's\n",
|
|
"performance dropped on this task. Both GPT-4 and GPT-3.5 had more formatting\n",
|
|
"mistakes in code generation in June than in March. We provide evidence that\n",
|
|
"GPT-4's ability to follow user instructions has decreased over time, which is\n",
|
|
"one common factor behind the many behavior drifts. Overall, our findings show\n",
|
|
"that the behavior of the \"same\" LLM service can change substantially in a\n",
|
|
"relatively short amount of time, highlighting the need for continuous\n",
|
|
"monitoring of LLMs.\n",
|
|
"arXiv ID: 2307.09009v3\n",
|
|
"Publication Date: 2023-07-18T06:56:08Z\n",
|
|
"Link: http://arxiv.org/abs/2307.09009v3\n",
|
|
"\n",
|
|
"\n",
|
|
"Title: Gpt-4: A Review on Advancements and Opportunities in Natural Language\n",
|
|
" Processing\n",
|
|
"Authors: Jawid Ahmad Baktash, Mursal Dawodi\n",
|
|
"Abstract: Generative Pre-trained Transformer 4 (GPT-4) is the fourth-generation\n",
|
|
"language model in the GPT series, developed by OpenAI, which promises\n",
|
|
"significant advancements in the field of natural language processing (NLP). In\n",
|
|
"this research article, we have discussed the features of GPT-4, its potential\n",
|
|
"applications, and the challenges that it might face. We have also compared\n",
|
|
"GPT-4 with its predecessor, GPT-3. GPT-4 has a larger model size (more than one\n",
|
|
"trillion), better multilingual capabilities, improved contextual understanding,\n",
|
|
"and reasoning capabilities than GPT-3. Some of the potential applications of\n",
|
|
"GPT-4 include chatbots, personal assistants, language translation, text\n",
|
|
"summarization, and question-answering. However, GPT-4 poses several challenges\n",
|
|
"and limitations such as computational requirements, data requirements, and\n",
|
|
"ethical concerns.\n",
|
|
"arXiv ID: 2305.03195v1\n",
|
|
"Publication Date: 2023-05-04T22:46:43Z\n",
|
|
"Link: http://arxiv.org/abs/2305.03195v1\n",
|
|
"\n",
|
|
"\n",
|
|
"Title: Is GPT-4 a Good Data Analyst?\n",
|
|
"Authors: Liying Cheng, Xingxuan Li, Lidong Bing\n",
|
|
"Abstract: As large language models (LLMs) have demonstrated their powerful capabilities\n",
|
|
"in plenty of domains and tasks, including context understanding, code\n",
|
|
"generation, language generation, data storytelling, etc., many data analysts\n",
|
|
"may raise concerns if their jobs will be replaced by artificial intelligence\n",
|
|
"(AI). This controversial topic has drawn great attention in public. However, we\n",
|
|
"are still at a stage of divergent opinions without any definitive conclusion.\n",
|
|
"Motivated by this, we raise the research question of \"is GPT-4 a good data\n",
|
|
"analyst?\" in this work and aim to answer it by conducting head-to-head\n",
|
|
"comparative studies. In detail, we regard GPT-4 as a data analyst to perform\n",
|
|
"end-to-end data analysis with databases from a wide range of domains. We\n",
|
|
"propose a framework to tackle the problems by carefully designing the prompts\n",
|
|
"for GPT-4 to conduct experiments. We also design several task-specific\n",
|
|
"evaluation metrics to systematically compare the performance between several\n",
|
|
"professional human data analysts and GPT-4. Experimental results show that\n",
|
|
"GPT-4 can achieve comparable performance to humans. We also provide in-depth\n",
|
|
"discussions about our results to shed light on further studies before reaching\n",
|
|
"the conclusion that GPT-4 can replace data analysts.\n",
|
|
"arXiv ID: 2305.15038v2\n",
|
|
"Publication Date: 2023-05-24T11:26:59Z\n",
|
|
"Link: http://arxiv.org/abs/2305.15038v2\n",
|
|
"\n",
|
|
"\n",
|
|
"Title: Graph Neural Architecture Search with GPT-4\n",
|
|
"Authors: Haishuai Wang, Yang Gao, Xin Zheng, Peng Zhang, Hongyang Chen, Jiajun Bu\n",
|
|
"Abstract: Graph Neural Architecture Search (GNAS) has shown promising results in\n",
|
|
"automatically designing graph neural networks. However, GNAS still requires\n",
|
|
"intensive human labor with rich domain knowledge to design the search space and\n",
|
|
"search strategy. In this paper, we integrate GPT-4 into GNAS and propose a new\n",
|
|
"GPT-4 based Graph Neural Architecture Search method (GPT4GNAS for short). The\n",
|
|
"basic idea of our method is to design a new class of prompts for GPT-4 to guide\n",
|
|
"GPT-4 toward the generative task of graph neural architectures. The prompts\n",
|
|
"consist of descriptions of the search space, search strategy, and search\n",
|
|
"feedback of GNAS. By iteratively running GPT-4 with the prompts, GPT4GNAS\n",
|
|
"generates more accurate graph neural networks with fast convergence.\n",
|
|
"Experimental results show that embedding GPT-4 into GNAS outperforms the\n",
|
|
"state-of-the-art GNAS methods.\n",
|
|
"arXiv ID: 2310.01436v1\n",
|
|
"Publication Date: 2023-09-30T08:05:59Z\n",
|
|
"Link: http://arxiv.org/abs/2310.01436v1\n",
|
|
"\n",
|
|
"\n",
|
|
"Title: Solving Challenging Math Word Problems Using GPT-4 Code Interpreter with\n",
|
|
" Code-based Self-Verification\n",
|
|
"Authors: Aojun Zhou, Ke Wang, Zimu Lu, Weikang Shi, Sichun Luo, Zipeng Qin, Shaoqing Lu, Anya Jia, Linqi Song, Mingjie Zhan, Hongsheng Li\n",
|
|
"Abstract: Recent progress in large language models (LLMs) like GPT-4 and PaLM-2 has\n",
|
|
"brought significant advancements in addressing math reasoning problems. In\n",
|
|
"particular, OpenAI's latest version of GPT-4, known as GPT-4 Code Interpreter,\n",
|
|
"shows remarkable performance on challenging math datasets. In this paper, we\n",
|
|
"explore the effect of code on enhancing LLMs' reasoning capability by\n",
|
|
"introducing different constraints on the \\textit{Code Usage Frequency} of GPT-4\n",
|
|
"Code Interpreter. We found that its success can be largely attributed to its\n",
|
|
"powerful skills in generating and executing code, evaluating the output of code\n",
|
|
"execution, and rectifying its solution when receiving unreasonable outputs.\n",
|
|
"Based on this insight, we propose a novel and effective prompting method,\n",
|
|
"explicit \\uline{c}ode-based \\uline{s}elf-\\uline{v}erification~(CSV), to further\n",
|
|
"boost the mathematical reasoning potential of GPT-4 Code Interpreter. This\n",
|
|
"method employs a zero-shot prompt on GPT-4 Code Interpreter to encourage it to\n",
|
|
"use code to self-verify its answers. In instances where the verification state\n",
|
|
"registers as ``False'', the model shall automatically amend its solution,\n",
|
|
"analogous to our approach of rectifying errors during a mathematics\n",
|
|
"examination. Furthermore, we recognize that the states of the verification\n",
|
|
"result indicate the confidence of a solution, which can improve the\n",
|
|
"effectiveness of majority voting. With GPT-4 Code Interpreter and CSV, we\n",
|
|
"achieve an impressive zero-shot accuracy on MATH dataset \\textbf{(53.9\\% $\\to$\n",
|
|
"84.3\\%)}.\n",
|
|
"arXiv ID: 2308.07921v1\n",
|
|
"Publication Date: 2023-08-15T17:58:45Z\n",
|
|
"Link: http://arxiv.org/abs/2308.07921v1\n",
|
|
"\n",
|
|
"\n",
|
|
"Title: OpenAI Cribbed Our Tax Example, But Can GPT-4 Really Do Tax?\n",
|
|
"Authors: Andrew Blair-Stanek, Nils Holzenberger, Benjamin Van Durme\n",
|
|
"Abstract: The authors explain where OpenAI got the tax law example in its livestream\n",
|
|
"demonstration of GPT-4, why GPT-4 got the wrong answer, and how it fails to\n",
|
|
"reliably calculate taxes.\n",
|
|
"arXiv ID: 2309.09992v1\n",
|
|
"Publication Date: 2023-09-15T20:00:27Z\n",
|
|
"Link: http://arxiv.org/abs/2309.09992v1\n",
|
|
"\n",
|
|
"\n",
|
|
"Title: Large Language Models' Understanding of Math: Source Criticism and\n",
|
|
" Extrapolation\n",
|
|
"Authors: Roozbeh Yousefzadeh, Xuenan Cao\n",
|
|
"Abstract: It has been suggested that large language models such as GPT-4 have acquired\n",
|
|
"some form of understanding beyond the correlations among the words in text\n",
|
|
"including some understanding of mathematics as well. Here, we perform a\n",
|
|
"critical inquiry into this claim by evaluating the mathematical understanding\n",
|
|
"of the GPT-4 model. Considering that GPT-4's training set is a secret, it is\n",
|
|
"not straightforward to evaluate whether the model's correct answers are based\n",
|
|
"on a mathematical understanding or based on replication of proofs that the\n",
|
|
"model has seen before. We specifically craft mathematical questions which their\n",
|
|
"formal proofs are not readily available on the web, proofs that are more likely\n",
|
|
"not seen by the GPT-4. We see that GPT-4 is unable to solve those problems\n",
|
|
"despite their simplicity. It is hard to find scientific evidence suggesting\n",
|
|
"that GPT-4 has acquired an understanding of even basic mathematical concepts. A\n",
|
|
"straightforward way to find failure modes of GPT-4 in theorem proving is to\n",
|
|
"craft questions where their formal proofs are not available on the web. Our\n",
|
|
"finding suggests that GPT-4's ability is to reproduce, rephrase, and polish the\n",
|
|
"mathematical proofs that it has seen before, and not in grasping mathematical\n",
|
|
"concepts. We also see that GPT-4's ability to prove mathematical theorems is\n",
|
|
"continuously expanding over time despite the claim that it is a fixed model. We\n",
|
|
"suggest that the task of proving mathematical theorems in formal language is\n",
|
|
"comparable to the methods used in search engines such as Google while\n",
|
|
"predicting the next word in a sentence may be a misguided approach, a recipe\n",
|
|
"that often leads to excessive extrapolation and eventual failures. Prompting\n",
|
|
"the GPT-4 over and over may benefit the GPT-4 and the OpenAI, but we question\n",
|
|
"whether it is valuable for machine learning or for theorem proving.\n",
|
|
"arXiv ID: 2311.07618v1\n",
|
|
"Publication Date: 2023-11-12T07:52:32Z\n",
|
|
"Link: http://arxiv.org/abs/2311.07618v1\n",
|
|
"\n",
|
|
"\n",
|
|
"\n",
|
|
"Requirement already satisfied: requests in /home/elpis_ubuntu/miniconda3/envs/llm/lib/python3.11/site-packages (2.31.0)\n",
|
|
"Requirement already satisfied: feedparser in /home/elpis_ubuntu/miniconda3/envs/llm/lib/python3.11/site-packages (6.0.10)\n",
|
|
"Requirement already satisfied: charset-normalizer<4,>=2 in /home/elpis_ubuntu/miniconda3/envs/llm/lib/python3.11/site-packages (from requests) (3.3.2)\n",
|
|
"Requirement already satisfied: idna<4,>=2.5 in /home/elpis_ubuntu/miniconda3/envs/llm/lib/python3.11/site-packages (from requests) (3.6)\n",
|
|
"Requirement already satisfied: urllib3<3,>=1.21.1 in /home/elpis_ubuntu/miniconda3/envs/llm/lib/python3.11/site-packages (from requests) (1.26.18)\n",
|
|
"Requirement already satisfied: certifi>=2017.4.17 in /home/elpis_ubuntu/miniconda3/envs/llm/lib/python3.11/site-packages (from requests) (2023.11.17)\n",
|
|
"Requirement already satisfied: sgmllib3k in /home/elpis_ubuntu/miniconda3/envs/llm/lib/python3.11/site-packages (from feedparser) (1.0.0)\n",
|
|
"\n",
|
|
"\n",
|
|
"--------------------------------------------------------------------------------\n",
|
|
"\u001b[33mSoftware_Developer\u001b[0m (to chat_manager):\n",
|
|
"\n",
|
|
"It appears that the code has been executed and the output lists several papers related to GPT-4 from arXiv. Let's analyze the potential applications in software based on the abstracts provided:\n",
|
|
"\n",
|
|
"1. **Dementia Diagnosis**: The first paper discusses the potential of GPT-4 in dementia diagnosis, comparing it with traditional AI tools. While GPT-4 shows promise, it currently does not outperform traditional methods. This suggests potential applications in healthcare software for diagnostic assistance.\n",
|
|
"\n",
|
|
"2. **Reasoning Ability**: The second paper criticizes GPT-4's reasoning ability, indicating that while it has improved over its predecessors, it still lacks true reasoning capabilities. This suggests that applications requiring deep reasoning, such as complex decision-making systems, may still be out of reach.\n",
|
|
"\n",
|
|
"3. **Logical Reasoning**: The third paper evaluates GPT-4's logical reasoning ability and finds that it performs well on known datasets but struggles with out-of-distribution and natural language inference datasets. This implies potential applications in enhancing logical reasoning in software systems, with a focus on improving robustness.\n",
|
|
"\n",
|
|
"4. **Behavior Over Time**: The fourth paper examines how GPT-3.5 and GPT-4's behavior changes over time, which is crucial for applications that require stability and predictability, such as automated monitoring systems.\n",
|
|
"\n",
|
|
"5. **Advancements in NLP**: The fifth paper reviews GPT-4's advancements and opportunities in NLP, suggesting applications in chatbots, personal assistants, language translation, text summarization, and question-answering systems.\n",
|
|
"\n",
|
|
"6. **Data Analysis**: The sixth paper explores whether GPT-4 can replace human data analysts. The results show that GPT-4 can perform comparably to humans, indicating potential applications in data analysis software tools.\n",
|
|
"\n",
|
|
"7. **Graph Neural Architecture Search**: The seventh paper introduces a method to use GPT-4 for graph neural architecture search, which could be applied in software for designing more accurate graph neural networks.\n",
|
|
"\n",
|
|
"8. **Math Word Problems**: The eighth paper discusses using GPT-4 for solving math word problems with self-verification, suggesting applications in educational software and tools that require mathematical problem-solving capabilities.\n",
|
|
"\n",
|
|
"9. **Tax Calculation**: The ninth paper questions GPT-4's ability to handle tax calculations, which is relevant for financial software that requires accurate and reliable tax computation.\n",
|
|
"\n",
|
|
"10. **Mathematical Understanding**: The tenth paper critically evaluates GPT-4's mathematical understanding, indicating that while it can reproduce known proofs, it struggles with novel problems. This suggests that while GPT-4 can assist in mathematical software, it may not yet be suitable for generating new mathematical insights.\n",
|
|
"\n",
|
|
"In summary, the potential applications of GPT-4 in software are vast, ranging from healthcare diagnostics to educational tools, data analysis, and NLP applications. However, limitations in reasoning, robustness, and novel problem-solving must be considered when integrating GPT-4 into software solutions.\n",
|
|
"\n",
|
|
"--------------------------------------------------------------------------------\n",
|
|
"\u001b[33mMathematician\u001b[0m (to chat_manager):\n",
|
|
"\n",
|
|
"TERMINATE\n",
|
|
"\n",
|
|
"--------------------------------------------------------------------------------\n",
|
|
"All agents have been cleared.\n"
|
|
]
|
|
}
|
|
],
|
|
"source": [
|
|
"new_builder = AgentBuilder(\n",
|
|
" config_file_or_env=config_file_or_env, builder_model=\"gpt-4-1106-preview\", agent_model=\"gpt-4-1106-preview\"\n",
|
|
")\n",
|
|
"agent_list, _ = new_builder.build_from_library(\n",
|
|
" building_task, library_path_or_json, llm_config, embedding_model=\"all-mpnet-base-v2\"\n",
|
|
")\n",
|
|
"start_task(\n",
|
|
" execution_task=\"Find a recent paper about gpt-4 on arxiv and find its potential applications in software.\",\n",
|
|
" agent_list=agent_list,\n",
|
|
")\n",
|
|
"new_builder.clear_all_agents()"
|
|
],
|
|
"metadata": {
|
|
"collapsed": false,
|
|
"ExecuteTime": {
|
|
"end_time": "2023-12-26T17:01:29.333975100Z",
|
|
"start_time": "2023-12-26T16:58:11.070813500Z"
|
|
}
|
|
},
|
|
"id": "521dc5f961efde59",
|
|
"execution_count": 3
|
|
}
|
|
],
|
|
"metadata": {
|
|
"kernelspec": {
|
|
"display_name": "Python 3 (ipykernel)",
|
|
"language": "python",
|
|
"name": "python3"
|
|
},
|
|
"language_info": {
|
|
"codemirror_mode": {
|
|
"name": "ipython",
|
|
"version": 3
|
|
},
|
|
"file_extension": ".py",
|
|
"mimetype": "text/x-python",
|
|
"name": "python",
|
|
"nbconvert_exporter": "python",
|
|
"pygments_lexer": "ipython3",
|
|
"version": "3.10.11"
|
|
}
|
|
},
|
|
"nbformat": 4,
|
|
"nbformat_minor": 5
|
|
}
|