mirror of https://github.com/microsoft/autogen.git
546 lines
33 KiB
Plaintext
546 lines
33 KiB
Plaintext
{
|
|
"cells": [
|
|
{
|
|
"attachments": {},
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"# StateFlow: Build Workflows through State-Oriented Actions\n",
|
|
"\n",
|
|
"AutoGen offers conversable agents powered by LLM, tool or human, which can be used to perform tasks collectively via automated chat. In this notebook, we introduce how to use groupchat to build workflows with AutoGen agents from a state-oriented perspective.\n",
|
|
"\n",
|
|
"\n",
|
|
"````{=mdx}\n",
|
|
":::info Requirements\n",
|
|
"Install `pyautogen`:\n",
|
|
"```bash\n",
|
|
"pip install pyautogen\n",
|
|
"```\n",
|
|
"\n",
|
|
"For more information, please refer to the [installation guide](/docs/installation/).\n",
|
|
":::\n",
|
|
"````"
|
|
]
|
|
},
|
|
{
|
|
"attachments": {},
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"## Set your API Endpoint\n",
|
|
"\n",
|
|
"The [`config_list_from_json`](https://microsoft.github.io/autogen/docs/reference/oai/openai_utils#config_list_from_json) function loads a list of configurations from an environment variable or a json file."
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": 1,
|
|
"metadata": {},
|
|
"outputs": [],
|
|
"source": [
|
|
"import autogen\n",
|
|
"\n",
|
|
"config_list = autogen.config_list_from_json(\n",
|
|
" \"OAI_CONFIG_LIST\",\n",
|
|
" filter_dict={\n",
|
|
" \"tags\": [\"gpt-4\", \"gpt-4-32k\"],\n",
|
|
" },\n",
|
|
")"
|
|
]
|
|
},
|
|
{
|
|
"attachments": {},
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"````{=mdx}\n",
|
|
":::tip\n",
|
|
"Learn more about configuring LLMs for agents [here](/docs/topics/llm_configuration).\n",
|
|
":::\n",
|
|
"````\n",
|
|
"\n",
|
|
"## A workflow for research\n",
|
|
"\n",
|
|
"<figure>\n",
|
|
" <img src=\"../website/blog/2024-02-29-StateFlow/img/sf_example_1.png\" width=\"700\"\n",
|
|
" alt=\"SF_Example_1\">\n",
|
|
" </img>\n",
|
|
"</figure>\n",
|
|
"\n",
|
|
"We define the following agents:\n",
|
|
"- Initializer: Start the workflow by sending a task.\n",
|
|
"- Coder: Retrieve papers from the internet by writing code.\n",
|
|
"- Executor: Execute the code.\n",
|
|
"- Scientist: Read the papers and write a summary.\n",
|
|
"\n",
|
|
"\n",
|
|
"In the Figure, we define a simple workflow for research with 4 states: Init, Retrieve, Reserach and End. Within each state, we will call different agents to perform the tasks.\n",
|
|
"- Init: We use the initializer to start the workflow.\n",
|
|
"- Retrieve: We will first call the coder to write code and then call the executor to execute the code.\n",
|
|
"- Research: We will call the scientist to read the papers and write a summary.\n",
|
|
"- End: We will end the workflow.\n",
|
|
"\n",
|
|
"Through customizing the speaker selection method, we can easily realize the state-oriented workflow by defining the transitions between different agents."
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": 2,
|
|
"metadata": {},
|
|
"outputs": [],
|
|
"source": [
|
|
"import tempfile\n",
|
|
"\n",
|
|
"from autogen.coding import LocalCommandLineCodeExecutor\n",
|
|
"\n",
|
|
"temp_dir = tempfile.TemporaryDirectory()\n",
|
|
"executor = LocalCommandLineCodeExecutor(\n",
|
|
" timeout=10, # Timeout for each code execution in seconds.\n",
|
|
" work_dir=temp_dir.name, # Use the temporary directory to store the code files.\n",
|
|
")\n",
|
|
"\n",
|
|
"gpt4_config = {\n",
|
|
" \"cache_seed\": False, # change the cache_seed for different trials\n",
|
|
" \"temperature\": 0,\n",
|
|
" \"config_list\": config_list,\n",
|
|
" \"timeout\": 120,\n",
|
|
"}\n",
|
|
"\n",
|
|
"initializer = autogen.UserProxyAgent(\n",
|
|
" name=\"Init\",\n",
|
|
" code_execution_config=False,\n",
|
|
")\n",
|
|
"\n",
|
|
"\n",
|
|
"coder = autogen.AssistantAgent(\n",
|
|
" name=\"Retrieve_Action_1\",\n",
|
|
" llm_config=gpt4_config,\n",
|
|
" system_message=\"\"\"You are the Coder. Given a topic, write code to retrieve related papers from the arXiv API, print their title, authors, abstract, and link.\n",
|
|
"You write python/shell code to solve tasks. Wrap the code in a code block that specifies the script type. The user can't modify your code. So do not suggest incomplete code which requires others to modify. Don't use a code block if it's not intended to be executed by the executor.\n",
|
|
"Don't include multiple code blocks in one response. Do not ask others to copy and paste the result. Check the execution result returned by the executor.\n",
|
|
"If the result indicates there is an error, fix the error and output the code again. Suggest the full code instead of partial code or code changes. If the error can't be fixed or if the task is not solved even after the code is executed successfully, analyze the problem, revisit your assumption, collect additional info you need, and think of a different approach to try.\n",
|
|
"\"\"\",\n",
|
|
")\n",
|
|
"executor = autogen.UserProxyAgent(\n",
|
|
" name=\"Retrieve_Action_2\",\n",
|
|
" system_message=\"Executor. Execute the code written by the Coder and report the result.\",\n",
|
|
" human_input_mode=\"NEVER\",\n",
|
|
" code_execution_config={\"executor\": executor},\n",
|
|
")\n",
|
|
"scientist = autogen.AssistantAgent(\n",
|
|
" name=\"Research_Action_1\",\n",
|
|
" llm_config=gpt4_config,\n",
|
|
" system_message=\"\"\"You are the Scientist. Please categorize papers after seeing their abstracts printed and create a markdown table with Domain, Title, Authors, Summary and Link\"\"\",\n",
|
|
")\n",
|
|
"\n",
|
|
"\n",
|
|
"def state_transition(last_speaker, groupchat):\n",
|
|
" messages = groupchat.messages\n",
|
|
"\n",
|
|
" if last_speaker is initializer:\n",
|
|
" # init -> retrieve\n",
|
|
" return coder\n",
|
|
" elif last_speaker is coder:\n",
|
|
" # retrieve: action 1 -> action 2\n",
|
|
" return executor\n",
|
|
" elif last_speaker is executor:\n",
|
|
" if messages[-1][\"content\"] == \"exitcode: 1\":\n",
|
|
" # retrieve --(execution failed)--> retrieve\n",
|
|
" return coder\n",
|
|
" else:\n",
|
|
" # retrieve --(execution sucess)--> research\n",
|
|
" return scientist\n",
|
|
" elif last_speaker == \"Scientist\":\n",
|
|
" # research -> end\n",
|
|
" return None\n",
|
|
"\n",
|
|
"\n",
|
|
"groupchat = autogen.GroupChat(\n",
|
|
" agents=[initializer, coder, executor, scientist],\n",
|
|
" messages=[],\n",
|
|
" max_round=20,\n",
|
|
" speaker_selection_method=state_transition,\n",
|
|
")\n",
|
|
"manager = autogen.GroupChatManager(groupchat=groupchat, llm_config=gpt4_config)"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": 3,
|
|
"metadata": {},
|
|
"outputs": [
|
|
{
|
|
"name": "stdout",
|
|
"output_type": "stream",
|
|
"text": [
|
|
"\u001b[33mInit\u001b[0m (to chat_manager):\n",
|
|
"\n",
|
|
"Topic: LLM applications papers from last week. Requirement: 5 - 10 papers from different domains.\n",
|
|
"\n",
|
|
"--------------------------------------------------------------------------------\n",
|
|
"\u001b[33mRetrieve_Action_1\u001b[0m (to chat_manager):\n",
|
|
"\n",
|
|
"To retrieve related papers from the arXiv API, we can use Python with the `requests` library to send a query to the API and parse the response. Below is a Python script that searches for papers related to \"LLM applications\" (Large Language Models applications) from the last week, across different domains, and prints out the required information for 5 to 10 papers.\n",
|
|
"\n",
|
|
"```python\n",
|
|
"import requests\n",
|
|
"from datetime import datetime, timedelta\n",
|
|
"import feedparser\n",
|
|
"\n",
|
|
"# Define the base URL for the arXiv API\n",
|
|
"ARXIV_API_URL = \"http://export.arxiv.org/api/query?\"\n",
|
|
"\n",
|
|
"# Define the search parameters\n",
|
|
"search_query = \"all:\\\"LLM applications\\\"\"\n",
|
|
"start = 0\n",
|
|
"max_results = 10\n",
|
|
"sort_by = \"submittedDate\"\n",
|
|
"sort_order = \"descending\"\n",
|
|
"\n",
|
|
"# Calculate the date one week ago from today\n",
|
|
"one_week_ago = (datetime.now() - timedelta(days=7)).strftime('%Y-%m-%dT%H:%M:%SZ')\n",
|
|
"\n",
|
|
"# Construct the query\n",
|
|
"query = f\"search_query={search_query}&start={start}&max_results={max_results}&sortBy={sort_by}&sortOrder={sort_order}&submittedDateRange={one_week_ago}-\"\n",
|
|
"\n",
|
|
"# Send the request to the arXiv API\n",
|
|
"response = requests.get(ARXIV_API_URL + query)\n",
|
|
"\n",
|
|
"# Parse the response using feedparser\n",
|
|
"feed = feedparser.parse(response.content)\n",
|
|
"\n",
|
|
"# Print the title, authors, abstract, and link of each paper\n",
|
|
"for entry in feed.entries:\n",
|
|
" print(\"Title:\", entry.title)\n",
|
|
" print(\"Authors:\", ', '.join(author.name for author in entry.authors))\n",
|
|
" print(\"Abstract:\", entry.summary)\n",
|
|
" print(\"Link:\", entry.link)\n",
|
|
" print(\"\\n---\\n\")\n",
|
|
"```\n",
|
|
"\n",
|
|
"This script will print the title, authors, abstract, and link for each paper related to \"LLM applications\" that was submitted in the last week, up to a maximum of 10 papers. If you want to ensure that the papers are from different domains, you might need to manually check the categories of the papers or refine the search query to target specific domains.\n",
|
|
"\n",
|
|
"--------------------------------------------------------------------------------\n",
|
|
"\u001b[31m\n",
|
|
">>>>>>>> EXECUTING CODE BLOCK (inferred language is python)...\u001b[0m\n",
|
|
"\u001b[33mRetrieve_Action_2\u001b[0m (to chat_manager):\n",
|
|
"\n",
|
|
"exitcode: 0 (execution succeeded)\n",
|
|
"Code output: Title: Adapting LLMs for Efficient Context Processing through Soft Prompt\n",
|
|
" Compression\n",
|
|
"Authors: Cangqing Wang, Yutian Yang, Ruisi Li, Dan Sun, Ruicong Cai, Yuzhu Zhang, Chengqian Fu, Lillian Floyd\n",
|
|
"Abstract: The rapid advancement of Large Language Models (LLMs) has inaugurated a\n",
|
|
"transformative epoch in natural language processing, fostering unprecedented\n",
|
|
"proficiency in text generation, comprehension, and contextual scrutiny.\n",
|
|
"Nevertheless, effectively handling extensive contexts, crucial for myriad\n",
|
|
"applications, poses a formidable obstacle owing to the intrinsic constraints of\n",
|
|
"the models' context window sizes and the computational burdens entailed by\n",
|
|
"their operations. This investigation presents an innovative framework that\n",
|
|
"strategically tailors LLMs for streamlined context processing by harnessing the\n",
|
|
"synergies among natural language summarization, soft prompt compression, and\n",
|
|
"augmented utility preservation mechanisms. Our methodology, dubbed\n",
|
|
"SoftPromptComp, amalgamates natural language prompts extracted from\n",
|
|
"summarization methodologies with dynamically generated soft prompts to forge a\n",
|
|
"concise yet semantically robust depiction of protracted contexts. This\n",
|
|
"depiction undergoes further refinement via a weighting mechanism optimizing\n",
|
|
"information retention and utility for subsequent tasks. We substantiate that\n",
|
|
"our framework markedly diminishes computational overhead and enhances LLMs'\n",
|
|
"efficacy across various benchmarks, while upholding or even augmenting the\n",
|
|
"caliber of the produced content. By amalgamating soft prompt compression with\n",
|
|
"sophisticated summarization, SoftPromptComp confronts the dual challenges of\n",
|
|
"managing lengthy contexts and ensuring model scalability. Our findings point\n",
|
|
"towards a propitious trajectory for augmenting LLMs' applicability and\n",
|
|
"efficiency, rendering them more versatile and pragmatic for real-world\n",
|
|
"applications. This research enriches the ongoing discourse on optimizing\n",
|
|
"language models, providing insights into the potency of soft prompts and\n",
|
|
"summarization techniques as pivotal instruments for the forthcoming generation\n",
|
|
"of NLP solutions.\n",
|
|
"Link: http://arxiv.org/abs/2404.04997v1\n",
|
|
"\n",
|
|
"---\n",
|
|
"\n",
|
|
"Title: Explainable Traffic Flow Prediction with Large Language Models\n",
|
|
"Authors: Xusen Guo, Qiming Zhang, Mingxing Peng, Meixin Zhu, Hao, Yang\n",
|
|
"Abstract: Traffic flow prediction is crucial for urban planning, transportation\n",
|
|
"management, and infrastructure development. However, achieving both accuracy\n",
|
|
"and interpretability in prediction models remains challenging due to the\n",
|
|
"complexity of traffic data and the inherent opacity of deep learning\n",
|
|
"methodologies. In this paper, we propose a novel approach, Traffic Flow\n",
|
|
"Prediction LLM (TF-LLM), which leverages large language models (LLMs) to\n",
|
|
"generate interpretable traffic flow predictions. By transferring multi-modal\n",
|
|
"traffic data into natural language descriptions, TF-LLM captures complex\n",
|
|
"spatial-temporal patterns and external factors such as weather conditions,\n",
|
|
"Points of Interest (PoIs), date, and holidays. We fine-tune the LLM framework\n",
|
|
"using language-based instructions to align with spatial-temporal traffic flow\n",
|
|
"data. Our comprehensive multi-modal traffic flow dataset (CATraffic) in\n",
|
|
"California enables the evaluation of TF-LLM against state-of-the-art deep\n",
|
|
"learning baselines. Results demonstrate TF-LLM's competitive accuracy while\n",
|
|
"providing intuitive and interpretable predictions. We discuss the\n",
|
|
"spatial-temporal and input dependencies for explainable future flow\n",
|
|
"forecasting, showcasing TF-LLM's potential for diverse city prediction tasks.\n",
|
|
"This paper contributes to advancing explainable traffic prediction models and\n",
|
|
"lays a foundation for future exploration of LLM applications in transportation.\n",
|
|
"Link: http://arxiv.org/abs/2404.02937v2\n",
|
|
"\n",
|
|
"---\n",
|
|
"\n",
|
|
"Title: Designing Child-Centric AI Learning Environments: Insights from\n",
|
|
" LLM-Enhanced Creative Project-Based Learning\n",
|
|
"Authors: Siyu Zha, Yuehan Qiao, Qingyu Hu, Zhongsheng Li, Jiangtao Gong, Yingqing Xu\n",
|
|
"Abstract: Project-based learning (PBL) is an instructional method that is very helpful\n",
|
|
"in nurturing students' creativity, but it requires significant time and energy\n",
|
|
"from both students and teachers. Large language models (LLMs) have been proven\n",
|
|
"to assist in creative tasks, yet much controversy exists regarding their role\n",
|
|
"in fostering creativity. This paper explores the potential of LLMs in PBL\n",
|
|
"settings, with a special focus on fostering creativity. We began with an\n",
|
|
"exploratory study involving 12 middle school students and identified five\n",
|
|
"design considerations for LLM applications in PBL. Building on this, we\n",
|
|
"developed an LLM-empowered, 48-hour PBL program and conducted an instructional\n",
|
|
"experiment with 31 middle school students. Our results indicated that LLMs can\n",
|
|
"enhance every stage of PBL. Additionally, we also discovered ambivalent\n",
|
|
"perspectives among students and mentors toward LLM usage. Furthermore, we\n",
|
|
"explored the challenge and design implications of integrating LLMs into PBL and\n",
|
|
"reflected on the program. By bridging AI advancements into educational\n",
|
|
"practice, our work aims to inspire further discourse and investigation into\n",
|
|
"harnessing AI's potential in child-centric educational settings.\n",
|
|
"Link: http://arxiv.org/abs/2403.16159v2\n",
|
|
"\n",
|
|
"---\n",
|
|
"\n",
|
|
"Title: The opportunities and risks of large language models in mental health\n",
|
|
"Authors: Hannah R. Lawrence, Renee A. Schneider, Susan B. Rubin, Maja J. Mataric, Daniel J. McDuff, Megan Jones Bell\n",
|
|
"Abstract: Global rates of mental health concerns are rising and there is increasing\n",
|
|
"realization that existing models of mental healthcare will not adequately\n",
|
|
"expand to meet the demand. With the emergence of large language models (LLMs)\n",
|
|
"has come great optimism regarding their promise to create novel, large-scale\n",
|
|
"solutions to support mental health. Despite their nascence, LLMs have already\n",
|
|
"been applied to mental health-related tasks. In this review, we summarize the\n",
|
|
"extant literature on efforts to use LLMs to provide mental health education,\n",
|
|
"assessment, and intervention and highlight key opportunities for positive\n",
|
|
"impact in each area. We then highlight risks associated with LLMs application\n",
|
|
"to mental health and encourage adoption of strategies to mitigate these risks.\n",
|
|
"The urgent need for mental health support must be balanced with responsible\n",
|
|
"development, testing, and deployment of mental health LLMs. Especially critical\n",
|
|
"is ensuring that mental health LLMs are fine-tuned for mental health, enhance\n",
|
|
"mental health equity, adhere to ethical standards, and that people, including\n",
|
|
"those with lived experience with mental health concerns, are involved in all\n",
|
|
"stages from development through deployment. Prioritizing these efforts will\n",
|
|
"minimize potential harms to mental health and maximize the likelihood that LLMs\n",
|
|
"will positively impact mental health globally.\n",
|
|
"Link: http://arxiv.org/abs/2403.14814v2\n",
|
|
"\n",
|
|
"---\n",
|
|
"\n",
|
|
"Title: Large Language Models for Blockchain Security: A Systematic Literature\n",
|
|
" Review\n",
|
|
"Authors: Zheyuan He, Zihao Li, Sen Yang\n",
|
|
"Abstract: Large Language Models (LLMs) have emerged as powerful tools in various\n",
|
|
"domains involving blockchain security (BS). Several recent studies are\n",
|
|
"exploring LLMs applied to BS. However, there remains a gap in our understanding\n",
|
|
"regarding the full scope of applications, impacts, and potential constraints of\n",
|
|
"LLMs on blockchain security. To fill this gap, we conduct a literature review\n",
|
|
"on LLM4BS.\n",
|
|
" As the first review of LLM's application on blockchain security, our study\n",
|
|
"aims to comprehensively analyze existing research and elucidate how LLMs\n",
|
|
"contribute to enhancing the security of blockchain systems. Through a thorough\n",
|
|
"examination of scholarly works, we delve into the integration of LLMs into\n",
|
|
"various aspects of blockchain security. We explore the mechanisms through which\n",
|
|
"LLMs can bolster blockchain security, including their applications in smart\n",
|
|
"contract auditing, identity verification, anomaly detection, vulnerable repair,\n",
|
|
"and so on. Furthermore, we critically assess the challenges and limitations\n",
|
|
"associated with leveraging LLMs for blockchain security, considering factors\n",
|
|
"such as scalability, privacy concerns, and adversarial attacks. Our review\n",
|
|
"sheds light on the opportunities and potential risks inherent in this\n",
|
|
"convergence, providing valuable insights for researchers, practitioners, and\n",
|
|
"policymakers alike.\n",
|
|
"Link: http://arxiv.org/abs/2403.14280v2\n",
|
|
"\n",
|
|
"---\n",
|
|
"\n",
|
|
"Title: Do Large Language Model Understand Multi-Intent Spoken Language ?\n",
|
|
"Authors: Shangjian Yin, Peijie Huang, Yuhong Xu, Haojing Huang, Jiatian Chen\n",
|
|
"Abstract: This study marks a significant advancement by harnessing Large Language\n",
|
|
"Models (LLMs) for multi-intent spoken language understanding (SLU), proposing a\n",
|
|
"unique methodology that capitalizes on the generative power of LLMs within an\n",
|
|
"SLU context. Our innovative technique reconfigures entity slots specifically\n",
|
|
"for LLM application in multi-intent SLU environments and introduces the concept\n",
|
|
"of Sub-Intent Instruction (SII), enhancing the dissection and interpretation of\n",
|
|
"intricate, multi-intent communication within varied domains. The resultant\n",
|
|
"datasets, dubbed LM-MixATIS and LM-MixSNIPS, are crafted from pre-existing\n",
|
|
"benchmarks. Our research illustrates that LLMs can match and potentially excel\n",
|
|
"beyond the capabilities of current state-of-the-art multi-intent SLU models. It\n",
|
|
"further explores LLM efficacy across various intent configurations and dataset\n",
|
|
"proportions. Moreover, we introduce two pioneering metrics, Entity Slot\n",
|
|
"Accuracy (ESA) and Combined Semantic Accuracy (CSA), to provide an in-depth\n",
|
|
"analysis of LLM proficiency in this complex field.\n",
|
|
"Link: http://arxiv.org/abs/2403.04481v2\n",
|
|
"\n",
|
|
"---\n",
|
|
"\n",
|
|
"Title: Breaking the Language Barrier: Can Direct Inference Outperform\n",
|
|
" Pre-Translation in Multilingual LLM Applications?\n",
|
|
"Authors: Yotam Intrator, Matan Halfon, Roman Goldenberg, Reut Tsarfaty, Matan Eyal, Ehud Rivlin, Yossi Matias, Natalia Aizenberg\n",
|
|
"Abstract: Large language models hold significant promise in multilingual applications.\n",
|
|
"However, inherent biases stemming from predominantly English-centric\n",
|
|
"pre-training have led to the widespread practice of pre-translation, i.e.,\n",
|
|
"translating non-English inputs to English before inference, leading to\n",
|
|
"complexity and information loss. This study re-evaluates the need for\n",
|
|
"pre-translation in the context of PaLM2 models (Anil et al., 2023), which have\n",
|
|
"been established as highly performant in multilingual tasks. We offer a\n",
|
|
"comprehensive investigation across 108 languages and 6 diverse benchmarks,\n",
|
|
"including open-end generative tasks, which were excluded from previous similar\n",
|
|
"studies. Our findings challenge the pre-translation paradigm established in\n",
|
|
"prior research, highlighting the advantages of direct inference in PaLM2.\n",
|
|
"Specifically, PaLM2-L consistently outperforms pre-translation in 94 out of 108\n",
|
|
"languages. These findings pave the way for more efficient and effective\n",
|
|
"multilingual applications, alleviating the limitations associated with\n",
|
|
"pre-translation and unlocking linguistic authenticity.\n",
|
|
"Link: http://arxiv.org/abs/2403.04792v1\n",
|
|
"\n",
|
|
"---\n",
|
|
"\n",
|
|
"Title: SciAssess: Benchmarking LLM Proficiency in Scientific Literature\n",
|
|
" Analysis\n",
|
|
"Authors: Hengxing Cai, Xiaochen Cai, Junhan Chang, Sihang Li, Lin Yao, Changxin Wang, Zhifeng Gao, Hongshuai Wang, Yongge Li, Mujie Lin, Shuwen Yang, Jiankun Wang, Yuqi Yin, Yaqi Li, Linfeng Zhang, Guolin Ke\n",
|
|
"Abstract: Recent breakthroughs in Large Language Models (LLMs) have revolutionized\n",
|
|
"natural language understanding and generation, igniting a surge of interest in\n",
|
|
"leveraging these technologies in the field of scientific literature analysis.\n",
|
|
"Existing benchmarks, however, inadequately evaluate the proficiency of LLMs in\n",
|
|
"scientific literature analysis, especially in scenarios involving complex\n",
|
|
"comprehension and multimodal data. In response, we introduced SciAssess, a\n",
|
|
"benchmark tailored for the in-depth analysis of scientific literature, crafted\n",
|
|
"to provide a thorough assessment of LLMs' efficacy. SciAssess focuses on\n",
|
|
"evaluating LLMs' abilities in memorization, comprehension, and analysis within\n",
|
|
"the context of scientific literature analysis. It includes representative tasks\n",
|
|
"from diverse scientific fields, such as general chemistry, organic materials,\n",
|
|
"and alloy materials. And rigorous quality control measures ensure its\n",
|
|
"reliability in terms of correctness, anonymization, and copyright compliance.\n",
|
|
"SciAssess evaluates leading LLMs, including GPT-4, GPT-3.5, and Gemini,\n",
|
|
"identifying their strengths and aspects for improvement and supporting the\n",
|
|
"ongoing development of LLM applications in scientific literature analysis.\n",
|
|
"SciAssess and its resources are made available at https://sci-assess.github.io,\n",
|
|
"offering a valuable tool for advancing LLM capabilities in scientific\n",
|
|
"literature analysis.\n",
|
|
"Link: http://arxiv.org/abs/2403.01976v2\n",
|
|
"\n",
|
|
"---\n",
|
|
"\n",
|
|
"Title: Differentially Private Synthetic Data via Foundation Model APIs 2: Text\n",
|
|
"Authors: Chulin Xie, Zinan Lin, Arturs Backurs, Sivakanth Gopi, Da Yu, Huseyin A Inan, Harsha Nori, Haotian Jiang, Huishuai Zhang, Yin Tat Lee, Bo Li, Sergey Yekhanin\n",
|
|
"Abstract: Text data has become extremely valuable due to the emergence of machine\n",
|
|
"learning algorithms that learn from it. A lot of high-quality text data\n",
|
|
"generated in the real world is private and therefore cannot be shared or used\n",
|
|
"freely due to privacy concerns. Generating synthetic replicas of private text\n",
|
|
"data with a formal privacy guarantee, i.e., differential privacy (DP), offers a\n",
|
|
"promising and scalable solution. However, existing methods necessitate DP\n",
|
|
"finetuning of large language models (LLMs) on private data to generate DP\n",
|
|
"synthetic data. This approach is not viable for proprietary LLMs (e.g.,\n",
|
|
"GPT-3.5) and also demands considerable computational resources for open-source\n",
|
|
"LLMs. Lin et al. (2024) recently introduced the Private Evolution (PE)\n",
|
|
"algorithm to generate DP synthetic images with only API access to diffusion\n",
|
|
"models. In this work, we propose an augmented PE algorithm, named Aug-PE, that\n",
|
|
"applies to the complex setting of text. We use API access to an LLM and\n",
|
|
"generate DP synthetic text without any model training. We conduct comprehensive\n",
|
|
"experiments on three benchmark datasets. Our results demonstrate that Aug-PE\n",
|
|
"produces DP synthetic text that yields competitive utility with the SOTA DP\n",
|
|
"finetuning baselines. This underscores the feasibility of relying solely on API\n",
|
|
"access of LLMs to produce high-quality DP synthetic texts, thereby facilitating\n",
|
|
"more accessible routes to privacy-preserving LLM applications. Our code and\n",
|
|
"data are available at https://github.com/AI-secure/aug-pe.\n",
|
|
"Link: http://arxiv.org/abs/2403.01749v1\n",
|
|
"\n",
|
|
"---\n",
|
|
"\n",
|
|
"Title: SERVAL: Synergy Learning between Vertical Models and LLMs towards\n",
|
|
" Oracle-Level Zero-shot Medical Prediction\n",
|
|
"Authors: Jiahuan Yan, Jintai Chen, Chaowen Hu, Bo Zheng, Yaojun Hu, Jimeng Sun, Jian Wu\n",
|
|
"Abstract: Recent development of large language models (LLMs) has exhibited impressive\n",
|
|
"zero-shot proficiency on generic and common sense questions. However, LLMs'\n",
|
|
"application on domain-specific vertical questions still lags behind, primarily\n",
|
|
"due to the humiliation problems and deficiencies in vertical knowledge.\n",
|
|
"Furthermore, the vertical data annotation process often requires\n",
|
|
"labor-intensive expert involvement, thereby presenting an additional challenge\n",
|
|
"in enhancing the model's vertical capabilities. In this paper, we propose\n",
|
|
"SERVAL, a synergy learning pipeline designed for unsupervised development of\n",
|
|
"vertical capabilities in both LLMs and small models by mutual enhancement.\n",
|
|
"Specifically, SERVAL utilizes the LLM's zero-shot outputs as annotations,\n",
|
|
"leveraging its confidence to teach a robust vertical model from scratch.\n",
|
|
"Reversely, the trained vertical model guides the LLM fine-tuning to enhance its\n",
|
|
"zero-shot capability, progressively improving both models through an iterative\n",
|
|
"process. In medical domain, known for complex vertical knowledge and costly\n",
|
|
"annotations, comprehensive experiments show that, without access to any gold\n",
|
|
"labels, SERVAL with the synergy learning of OpenAI GPT-3.5 and a simple model\n",
|
|
"attains fully-supervised competitive performance across ten widely used medical\n",
|
|
"datasets. These datasets represent vertically specialized medical diagnostic\n",
|
|
"scenarios (e.g., diabetes, heart diseases, COVID-19), highlighting the\n",
|
|
"potential of SERVAL in refining the vertical capabilities of LLMs and training\n",
|
|
"vertical models from scratch, all achieved without the need for annotations.\n",
|
|
"Link: http://arxiv.org/abs/2403.01570v2\n",
|
|
"\n",
|
|
"---\n",
|
|
"\n",
|
|
"\n",
|
|
"\n",
|
|
"--------------------------------------------------------------------------------\n",
|
|
"\u001b[33mResearch_Action_1\u001b[0m (to chat_manager):\n",
|
|
"\n",
|
|
"Based on the provided code output, here is a markdown table categorizing the papers by domain, along with their titles, authors, summaries, and links:\n",
|
|
"\n",
|
|
"| Domain | Title | Authors | Summary | Link |\n",
|
|
"|--------|-------|---------|---------|------|\n",
|
|
"| Natural Language Processing | Adapting LLMs for Efficient Context Processing through Soft Prompt Compression | Cangqing Wang, et al. | The paper presents a framework for efficient context processing in LLMs using natural language summarization and soft prompt compression. | [Link](http://arxiv.org/abs/2404.04997v1) |\n",
|
|
"| Transportation | Explainable Traffic Flow Prediction with Large Language Models | Xusen Guo, et al. | This paper introduces a novel approach for interpretable traffic flow predictions using LLMs, which captures complex spatial-temporal patterns. | [Link](http://arxiv.org/abs/2404.02937v2) |\n",
|
|
"| Education | Designing Child-Centric AI Learning Environments: Insights from LLM-Enhanced Creative Project-Based Learning | Siyu Zha, et al. | The study explores the potential of LLMs in enhancing project-based learning (PBL) and fostering creativity in educational settings. | [Link](http://arxiv.org/abs/2403.16159v2) |\n",
|
|
"| Mental Health | The opportunities and risks of large language models in mental health | Hannah R. Lawrence, et al. | This review summarizes the literature on LLMs in mental health education, assessment, and intervention, highlighting opportunities and risks. | [Link](http://arxiv.org/abs/2403.14814v2) |\n",
|
|
"| Blockchain Security | Large Language Models for Blockchain Security: A Systematic Literature Review | Zheyuan He, et al. | The paper reviews the application of LLMs in blockchain security, discussing their impact and potential limitations. | [Link](http://arxiv.org/abs/2403.14280v2) |\n",
|
|
"| Spoken Language Understanding | Do Large Language Model Understand Multi-Intent Spoken Language? | Shangjian Yin, et al. | The study investigates LLMs' capabilities in multi-intent spoken language understanding and proposes new methodologies and metrics. | [Link](http://arxiv.org/abs/2403.04481v2) |\n",
|
|
"| Multilingualism | Breaking the Language Barrier: Can Direct Inference Outperform Pre-Translation in Multilingual LLM Applications? | Yotam Intrator, et al. | The paper challenges the pre-translation paradigm in multilingual LLM applications, showing the advantages of direct inference. | [Link](http://arxiv.org/abs/2403.04792v1) |\n",
|
|
"| Scientific Literature | SciAssess: Benchmarking LLM Proficiency in Scientific Literature Analysis | Hengxing Cai, et al. | Introduces SciAssess, a benchmark for evaluating LLMs' abilities in scientific literature analysis across various scientific fields. | [Link](http://arxiv.org/abs/2403.01976v2) |\n",
|
|
"| Privacy & Security | Differentially Private Synthetic Data via Foundation Model APIs 2: Text | Chulin Xie, et al. | The paper proposes a method to generate differentially private synthetic text data using API access to LLMs without model training. | [Link](http://arxiv.org/abs/2403.01749v1) |\n",
|
|
"| Medical Diagnostics | SERVAL: Synergy Learning between Vertical Models and LLMs towards Oracle-Level Zero-shot Medical Prediction | Jiahuan Yan, et al. | SERVAL is a synergy learning pipeline that enhances the vertical capabilities of LLMs and trains vertical models without annotations in the medical domain. | [Link](http://arxiv.org/abs/2403.01570v2) |\n",
|
|
"\n",
|
|
"Please note that the domains have been inferred from the summaries and titles of the papers and may not perfectly reflect the authors' intended categorization.\n",
|
|
"\n",
|
|
"--------------------------------------------------------------------------------\n"
|
|
]
|
|
}
|
|
],
|
|
"source": [
|
|
"chat_result = initializer.initiate_chat(\n",
|
|
" manager, message=\"Topic: LLM applications papers from last week. Requirement: 5 - 10 papers from different domains.\"\n",
|
|
")"
|
|
]
|
|
}
|
|
],
|
|
"metadata": {
|
|
"front_matter": {
|
|
"description": "StateFlow: Build Workflows through State-Oriented Actions",
|
|
"tags": [
|
|
"orchestration",
|
|
"group chat",
|
|
"research"
|
|
]
|
|
},
|
|
"kernelspec": {
|
|
"display_name": "flaml",
|
|
"language": "python",
|
|
"name": "python3"
|
|
},
|
|
"language_info": {
|
|
"codemirror_mode": {
|
|
"name": "ipython",
|
|
"version": 3
|
|
},
|
|
"file_extension": ".py",
|
|
"mimetype": "text/x-python",
|
|
"name": "python",
|
|
"nbconvert_exporter": "python",
|
|
"pygments_lexer": "ipython3",
|
|
"version": "3.9.18"
|
|
},
|
|
"orig_nbformat": 4
|
|
},
|
|
"nbformat": 4,
|
|
"nbformat_minor": 2
|
|
}
|