summary of recent updates (#1850)

* share updates

* updates

* fix url

* address comments

* address comments

---------

Co-authored-by: Qingyun Wu <qingyun0327@gmail.com>
This commit is contained in:
Chi Wang 2024-03-04 19:38:30 -08:00 committed by GitHub
parent 0a79512ebd
commit f3289cb987
No known key found for this signature in database
GPG Key ID: B5690EEEBB952194
9 changed files with 209 additions and 1 deletions

View File

@ -2,7 +2,7 @@
[![Build](https://github.com/microsoft/autogen/actions/workflows/python-package.yml/badge.svg)](https://github.com/microsoft/autogen/actions/workflows/python-package.yml)
![Python Version](https://img.shields.io/badge/3.8%20%7C%203.9%20%7C%203.10%20%7C%203.11%20%7C%203.12-blue)
[![Downloads](https://static.pepy.tech/badge/pyautogen/week)](https://pepy.tech/project/pyautogen)
[![](https://img.shields.io/discord/1153072414184452236?logo=discord&style=flat)](https://discord.gg/pAbnFJrkgZ)
[![Discord](https://img.shields.io/discord/1153072414184452236?logo=discord&style=flat)](https://discord.gg/pAbnFJrkgZ)
[![Twitter](https://img.shields.io/twitter/url/https/twitter.com/cloudposse.svg?style=social&label=Follow%20%40pyautogen)](https://twitter.com/pyautogen)
@ -12,6 +12,7 @@
<img src="https://github.com/microsoft/autogen/blob/main/website/static/img/flaml.svg" width=200>
<br>
</p> -->
:fire: Mar 1: the first AutoGen multi-agent experiment on the challenging [GAIA](https://huggingface.co/spaces/gaia-benchmark/leaderboard) benchmark achieved the No. 1 accuracy in all the three levels.
:fire: Jan 30: AutoGen is highlighted by Peter Lee in Microsoft Research Forum [Keynote](https://t.co/nUBSjPDjqD).

View File

@ -0,0 +1,3 @@
gaia.png filter=lfs diff=lfs merge=lfs -text
dalle_gpt4v.png filter=lfs diff=lfs merge=lfs -text
teach.png filter=lfs diff=lfs merge=lfs -text

Binary file not shown.

After

Width:  |  Height:  |  Size: 482 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 3.6 MiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 534 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 147 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 900 KiB

View File

@ -0,0 +1,178 @@
---
title: What's New in AutoGen?
authors: sonichi
tags: [news, summary, roadmap]
---
![autogen is loved](img/love.png)
**TL;DR**
- **AutoGen has received tremenduous interest and recognition.**
- **AutoGen has many exciting new features and ongoing reserach.**
Five months have passed since the initial spinoff of AutoGen from [FLAML](https://github.com/microsoft/FLAML). What have we learned since then? What are the milestones achieved? What's next?
## Background
AutoGen was motivated by two big questions:
- What are future AI applications like?
- How do we empower every developer to build them?
Last year, I worked with my colleagues and collaborators from Penn State University and University of Washington, on a new multi-agent framework, to enable the next generation of applications powered by large language models.
We have been building AutoGen, as a programming framework for agentic AI, just like PyTorch for deep learning.
We developed AutoGen in an open source project [FLAML](https://github.com/microsoft/FLAML): a fast library for AutoML and tuning. After a few studies like [EcoOptiGen](https://arxiv.org/abs/2303.04673v1) and [MathChat](https://arxiv.org/abs/2306.01337), in August, we published a [technical report](https://arxiv.org/abs/2308.08155v1) about the multi-agent framework.
In October, we moved AutoGen from FLAML to a standalone repo on GitHub, and published an [updated technical report](https://arxiv.org/abs/2308.08155).
## Feedback
Since then, we've got new feedback every day, everywhere. Users have shown really high recognition of the new levels of capability enabled by AutoGen. For example, there are many comments like the following on X (Twitter) or YouTube.
> Autogen gave me the same a-ha moment that I haven't felt since trying out GPT-3
for the first time.
> I have never been this surprised since ChatGPT.
Many users have deep understanding of the value in different dimensions, such as the modularity, flexibility and simplicity.
> The same reason autogen is significant is the same reason OOP is a good idea. Autogen packages up all that complexity into an agent I can create in one line, or modify with another.
<!--
I had lots of ideas I wanted to implement, but it needed a framework like this
and I am just not the guy to make such a robust and intelligent framework.
-->
Over time, more and more users share their experiences in using or contributing to autogen.
> In our Data Science department Autogen is helping us develop a production ready
multi-agents framework.
>> Sam Khalil, VP Data Insights & FounData, Novo Nordisk
> When I built an interactive learning tool for students, I looked for a tool that
could streamline the logistics but also give enough flexibility so I could use
customized tools. AutoGen has both. It simplified the work. Thanks to Chi and his
team for sharing such a wonderful tool with the community.
>> Yongsheng Lian, Professor at the University of Louisville, Mechanical Engineering
> Exciting news: the latest AutoGen release now features my contribution…
This experience has been a wonderful blend of learning and contributing,
demonstrating the dynamic and collaborative spirit of the tech community.
>> Davor Runje, Cofounder @ airt / President of the board @ CISEx
> With the support of a grant through the Data Intensive Studies Center at Tufts
University, our group is hoping to solve some of the challenges students face when
transitioning from undergraduate to graduate-level courses, particularly in Tufts'
Doctor of Physical Therapy program in the School of Medicine. We're experimenting
with Autogen to create tailored assessments, individualized study guides, and focused
tutoring. This approach has led to significantly better results than those we
achieved using standard chatbots. With the help of Chi and his group at Microsoft,
our current experiments include using multiple agents in sequential chat, teachable
agents, and round-robin style debate formats. These methods have proven more
effective in generating assessments and feedback compared to other large language
models (LLMs) we've explored. I've also used OpenAI Assistant agents through Autogen
in my Primary Care class to facilitate student engagement in patient interviews
through digital simulations. The agent retrieved information from a real patient
featured in a published case study, allowing students to practice their interview
skills with realistic information.
>> Benjamin D Stern, MS, DPT, Assistant Professor, Doctor of Physical Therapy Program,
Tufts University School of Medicine
> Autogen has been a game changer for how we analyze companies and products! Through
collaborative discourse between AI Agents we are able to shave days off our research
and analysis process.
>> Justin Trugman, Cofounder & Head of Technology at BetterFutureLabs
These are just a small fraction of examples. We have seen big enterprise customers interest from pretty much every vertical industry: Accounting, Airlines, Biotech, Consulting, Consumer Packaged Goods, Electronics, Entertainment, Finance, Fintech, Government, Healthcare, Manufacturer, Metals, Pharmacy, Research, Retailer, Social Media, Software, Supply Chain, Technology, Telecom…
AutoGen is used or contributed by companies, organizations, universities from A to Z, in all over the world. We have seen hundreds of example applications. Some organization uses AutoGen as the backbone to build their agent platform. Others use AutoGen for diverse scenarios, including research and investment to novel and creative applications of multiple agents.
## Milestones
AutoGen has a large and active community of developers, researchers and AI practitioners.
- 22K+ stars on [GitHub](https://aka.ms/autogen-gh), 3K+ forks
- 14K+ members on [Discord](https://aka.ms/autogen-dc)
- 100K+ downloads per months
- 3M+ views on Youtube (400+ community-generated videos)
- 100+ citations on [Google Scholar](https://scholar.google.com/citations?view_op=view_citation&hl=en&user=IiSNwnAAAAAJ&citation_for_view=IiSNwnAAAAAJ:zCpYd49hD24C)
I am so amazed by their creativity and passion.
I also appreciate the recognition and awards AutoGen has received, such as:
- Selected by [TheSequence: My Five Favorite AI Papers of 2023](https://thesequence.substack.com/p/my-five-favorite-ai-papers-of-2023)
- Top trending repo on GitHub in Oct'23
- Selected into [Open100: Top 100 Open Source achievements](https://www.benchcouncil.org/evaluation/opencs/annual.html) only 35 days after spinoff
On March 1, the initial AutoGen multi-agent experiment on the challenging [GAIA](https://huggingface.co/spaces/gaia-benchmark/leaderboard) benchmark turned out to achieve the No. 1 accuracy with a big leap, in all the three levels.
![gaia](img/gaia.png)
That shows the big potential of using AutoGen in solving complex tasks.
And it's just the beginning of the community's effort to answering a few hard open questions.
## Open Questions
In the [AutoGen technical report](https://arxiv.org/abs/2308.08155), we laid out a number of challenging research questions:
1. How to design optimal multi-agent workflows?
1. How to create highly capable agents?
1. How to enable scale, safety and human agency?
The community has been working hard to address them in several dimensions:
- Evaluation. Convenient and insightful evaluation is the foundation of making solid progress.
- Interface. An intuitive, expressive and standardized interface is the prerequisite of fast experimentation and optimization.
- Optimization. Both the multi-agent interaction design (e.g., decomposition) and the individual agent capability need to be optimized to satisfy specific application needs.
- Integration. Integration with new technologies is an effective way to enhance agent capability.
- Learning/Teaching. Agentic learning and teaching are intuitive approaches for agents to optimize their performance, enable human agency and enhance safety.
## New Features & Ongoing Research
### Evaluation
We are working on agent-based evaluation tools and benchmarking tools. For example:
- [AgentEval](/blog/2023/11/20/AgentEval). Our [research](https://arxiv.org/abs/2402.09015) finds that LLM agents built with AutoGen can be used to automatically identify evaluation criteria and assess the performance from task descriptions and execution logs. It is demonstrated as a [notebook example](https://github.com/microsoft/autogen/blob/main/notebook/agenteval_cq_math.ipynb). Feedback and help are welcome for building it into the library.
- [AutoGenBench](/blog/2024/01/25/AutoGenBench). AutoGenBench is a commandline tool for downloading, configuring, running an agentic benchmark, and reporting results. It is designed to allow repetition, isolation and instrumentation, leveraging the new [runtime logging](/docs/notebooks/agentchat_logging) feature.
These tools have been used for improving the AutoGen library as well as applications. For example, the new state-of-the-art performance achieved by a multi-agent solution to the [GAIA](https://huggingface.co/spaces/gaia-benchmark/leaderboard) benchmark has benefited from these evaluation tools.
### Interface
We are making rapid progress in further improving the interface to make it even easier to build agent applications. For example:
- [AutoBuild](/blog/2023/11/26/Agent-AutoBuild). AutoBuild is an ongoing research to automatically create or select a group of agents for a given task and objective. If successful, it will greatly reduce the effort from users or developers when using the multi-agent technology. It also paves the way of agentic decomposition to handle complex tasks. It is available as an experimental feature and demonstrated in two modes: free-form [creation](https://github.com/microsoft/autogen/blob/main/notebook/autobuild_basic.ipynb) and [selection](https://github.com/microsoft/autogen/blob/main/notebook/autobuild_agent_library.ipynb) from a library.
- [AutoGen Studio](/blog/2023/12/01/AutoGenStudio). AutoGen Studio is a no-code UI for fast experimentation with the multi-agent conversations. It lowers the barrier of entrance to the AutoGen technology. Models, agents, and workflows can all be configured without writing code. And chatting with multiple agents in a playground is immediately available after the configuration. Although only a subset of `pyautogen` features are available in this sample app, it demonstrates a promising experience. It has generated a tremenduous excitement in the community.
- Conversation Programming+. The [AutoGen paper](https://arxiv.org/abs/2308.08155) introduced a key concept of *Conversation Programming*, which can be used to program diverse conversation patterns such as 1-1 chat, group chat, hierarchical chat, nested chat etc. While we offered dynamic group chat as an example of high-level orchestration, it made others patterns relatively less discoverable. Therefore, we have added more convenient conversation programming features which enables easier definition of other types of complex workflow, such as [finite state machine based group chat](/blog/2024/02/11/FSM-GroupChat), [sequential chats](/docs/notebooks/agentchats_sequential_chats), and [nested chats](/docs/notebooks/agentchat_nestedchat). Many users have found them useful in implementing specific patterns, which have been always possible but more obvious with the added features. I will write another blog post for a deep dive.
### Learning/Optimization/Teaching
The features in this category allow agents to remember teachings from users or other agents long term, or improve over iterations. For example:
- [AgentOptimizer](/blog/2023/12/23/AgentOptimizer). This [research](https://arxiv.org/abs/2402.11359) finds an approach of training LLM agents without modifying the model. As a case study, this technique optimizes a set of Python functions for agents to use in solving a set of training tasks. It is planned to be available as an experimental feature.
- [EcoAssistant](/blog/2023/11/09/EcoAssistant). This [research](https://arxiv.org/abs/2310.03046) finds a multi-agent teaching approach when using agents with different capacities powered by different LLMs. For example, a GPT-4 agent can teach a GPT-3.5 agent by demonstration. With this approach, one only needs 1/3 or 1/2 of GPT-4's cost, while getting 10-20\% higher success rate than GPT-4 on coding-based QA. No finetuning is needed. All you need is a GPT-4 endpoint and a GPT-3.5-turbo endpoint. Help is appreciated to offer this technique as a feature in the AutoGen library.
- [Teachability](/blog/2023/10/26/TeachableAgent). Every LLM agent in AutoGen can be made teachable, i.e., remember facts, preferences, skills etc. from interacting with other agents. For example, a user behind a user proxy agent can teach an assistant agent instructions in solving a difficult math problem. After teaching once, the problem solving rate for the assistant agent can have a dramatic improvement (e.g., 37% -> 95% for gpt-4-0613).
![teach](img/teach.png)
This feature works for GPTAssistantAgent (using OpenAI's assistant API) and group chat as well. One interesting use case of teachability + FSM group chat: [teaching resilience](https://www.linkedin.com/pulse/combatting-ai-naivete-teaching-resilience-emotional-leah-bonser-jdhrc).
### Integration
The extensible design of AutoGen makes it integratable with new technologies. For example:
- [Custom models and clients](/blog/2024/01/26/Custom-Models) can be used as backends of an agent, such as Huggingface models and inference APIs.
- [OpenAI assistant](/blog/2023/11/13/OAI-assistants) can be used as the backend of an agent (GPTAssistantAgent). It will be nice to reimplement it as a custom client to increase the compatibility with ConversableAgent.
- [Multimodality](/blog/2023/11/06/LMM-Agent). LMM models like GPT-4V can be used to provide vision to an agent, and accomplish interesting multimodal tasks by conversing with other agents, including advanced image analysis, figure generation, and automatic iterative improvement in image generation.
![multimodal](img/dalle_gpt4v.png)
The above only covers a subset of new features and roadmap. There are many other interesting new features, integration examples or sample apps:
- new features like stateful code execution, [tool decorators](/docs/Use-Cases/agent_chat#tool-calling), [long context handling](https://github.com/microsoft/autogen/blob/main/notebook/agentchat_capability_long_context_handling.ipynb), [web agents](https://github.com/microsoft/autogen/blob/main/notebook/agentchat_surfer.ipynb).
- integration examples like using [guidance](https://github.com/microsoft/autogen/blob/main/notebook/agentchat_guidance.ipynb) to generate structured response.
- sample apps like [AutoAnny](/blog/2024/02/02/AutoAnny).
## Call for Help
I appreciate the huge support from more than 14K members in the Discord community.
Despite all the exciting progress, there are tons of open problems, issues and feature requests awaiting to be solved.
We need more help to tackle the challenging problems and accelerate the development.
You're all welcome to join our community and define the future of AI agents together.
*Do you find this update helpful? Would you like to join force? Please join our [Discord](https://discord.gg/pAbnFJrkgZ) server for discussion.*
![contributors](img/contributors.png)

View File

@ -47,3 +47,29 @@ For technical details, please check our technical report and research publicatio
booktitle={ArXiv preprint arXiv:2310.03046},
}
```
* [Towards better Human-Agent Alignment: Assessing Task Utility in LLM-Powered Applications](https://arxiv.org/abs/2402.09015). Negar Arabzadeh, Julia Kiseleva, Qingyun Wu, Chi Wang, Ahmed Awadallah, Victor Dibia, Adam Fourney, Charles Clarke. ArXiv preprint arXiv:2402.09015 (2024).
```bibtex
@misc{Kiseleva2024agenteval,
title={Towards better Human-Agent Alignment: Assessing Task Utility in LLM-Powered Applications},
author={Negar Arabzadeh and Julia Kiseleva and Qingyun Wu and Chi Wang and Ahmed Awadallah and Victor Dibia and Adam Fourney and Charles Clarke},
year={2024},
eprint={2402.09015},
archivePrefix={arXiv},
primaryClass={cs.CL}
}
```
* [Training Language Model Agents without Modifying Language Models](https://arxiv.org/abs/2402.11359). Shaokun Zhang, Jieyu Zhang, Jiale Liu, Linxin Song, Chi Wang, Ranjay Krishna, Qingyun Wu. ArXiv preprint arXiv:2402.09015 (2024).
```bibtex
@misc{zhang2024agentoptimizer,
title={Training Language Model Agents without Modifying Language Models},
author={Shaokun Zhang and Jieyu Zhang and Jiale Liu and Linxin Song and Chi Wang and Ranjay Krishna and Qingyun Wu},
year={2024},
eprint={2402.11359},
archivePrefix={arXiv},
primaryClass={cs.AI}
}
```