summary of recent updates (#1850)

* share updates * updates * fix url * address comments * address comments --------- Co-authored-by: Qingyun Wu <qingyun0327@gmail.com>
2024-03-04 19:38:30 -08:00 · 2024-03-04 19:38:30 -08:00 · f3289cb987
parent 0a79512ebd
commit f3289cb987
9 changed files with 209 additions and 1 deletions
--- a/README.md
+++ b/README.md
@ -2,7 +2,7 @@
 [![Build](https://github.com/microsoft/autogen/actions/workflows/python-package.yml/badge.svg)](https://github.com/microsoft/autogen/actions/workflows/python-package.yml)
 ![Python Version](https://img.shields.io/badge/3.8%20%7C%203.9%20%7C%203.10%20%7C%203.11%20%7C%203.12-blue)
 [![Downloads](https://static.pepy.tech/badge/pyautogen/week)](https://pepy.tech/project/pyautogen)
-[![](https://img.shields.io/discord/1153072414184452236?logo=discord&style=flat)](https://discord.gg/pAbnFJrkgZ)
+[![Discord](https://img.shields.io/discord/1153072414184452236?logo=discord&style=flat)](https://discord.gg/pAbnFJrkgZ)
 [![Twitter](https://img.shields.io/twitter/url/https/twitter.com/cloudposse.svg?style=social&label=Follow%20%40pyautogen)](https://twitter.com/pyautogen)


@ -12,6 +12,7 @@
    <img src="https://github.com/microsoft/autogen/blob/main/website/static/img/flaml.svg"  width=200>
    <br>
 </p> -->
+:fire: Mar 1: the first AutoGen multi-agent experiment on the challenging [GAIA](https://huggingface.co/spaces/gaia-benchmark/leaderboard) benchmark achieved the No. 1 accuracy in all the three levels.

 :fire: Jan 30: AutoGen is highlighted by Peter Lee in Microsoft Research Forum [Keynote](https://t.co/nUBSjPDjqD).

--- a/website/blog/2024-03-03-AutoGen-Update/img/.gitattributes
+++ b/website/blog/2024-03-03-AutoGen-Update/img/.gitattributes
@ -0,0 +1,3 @@
+gaia.png filter=lfs diff=lfs merge=lfs -text
+dalle_gpt4v.png filter=lfs diff=lfs merge=lfs -text
+teach.png filter=lfs diff=lfs merge=lfs -text
--- a/website/blog/2024-03-03-AutoGen-Update/img/contributors.png
+++ b/website/blog/2024-03-03-AutoGen-Update/img/contributors.png
--- a/website/blog/2024-03-03-AutoGen-Update/img/dalle_gpt4v.png
+++ b/website/blog/2024-03-03-AutoGen-Update/img/dalle_gpt4v.png
--- a/website/blog/2024-03-03-AutoGen-Update/img/gaia.png
+++ b/website/blog/2024-03-03-AutoGen-Update/img/gaia.png
--- a/website/blog/2024-03-03-AutoGen-Update/img/love.png
+++ b/website/blog/2024-03-03-AutoGen-Update/img/love.png
--- a/website/blog/2024-03-03-AutoGen-Update/img/teach.png
+++ b/website/blog/2024-03-03-AutoGen-Update/img/teach.png
--- a/website/blog/2024-03-03-AutoGen-Update/index.mdx
+++ b/website/blog/2024-03-03-AutoGen-Update/index.mdx
@ -0,0 +1,178 @@
+---
+title: What's New in AutoGen?
+authors: sonichi
+tags: [news, summary, roadmap]
+---
+
+![autogen is loved](img/love.png)
+
+**TL;DR**
+- **AutoGen has received tremenduous interest and recognition.**
+- **AutoGen has many exciting new features and ongoing reserach.**
+
+Five months have passed since the initial spinoff of AutoGen from [FLAML](https://github.com/microsoft/FLAML). What have we learned since then? What are the milestones achieved? What's next?
+
+## Background
+
+AutoGen was motivated by two big questions:
+- What are future AI applications like?
+- How do we empower every developer to build them?
+
+Last year, I worked with my colleagues and collaborators from Penn State University and University of Washington, on a new multi-agent framework, to enable the next generation of applications powered by large language models.
+We have been building AutoGen, as a programming framework for agentic AI, just like PyTorch for deep learning.
+We developed AutoGen in an open source project [FLAML](https://github.com/microsoft/FLAML): a fast library for AutoML and tuning. After a few studies like [EcoOptiGen](https://arxiv.org/abs/2303.04673v1) and [MathChat](https://arxiv.org/abs/2306.01337), in August, we published a [technical report](https://arxiv.org/abs/2308.08155v1) about the multi-agent framework.
+In October, we moved AutoGen from FLAML to a standalone repo on GitHub, and published an [updated technical report](https://arxiv.org/abs/2308.08155).
+
+## Feedback
+
+Since then, we've got new feedback every day, everywhere. Users have shown really high recognition of the new levels of capability enabled by AutoGen. For example, there are many comments like the following on X (Twitter) or YouTube.
+
+> Autogen gave me the same a-ha moment that I haven't felt since trying out GPT-3
+for the first time.
+
+> I have never been this surprised since ChatGPT.
+
+
+Many users have deep understanding of the value in different dimensions, such as the modularity, flexibility and simplicity.
+
+> The same reason autogen is significant is the same reason OOP is a good idea. Autogen packages up all that complexity into an agent I can create in one line, or modify with another.
+<!--
+I had lots of ideas I wanted to implement, but it needed a framework like this
+and I am just not the guy to make such a robust and intelligent framework.
+-->
+
+Over time, more and more users share their experiences in using or contributing to autogen.
+
+> In our Data Science department Autogen is helping us develop a production ready
+multi-agents framework.
+>> Sam Khalil, VP Data Insights & FounData, Novo Nordisk
+
+> When I built an interactive learning tool for students, I looked for a tool that
+could streamline the logistics but also give enough flexibility so I could use
+customized tools. AutoGen has both. It simplified the work. Thanks to Chi and his
+team for sharing such a wonderful tool with the community.
+>> Yongsheng Lian, Professor at the University of Louisville, Mechanical Engineering
+
+> Exciting news: the latest AutoGen release now features my contribution…
+This experience has been a wonderful blend of learning and contributing,
+demonstrating the dynamic and collaborative spirit of the tech community.
+>> Davor Runje, Cofounder @ airt / President of the board @ CISEx
+
+> With the support of a grant through the Data Intensive Studies Center at Tufts
+University, our group is hoping to solve some of the challenges students face when
+transitioning from undergraduate to graduate-level courses, particularly in Tufts'
+Doctor of Physical Therapy program in the School of Medicine. We're experimenting
+with Autogen to create tailored assessments, individualized study guides, and focused
+tutoring. This approach has led to significantly better results than those we
+achieved using standard chatbots. With the help of Chi and his group at Microsoft,
+our current experiments include using multiple agents in sequential chat, teachable
+agents, and round-robin style debate formats. These methods have proven more
+effective in generating assessments and feedback compared to other large language
+models (LLMs) we've explored. I've also used OpenAI Assistant agents through Autogen
+in my Primary Care class to facilitate student engagement in patient interviews
+through digital simulations. The agent retrieved information from a real patient
+featured in a published case study, allowing students to practice their interview
+skills with realistic information.
+>> Benjamin D Stern, MS, DPT, Assistant Professor, Doctor of Physical Therapy Program,
+Tufts University School of Medicine
+
+> Autogen has been a game changer for how we analyze companies and products! Through
+collaborative discourse between AI Agents we are able to shave days off our research
+and analysis process.
+>> Justin Trugman, Cofounder & Head of Technology at BetterFutureLabs
+
+These are just a small fraction of examples. We have seen big enterprise customers’ interest from pretty much every vertical industry: Accounting, Airlines, Biotech, Consulting, Consumer Packaged Goods, Electronics, Entertainment, Finance, Fintech, Government, Healthcare, Manufacturer, Metals, Pharmacy, Research, Retailer, Social Media, Software, Supply Chain, Technology, Telecom…
+
+AutoGen is used or contributed by companies, organizations, universities from A to Z, in all over the world. We have seen hundreds of example applications. Some organization uses AutoGen as the backbone to build their agent platform. Others use AutoGen for diverse scenarios, including research and investment to novel and creative applications of multiple agents.
+
+## Milestones
+
+AutoGen has a large and active community of developers, researchers and AI practitioners.
+- 22K+ stars on [GitHub](https://aka.ms/autogen-gh), 3K+ forks
+- 14K+ members on [Discord](https://aka.ms/autogen-dc)
+- 100K+ downloads per months
+- 3M+ views on Youtube (400+ community-generated videos)
+- 100+ citations on [Google Scholar](https://scholar.google.com/citations?view_op=view_citation&hl=en&user=IiSNwnAAAAAJ&citation_for_view=IiSNwnAAAAAJ:zCpYd49hD24C)
+
+I am so amazed by their creativity and passion.
+I also appreciate the recognition and awards AutoGen has received, such as:
+- Selected by [TheSequence: My Five Favorite AI Papers of 2023](https://thesequence.substack.com/p/my-five-favorite-ai-papers-of-2023)
+- Top trending repo on GitHub in Oct'23
+- Selected into [Open100: Top 100 Open Source achievements](https://www.benchcouncil.org/evaluation/opencs/annual.html) only 35 days after spinoff
+
+On March 1, the initial AutoGen multi-agent experiment on the challenging [GAIA](https://huggingface.co/spaces/gaia-benchmark/leaderboard) benchmark turned out to achieve the No. 1 accuracy with a big leap, in all the three levels.
+
+![gaia](img/gaia.png)
+
+That shows the big potential of using AutoGen in solving complex tasks.
+And it's just the beginning of the community's effort to answering a few hard open questions.
+
+## Open Questions
+
+In the [AutoGen technical report](https://arxiv.org/abs/2308.08155), we laid out a number of challenging research questions:
+
+1. How to design optimal multi-agent workflows?
+1. How to create highly capable agents?
+1. How to enable scale, safety and human agency?
+
+The community has been working hard to address them in several dimensions:
+
+- Evaluation. Convenient and insightful evaluation is the foundation of making solid progress.
+- Interface. An intuitive, expressive and standardized interface is the prerequisite of fast experimentation and optimization.
+- Optimization. Both the multi-agent interaction design (e.g., decomposition) and the individual agent capability need to be optimized to satisfy specific application needs.
+- Integration. Integration with new technologies is an effective way to enhance agent capability.
+- Learning/Teaching. Agentic learning and teaching are intuitive approaches for agents to optimize their performance, enable human agency and enhance safety.
+
+## New Features & Ongoing Research
+
+### Evaluation
+
+We are working on agent-based evaluation tools and benchmarking tools. For example:
+
+- [AgentEval](/blog/2023/11/20/AgentEval). Our [research](https://arxiv.org/abs/2402.09015) finds that LLM agents built with AutoGen can be used to automatically identify evaluation criteria and assess the performance from task descriptions and execution logs. It is demonstrated as a [notebook example](https://github.com/microsoft/autogen/blob/main/notebook/agenteval_cq_math.ipynb). Feedback and help are welcome for building it into the library.
+- [AutoGenBench](/blog/2024/01/25/AutoGenBench). AutoGenBench is a commandline tool for downloading, configuring, running an agentic benchmark, and reporting results. It is designed to allow repetition, isolation and instrumentation, leveraging the new [runtime logging](/docs/notebooks/agentchat_logging) feature.
+
+These tools have been used for improving the AutoGen library as well as applications. For example, the new state-of-the-art performance achieved by a multi-agent solution to the [GAIA](https://huggingface.co/spaces/gaia-benchmark/leaderboard) benchmark has benefited from these evaluation tools.
+
+### Interface
+
+We are making rapid progress in further improving the interface to make it even easier to build agent applications. For example:
+
+- [AutoBuild](/blog/2023/11/26/Agent-AutoBuild). AutoBuild is an ongoing research to automatically create or select a group of agents for a given task and objective. If successful, it will greatly reduce the effort from users or developers when using the multi-agent technology. It also paves the way of agentic decomposition to handle complex tasks. It is available as an experimental feature and demonstrated in two modes: free-form [creation](https://github.com/microsoft/autogen/blob/main/notebook/autobuild_basic.ipynb) and [selection](https://github.com/microsoft/autogen/blob/main/notebook/autobuild_agent_library.ipynb) from a library.
+- [AutoGen Studio](/blog/2023/12/01/AutoGenStudio). AutoGen Studio is a no-code UI for fast experimentation with the multi-agent conversations. It lowers the barrier of entrance to the AutoGen technology. Models, agents, and workflows can all be configured without writing code. And chatting with multiple agents in a playground is immediately available after the configuration. Although only a subset of `pyautogen` features are available in this sample app, it demonstrates a promising experience. It has generated a tremenduous excitement in the community.
+- Conversation Programming+. The [AutoGen paper](https://arxiv.org/abs/2308.08155) introduced a key concept of *Conversation Programming*, which can be used to program diverse conversation patterns such as 1-1 chat, group chat, hierarchical chat, nested chat etc. While we offered dynamic group chat as an example of high-level orchestration, it made others patterns relatively less discoverable. Therefore, we have added more convenient conversation programming features which enables easier definition of other types of complex workflow, such as [finite state machine based group chat](/blog/2024/02/11/FSM-GroupChat), [sequential chats](/docs/notebooks/agentchats_sequential_chats), and [nested chats](/docs/notebooks/agentchat_nestedchat). Many users have found them useful in implementing specific patterns, which have been always possible but more obvious with the added features. I will write another blog post for a deep dive.
+
+### Learning/Optimization/Teaching
+
+The features in this category allow agents to remember teachings from users or other agents long term, or improve over iterations. For example:
+
+- [AgentOptimizer](/blog/2023/12/23/AgentOptimizer). This [research](https://arxiv.org/abs/2402.11359) finds an approach of training LLM agents without modifying the model. As a case study, this technique optimizes a set of Python functions for agents to use in solving a set of training tasks. It is planned to be available as an experimental feature.
+- [EcoAssistant](/blog/2023/11/09/EcoAssistant). This [research](https://arxiv.org/abs/2310.03046) finds a multi-agent teaching approach when using agents with different capacities powered by different LLMs. For example, a GPT-4 agent can teach a GPT-3.5 agent by demonstration. With this approach, one only needs 1/3 or 1/2 of GPT-4's cost, while getting 10-20\% higher success rate than GPT-4 on coding-based QA. No finetuning is needed. All you need is a GPT-4 endpoint and a GPT-3.5-turbo endpoint. Help is appreciated to offer this technique as a feature in the AutoGen library.
+- [Teachability](/blog/2023/10/26/TeachableAgent). Every LLM agent in AutoGen can be made teachable, i.e., remember facts, preferences, skills etc. from interacting with other agents. For example, a user behind a user proxy agent can teach an assistant agent instructions in solving a difficult math problem. After teaching once, the problem solving rate for the assistant agent can have a dramatic improvement (e.g., 37% -> 95% for gpt-4-0613).
+![teach](img/teach.png)
+This feature works for GPTAssistantAgent (using OpenAI's assistant API) and group chat as well. One interesting use case of teachability + FSM group chat: [teaching resilience](https://www.linkedin.com/pulse/combatting-ai-naivete-teaching-resilience-emotional-leah-bonser-jdhrc).
+
+### Integration
+
+The extensible design of AutoGen makes it integratable with new technologies. For example:
+- [Custom models and clients](/blog/2024/01/26/Custom-Models) can be used as backends of an agent, such as Huggingface models and inference APIs.
+- [OpenAI assistant](/blog/2023/11/13/OAI-assistants) can be used as the backend of an agent (GPTAssistantAgent). It will be nice to reimplement it as a custom client to increase the compatibility with ConversableAgent.
+- [Multimodality](/blog/2023/11/06/LMM-Agent). LMM models like GPT-4V can be used to provide vision to an agent, and accomplish interesting multimodal tasks by conversing with other agents, including advanced image analysis, figure generation, and automatic iterative improvement in image generation.
+
+![multimodal](img/dalle_gpt4v.png)
+
+The above only covers a subset of new features and roadmap. There are many other interesting new features, integration examples or sample apps:
+- new features like stateful code execution, [tool decorators](/docs/Use-Cases/agent_chat#tool-calling), [long context handling](https://github.com/microsoft/autogen/blob/main/notebook/agentchat_capability_long_context_handling.ipynb), [web agents](https://github.com/microsoft/autogen/blob/main/notebook/agentchat_surfer.ipynb).
+- integration examples like using [guidance](https://github.com/microsoft/autogen/blob/main/notebook/agentchat_guidance.ipynb) to generate structured response.
+- sample apps like [AutoAnny](/blog/2024/02/02/AutoAnny).
+
+## Call for Help
+
+I appreciate the huge support from more than 14K members in the Discord community.
+Despite all the exciting progress, there are tons of open problems, issues and feature requests awaiting to be solved.
+We need more help to tackle the challenging problems and accelerate the development.
+You're all welcome to join our community and define the future of AI agents together.
+
+*Do you find this update helpful? Would you like to join force? Please join our [Discord](https://discord.gg/pAbnFJrkgZ) server for discussion.*
+
+![contributors](img/contributors.png)
--- a/website/docs/Research.md
+++ b/website/docs/Research.md
@ -47,3 +47,29 @@ For technical details, please check our technical report and research publicatio
    booktitle={ArXiv preprint arXiv:2310.03046},
 }
 ```
+
+* [Towards better Human-Agent Alignment: Assessing Task Utility in LLM-Powered Applications](https://arxiv.org/abs/2402.09015). Negar Arabzadeh, Julia Kiseleva, Qingyun Wu, Chi Wang, Ahmed Awadallah, Victor Dibia, Adam Fourney, Charles Clarke. ArXiv preprint arXiv:2402.09015 (2024).
+
+```bibtex
+@misc{Kiseleva2024agenteval,
+      title={Towards better Human-Agent Alignment: Assessing Task Utility in LLM-Powered Applications},
+      author={Negar Arabzadeh and Julia Kiseleva and Qingyun Wu and Chi Wang and Ahmed Awadallah and Victor Dibia and Adam Fourney and Charles Clarke},
+      year={2024},
+      eprint={2402.09015},
+      archivePrefix={arXiv},
+      primaryClass={cs.CL}
+}
+```
+
+* [Training Language Model Agents without Modifying Language Models](https://arxiv.org/abs/2402.11359). Shaokun Zhang, Jieyu Zhang, Jiale Liu, Linxin Song, Chi Wang, Ranjay Krishna, Qingyun Wu. ArXiv preprint arXiv:2402.09015 (2024).
+
+```bibtex
+@misc{zhang2024agentoptimizer,
+      title={Training Language Model Agents without Modifying Language Models},
+      author={Shaokun Zhang and Jieyu Zhang and Jiale Liu and Linxin Song and Chi Wang and Ranjay Krishna and Qingyun Wu},
+      year={2024},
+      eprint={2402.11359},
+      archivePrefix={arXiv},
+      primaryClass={cs.AI}
+}
+```