mirror of https://github.com/microsoft/autogen.git
Update TRANSPARENCY_FAQS.md (#590)
Modified Transparency wording per Olga's advice.
This commit is contained in:
parent
d6dce9ebb1
commit
207330577f
|
@ -28,7 +28,7 @@ AutoGen is a generic infrastructure that can be used in multiple scenarios. The
|
||||||
While AutoGen automates LLM workflows, decisions about how to use specific LLM outputs should always have a human in the loop. For example, you should not use AutoGen to automatically post LLM generated content to social media.
|
While AutoGen automates LLM workflows, decisions about how to use specific LLM outputs should always have a human in the loop. For example, you should not use AutoGen to automatically post LLM generated content to social media.
|
||||||
|
|
||||||
## How was AutoGen evaluated? What metrics are used to measure performance?
|
## How was AutoGen evaluated? What metrics are used to measure performance?
|
||||||
- We tested the new release of AutoGen TeamOne on a test of 40 cross-domain prompt injection attacks and all returned the expected results with no signs of jailbreak.
|
- We performed testing for Responsible AI harm e.g., cross-domain prompt injection and all tests returned the expected results with no signs of jailbreak.
|
||||||
- AutoGen was evaluated on six applications to illustrate its potential in simplifying the development of high-performance multi-agent applications. These applications are selected based on their real-world relevance, problem difficulty and problem-solving capabilities enabled by AutoGen, and innovative potential. These applications involve using AutoGen to solve math problems, question answering, decision making in text world environments, supply chain optimization, etc. For each of these domains AutoGen was evaluated on various success-based metrics (i.e., how often the AutoGen based implementation solved the task). And, in some cases, AutoGen based approach was also evaluated on implementation efficiency (e.g., to track reductions in developer effort to build). More details can be found at: https://aka.ms/autogen-pdf.
|
- AutoGen was evaluated on six applications to illustrate its potential in simplifying the development of high-performance multi-agent applications. These applications are selected based on their real-world relevance, problem difficulty and problem-solving capabilities enabled by AutoGen, and innovative potential. These applications involve using AutoGen to solve math problems, question answering, decision making in text world environments, supply chain optimization, etc. For each of these domains AutoGen was evaluated on various success-based metrics (i.e., how often the AutoGen based implementation solved the task). And, in some cases, AutoGen based approach was also evaluated on implementation efficiency (e.g., to track reductions in developer effort to build). More details can be found at: https://aka.ms/autogen-pdf.
|
||||||
- We evaluated [a team of AutoGen agents](https://github.com/microsoft/autogen/tree/gaia_multiagent_v01_march_1st/samples/tools/autogenbench/scenarios/GAIA/Templates/Orchestrator) on the [GAIA benchmark](https://arxiv.org/abs/2311.12983), and got [SOTA results](https://huggingface.co/spaces/gaia-benchmark/leaderboard) as of March 1, 2024.
|
- We evaluated [a team of AutoGen agents](https://github.com/microsoft/autogen/tree/gaia_multiagent_v01_march_1st/samples/tools/autogenbench/scenarios/GAIA/Templates/Orchestrator) on the [GAIA benchmark](https://arxiv.org/abs/2311.12983), and got [SOTA results](https://huggingface.co/spaces/gaia-benchmark/leaderboard) as of March 1, 2024.
|
||||||
|
|
||||||
|
|
Loading…
Reference in New Issue