Adding Benchmarks back into agbench and updates to agbench (#3711)

2024-10-11 18:46:18 -04:00 · 2024-10-11 18:46:18 -04:00 · 373adc9a34
parent 10b006b9ba
commit 373adc9a34
10 changed files with 478 additions and 144 deletions
--- a/python/packages/agbench/.gitignore
+++ b/python/packages/agbench/.gitignore
@ -1,3 +1,3 @@
 scenarios/*/Downloads
 scenarios/*/Tasks
-*/Results
+*/Results
--- a/python/packages/agbench/CONTRIBUTING.md
+++ b/python/packages/agbench/CONTRIBUTING.md
@ -6,12 +6,11 @@ As part of the broader AutoGen project, AutoGenBench welcomes community contribu
 We ask that all contributions to AutoGenBench adhere to the following:

 - Follow AutoGen's broader [contribution guidelines](https://microsoft.github.io/autogen/docs/Contribute)
- All AutoGenBench benchmarks should live in a subfolder of `/samples/tools/autogenbench/scenarios` alongside `HumanEval`, `GAIA`, etc.
+- All AutoGenBench benchmarks should live in a subfolder of `/benchmarks` alongside `HumanEval`, `GAIA`, etc.
 - Benchmark scenarios should include a detailed README.md, in the root of their folder, describing the benchmark and providing citations where warranted.
 - Benchmark data (tasks, ground truth, etc.) should be downloaded from their original sources rather than hosted in the AutoGen repository (unless the benchmark is original, and the repository *is* the original source)
    - You can use the `Scripts/init_tasks.py` file to automate this download.
- Basic scoring should be compatible with the `autogenbench tabulate` command (e.g., by outputting logs compatible with the default tabulation mechanism, or by providing a `Scripts/custom_tabulate.py` file)
- If you wish your benchmark to be compatible with the `autogenbench clone` command, include a `MANIFEST.json` file in the root of your folder.
+- Basic scoring should be compatible with the `agbench tabulate` command (e.g., by outputting logs compatible with the default tabulation mechanism, or by providing a `Scripts/custom_tabulate.py` file)

 These requirements are further detailed below, but if you simply copy the `HumanEval` folder, you will already be off to a great start.

@ -62,16 +61,16 @@ For example:

 In this example, the string `__MODEL__` will be replaced in the file `scenarios.py`, while the string `__PROMPT__` will be replaced in the `prompt.txt` file.

-The `template` field can also take on a list value, but this usage is considered advanced and is not described here. See the `autogenbench/run_cmd.py` code, or the `GAIA` benchmark tasks files for additional information about this option.
+The `template` field can also take on a list value, but this usage is considered advanced and is not described here. See the `agbench/run_cmd.py` code, or the `GAIA` benchmark tasks files for additional information about this option.


 ## Task Instance Expansion Algorithm

-Once the tasks have been defined, as per above, they must be "instantiated" before they can be run. This instantiation happens automatically when the user issues the `autogenbench run` command and involves creating a local folder to share with Docker. Each instance and repetition gets its own folder along the path: `./results/[scenario]/[task_id]/[instance_id]`. For the sake of brevity we will refer to this folder as the `DEST_FOLDER`.
+Once the tasks have been defined, as per above, they must be "instantiated" before they can be run. This instantiation happens automatically when the user issues the `agbench run` command and involves creating a local folder to share with Docker. Each instance and repetition gets its own folder along the path: `./results/[scenario]/[task_id]/[instance_id]`. For the sake of brevity we will refer to this folder as the `DEST_FOLDER`.

 The algorithm for populating the `DEST_FOLDER` is as follows:

-1. Pre-populate DEST_FOLDER with all the basic starter files for running a scenario (found in `autogenbench/template`).
+1. Pre-populate DEST_FOLDER with all the basic starter files for running a scenario (found in `agbench/template`).
 2. Recursively copy the template folder specified in the JSONL line to DEST_FOLDER (if the JSON `template` attribute points to a folder) If the JSONs `template` attribute instead points to a file, copy the file, but rename it to `scenario.py`
 3. Apply any string replacements, as outlined in the prior section.
 4. Write a run.sh file to DEST_FOLDER that will be executed by Docker when it is loaded. The `run.sh` is described below.
@ -139,9 +138,8 @@ echo RUN.SH COMPLETE !#!#
 Be warned that this listing is provided here for illustration purposes, and may vary over time. The source of truth are the `run.sh` files found in the ``./results/[taskset]/[task_id]/[instance_id]`` folders.


-## Integrating with the `tabulate` and `clone` commands.
-
-The above details are sufficient for defining and running tasks, but if you wish to support the `autogenbench tabulate` and `autogenbench clone` commands, a few additional steps are required.
+## Integrating with the `tabulate` 
+The above details are sufficient for defining and running tasks, but if you wish to support the `agbench tabulate`  commands, a few additional steps are required.

 ### Tabulations

@ -154,35 +152,10 @@ Should you provide a custom tabulation script, please implement `--help` and `-h
 The `scenarios/GAIA/Scripts/custom_tabulate.py` is a great example of custom tabulation. It also shows how you can reuse some components of the default tabulator to speed up development.


-### Cloning

-If you wish your benchmark to be available via the `autogenbench clone` command, you will need to take three additional steps:
-
-#### Manifest
-First, provide a `MANIFEST.json` file in the root of your benchmark. An example is provided below, from which you can see the schema:
-
-```json
-{
-    "files": {
-        "Templates/TwoAgents/prompt.txt": "Templates/TwoAgents/prompt.txt",
-        "Templates/TwoAgents/coding/my_tests.py": "Templates/TwoAgents/coding/my_tests.py",
-        "Templates/TwoAgents/scenario.py": "Templates/TwoAgents/scenario.py",
-        "README.md": "README.md",
-	"Scripts/init_tasks.py": "Scripts/init_tasks.py",
-	"Scripts/custom_tabulate.py": "Scripts/custom_tabulate.py"
-    }
-}
-```
-
-The keys of the `files` dictionary are local paths, relative to your benchmark's root directory. The values are relative paths in the AutoGen GitHub repository (relative to the folder where the MANIFEST.json file is located). In most cases, the keys and values will be identical.
-
-#### SCENARIOS dictionary
-Second, you must add an entry to the `scenarios` dictionary in `autogen/samples/tools/autogenbench/scenarios/MANIFEST.json`.
-
-#### Scripts/init_tasks.py
-Finally, you should provide an `Scripts/init_tasks.py` file, in your benchmark folder, and include a `main()` method therein. This method will be loaded and called automatically by `autogenbench clone` after all manifest files have been downloaded.
+## Scripts/init_tasks.py
+Finally, you should provide an `Scripts/init_tasks.py` file, in your benchmark folder, and include a `main()` method therein. 

 This `init_tasks.py` script is a great place to download benchmarks from their original sources and convert them to the JSONL format required by AutoGenBench:
 - See `HumanEval/Scripts/init_tasks.py` for an example of how to expand a benchmark from an original GitHub repository.
 - See `GAIA/Scripts/init_tasks.py` for an example of how to expand a benchmark from `Hugging Face Hub`.
- See `MATH/SCripts/init_tasks.py` for an example of how to expand a benchmark from an author-hosted website.
--- a/python/packages/agbench/README.md
+++ b/python/packages/agbench/README.md
@ -1,29 +1,35 @@
 # AutoGenBench

-AutoGenBench is a tool for repeatedly running a set of pre-defined AutoGen tasks in a setting with tightly-controlled initial conditions. With each run, AutoGenBench will start from a blank slate. The agents being evaluated will need to work out what code needs to be written, and what libraries or dependencies to install, to solve tasks. The results of each run are logged, and can be ingested by analysis or metrics scripts (such as `autogenbench tabulate`). By default, all runs are conducted in freshly-initialized docker containers, providing the recommended level of consistency and safety.
+AutoGenBench (agbench) is a tool for repeatedly running a set of pre-defined AutoGen tasks in a setting with tightly-controlled initial conditions. With each run, AutoGenBench will start from a blank slate. The agents being evaluated will need to work out what code needs to be written, and what libraries or dependencies to install, to solve tasks. The results of each run are logged, and can be ingested by analysis or metrics scripts (such as `agbench tabulate`). By default, all runs are conducted in freshly-initialized docker containers, providing the recommended level of consistency and safety.

 AutoGenBench works with all AutoGen 0.1.*, and 0.2.* versions.

 ## Technical Specifications

-If you are already an AutoGenBench pro, and want the full technical specifications, please review the [contributor's guide](CONTRIBUTING.md).
-
+If you are already an AutoGenBench pro, and want the full technical specifications, please review the [contributor&#39;s guide](CONTRIBUTING.md).

 ## Docker Requirement
+
 AutoGenBench also requires Docker (Desktop or Engine). **It will not run in GitHub codespaces**, unless you opt for native execution (with is strongly discouraged). To install Docker Desktop see [https://www.docker.com/products/docker-desktop/](https://www.docker.com/products/docker-desktop/).

+If you are working in WSL, you can follow the instructions below to set up your environment:
+
+1. Install Docker Desktop. After installation, restart is needed, then open Docker Desktop, in Settings, Ressources, WSL Integration, Enable integration with additional distros – Ubuntu
+2. Clone autogen and export `AUTOGEN_REPO_BASE`. This environment variable enables the Docker containers to use the correct version agents.
+    ```bash
+    git clone git@github.com:microsoft/autogen.git
+    export AUTOGEN_REPO_BASE=<path_to_autogen>
+    ```
+
 ## Installation and Setup

-**To get the most out of AutoGenBench, the `autogenbench` package should be installed**. At present, the easiest way to do this is to install it via `pip`:
+[Deprecated currently] **To get the most out of AutoGenBench, the `agbench` package should be installed**. At present, the easiest way to do this is to install it via `pip`.
+
+
+If you would prefer working from source code (e.g., for development, or to utilize an alternate branch), simply clone the [AutoGen](https://github.com/microsoft/autogen) repository, then install `agbench` via:

 ```
-pip install autogenbench
-```
-
-If you would prefer working from source code (e.g., for development, or to utilize an alternate branch), simply clone the [AutoGen](https://github.com/microsoft/autogen) repository, then install `autogenbench` via:
-
-```
-pip install -e autogen/samples/tools/autogenbench
+pip install -e autogen/python/packages/agbench
 ```

 After installation, you must configure your API keys. As with other AutoGen applications, AutoGenBench will look for the OpenAI keys in the OAI_CONFIG_LIST file in the current working directory, or the OAI_CONFIG_LIST environment variable. This behavior can be overridden using a command-line parameter described later.
@ -36,7 +42,6 @@ export OAI_CONFIG_LIST=$(cat ./OAI_CONFIG_LIST)

 If an OAI_CONFIG_LIST is *not* provided (by means of file or environment variable), AutoGenBench will use the OPENAI_API_KEY environment variable instead.

-
 For some benchmark scenarios, additional keys may be required (e.g., keys for the Bing Search API). These can be added to an `ENV.json` file in the current working folder. An example `ENV.json` file is provided below:

 ```
@ -46,75 +51,106 @@ For some benchmark scenarios, additional keys may be required (e.g., keys for th
 ```

 ## A Typical Session
+
 Once AutoGenBench and necessary keys are installed, a typical session will look as follows:

+
+
+Navigate to HumanEval
+
+```bash
+cd autogen/python/packages/agbench/benchmarks/HumanEval
 ```
-autogenbench clone HumanEval
-cd HumanEval
-autogenbench run Tasks/r_human_eval_two_agents.jsonl
-autogenbench tabulate results/r_human_eval_two_agents
+**Note:** The following instructions are specific to the HumanEval benchmark. For other benchmarks, please refer to the README in the respective benchmark folder, e.g.,: [AssistantBench](benchmarks/AssistantBench/README.md).
+
+
+Create a file called ENV.json with the following (required) contents (If you're using MagenticOne), if using Azure:
+
+```json
+{
+    "CHAT_COMPLETION_KWARGS_JSON": "{}",
+    "CHAT_COMPLETION_PROVIDER": "azure"
+}
+```
+
+You can also use the openai client by replacing the last two entries in the ENV file by:
+
+- `CHAT_COMPLETION_PROVIDER='openai'`
+- `CHAT_COMPLETION_KWARGS_JSON` with the following JSON structure:
+
+```json
+{
+  "api_key": "REPLACE_WITH_YOUR_API",
+  "model": "REPLACE_WITH_YOUR_MODEL"
+}
+```
+
+Now initialize the tasks.
+
+```bash
+python Scripts/init_tasks.py
+```
+
+Note: This will attempt to download HumanEval
+
+
+Once the script completes, you should now see a folder in your current directory called `Tasks` that contains one JSONL file per template in `Templates`.
+
+Now to run a specific subset of HumanEval use:
+
+```bash
+agbench run Tasks/human_eval_MagenticOne.jsonl
+```
+
+You should see the command line print the raw logs that shows the agents in action To see a summary of the results (e.g., task completion rates), in a new terminal run the following:
+
+```bash
+agbench tabulate Results/human_eval_MagenticOne
 ```

 Where:
- `autogenbench clone HumanEval` downloads and expands the HumanEval benchmark scenario.
- `autogenbench run Tasks/r_human_eval_two_agents.jsonl` runs the tasks defined in `Tasks/r_human_eval_two_agents.jsonl`
- `autogenbench tablue results/r_human_eval_two_agents` tabulates the results of the run
+
+- `agbench run Tasks/human_eval_MagenticOne.jsonl` runs the tasks defined in `Tasks/human_eval_MagenticOne.jsonl`
+- `agbench tablue results/human_eval_MagenticOne` tabulates the results of the run

 Each of these commands has extensive in-line help via:

- `autogenbench --help`
- `autogenbench clone --help`
- `autogenbench run --help`
- `autogenbench tabulate --help`
+- `agbench --help`
+- `agbench run --help`
+- `agbench tabulate --help`
+- `agbench remove_missing --help`

-**NOTE:** If you are running `autogenbench` from within the repository, you don’t need to run `autogenbench clone`. Instead, navigate to the appropriate scenario folder (e.g., `scenarios/HumanEval`) and run the `Scripts/init_tasks.py` file.
+**NOTE:** If you are running `agbench` from within the repository, you need to navigate to the appropriate scenario folder (e.g., `scenarios/HumanEval`) and run the `Scripts/init_tasks.py` file.

 More details of each command are provided in the sections that follow.

-## Cloning Benchmarks
-To clone an existing benchmark, simply run:
-```
-autogenbench clone [BENCHMARK]
-```
-
-For example,
-
-```
-autogenbench clone HumanEval
-```
-
-To see which existing benchmarks are available to clone, run:
-
-```
-autogenbench clone --list
-```
-
-> Note: You might need to log in to HuggingFace to access certain datasets like GAIA. To do this, run `huggingface-cli login` in your terminal and follow the prompts.

 ## Running AutoGenBench

 To run a benchmark (which executes the tasks, but does not compute metrics), simply execute:
+
 ```
 cd [BENCHMARK]
-autogenbench run Tasks
+agbench run Tasks/*.jsonl
 ```

 For example,
+
 ```
 cd HumanEval
-autogenbench run Tasks
+agbench run Tasks/human_eval_MagenticOne.jsonl
 ```

 The default is to run each task once. To run each scenario 10 times, use:

 ```
-autogenbench run --repeat 10 Tasks
+agbench run --repeat 10 Tasks/human_eval_MagenticOne.jsonl
 ```

-The `autogenbench` command-line tool allows a number of command-line arguments to control various parameters of execution. Type ``autogenbench -h`` to explore these options:
+The `agbench` command-line tool allows a number of command-line arguments to control various parameters of execution. Type ``agbench -h`` to explore these options:

 ```
-'autogenbench run' will run the specified autogen scenarios for a given number of repetitions and record all logs and trace information. When running in a Docker environment (default), each run will begin from a common, tightly controlled, environment. The resultant logs can then be further processed by other scripts to produce metrics.
+'agbench run' will run the specified autogen scenarios for a given number of repetitions and record all logs and trace information. When running in a Docker environment (default), each run will begin from a common, tightly controlled, environment. The resultant logs can then be further processed by other scripts to produce metrics.

 positional arguments:
  scenario      The JSONL scenario file to run. If a directory is specified,
@ -140,7 +176,7 @@ options:
                        The requirements file to pip install before running the scenario.
  -d DOCKER_IMAGE, --docker-image DOCKER_IMAGE
                        The Docker image to use when running scenarios. Can not be used together with --native. (default:
-                        'autogenbench:default', which will be created if not present)
+                        'agbench:default', which will be created if not present)
  --native              Run the scenarios natively rather than in docker. NOTE: This is not advisable, and should be done
                        with great caution.
 ```
@ -171,4 +207,4 @@ Within each folder, you will find the following files:

 ## Contributing or Defining New Tasks or Benchmarks

-If you would like to develop -- or even contribute -- your own tasks or benchmarks, please review the [contributor's guide](CONTRIBUTING.md) for complete technical details.
+If you would like to develop -- or even contribute -- your own tasks or benchmarks, please review the [contributor&#39;s guide](CONTRIBUTING.md) for complete technical details.
--- a/python/packages/agbench/pyproject.toml
+++ b/python/packages/agbench/pyproject.toml
@ -24,7 +24,8 @@ dependencies = [
    "huggingface_hub",
    "tabulate",
    "azure-identity",
-    "pandas"
+    "pandas",
+    "scipy"
 ]

 [tool.uv]
--- a/python/packages/agbench/src/agbench/cli.py
+++ b/python/packages/agbench/src/agbench/cli.py
@ -3,6 +3,7 @@ from typing import Callable, List, Optional, Sequence

 from typing_extensions import TypedDict

+from .remove_missing_cmd import remove_missing_cli
 from .run_cmd import run_cli
 from .tabulate_cmd import tabulate_cli
 from .version import __version__
@ -32,6 +33,11 @@ def main(args: Optional[List[str]] = None) -> None:
            "description": "tabulate the results of a previous run",
            "function": tabulate_cli,
        },
+        {
+            "command": "remove_missing",
+            "description": "remove folders with missing results",
+            "function": remove_missing_cli,
+        },
        {
            "command": "--version",
            "description": f"print the version of {invocation_cmd}",
@ -68,12 +74,7 @@ usage: {invocation_cmd} COMMAND ARGS

 {invocation_cmd} is a tool for running and managing AutoGen benchmark scenarios. A typically session might resemble:

-    {invocation_cmd} clone HumanEval
-    cd HumanEval
-    {invocation_cmd} run Tasks/human_eval_two_agents_gpt4.jsonl
-
-which will download the HumanEval benchmark, expand it, and then run the benchmark once with the `human_eval_two_agents_gpt4` configuration.
-
+\
 Available COMMANDs include:

 {commands_details}
@ -81,7 +82,6 @@ Available COMMANDs include:
 Additionally, you can use the --help option with any command for further command-specific instructions. E.g.,

    {invocation_cmd} run --help
-    {invocation_cmd} clone --help

 """.strip()

--- a/python/packages/agbench/src/agbench/remove_missing_cmd.py
+++ b/python/packages/agbench/src/agbench/remove_missing_cmd.py
@ -0,0 +1,123 @@
+import argparse
+import os
+import shutil
+import sys
+from typing import Sequence
+
+
+def default_scorer(instance_dir: str) -> bool:
+    """
+    returns True if the instance_dir has the expected ending pattern in the console_log.txt file
+    """
+    console_log = os.path.join(instance_dir, "console_log.txt")
+    if os.path.isfile(console_log):
+        with open(console_log, "rt") as fh:
+            content = fh.read()
+            # Use a regular expression to match the expected ending pattern
+            has_final_answer = "FINAL ANSWER:" in content
+            has_scenario_complete = "SCENARIO.PY COMPLETE !#!#" in content
+            has_run_complete = "RUN.SH COMPLETE !#!#" in content
+            # if so, return False
+            last_10_lines = content.splitlines()[-10:]
+            last_10_lines_joined = "\n".join(last_10_lines)
+            has_error_in_last_10_lines = "Error code" in last_10_lines_joined
+            has_all = has_final_answer and has_scenario_complete and has_run_complete and not has_error_in_last_10_lines
+            if not has_all:
+                print(content)
+            return has_all
+    return False
+
+
+def delete_folders_with_missing_results(runlogs_path: str, noconfirm: bool = False) -> None:
+    deleted_folders = 0
+
+    for task_id in os.listdir(runlogs_path):
+        task_path = os.path.join(runlogs_path, task_id)
+
+        if not os.path.isdir(task_path):
+            continue
+
+        instance = 0
+        has_missing_results = False
+
+        while True:
+            instance_dir = os.path.join(task_path, str(instance))
+            if not os.path.isdir(instance_dir):
+                if instance == 0:
+                    print(f"Empty folder: {task_path}")
+                    has_missing_results = True
+                break
+            if not default_scorer(instance_dir):
+                has_missing_results = True
+                break
+
+            instance += 1
+        if has_missing_results:
+            if not noconfirm:
+                print(f"Missing Results in : {task_path}")
+                user_confirmation = input("Press 1 to delete, anything else to skip...")
+                if user_confirmation == "1":
+                    shutil.rmtree(task_path)
+                    print(f"Deleted folder: {task_path}")
+                    deleted_folders += 1
+                else:
+                    print(f"Skipping folder: {task_path}")
+            else:
+                shutil.rmtree(task_path)
+                print(f"Deleted folder: {task_path}")
+                deleted_folders += 1
+
+    print(f"Total folders deleted: {deleted_folders}")
+
+
+def remove_missing_cli(args: Sequence[str]) -> None:
+    invocation_cmd = args[0]
+    args = args[1:]
+    runlogs_path = args[0]
+
+    parser = argparse.ArgumentParser(
+        prog=invocation_cmd,
+        description=f"{invocation_cmd} will remove folders with missing results.",
+    )
+
+    parser.add_argument(
+        "runlogs",
+        help="The path where the run's logs are stored.",
+    )
+    parser.add_argument(
+        "-c",
+        "--noconfirm",
+        action="store_true",
+        help="Disable confirmation prompt before deleting folders.",
+    )
+
+    parsed_args = parser.parse_args(args)
+    print(parsed_args)
+    if not os.path.isdir(parsed_args.runlogs):
+        print(f"Error: '{runlogs_path}' is not a valid directory.")
+        print("Usage: agbench remove_missing <path_to_runlogs>")
+
+        sys.exit(1)
+    if not parsed_args.noconfirm:
+        input(
+            "Did you modify the default_scorer function to match the expected ending pattern? Press Enter to continue..."
+        )
+
+    delete_folders_with_missing_results(parsed_args.runlogs, parsed_args.noconfirm)
+
+
+if __name__ == "__main__":
+    if len(sys.argv) < 2:
+        print("Usage: python remove_missing_cmd.py <path_to_runlogs> [-c]")
+        sys.exit(1)
+
+    runlogs_path = sys.argv[1]
+    noconfirm = False
+    if len(sys.argv) == 3 and sys.argv[2] == "-c":
+        noconfirm = True
+    if not os.path.isdir(runlogs_path):
+        print(f"Error: '{runlogs_path}' is not a valid directory.")
+        sys.exit(1)
+    input("Did you modify the default_scorer function to match the expected ending pattern? Press Enter to continue...")
+
+    delete_folders_with_missing_results(runlogs_path, noconfirm)
--- a/python/packages/agbench/src/agbench/run_cmd.py
+++ b/python/packages/agbench/src/agbench/run_cmd.py
@ -10,7 +10,8 @@ import subprocess
 import sys
 import time
 import traceback
-from typing import Callable, Dict, List, Mapping, Optional, Sequence, Tuple, Union, cast
+from multiprocessing import Pool
+from typing import Any, Callable, Dict, List, Mapping, Optional, Sequence, Tuple, Union, cast

 import docker
 from azure.core.exceptions import ClientAuthenticationError
@ -244,7 +245,9 @@ def expand_scenario(scenario_dir: str, scenario: ScenarioInstance, output_dir: s
                fh.write(line)


-def get_scenario_env(token_provider: Optional[Callable[[], str]], env_file: str = DEFAULT_ENV_FILE) -> Dict[str, str]:
+def get_scenario_env(
+    token_provider: Optional[Callable[[], str]] = None, env_file: str = DEFAULT_ENV_FILE
+) -> Dict[str, str]:
    """
    Return a dictionary of environment variables needed to run a scenario.

@ -269,7 +272,12 @@ def get_scenario_env(token_provider: Optional[Callable[[], str]], env_file: str
    azure_openai_ad_token = os.environ.get("AZURE_OPENAI_AD_TOKEN")
    if not azure_openai_ad_token and token_provider:
        azure_openai_ad_token = token_provider()
-
+    if not azure_openai_ad_token:
+        azure_token_provider = get_azure_token_provider()
+        if azure_token_provider:
+            azure_openai_ad_token = azure_token_provider()
+        else:
+            logging.warning("No Azure AD token provider found. Azure AD token not set.")
    if azure_openai_ad_token is not None and len(azure_openai_ad_token.strip()) > 0:
        env["AZURE_OPENAI_AD_TOKEN"] = azure_openai_ad_token

@ -305,7 +313,7 @@ def run_scenario_natively(work_dir: str, env: Mapping[str, str], timeout: int =
        f.write(
            f"""#
 echo RUN.SH STARTING !#!#
-export AGNEXT_TESTBED_SETTING="Native"
+export AUTOGEN_TESTBED_SETTING="Native"
 echo "agbench version: {__version__}" > timestamp.txt

 # Create and activate the virtual environment
@ -425,7 +433,7 @@ def run_scenario_in_docker(
        f.write(
            f"""#
 echo RUN.SH STARTING !#!#
-export AGNEXT_TESTBED_SETTING="Docker"
+export AUTOGEN_TESTBED_SETTING="Docker"

 umask 000
 echo "agbench version: {__version__}" > timestamp.txt
@ -477,20 +485,20 @@ echo RUN.SH COMPLETE !#!#
    # Figure out what folders to mount
    volumes = {str(pathlib.Path(work_dir).absolute()): {"bind": "/workspace", "mode": "rw"}}

-    # Add the autogen_core repo if we can find it
-    agnext_repo_base = os.environ.get("AGNEXT_REPO_BASE")
-    if agnext_repo_base is None:
-        agnext_repo_base = find_agnext_repo(os.getcwd())
-    elif not os.path.isdir(agnext_repo_base):
-        raise FileNotFoundError(errno.ENOENT, os.strerror(errno.ENOENT), agnext_repo_base)
+    # Add the autogen repo if we can find it
+    autogen_repo_base = os.environ.get("AUTOGEN_REPO_BASE")
+    if autogen_repo_base is None:
+        autogen_repo_base = find_autogen_repo(os.getcwd())
+    elif not os.path.isdir(autogen_repo_base):
+        raise FileNotFoundError(errno.ENOENT, os.strerror(errno.ENOENT), autogen_repo_base)

-    if agnext_repo_base is None:
+    if autogen_repo_base is None:
        raise ValueError(
-            "Could not find AutoGen repo base. Please set the environment variable AGNEXT_REPO_BASE to the correct value."
+            "Could not find AutoGen repo base. Please set the environment variable AUTOGEN_REPO_BASE to the correct value."
        )

-    agnext_repo_base = os.path.join(agnext_repo_base, "python")
-    volumes[str(pathlib.Path(agnext_repo_base).absolute())] = {"bind": "/autogen_core", "mode": "rw"}
+    autogen_repo_base = os.path.join(autogen_repo_base, "python")
+    volumes[str(pathlib.Path(autogen_repo_base).absolute())] = {"bind": "/autogen_python", "mode": "rw"}

    print("Mounting:")
    for k in volumes:
@ -583,7 +591,7 @@ def build_default_docker_image(docker_client: docker.DockerClient, image_tag: st
            sys.stdout.write(segment["stream"])


-def find_agnext_repo(path: str) -> Optional[str]:
+def find_autogen_repo(path: str) -> Optional[str]:
    """
    Utility for identifying if the path is a subdirectory of the autogen_core repo.

@ -611,6 +619,135 @@ def find_agnext_repo(path: str) -> Optional[str]:
    return None


+def split_jsonl(file_path: str, num_parts: int) -> List[List[Dict[str, Any]]]:
+    """
+    Split a JSONL file into num_parts approximately equal parts.
+    """
+    with open(file_path, "r") as f:
+        data = [json.loads(line) for line in f]
+
+    random.shuffle(data)  # Shuffle the data for better distribution
+    chunk_size = len(data) // num_parts
+    return [data[i : i + chunk_size] for i in range(0, len(data), chunk_size)]
+
+
+def mkdir_p(path: str) -> None:
+    """
+    Create a directory if it doesn't exist, handling race conditions.
+    """
+    try:
+        os.makedirs(path, exist_ok=True)
+    except OSError as exc:
+        if exc.errno != errno.EEXIST:
+            raise
+
+
+def run_scenarios_subset(
+    scenario_name: str,
+    scenarios: List[Dict[str, Any]],
+    n_repeats: int,
+    is_native: bool,
+    docker_image: Optional[str] = None,
+    results_dir: str = "Results",
+    subsample: Union[None, int, float] = None,
+) -> None:
+    """
+    Run a subset of agbench scenarios a given number of times.
+    """
+    for instance in scenarios:
+        # Create a folder to store the results
+        # Results base
+
+        mkdir_p(results_dir)
+
+        # Results for the scenario
+
+        results_scenario = os.path.join(results_dir, scenario_name)
+        mkdir_p(results_scenario)
+
+        # Results for the instance
+        results_instance = os.path.join(results_scenario, instance["id"])
+        mkdir_p(results_instance)
+
+        # Results for the repeats
+        for i in range(0, n_repeats):
+            results_repetition = os.path.join(results_instance, str(i))
+
+            # Skip it if it already exists
+            if os.path.isdir(results_repetition):
+                print(f"Found folder {results_repetition} ... Skipping.")
+                continue
+            print(f"Running scenario {results_repetition}")
+
+            # Expand the scenario
+            expand_scenario(".", instance, results_repetition)  # type: ignore
+
+            # Prepare the environment (keys/values that need to be added)
+            env = get_scenario_env()
+
+            # Run the scenario
+            if is_native:
+                run_scenario_natively(results_repetition, env)
+            else:
+                run_scenario_in_docker(
+                    results_repetition,
+                    env,
+                    docker_image=docker_image,
+                )
+
+
+def run_parallel(args: argparse.Namespace) -> None:
+    """
+    Run scenarios in parallel.
+    """
+    # Read and split the JSONL file
+    scenarios = split_jsonl(args.scenario, args.parallel)
+    scenario_name_parts = os.path.basename(args.scenario).split(".")
+    scenario_name_parts.pop()
+    scenario_name = ".".join(scenario_name_parts)
+
+    # Create a pool of worker processes
+    with Pool(processes=args.parallel) as pool:
+        # Prepare arguments for each worker
+        worker_args = [
+            (
+                scenario_name,
+                scenario_subset,
+                args.repeat,
+                args.native,
+                args.docker_image,
+                "Results",
+                args.subsample,
+            )
+            for scenario_subset in scenarios
+        ]
+
+        # Run scenarios in parallel
+        pool.starmap(run_scenarios_subset, worker_args)
+
+
+def get_azure_token_provider() -> Optional[Callable[[], str]]:
+    """
+    Get the Azure bearer token generator if a token wasn't provided and there's any evidence of using Azure.
+    """
+    if not os.environ.get("AZURE_OPENAI_AD_TOKEN") and os.path.isdir(pathlib.Path("~/.azure").expanduser()):
+        logging.disable(logging.CRITICAL)
+        try:
+            azure_token_provider = get_bearer_token_provider(
+                DefaultAzureCredential(), "https://cognitiveservices.azure.com/.default"
+            )
+            azure_token_provider()  # Call it once to warm it up, and make sure it doesn't throw an error
+            print("Found Azure token provider.")
+            return azure_token_provider
+        except ClientAuthenticationError:
+            error_message = traceback.format_exc()
+            print(
+                f"Azure token provider failed loading. Try using 'az login --use-device-code':\n{error_message}\n\nContinuing without Azure token provider..."
+            )
+        logging.disable(logging.NOTSET)
+    return None
+
+
 def run_cli(args: Sequence[str]) -> None:
    invocation_cmd = args[0]
    args = args[1:]
@ -639,6 +776,15 @@ def run_cli(args: Sequence[str]) -> None:
        help='Run on a subsample of the tasks in the JSONL file(s). If a decimal value is specified, then run on the given proportion of tasks in each file. For example "0.7" would run on 70%% of tasks, and "1.0" would run on 100%% of tasks. If an integer value is specified, then randomly select *that* number of tasks from each specified JSONL file. For example "7" would run tasks, while "1" would run only 1 task from each specified JSONL file. (default: 1.0; which is 100%%)',
        default=None,
    )
+
+    parser.add_argument(
+        "-p",
+        "--parallel",
+        type=int,
+        help="The number of parallel processes to run (default: 1).",
+        default=1,
+    )
+
    parser.add_argument(
        "-d",
        "--docker-image",
@ -656,6 +802,10 @@ def run_cli(args: Sequence[str]) -> None:

    parsed_args = parser.parse_args(args)

+    # don't support parallel and subsample together
+    if parsed_args.parallel > 1 and parsed_args.subsample is not None:
+        sys.exit("The options --parallel and --subsample can not be used together currently. Exiting.")
+
    # Don't allow both --docker-image and --native on the same command
    if parsed_args.docker_image is not None and parsed_args.native:
        sys.exit("The options --native and --docker-image can not be used together. Exiting.")
@ -701,29 +851,17 @@ def run_cli(args: Sequence[str]) -> None:
                )

    # Get the Azure bearer token generator if a token wasn't provided and there's any evidence of using Azure
-    azure_token_provider = None
-    if not os.environ.get("AZURE_OPENAI_AD_TOKEN") and os.path.isdir(pathlib.Path("~/.azure").expanduser()):
-        logging.disable(logging.CRITICAL)
-        try:
-            azure_token_provider = get_bearer_token_provider(
-                DefaultAzureCredential(), "https://cognitiveservices.azure.com/.default"
-            )
-            azure_token_provider()  # Call it once to warm it up, and make sure it doesn't throw an error
-            print("Found Azure token provider.")
-        except ClientAuthenticationError:
-            error_message = traceback.format_exc()
-            azure_token_provider = None
-            print(
-                f"Azure token provider failed loading. Try using 'az login --use-device-code':\n{error_message}\n\nContinuing without Azure token provider..."
-            )
-        logging.disable(logging.NOTSET)
+    azure_token_provider = get_azure_token_provider()

    # Run the scenario
-    run_scenarios(
-        scenario=parsed_args.scenario,
-        n_repeats=parsed_args.repeat,
-        is_native=True if parsed_args.native else False,
-        token_provider=azure_token_provider,
-        docker_image=parsed_args.docker_image,
-        subsample=subsample,
-    )
+    if parsed_args.parallel > 1:
+        run_parallel(parsed_args)
+    else:
+        run_scenarios(
+            scenario=parsed_args.scenario,
+            n_repeats=parsed_args.repeat,
+            is_native=True if parsed_args.native else False,
+            token_provider=azure_token_provider,
+            docker_image=parsed_args.docker_image,
+            subsample=subsample,
+        )
--- a/python/packages/agbench/src/agbench/tabulate_cmd.py
+++ b/python/packages/agbench/src/agbench/tabulate_cmd.py
@ -183,7 +183,12 @@ def default_tabulate(

        footer_row = ["Failures"]
        for i in range(0, max_instances):
-            footer_row.append(_count_equals(False, i))
+            # count how many are not True, and not None, could be False or any other value
+            failures = 0
+            for row in all_results:
+                if isinstance(row[i + 1], tuple):
+                    failures += row[i + 1][0] != 1
+            footer_row.append(failures)
        footer.append(footer_row)

        footer_row = ["Missing"]
@ -196,6 +201,21 @@ def default_tabulate(
            footer_row.append(footer[0][i + 1] + footer[1][i + 1] + footer[2][i + 1])
        footer.append(footer_row)

+        footer_row = ["Average Success Rate"]
+        for i in range(0, max_instances):
+            footer_row.append(_count_equals(True, i) / (footer[0][i + 1] + footer[1][i + 1] + footer[2][i + 1]))
+        footer.append(footer_row)
+
+        footer_row = ["Average Score"]
+        for i in range(0, max_instances):
+            avg_score_trial = 0.0
+            for row in all_results:
+                if isinstance(row[i + 1], tuple):
+                    avg_score_trial += row[i + 1][0]
+            avg_score_trial = avg_score_trial / len(all_results)
+            footer_row.append(avg_score_trial)
+        footer.append(footer_row)
+
        table = deepcopy(all_results)
        for row in table:
            for trial in range(0, max_instances):
--- a/python/packages/autogen-magentic-one/readme.md
+++ b/python/packages/autogen-magentic-one/readme.md
@ -168,21 +168,19 @@ In addition, developers can also handle and process logs generated from the Auto

 You can install the Magentic-One package using pip and then run the example code to see how the agents work together to accomplish a task.

-
 1. Clone the code.
+
 ```bash
-# clone autogen_core
-cd python/teams/autogen-magentic-one
+git clone -b staging https://github.com/microsoft/autogen.git
+cd autogen/python/packages/autogen-magentic-one
 pip install -e .
 ```

 2. Configure the environment variables for the chat completion client. See instructions below.
-
-
-2. Now you can run the example code to see how the agents work together to accomplish a task.
+3. Now you can run the example code to see how the agents work together to accomplish a task.

 ```bash
-python examples/example.py
+python examples/example_websurfer.py
 ```


@ -230,4 +228,4 @@ Some functionalities, such as using web-search requires an API key for Bing.
 You can set it using:
 ```bash
 export BING_API_KEY=xxxxxxx
-```
+```
--- a/python/uv.lock
+++ b/python/uv.lock
@ -71,6 +71,7 @@ dependencies = [
    { name = "huggingface-hub" },
    { name = "openai" },
    { name = "pandas" },
+    { name = "scipy" },
    { name = "tabulate" },
 ]

@ -87,6 +88,7 @@ requires-dist = [
    { name = "huggingface-hub" },
    { name = "openai" },
    { name = "pandas" },
+    { name = "scipy" },
    { name = "tabulate" },
 ]

@ -4094,6 +4096,49 @@ wheels = [
    { url = "https://files.pythonhosted.org/packages/fe/f1/3db1590be946c14d86ac0cc8422e5808500903592b7ca09a097e425b1dba/ruff-0.4.8-py3-none-win_arm64.whl", hash = "sha256:14019a06dbe29b608f6b7cbcec300e3170a8d86efaddb7b23405cb7f7dcaf780", size = 7944828 },
 ]

+[[package]]
+name = "scipy"
+version = "1.14.1"
+source = { registry = "https://pypi.org/simple" }
+dependencies = [
+    { name = "numpy" },
+]
+sdist = { url = "https://files.pythonhosted.org/packages/62/11/4d44a1f274e002784e4dbdb81e0ea96d2de2d1045b2132d5af62cc31fd28/scipy-1.14.1.tar.gz", hash = "sha256:5a275584e726026a5699459aa72f828a610821006228e841b94275c4a7c08417", size = 58620554 }
+wheels = [
+    { url = "https://files.pythonhosted.org/packages/64/68/3bc0cfaf64ff507d82b1e5d5b64521df4c8bf7e22bc0b897827cbee9872c/scipy-1.14.1-cp310-cp310-macosx_10_13_x86_64.whl", hash = "sha256:b28d2ca4add7ac16ae8bb6632a3c86e4b9e4d52d3e34267f6e1b0c1f8d87e389", size = 39069598 },
+    { url = "https://files.pythonhosted.org/packages/43/a5/8d02f9c372790326ad405d94f04d4339482ec082455b9e6e288f7100513b/scipy-1.14.1-cp310-cp310-macosx_12_0_arm64.whl", hash = "sha256:d0d2821003174de06b69e58cef2316a6622b60ee613121199cb2852a873f8cf3", size = 29879676 },
+    { url = "https://files.pythonhosted.org/packages/07/42/0e0bea9666fcbf2cb6ea0205db42c81b1f34d7b729ba251010edf9c80ebd/scipy-1.14.1-cp310-cp310-macosx_14_0_arm64.whl", hash = "sha256:8bddf15838ba768bb5f5083c1ea012d64c9a444e16192762bd858f1e126196d0", size = 23088696 },
+    { url = "https://files.pythonhosted.org/packages/15/47/298ab6fef5ebf31b426560e978b8b8548421d4ed0bf99263e1eb44532306/scipy-1.14.1-cp310-cp310-macosx_14_0_x86_64.whl", hash = "sha256:97c5dddd5932bd2a1a31c927ba5e1463a53b87ca96b5c9bdf5dfd6096e27efc3", size = 25470699 },
+    { url = "https://files.pythonhosted.org/packages/d8/df/cdb6be5274bc694c4c22862ac3438cb04f360ed9df0aecee02ce0b798380/scipy-1.14.1-cp310-cp310-manylinux_2_17_aarch64.manylinux2014_aarch64.whl", hash = "sha256:2ff0a7e01e422c15739ecd64432743cf7aae2b03f3084288f399affcefe5222d", size = 35606631 },
+    { url = "https://files.pythonhosted.org/packages/47/78/b0c2c23880dd1e99e938ad49ccfb011ae353758a2dc5ed7ee59baff684c3/scipy-1.14.1-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl", hash = "sha256:8e32dced201274bf96899e6491d9ba3e9a5f6b336708656466ad0522d8528f69", size = 41178528 },
+    { url = "https://files.pythonhosted.org/packages/5d/aa/994b45c34b897637b853ec04334afa55a85650a0d11dacfa67232260fb0a/scipy-1.14.1-cp310-cp310-musllinux_1_2_x86_64.whl", hash = "sha256:8426251ad1e4ad903a4514712d2fa8fdd5382c978010d1c6f5f37ef286a713ad", size = 42784535 },
+    { url = "https://files.pythonhosted.org/packages/e7/1c/8daa6df17a945cb1a2a1e3bae3c49643f7b3b94017ff01a4787064f03f84/scipy-1.14.1-cp310-cp310-win_amd64.whl", hash = "sha256:a49f6ed96f83966f576b33a44257d869756df6cf1ef4934f59dd58b25e0327e5", size = 44772117 },
+    { url = "https://files.pythonhosted.org/packages/b2/ab/070ccfabe870d9f105b04aee1e2860520460ef7ca0213172abfe871463b9/scipy-1.14.1-cp311-cp311-macosx_10_13_x86_64.whl", hash = "sha256:2da0469a4ef0ecd3693761acbdc20f2fdeafb69e6819cc081308cc978153c675", size = 39076999 },
+    { url = "https://files.pythonhosted.org/packages/a7/c5/02ac82f9bb8f70818099df7e86c3ad28dae64e1347b421d8e3adf26acab6/scipy-1.14.1-cp311-cp311-macosx_12_0_arm64.whl", hash = "sha256:c0ee987efa6737242745f347835da2cc5bb9f1b42996a4d97d5c7ff7928cb6f2", size = 29894570 },
+    { url = "https://files.pythonhosted.org/packages/ed/05/7f03e680cc5249c4f96c9e4e845acde08eb1aee5bc216eff8a089baa4ddb/scipy-1.14.1-cp311-cp311-macosx_14_0_arm64.whl", hash = "sha256:3a1b111fac6baec1c1d92f27e76511c9e7218f1695d61b59e05e0fe04dc59617", size = 23103567 },
+    { url = "https://files.pythonhosted.org/packages/5e/fc/9f1413bef53171f379d786aabc104d4abeea48ee84c553a3e3d8c9f96a9c/scipy-1.14.1-cp311-cp311-macosx_14_0_x86_64.whl", hash = "sha256:8475230e55549ab3f207bff11ebfc91c805dc3463ef62eda3ccf593254524ce8", size = 25499102 },
+    { url = "https://files.pythonhosted.org/packages/c2/4b/b44bee3c2ddc316b0159b3d87a3d467ef8d7edfd525e6f7364a62cd87d90/scipy-1.14.1-cp311-cp311-manylinux_2_17_aarch64.manylinux2014_aarch64.whl", hash = "sha256:278266012eb69f4a720827bdd2dc54b2271c97d84255b2faaa8f161a158c3b37", size = 35586346 },
+    { url = "https://files.pythonhosted.org/packages/93/6b/701776d4bd6bdd9b629c387b5140f006185bd8ddea16788a44434376b98f/scipy-1.14.1-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl", hash = "sha256:fef8c87f8abfb884dac04e97824b61299880c43f4ce675dd2cbeadd3c9b466d2", size = 41165244 },
+    { url = "https://files.pythonhosted.org/packages/06/57/e6aa6f55729a8f245d8a6984f2855696c5992113a5dc789065020f8be753/scipy-1.14.1-cp311-cp311-musllinux_1_2_x86_64.whl", hash = "sha256:b05d43735bb2f07d689f56f7b474788a13ed8adc484a85aa65c0fd931cf9ccd2", size = 42817917 },
+    { url = "https://files.pythonhosted.org/packages/ea/c2/5ecadc5fcccefaece775feadcd795060adf5c3b29a883bff0e678cfe89af/scipy-1.14.1-cp311-cp311-win_amd64.whl", hash = "sha256:716e389b694c4bb564b4fc0c51bc84d381735e0d39d3f26ec1af2556ec6aad94", size = 44781033 },
+    { url = "https://files.pythonhosted.org/packages/c0/04/2bdacc8ac6387b15db6faa40295f8bd25eccf33f1f13e68a72dc3c60a99e/scipy-1.14.1-cp312-cp312-macosx_10_13_x86_64.whl", hash = "sha256:631f07b3734d34aced009aaf6fedfd0eb3498a97e581c3b1e5f14a04164a456d", size = 39128781 },
+    { url = "https://files.pythonhosted.org/packages/c8/53/35b4d41f5fd42f5781dbd0dd6c05d35ba8aa75c84ecddc7d44756cd8da2e/scipy-1.14.1-cp312-cp312-macosx_12_0_arm64.whl", hash = "sha256:af29a935803cc707ab2ed7791c44288a682f9c8107bc00f0eccc4f92c08d6e07", size = 29939542 },
+    { url = "https://files.pythonhosted.org/packages/66/67/6ef192e0e4d77b20cc33a01e743b00bc9e68fb83b88e06e636d2619a8767/scipy-1.14.1-cp312-cp312-macosx_14_0_arm64.whl", hash = "sha256:2843f2d527d9eebec9a43e6b406fb7266f3af25a751aa91d62ff416f54170bc5", size = 23148375 },
+    { url = "https://files.pythonhosted.org/packages/f6/32/3a6dedd51d68eb7b8e7dc7947d5d841bcb699f1bf4463639554986f4d782/scipy-1.14.1-cp312-cp312-macosx_14_0_x86_64.whl", hash = "sha256:eb58ca0abd96911932f688528977858681a59d61a7ce908ffd355957f7025cfc", size = 25578573 },
+    { url = "https://files.pythonhosted.org/packages/f0/5a/efa92a58dc3a2898705f1dc9dbaf390ca7d4fba26d6ab8cfffb0c72f656f/scipy-1.14.1-cp312-cp312-manylinux_2_17_aarch64.manylinux2014_aarch64.whl", hash = "sha256:30ac8812c1d2aab7131a79ba62933a2a76f582d5dbbc695192453dae67ad6310", size = 35319299 },
+    { url = "https://files.pythonhosted.org/packages/8e/ee/8a26858ca517e9c64f84b4c7734b89bda8e63bec85c3d2f432d225bb1886/scipy-1.14.1-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl", hash = "sha256:8f9ea80f2e65bdaa0b7627fb00cbeb2daf163caa015e59b7516395fe3bd1e066", size = 40849331 },
+    { url = "https://files.pythonhosted.org/packages/a5/cd/06f72bc9187840f1c99e1a8750aad4216fc7dfdd7df46e6280add14b4822/scipy-1.14.1-cp312-cp312-musllinux_1_2_x86_64.whl", hash = "sha256:edaf02b82cd7639db00dbff629995ef185c8df4c3ffa71a5562a595765a06ce1", size = 42544049 },
+    { url = "https://files.pythonhosted.org/packages/aa/7d/43ab67228ef98c6b5dd42ab386eae2d7877036970a0d7e3dd3eb47a0d530/scipy-1.14.1-cp312-cp312-win_amd64.whl", hash = "sha256:2ff38e22128e6c03ff73b6bb0f85f897d2362f8c052e3b8ad00532198fbdae3f", size = 44521212 },
+    { url = "https://files.pythonhosted.org/packages/50/ef/ac98346db016ff18a6ad7626a35808f37074d25796fd0234c2bb0ed1e054/scipy-1.14.1-cp313-cp313-macosx_10_13_x86_64.whl", hash = "sha256:1729560c906963fc8389f6aac023739ff3983e727b1a4d87696b7bf108316a79", size = 39091068 },
+    { url = "https://files.pythonhosted.org/packages/b9/cc/70948fe9f393b911b4251e96b55bbdeaa8cca41f37c26fd1df0232933b9e/scipy-1.14.1-cp313-cp313-macosx_12_0_arm64.whl", hash = "sha256:4079b90df244709e675cdc8b93bfd8a395d59af40b72e339c2287c91860deb8e", size = 29875417 },
+    { url = "https://files.pythonhosted.org/packages/3b/2e/35f549b7d231c1c9f9639f9ef49b815d816bf54dd050da5da1c11517a218/scipy-1.14.1-cp313-cp313-macosx_14_0_arm64.whl", hash = "sha256:e0cf28db0f24a38b2a0ca33a85a54852586e43cf6fd876365c86e0657cfe7d73", size = 23084508 },
+    { url = "https://files.pythonhosted.org/packages/3f/d6/b028e3f3e59fae61fb8c0f450db732c43dd1d836223a589a8be9f6377203/scipy-1.14.1-cp313-cp313-macosx_14_0_x86_64.whl", hash = "sha256:0c2f95de3b04e26f5f3ad5bb05e74ba7f68b837133a4492414b3afd79dfe540e", size = 25503364 },
+    { url = "https://files.pythonhosted.org/packages/a7/2f/6c142b352ac15967744d62b165537a965e95d557085db4beab2a11f7943b/scipy-1.14.1-cp313-cp313-manylinux_2_17_aarch64.manylinux2014_aarch64.whl", hash = "sha256:b99722ea48b7ea25e8e015e8341ae74624f72e5f21fc2abd45f3a93266de4c5d", size = 35292639 },
+    { url = "https://files.pythonhosted.org/packages/56/46/2449e6e51e0d7c3575f289f6acb7f828938eaab8874dbccfeb0cd2b71a27/scipy-1.14.1-cp313-cp313-manylinux_2_17_x86_64.manylinux2014_x86_64.whl", hash = "sha256:5149e3fd2d686e42144a093b206aef01932a0059c2a33ddfa67f5f035bdfe13e", size = 40798288 },
+    { url = "https://files.pythonhosted.org/packages/32/cd/9d86f7ed7f4497c9fd3e39f8918dd93d9f647ba80d7e34e4946c0c2d1a7c/scipy-1.14.1-cp313-cp313-musllinux_1_2_x86_64.whl", hash = "sha256:e4f5a7c49323533f9103d4dacf4e4f07078f360743dec7f7596949149efeec06", size = 42524647 },
+    { url = "https://files.pythonhosted.org/packages/f5/1b/6ee032251bf4cdb0cc50059374e86a9f076308c1512b61c4e003e241efb7/scipy-1.14.1-cp313-cp313-win_amd64.whl", hash = "sha256:baff393942b550823bfce952bb62270ee17504d02a1801d7fd0719534dfb9c84", size = 44469524 },
+]
+
 [[package]]
 name = "selenium"
 version = "4.24.0"