WebSurfer Updated (Selenium, Playwright, and support for many filetypes) (#1929)

* Feat/headless browser (retargeted) (#1832)

* Add headless browser to the WebSurferAgent, closes #1481

* replace soup.get_text() with markdownify.MarkdownConverter().convert_soup(soup)

* import HeadlessChromeBrowser

* implicitly wait for 10s

* inicrease max. wait time to 99s

* fix: trim trailing whitespace

* test: fix headless tests

* better bing query search

* docs: add example 3 for headless option

---------

Co-authored-by: Vijay Ramesh <vijay@regrello.com>

* Handle missing Selenium package.

* Added browser_chat.py example to simplify testing.

* Based browser on mdconvert. (#1847)

* Based browser on mdconvert.

* Updated web_surfer.

* Renamed HeadlessChromeBrowser to SeleniumChromeBrowser

* Added an initial POC with Playwright.

* Separated Bing search into it's own utility module.

* Simple browser now uses Bing tools.

* Updated Playwright browser to inherit from SimpleTextBrowser

* Got Selenium working too.

* Renamed classes and files for consistency.

* Added more instructions.

* Initial work to support other search providers.

* Added some basic behavior when the BING_API_KEY is missing.

* Cleaned up some search results.

* Moved to using the request.Sessions object. Moved Bing SERP paring to mdconvert to be more broadly useful.

* Added backward compatibility to WebSurferAgent

* Selenium and Playwright now grab the whole DOM, not jus the body, allowing the converters access to metadata.

* Fixed printing of page titles in Playwright.

* Moved installation of WebSurfer dependencies to contrib-tests.yml

* Fixing pre-commit issues.

* Reverting conversable_agent, which should not have been changed in prior commit.

* Added RequestMarkdownBrowser tests.

* Fixed a bug with Bing search, and added search test cases.

* Added tests for Bing search.

* Added tests for md_convert

* Added test files.

* Added missing pptx.

* Added more tests for WebSurfer coverage.

* Fixed guard on requests_markdown_browser test.

* Updated test coverage for mdconvert.

* Fix brwser_utils tests.

* Removed image test from browser, since exiftool isn't installed on test machine.

* Removed image test from browser, since exiftool isn't installed on test machine.

* Disable Selenium GPU and sandbox to ensure it runs headless in Docker.

* Added option for Bing API results to be interleaved (as Bing specifies), or presented in a categorized list (Web, News, Videos), etc

* Print more details when requests exceptions are thrown.

* Added additional documentation to markdown_search

* Added documentation to the selenium_markdown_browser.

* Added documentation to playwright_markdown_browser.py

* Added documentation to requests_markdown_browser

* Added documentation to mdconvert.py

* Updated agentchat_surfer notebook.

* Update .github/workflows/contrib-tests.yml

Co-authored-by: Davor Runje <davor@airt.ai>

* Merge main. Resolve conflicts.

* Resolve pre-commit checks.

* Removed offending LFS file.

* Re-added offending LFS file.

* Fixed browser_utils tests.

* Fixed style errors.

---------

Co-authored-by: Asapanna Rakesh <45640029+INF800@users.noreply.github.com>
Co-authored-by: Vijay Ramesh <vijay@regrello.com>
Co-authored-by: Eric Zhu <ekzhu@users.noreply.github.com>
Co-authored-by: Davor Runje <davor@airt.ai>
This commit is contained in:
afourney 2024-09-25 15:17:42 -07:00 committed by GitHub
parent 2e1f788293
commit 0d5163b78a
No known key found for this signature in database
GPG Key ID: B5690EEEBB952194
26 changed files with 5456 additions and 592 deletions

View File

@ -134,6 +134,9 @@ jobs:
- name: Install packages and dependencies for RetrieveChat
run: |
pip install -e .[retrievechat]
- name: Install packages and dependencies for WebSurfer and browser_utils
run: |
pip install -e .[test,websurfer]
- name: Set AUTOGEN_USE_DOCKER based on OS
shell: bash
run: |
@ -275,7 +278,7 @@ jobs:
fi
- name: Coverage
run: |
pytest test/test_browser_utils.py test/agentchat/contrib/test_web_surfer.py --skip-openai
pytest test/browser_utils test/agentchat/contrib/test_web_surfer.py --skip-openai
- name: Upload coverage to Codecov
uses: codecov/codecov-action@v3
with:

View File

@ -46,7 +46,8 @@ repos:
website/docs/tutorial/code-executors.ipynb |
website/docs/topics/code-execution/custom-executor.ipynb |
website/docs/topics/non-openai-models/cloud-gemini.ipynb |
notebook/.*
notebook/.* |
test/browser_utils/test_files/.*
)$
# See https://jaredkhan.com/blog/mypy-pre-commit
- repo: local

View File

@ -1,15 +1,13 @@
import copy
import json
import logging
import re
from dataclasses import dataclass
from datetime import datetime
from typing import Any, Callable, Dict, List, Literal, Optional, Tuple, Union
from typing_extensions import Annotated
from ... import Agent, AssistantAgent, ConversableAgent, GroupChat, GroupChatManager, OpenAIWrapper, UserProxyAgent
from ...browser_utils import SimpleTextBrowser
from ...browser_utils import AbstractMarkdownBrowser, BingMarkdownSearch, RequestsMarkdownBrowser
from ...code_utils import content_str
from ...oai.openai_utils import filter_config
from ...token_count_utils import count_token, get_max_token_limit
@ -20,12 +18,9 @@ logger = logging.getLogger(__name__)
class WebSurferAgent(ConversableAgent):
"""(In preview) An agent that acts as a basic web surfer that can search the web and visit web pages."""
DEFAULT_PROMPT = (
"You are a helpful AI assistant with access to a web browser (via the provided functions). In fact, YOU ARE THE ONLY MEMBER OF YOUR PARTY WITH ACCESS TO A WEB BROWSER, so please help out where you can by performing web searches, navigating pages, and reporting what you find. Today's date is "
+ datetime.now().date().isoformat()
)
DEFAULT_PROMPT = "You are a helpful AI assistant with access to a web browser (via the provided functions). In fact, YOU ARE THE ONLY MEMBER OF YOUR PARTY WITH ACCESS TO A WEB BROWSER, so please help out where you can by performing web searches, navigating pages, and reporting what you find."
DEFAULT_DESCRIPTION = "A helpful assistant with access to a web browser. Ask them to perform web searches, open pages, navigate to Wikipedia, answer questions from pages, and or generate summaries."
DEFAULT_DESCRIPTION = "A helpful assistant with access to a web browser. Ask them to perform web searches, open pages, navigate to Wikipedia, download files, etc. Once on a desired page, ask them to answer questions by reading the page, generate summaries, find specific words or phrases on the page (ctrl+f), or even just scroll up or down in the viewport."
def __init__(
self,
@ -40,7 +35,8 @@ class WebSurferAgent(ConversableAgent):
llm_config: Optional[Union[Dict, Literal[False]]] = None,
summarizer_llm_config: Optional[Union[Dict, Literal[False]]] = None,
default_auto_reply: Optional[Union[str, Dict, None]] = "",
browser_config: Optional[Union[Dict, None]] = None,
browser_config: Optional[Union[Dict, None]] = None, # Deprecated
browser: Optional[Union[AbstractMarkdownBrowser, None]] = None,
**kwargs,
):
super().__init__(
@ -60,11 +56,39 @@ class WebSurferAgent(ConversableAgent):
self._create_summarizer_client(summarizer_llm_config, llm_config)
# Create the browser
self.browser = SimpleTextBrowser(**(browser_config if browser_config else {}))
if browser_config is not None:
if browser is not None:
raise ValueError(
"WebSurferAgent cannot accept both a 'browser_config' (deprecated) parameter and 'browser' parameter at the same time. Use only one or the other."
)
inner_llm_config = copy.deepcopy(llm_config)
# Print a warning
logger.warning(
"Warning: the parameter 'browser_config' in WebSurferAgent.__init__() is deprecated. Use 'browser' instead."
)
# Update the settings to the new format
_bconfig = {}
_bconfig.update(browser_config)
if "bing_api_key" in _bconfig:
_bconfig["search_engine"] = BingMarkdownSearch(
bing_api_key=_bconfig["bing_api_key"], interleave_results=False
)
del _bconfig["bing_api_key"]
else:
_bconfig["search_engine"] = BingMarkdownSearch()
if "request_kwargs" in _bconfig:
_bconfig["requests_get_kwargs"] = _bconfig["request_kwargs"]
del _bconfig["request_kwargs"]
self.browser = RequestsMarkdownBrowser(**_bconfig)
else:
self.browser = browser
# Set up the inner monologue
inner_llm_config = copy.deepcopy(llm_config)
self._assistant = AssistantAgent(
self.name + "_inner_assistant",
system_message=system_message, # type: ignore[arg-type]
@ -130,6 +154,7 @@ class WebSurferAgent(ConversableAgent):
total_pages = len(self.browser.viewport_pages)
header += f"Viewport position: Showing page {current_page+1} of {total_pages}.\n"
return (header, self.browser.viewport)
@self._user_proxy.register_for_execution()
@ -138,7 +163,7 @@ class WebSurferAgent(ConversableAgent):
description="Perform an INFORMATIONAL web search query then return the search results.",
)
def _informational_search(query: Annotated[str, "The informational web search query to perform."]) -> str:
self.browser.visit_page(f"bing: {query}")
self.browser.visit_page(f"search: {query}")
header, content = _browser_state()
return header.strip() + "\n=======================\n" + content
@ -148,9 +173,9 @@ class WebSurferAgent(ConversableAgent):
description="Perform a NAVIGATIONAL web search query then immediately navigate to the top result. Useful, for example, to navigate to a particular Wikipedia article or other known destination. Equivalent to Google's \"I'm Feeling Lucky\" button.",
)
def _navigational_search(query: Annotated[str, "The navigational web search query to perform."]) -> str:
self.browser.visit_page(f"bing: {query}")
self.browser.visit_page(f"search: {query}")
# Extract the first linl
# Extract the first link
m = re.search(r"\[.*?\]\((http.*?)\)", self.browser.page_content)
if m:
self.browser.visit_page(m.group(1))
@ -168,6 +193,15 @@ class WebSurferAgent(ConversableAgent):
header, content = _browser_state()
return header.strip() + "\n=======================\n" + content
@self._user_proxy.register_for_execution()
@self._assistant.register_for_llm(
name="download_file", description="Download a file at a given URL and, if possible, return its text."
)
def _download_file(url: Annotated[str, "The relative or absolute url of the file to be downloaded."]) -> str:
self.browser.visit_page(url)
header, content = _browser_state()
return header.strip() + "\n=======================\n" + content
@self._user_proxy.register_for_execution()
@self._assistant.register_for_llm(
name="page_up",
@ -188,14 +222,51 @@ class WebSurferAgent(ConversableAgent):
header, content = _browser_state()
return header.strip() + "\n=======================\n" + content
@self._user_proxy.register_for_execution()
@self._assistant.register_for_llm(
name="find_on_page_ctrl_f",
description="Scroll the viewport to the first occurrence of the search string. This is equivalent to Ctrl+F.",
)
def _find_on_page_ctrl_f(
search_string: Annotated[
str, "The string to search for on the page. This search string supports wildcards like '*'"
]
) -> str:
find_result = self.browser.find_on_page(search_string)
header, content = _browser_state()
if find_result is None:
return (
header.strip()
+ "\n=======================\nThe search string '"
+ search_string
+ "' was not found on this page."
)
else:
return header.strip() + "\n=======================\n" + content
@self._user_proxy.register_for_execution()
@self._assistant.register_for_llm(
name="find_next",
description="Scroll the viewport to next occurrence of the search string.",
)
def _find_next() -> str:
find_result = self.browser.find_next()
header, content = _browser_state()
if find_result is None:
return header.strip() + "\n=======================\nThe search string was not found on this page."
else:
return header.strip() + "\n=======================\n" + content
if self.summarization_client is not None:
@self._user_proxy.register_for_execution()
@self._assistant.register_for_llm(
name="answer_from_page",
name="read_page_and_answer",
description="Uses AI to read the page and directly answer a given question based on the content.",
)
def _answer_from_page(
def _read_page_and_answer(
question: Annotated[Optional[str], "The question to directly answer."],
url: Annotated[Optional[str], "[Optional] The url of the page. (Defaults to the current page)"] = None,
) -> str:
@ -256,7 +327,7 @@ class WebSurferAgent(ConversableAgent):
Optional[str], "[Optional] The url of the page to summarize. (Defaults to current page)"
] = None,
) -> str:
return _answer_from_page(url=url, question=None)
return _read_page_and_answer(url=url, question=None)
def generate_surfer_reply(
self,

View File

@ -1,285 +0,0 @@
import io
import json
import mimetypes
import os
import re
import uuid
from typing import Any, Dict, List, Optional, Tuple, Union
from urllib.parse import urljoin, urlparse
import markdownify
import requests
from bs4 import BeautifulSoup
# Optional PDF support
IS_PDF_CAPABLE = False
try:
import pdfminer
import pdfminer.high_level
IS_PDF_CAPABLE = True
except ModuleNotFoundError:
pass
# Other optional dependencies
try:
import pathvalidate
except ModuleNotFoundError:
pass
class SimpleTextBrowser:
"""(In preview) An extremely simple text-based web browser comparable to Lynx. Suitable for Agentic use."""
def __init__(
self,
start_page: Optional[str] = None,
viewport_size: Optional[int] = 1024 * 8,
downloads_folder: Optional[Union[str, None]] = None,
bing_base_url: str = "https://api.bing.microsoft.com/v7.0/search",
bing_api_key: Optional[Union[str, None]] = None,
request_kwargs: Optional[Union[Dict[str, Any], None]] = None,
):
self.start_page: str = start_page if start_page else "about:blank"
self.viewport_size = viewport_size # Applies only to the standard uri types
self.downloads_folder = downloads_folder
self.history: List[str] = list()
self.page_title: Optional[str] = None
self.viewport_current_page = 0
self.viewport_pages: List[Tuple[int, int]] = list()
self.set_address(self.start_page)
self.bing_base_url = bing_base_url
self.bing_api_key = bing_api_key
self.request_kwargs = request_kwargs
self._page_content = ""
@property
def address(self) -> str:
"""Return the address of the current page."""
return self.history[-1]
def set_address(self, uri_or_path: str) -> None:
self.history.append(uri_or_path)
# Handle special URIs
if uri_or_path == "about:blank":
self._set_page_content("")
elif uri_or_path.startswith("bing:"):
self._bing_search(uri_or_path[len("bing:") :].strip())
else:
if not uri_or_path.startswith("http:") and not uri_or_path.startswith("https:"):
uri_or_path = urljoin(self.address, uri_or_path)
self.history[-1] = uri_or_path # Update the address with the fully-qualified path
self._fetch_page(uri_or_path)
self.viewport_current_page = 0
@property
def viewport(self) -> str:
"""Return the content of the current viewport."""
bounds = self.viewport_pages[self.viewport_current_page]
return self.page_content[bounds[0] : bounds[1]]
@property
def page_content(self) -> str:
"""Return the full contents of the current page."""
return self._page_content
def _set_page_content(self, content: str) -> None:
"""Sets the text content of the current page."""
self._page_content = content
self._split_pages()
if self.viewport_current_page >= len(self.viewport_pages):
self.viewport_current_page = len(self.viewport_pages) - 1
def page_down(self) -> None:
self.viewport_current_page = min(self.viewport_current_page + 1, len(self.viewport_pages) - 1)
def page_up(self) -> None:
self.viewport_current_page = max(self.viewport_current_page - 1, 0)
def visit_page(self, path_or_uri: str) -> str:
"""Update the address, visit the page, and return the content of the viewport."""
self.set_address(path_or_uri)
return self.viewport
def _split_pages(self) -> None:
# Split only regular pages
if not self.address.startswith("http:") and not self.address.startswith("https:"):
self.viewport_pages = [(0, len(self._page_content))]
return
# Handle empty pages
if len(self._page_content) == 0:
self.viewport_pages = [(0, 0)]
return
# Break the viewport into pages
self.viewport_pages = []
start_idx = 0
while start_idx < len(self._page_content):
end_idx = min(start_idx + self.viewport_size, len(self._page_content)) # type: ignore[operator]
# Adjust to end on a space
while end_idx < len(self._page_content) and self._page_content[end_idx - 1] not in [" ", "\t", "\r", "\n"]:
end_idx += 1
self.viewport_pages.append((start_idx, end_idx))
start_idx = end_idx
def _bing_api_call(self, query: str) -> Dict[str, Dict[str, List[Dict[str, Union[str, Dict[str, str]]]]]]:
# Make sure the key was set
if self.bing_api_key is None:
raise ValueError("Missing Bing API key.")
# Prepare the request parameters
request_kwargs = self.request_kwargs.copy() if self.request_kwargs is not None else {}
if "headers" not in request_kwargs:
request_kwargs["headers"] = {}
request_kwargs["headers"]["Ocp-Apim-Subscription-Key"] = self.bing_api_key
if "params" not in request_kwargs:
request_kwargs["params"] = {}
request_kwargs["params"]["q"] = query
request_kwargs["params"]["textDecorations"] = False
request_kwargs["params"]["textFormat"] = "raw"
request_kwargs["stream"] = False
# Make the request
response = requests.get(self.bing_base_url, **request_kwargs)
response.raise_for_status()
results = response.json()
return results # type: ignore[no-any-return]
def _bing_search(self, query: str) -> None:
results = self._bing_api_call(query)
web_snippets: List[str] = list()
idx = 0
for page in results["webPages"]["value"]:
idx += 1
web_snippets.append(f"{idx}. [{page['name']}]({page['url']})\n{page['snippet']}")
if "deepLinks" in page:
for dl in page["deepLinks"]:
idx += 1
web_snippets.append(
f"{idx}. [{dl['name']}]({dl['url']})\n{dl['snippet'] if 'snippet' in dl else ''}" # type: ignore[index]
)
news_snippets = list()
if "news" in results:
for page in results["news"]["value"]:
idx += 1
news_snippets.append(f"{idx}. [{page['name']}]({page['url']})\n{page['description']}")
self.page_title = f"{query} - Search"
content = (
f"A Bing search for '{query}' found {len(web_snippets) + len(news_snippets)} results:\n\n## Web Results\n"
+ "\n\n".join(web_snippets)
)
if len(news_snippets) > 0:
content += "\n\n## News Results:\n" + "\n\n".join(news_snippets)
self._set_page_content(content)
def _fetch_page(self, url: str) -> None:
try:
# Prepare the request parameters
request_kwargs = self.request_kwargs.copy() if self.request_kwargs is not None else {}
request_kwargs["stream"] = True
# Send a HTTP request to the URL
response = requests.get(url, **request_kwargs)
response.raise_for_status()
# If the HTTP request returns a status code 200, proceed
if response.status_code == 200:
content_type = response.headers.get("content-type", "")
for ct in ["text/html", "text/plain", "application/pdf"]:
if ct in content_type.lower():
content_type = ct
break
if content_type == "text/html":
# Get the content of the response
html = ""
for chunk in response.iter_content(chunk_size=512, decode_unicode=True):
html += chunk
soup = BeautifulSoup(html, "html.parser")
# Remove javascript and style blocks
for script in soup(["script", "style"]):
script.extract()
# Convert to markdown -- Wikipedia gets special attention to get a clean version of the page
if url.startswith("https://en.wikipedia.org/"):
body_elm = soup.find("div", {"id": "mw-content-text"})
title_elm = soup.find("span", {"class": "mw-page-title-main"})
if body_elm:
# What's the title
main_title = soup.title.string
if title_elm and len(title_elm) > 0:
main_title = title_elm.string
webpage_text = (
"# " + main_title + "\n\n" + markdownify.MarkdownConverter().convert_soup(body_elm)
)
else:
webpage_text = markdownify.MarkdownConverter().convert_soup(soup)
else:
webpage_text = markdownify.MarkdownConverter().convert_soup(soup)
# Convert newlines
webpage_text = re.sub(r"\r\n", "\n", webpage_text)
# Remove excessive blank lines
self.page_title = soup.title.string
self._set_page_content(re.sub(r"\n{2,}", "\n\n", webpage_text).strip())
elif content_type == "text/plain":
# Get the content of the response
plain_text = ""
for chunk in response.iter_content(chunk_size=512, decode_unicode=True):
plain_text += chunk
self.page_title = None
self._set_page_content(plain_text)
elif IS_PDF_CAPABLE and content_type == "application/pdf":
pdf_data = io.BytesIO(response.raw.read())
self.page_title = None
self._set_page_content(pdfminer.high_level.extract_text(pdf_data))
elif self.downloads_folder is not None:
# Try producing a safe filename
fname = None
try:
fname = pathvalidate.sanitize_filename(os.path.basename(urlparse(url).path)).strip()
except NameError:
pass
# No suitable name, so make one
if fname is None:
extension = mimetypes.guess_extension(content_type)
if extension is None:
extension = ".download"
fname = str(uuid.uuid4()) + extension
# Open a file for writing
download_path = os.path.abspath(os.path.join(self.downloads_folder, fname))
with open(download_path, "wb") as fh:
for chunk in response.iter_content(chunk_size=512):
fh.write(chunk)
# Return a page describing what just happened
self.page_title = "Download complete."
self._set_page_content(f"Downloaded '{url}' to '{download_path}'.")
else:
self.page_title = f"Error - Unsupported Content-Type '{content_type}'"
self._set_page_content(self.page_title)
else:
self.page_title = "Error"
self._set_page_content("Failed to retrieve " + url)
except requests.exceptions.RequestException as e:
self.page_title = "Error"
self._set_page_content(str(e))

View File

@ -0,0 +1,19 @@
from .abstract_markdown_browser import AbstractMarkdownBrowser
from .markdown_search import AbstractMarkdownSearch, BingMarkdownSearch
from .mdconvert import DocumentConverterResult, FileConversionException, MarkdownConverter, UnsupportedFormatException
from .playwright_markdown_browser import PlaywrightMarkdownBrowser
from .requests_markdown_browser import RequestsMarkdownBrowser
from .selenium_markdown_browser import SeleniumMarkdownBrowser
__all__ = (
"AbstractMarkdownBrowser",
"RequestsMarkdownBrowser",
"SeleniumMarkdownBrowser",
"PlaywrightMarkdownBrowser",
"AbstractMarkdownSearch",
"BingMarkdownSearch",
"MarkdownConverter",
"UnsupportedFormatException",
"FileConversionException",
"DocumentConverterResult",
)

View File

@ -0,0 +1,64 @@
from abc import ABC, abstractmethod
from typing import Dict, Optional, Union
class AbstractMarkdownBrowser(ABC):
"""
An abstract class for a Markdown web browser.
All MarkdownBrowers work by:
(1) fetching a web page by URL (via requests, Selenium, Playwright, etc.)
(2) converting the page's HTML or DOM to Markdown
(3) operating on the Markdown
Such browsers are simple, and suitable for read-only agentic use.
They cannot be used to interact with complex web applications.
"""
@abstractmethod
def __init__(self):
pass
@property
@abstractmethod
def address(self) -> str:
pass
@abstractmethod
def set_address(self, uri_or_path):
pass
@property
@abstractmethod
def viewport(self) -> str:
pass
@property
@abstractmethod
def page_content(self) -> str:
pass
@abstractmethod
def page_down(self):
pass
@abstractmethod
def page_up(self):
pass
@abstractmethod
def visit_page(self, path_or_uri):
pass
@abstractmethod
def open_local_file(self, local_path):
pass
@abstractmethod
def find_on_page(self, query: str):
pass
@abstractmethod
def find_next(self):
pass

View File

@ -0,0 +1,290 @@
# ruff: noqa: E722
import json
import logging
import os
import re
from abc import ABC, abstractmethod
from typing import Any, Dict, List, Optional, Tuple, Union
from urllib.parse import parse_qs, quote, quote_plus, unquote, urlparse, urlunparse
import requests
from bs4 import BeautifulSoup
from .mdconvert import MarkdownConverter
logger = logging.getLogger(__name__)
class AbstractMarkdownSearch(ABC):
"""
An abstract class for providing search capabilities to a Markdown browser.
"""
@abstractmethod
def __init__(self):
pass
@property
@abstractmethod
def search(self, query) -> str:
pass
class BingMarkdownSearch(AbstractMarkdownSearch):
"""
Provides Bing web search capabilities to Markdown browsers.
"""
def __init__(self, bing_api_key: str = None, interleave_results: bool = True):
"""
Perform a Bing web search, and return the results formatted in Markdown.
Args:
bing_api_key: key for the Bing search API. If omitted, an attempt is made to read the key from the BING_API_KEY environment variable. If no key is found, BingMarkdownSearch will print a warning, and will fall back to visiting and scraping the live Bing results page. Scraping is objectively worse than using the API, and thus is not recommended.
interleave_results: When using the Bing API, results are returned based on category (web, news, videos, etc.), along with instructions for how they should be interleaved on the page. When `interleave` is set to True, these interleaving instructions are followed, and a single results list is returned by BingMarkdownSearch. When `interleave` is set to false, results are separated by category, and no interleaving is done.
"""
super().__init__()
self._mdconvert = MarkdownConverter()
self._interleave_results = interleave_results
if bing_api_key is None or bing_api_key.strip() == "":
self._bing_api_key = os.environ.get("BING_API_KEY")
else:
self._bing_api_key = bing_api_key
if self._bing_api_key is None:
if not self._interleave_results:
raise ValueError(
"No Bing API key was provided. This is incompatible with setting `interleave_results` to False. Please provide a key, or set `interleave_results` to True."
)
logger.warning(
"Warning: No Bing API key provided. BingMarkdownSearch will submit an HTTP request to the Bing landing page, but results may be missing or low quality. To resolve this warning, provide a Bing API key by setting the BING_API_KEY environment variable, or using the 'bing_api_key' parameter in by BingMarkdownSearch's constructor. Bing API keys can be obtained via https://www.microsoft.com/en-us/bing/apis/bing-web-search-api\n"
)
def search(self, query: str):
"""Search Bing and return the results formatted in Markdown. If a Bing API key is available, the API is used to perform the search. If no API key is available, the search is performed by submitting an HTTPs GET request directly to Bing. Searches performed with the API are much higher quality, and are more reliable.
Args:
query: The search query to issue
Returns:
A Markdown rendering of the search results.
"""
if self._bing_api_key is None:
return self._fallback_search(query)
else:
return self._api_search(query)
def _api_search(self, query: str):
"""Search Bing using the API, and return the results formatted in Markdown.
Args:
query: The search query to issue
Returns:
A Markdown rendering of the search results.
"""
results = self._bing_api_call(query)
snippets = dict()
def _processFacts(elm):
facts = list()
for e in elm:
k = e["label"]["text"]
v = " ".join(item["text"] for item in e["items"])
facts.append(f"{k}: {v}")
return "\n".join(facts)
# Web pages
# __POS__ is a placeholder for the final ranking position, added at the end
web_snippets = list()
if "webPages" in results:
for page in results["webPages"]["value"]:
snippet = f"__POS__. {self._markdown_link(page['name'], page['url'])}\n{page['snippet']}"
if "richFacts" in page:
snippet += "\n" + _processFacts(page["richFacts"])
if "mentions" in page:
snippet += "\nMentions: " + ", ".join(e["name"] for e in page["mentions"])
if page["id"] not in snippets:
snippets[page["id"]] = list()
snippets[page["id"]].append(snippet)
web_snippets.append(snippet)
if "deepLinks" in page:
for dl in page["deepLinks"]:
deep_snippet = f"__POS__. {self._markdown_link(dl['name'], dl['url'])}\n{dl['snippet'] if 'snippet' in dl else ''}"
snippets[page["id"]].append(deep_snippet)
web_snippets.append(deep_snippet)
# News results
news_snippets = list()
if "news" in results:
for page in results["news"]["value"]:
snippet = (
f"__POS__. {self._markdown_link(page['name'], page['url'])}\n{page.get('description', '')}".strip()
)
if "datePublished" in page:
snippet += "\nDate published: " + page["datePublished"].split("T")[0]
if "richFacts" in page:
snippet += "\n" + _processFacts(page["richFacts"])
if "mentions" in page:
snippet += "\nMentions: " + ", ".join(e["name"] for e in page["mentions"])
news_snippets.append(snippet)
if len(news_snippets) > 0:
snippets[results["news"]["id"]] = news_snippets
# Videos
video_snippets = list()
if "videos" in results:
for page in results["videos"]["value"]:
if not page["contentUrl"].startswith("https://www.youtube.com/watch?v="):
continue
snippet = f"__POS__. {self._markdown_link(page['name'], page['contentUrl'])}\n{page.get('description', '')}".strip()
if "datePublished" in page:
snippet += "\nDate published: " + page["datePublished"].split("T")[0]
if "richFacts" in page:
snippet += "\n" + _processFacts(page["richFacts"])
if "mentions" in page:
snippet += "\nMentions: " + ", ".join(e["name"] for e in page["mentions"])
video_snippets.append(snippet)
if len(video_snippets) > 0:
snippets[results["videos"]["id"]] = video_snippets
# Related searches
related_searches = ""
if "relatedSearches" in results:
related_searches = "## Related Searches:\n"
for s in results["relatedSearches"]["value"]:
related_searches += "- " + s["text"] + "\n"
snippets[results["relatedSearches"]["id"]] = [related_searches.strip()]
idx = 0
content = ""
if self._interleave_results:
# Interleaved
for item in results["rankingResponse"]["mainline"]["items"]:
_id = item["value"]["id"]
if _id in snippets:
for s in snippets[_id]:
if "__POS__" in s:
idx += 1
content += s.replace("__POS__", str(idx)) + "\n\n"
else:
content += s + "\n\n"
else:
# Categorized
if len(web_snippets) > 0:
content += "## Web Results\n\n"
for s in web_snippets:
if "__POS__" in s:
idx += 1
content += s.replace("__POS__", str(idx)) + "\n\n"
else:
content += s + "\n\n"
if len(news_snippets) > 0:
content += "## News Results\n\n"
for s in news_snippets:
if "__POS__" in s:
idx += 1
content += s.replace("__POS__", str(idx)) + "\n\n"
else:
content += s + "\n\n"
if len(video_snippets) > 0:
content += "## Video Results\n\n"
for s in video_snippets:
if "__POS__" in s:
idx += 1
content += s.replace("__POS__", str(idx)) + "\n\n"
else:
content += s + "\n\n"
if len(related_searches) > 0:
content += related_searches
return f"## A Bing search for '{query}' found {idx} results:\n\n" + content.strip()
def _bing_api_call(self, query: str):
"""Make a Bing API call, and return a Python representation of the JSON response."
Args:
query: The search query to issue
Returns:
A Python representation of the Bing API's JSON response (as parsed by `json.loads()`).
"""
# Make sure the key was set
if not self._bing_api_key:
raise ValueError("Missing Bing API key.")
# Prepare the request parameters
request_kwargs = {}
request_kwargs["headers"] = {}
request_kwargs["headers"]["Ocp-Apim-Subscription-Key"] = self._bing_api_key
request_kwargs["params"] = {}
request_kwargs["params"]["q"] = query
request_kwargs["params"]["textDecorations"] = False
request_kwargs["params"]["textFormat"] = "raw"
request_kwargs["stream"] = False
# Make the request
response = requests.get("https://api.bing.microsoft.com/v7.0/search", **request_kwargs)
response.raise_for_status()
results = response.json()
return results
def _fallback_search(self, query: str):
"""When no Bing API key is provided, we issue a simple HTTPs GET call to the Bing landing page and convert it to Markdown.
Args:
query: The search query to issue
Returns:
The Bing search results page, converted to Markdown.
"""
user_agent = "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/119.0.0.0 Safari/537.36 Edg/119.0.0.0"
headers = {"User-Agent": user_agent}
url = f"https://www.bing.com/search?q={quote_plus(query)}&FORM=QBLH"
response = requests.get(url, headers=headers)
response.raise_for_status()
return self._mdconvert.convert_response(response).text_content
def _markdown_link(self, anchor: str, href: str):
"""Create a Markdown hyperlink, escaping the URLs as appropriate.
Args:
anchor: The anchor text of the hyperlink
href: The href destination of the hyperlink
Returns:
A correctly-formatted Markdown hyperlink
"""
try:
parsed_url = urlparse(href)
# URLs provided by Bing are sometimes only partially quoted, leaving in characters
# the conflict with Markdown. We unquote the URL, and then re-quote more completely
href = urlunparse(parsed_url._replace(path=quote(unquote(parsed_url.path))))
anchor = re.sub(r"[\[\]]", " ", anchor)
return f"[{anchor}]({href})"
except ValueError: # It's not clear if this ever gets thrown
return f"[{anchor}]({href})"

File diff suppressed because it is too large Load Diff

View File

@ -0,0 +1,113 @@
import io
import os
from typing import Any, Dict, Optional, Union
from urllib.parse import parse_qs, quote_plus, unquote, urljoin, urlparse
from .requests_markdown_browser import RequestsMarkdownBrowser
# Check if Playwright dependencies are installed
IS_PLAYWRIGHT_ENABLED = False
try:
from playwright._impl._errors import TimeoutError
from playwright.sync_api import sync_playwright
IS_PLAYWRIGHT_ENABLED = True
except ModuleNotFoundError:
pass
class PlaywrightMarkdownBrowser(RequestsMarkdownBrowser):
"""
(In preview) A Playwright and Chromium powered Markdown web browser.
PlaywrightMarkdownBrowser extends RequestsMarkdownBrowser, and replaces only the functionality of `visit_page(url)`.
"""
def __init__(self, launch_args: Dict[str, Any] = {}, **kwargs):
"""
Instantiate a new PlaywrightMarkdownBrowser.
Arguments:
**launch_args: Arguments passed to `playwright.chromium.launch`. See Playwright documentation for more details.
**kwargs: PlaywrightMarkdownBrowser passes these arguments to the RequestsMarkdownBrowser superclass. See RequestsMarkdownBrowser documentation for more details.
"""
super().__init__(**kwargs)
self._playwright = None
self._browser = None
self._page = None
# Raise an error if Playwright isn't available
if not IS_PLAYWRIGHT_ENABLED:
raise ModuleNotFoundError(
"No module named 'playwright'. Playwright can be installed via 'pip install playwright' or 'conda install playwright' depending on your environment.\n\nOnce installed, you must also install a browser via 'playwright install --with-deps chromium'"
)
# Create the playwright instance
self._playwright = sync_playwright().start()
self._browser = self._playwright.chromium.launch(**launch_args)
# Browser context
self._page = self._browser.new_page()
self.set_address(self.start_page)
def __del__(self):
"""
Close the Playwright session and browser when garbage-collected. Garbage collection may not always occur, or may happen at a later time. Call `close()` explicitly if you wish to free up resources used by Playwright or Chromium.
"""
self.close()
def close(self):
"""
Close the Playwright session and browser used by Playwright. The session cannot be reopened without instantiating a new PlaywrightMarkdownBrowser instance.
"""
if self._browser is not None:
self._browser.close()
self._browser = None
if self._playwright is not None:
self._playwright.stop()
self._playwright = None
def _fetch_page(self, url) -> None:
"""
Fetch a page. If the page is a regular HTTP page, use Playwright to gather the HTML. If the page is a download, or a local file, rely on superclass behavior.
"""
if url.startswith("file://"):
super()._fetch_page(url)
else:
try:
# Regular webpage
self._page.goto(url)
return self._process_page(url, self._page)
except Exception as e:
# Downloaded file
if self.downloads_folder and "net::ERR_ABORTED" in str(e):
with self._page.expect_download() as download_info:
try:
self._page.goto(url)
except Exception as e:
if "net::ERR_ABORTED" in str(e):
pass
else:
raise e
download = download_info.value
fname = os.path.join(self.downloads_folder, download.suggested_filename)
download.save_as(fname)
self._process_download(url, fname)
else:
raise e
def _process_page(self, url, page):
"""
Playwright fetched a regular HTTP page. Gather the document HTML, and convert it to Markdown.
"""
html = page.evaluate("document.documentElement.outerHTML;")
res = self._markdown_converter.convert_stream(io.StringIO(html), file_extension=".html", url=url)
self.page_title = page.title()
self._set_page_content(res.text_content)
def _process_download(self, url, path):
"""
Playwright downloaded a file. Convert it to Markdown.
"""
res = self._markdown_converter.convert_local(path, url=url)
self.page_title = res.title
self._set_page_content(res.text_content)

View File

@ -0,0 +1,426 @@
# ruff: noqa: E722
import datetime
import html
import io
import json
import mimetypes
import os
import pathlib
import re
import time
import traceback
import uuid
from typing import Any, Dict, List, Optional, Tuple, Union
from urllib.parse import parse_qs, unquote, urljoin, urlparse
from urllib.request import url2pathname
import pathvalidate
import requests
from .abstract_markdown_browser import AbstractMarkdownBrowser
from .markdown_search import AbstractMarkdownSearch, BingMarkdownSearch
from .mdconvert import FileConversionException, MarkdownConverter, UnsupportedFormatException
class RequestsMarkdownBrowser(AbstractMarkdownBrowser):
"""
(In preview) An extremely simple Python requests-powered Markdown web browser.
This browser cannot run JavaScript, compute CSS, etc. It simply fetches the HTML document, and converts it to Markdown.
See AbstractMarkdownBrowser for more details.
"""
def __init__(
self,
start_page: Optional[str] = None,
viewport_size: Optional[int] = 1024 * 8,
downloads_folder: Optional[Union[str, None]] = None,
search_engine: Optional[Union[AbstractMarkdownSearch, None]] = None,
markdown_converter: Optional[Union[MarkdownConverter, None]] = None,
requests_session: Optional[Union[requests.Session, None]] = None,
requests_get_kwargs: Optional[Union[Dict[str, Any], None]] = None,
):
"""
Instantiate a new RequestsMarkdownBrowser.
Arguments:
start_page: The page on which the browser starts (default: "about:blank")
viewport_size: Approximately how many *characters* fit in the viewport. Viewport dimensions are adjusted dynamically to avoid cutting off words (default: 8192).
downloads_folder: Path to where downloads are saved. If None, downloads are disabled. (default: None)
search_engine: An instance of MarkdownSearch, which handles web searches performed by this browser (default: a new `BingMarkdownSearch()` with default parameters)
markdown_converted: An instance of a MarkdownConverter used to convert HTML pages and downloads to Markdown (default: a new `MarkdownConerter()` with default parameters)
request_session: The session from which to issue requests (default: a new `requests.Session()` instance with default parameters)
request_get_kwargs: Extra parameters passed to evert `.get()` call made to requests.
"""
self.start_page: str = start_page if start_page else "about:blank"
self.viewport_size = viewport_size # Applies only to the standard uri types
self.downloads_folder = downloads_folder
self.history: List[Tuple[str, float]] = list()
self.page_title: Optional[str] = None
self.viewport_current_page = 0
self.viewport_pages: List[Tuple[int, int]] = list()
self.set_address(self.start_page)
self._page_content: str = ""
if search_engine is None:
self._search_engine = BingMarkdownSearch()
else:
self._search_engine = search_engine
if markdown_converter is None:
self._markdown_converter = MarkdownConverter()
else:
self._markdown_converter = markdown_converter
if requests_session is None:
self._requests_session = requests.Session()
else:
self._requests_session = requests_session
if requests_get_kwargs is None:
self._requests_get_kwargs = {}
else:
self._requests_get_kwargs = requests_get_kwargs
self._find_on_page_query: Union[str, None] = None
self._find_on_page_last_result: Union[int, None] = None # Location of the last result
@property
def address(self) -> str:
"""Return the address of the current page."""
return self.history[-1][0]
def set_address(self, uri_or_path: str) -> None:
"""Sets the address of the current page.
This will result in the page being fetched via the underlying requests session.
Arguments:
uri_or_path: The fully-qualified URI to fetch, or the path to fetch from the current location. If the URI protocol is `search:`, the remainder of the URI is interpreted as a search query, and a web search is performed. If the URI protocol is `file://`, the remainder of the URI is interpreted as a local absolute file path.
"""
# TODO: Handle anchors
self.history.append((uri_or_path, time.time()))
# Handle special URIs
if uri_or_path == "about:blank":
self._set_page_content("")
elif uri_or_path.startswith("search:"):
query = uri_or_path[len("search:") :].strip()
results = self._search_engine.search(query)
self.page_title = f"{query} - Search"
self._set_page_content(results, split_pages=False)
else:
if (
not uri_or_path.startswith("http:")
and not uri_or_path.startswith("https:")
and not uri_or_path.startswith("file:")
):
if len(self.history) > 1:
prior_address = self.history[-2][0]
uri_or_path = urljoin(prior_address, uri_or_path)
# Update the address with the fully-qualified path
self.history[-1] = (uri_or_path, self.history[-1][1])
self._fetch_page(uri_or_path)
self.viewport_current_page = 0
self.find_on_page_query = None
self.find_on_page_viewport = None
@property
def viewport(self) -> str:
"""Return the content of the current viewport."""
bounds = self.viewport_pages[self.viewport_current_page]
return self.page_content[bounds[0] : bounds[1]]
@property
def page_content(self) -> str:
"""Return the full contents of the current page."""
return self._page_content
def _set_page_content(self, content: str, split_pages=True) -> None:
"""Sets the text content of the current page."""
self._page_content = content
if split_pages:
self._split_pages()
else:
self.viewport_pages = [(0, len(self._page_content))]
if self.viewport_current_page >= len(self.viewport_pages):
self.viewport_current_page = len(self.viewport_pages) - 1
def page_down(self) -> None:
"""Move the viewport down one page, if possible."""
self.viewport_current_page = min(self.viewport_current_page + 1, len(self.viewport_pages) - 1)
def page_up(self) -> None:
"""Move the viewport up one page, if possible."""
self.viewport_current_page = max(self.viewport_current_page - 1, 0)
def find_on_page(self, query: str) -> Union[str, None]:
"""Searches for the query from the current viewport forward, looping back to the start if necessary."""
# Did we get here via a previous find_on_page search with the same query?
# If so, map to find_next
if query == self._find_on_page_query and self.viewport_current_page == self._find_on_page_last_result:
return self.find_next()
# Ok it's a new search start from the current viewport
self._find_on_page_query = query
viewport_match = self._find_next_viewport(query, self.viewport_current_page)
if viewport_match is None:
self._find_on_page_last_result = None
return None
else:
self.viewport_current_page = viewport_match
self._find_on_page_last_result = viewport_match
return self.viewport
def find_next(self) -> None:
"""Scroll to the next viewport that matches the query"""
if self._find_on_page_query is None:
return None
starting_viewport = self._find_on_page_last_result
if starting_viewport is None:
starting_viewport = 0
else:
starting_viewport += 1
if starting_viewport >= len(self.viewport_pages):
starting_viewport = 0
viewport_match = self._find_next_viewport(self._find_on_page_query, starting_viewport)
if viewport_match is None:
self._find_on_page_last_result = None
return None
else:
self.viewport_current_page = viewport_match
self._find_on_page_last_result = viewport_match
return self.viewport
def _find_next_viewport(self, query: str, starting_viewport: int) -> Union[int, None]:
"""Search for matches between the starting viewport looping when reaching the end."""
if query is None:
return None
# Normalize the query, and convert to a regular expression
nquery = re.sub(r"\*", "__STAR__", query)
nquery = " " + (" ".join(re.split(r"\W+", nquery))).strip() + " "
nquery = nquery.replace(" __STAR__ ", "__STAR__ ") # Merge isolated stars with prior word
nquery = nquery.replace("__STAR__", ".*").lower()
if nquery.strip() == "":
return None
idxs = list()
idxs.extend(range(starting_viewport, len(self.viewport_pages)))
idxs.extend(range(0, starting_viewport))
for i in idxs:
bounds = self.viewport_pages[i]
content = self.page_content[bounds[0] : bounds[1]]
# TODO: Remove markdown links and images
ncontent = " " + (" ".join(re.split(r"\W+", content))).strip().lower() + " "
if re.search(nquery, ncontent):
return i
return None
def visit_page(self, path_or_uri: str) -> str:
"""Update the address, visit the page, and return the content of the viewport."""
self.set_address(path_or_uri)
return self.viewport
def open_local_file(self, local_path: str) -> str:
"""Convert a local file path to a file:/// URI, update the address, visit the page, and return the contents of the viewport."""
full_path = os.path.abspath(os.path.expanduser(local_path))
self.set_address(pathlib.Path(full_path).as_uri())
return self.viewport
def _split_pages(self) -> None:
"""Split the page contents into pages that are approximately the viewport size. Small deviations are permitted to ensure words are not broken."""
# Handle empty pages
if len(self._page_content) == 0:
self.viewport_pages = [(0, 0)]
return
# Break the viewport into pages
self.viewport_pages = []
start_idx = 0
while start_idx < len(self._page_content):
end_idx = min(start_idx + self.viewport_size, len(self._page_content)) # type: ignore[operator]
# Adjust to end on a space
while end_idx < len(self._page_content) and self._page_content[end_idx - 1] not in [" ", "\t", "\r", "\n"]:
end_idx += 1
self.viewport_pages.append((start_idx, end_idx))
start_idx = end_idx
def _fetch_page(
self, url: str, session: requests.Session = None, requests_get_kwargs: Dict[str, Any] = None
) -> None:
"""Fetch a page using the requests library. Then convert it to Markdown, and set `page_content` (which splits the content into pages as necessary.
Arguments:
url: The fully-qualified URL to fetch.
session: Used to override the session used for this request. If None, use `self._requests_session` as usual.
requests_get_kwargs: Extra arguments passes to `requests.Session.get`.
"""
download_path = ""
response = None
try:
if url.startswith("file://"):
download_path = os.path.normcase(os.path.normpath(unquote(url[7:])))
if os.path.isdir(download_path):
res = self._markdown_converter.convert_stream(
io.StringIO(self._fetch_local_dir(download_path)), file_extension=".html"
)
self.page_title = res.title
self._set_page_content(
res.text_content, split_pages=False
) # Like search results, don't split directory listings
else:
res = self._markdown_converter.convert_local(download_path)
self.page_title = res.title
self._set_page_content(res.text_content)
else:
# Send a HTTP request to the URL
if session is None:
session = self._requests_session
_get_kwargs = {}
_get_kwargs.update(self._requests_get_kwargs)
if requests_get_kwargs is not None:
_get_kwargs.update(requests_get_kwargs)
_get_kwargs["stream"] = True
response = session.get(url, **_get_kwargs)
response.raise_for_status()
# If the HTTP request was successful
content_type = response.headers.get("content-type", "")
# Text or HTML
if "text/" in content_type.lower():
res = self._markdown_converter.convert_response(response)
self.page_title = res.title
self._set_page_content(res.text_content)
# A download
else:
# Try producing a safe filename
fname = None
download_path = None
try:
fname = pathvalidate.sanitize_filename(os.path.basename(urlparse(url).path)).strip()
download_path = os.path.abspath(os.path.join(self.downloads_folder, fname))
suffix = 0
while os.path.exists(download_path) and suffix < 1000:
suffix += 1
base, ext = os.path.splitext(fname)
new_fname = f"{base}__{suffix}{ext}"
download_path = os.path.abspath(os.path.join(self.downloads_folder, new_fname))
except NameError:
pass
# No suitable name, so make one
if fname is None:
extension = mimetypes.guess_extension(content_type)
if extension is None:
extension = ".download"
fname = str(uuid.uuid4()) + extension
download_path = os.path.abspath(os.path.join(self.downloads_folder, fname))
# Open a file for writing
with open(download_path, "wb") as fh:
for chunk in response.iter_content(chunk_size=512):
fh.write(chunk)
# Render it
local_uri = pathlib.Path(download_path).as_uri()
self.set_address(local_uri)
except UnsupportedFormatException:
self.page_title = ("Download complete.",)
self._set_page_content(f"# Download complete\n\nSaved file to '{download_path}'")
except FileConversionException:
self.page_title = ("Download complete.",)
self._set_page_content(f"# Download complete\n\nSaved file to '{download_path}'")
except FileNotFoundError:
self.page_title = "Error 404"
self._set_page_content(f"## Error 404\n\nFile not found: {download_path}")
except requests.exceptions.RequestException:
if response is None:
self.page_title = "Request Exception"
self._set_page_content("## Unhandled Request Exception:\n\n" + traceback.format_exc())
else:
self.page_title = f"Error {response.status_code}"
# If the error was rendered in HTML we might as well render it
content_type = response.headers.get("content-type", "")
if content_type is not None and "text/html" in content_type.lower():
res = self._markdown_converter.convert(response)
self.page_title = f"Error {response.status_code}"
self._set_page_content(f"## Error {response.status_code}\n\n{res.text_content}")
else:
text = ""
for chunk in response.iter_content(chunk_size=512, decode_unicode=True):
text += chunk
self.page_title = f"Error {response.status_code}"
self._set_page_content(f"## Error {response.status_code}\n\n{text}")
def _fetch_local_dir(self, local_path: str) -> str:
"""Render a local directory listing in HTML to assist with local file browsing via the "file://" protocol.
Through rendered in HTML, later parts of the pipeline will convert the listing to Markdown.
Arguments:
local_path: A path to the local directory whose contents are to be listed.
Returns:
A directory listing, rendered in HTML.
"""
pardir = os.path.normpath(os.path.join(local_path, os.pardir))
pardir_uri = pathlib.Path(pardir).as_uri()
listing = f"""
<!DOCTYPE html>
<html>
<head>
<title>Index of {html.escape(local_path)}</title>
</head>
<body>
<h1>Index of {html.escape(local_path)}</h1>
<a href="{html.escape(pardir_uri, quote=True)}">.. (parent directory)</a>
<table>
<tr>
<th>Name</th><th>Size</th><th>Date modified</th>
</tr>
"""
for entry in os.listdir(local_path):
full_path = os.path.normpath(os.path.join(local_path, entry))
full_path_uri = pathlib.Path(full_path).as_uri()
size = ""
mtime = datetime.datetime.fromtimestamp(os.path.getmtime(full_path)).strftime("%Y-%m-%d %H:%M")
if os.path.isdir(full_path):
entry = entry + os.path.sep
else:
size = str(os.path.getsize(full_path))
listing += (
"<tr>\n"
+ f'<td><a href="{html.escape(full_path_uri, quote=True)}">{html.escape(entry)}</a></td>'
+ f"<td>{html.escape(size)}</td>"
+ f"<td>{html.escape(mtime)}</td>"
+ "</tr>"
)
listing += """
</table>
</body>
</html>
"""
return listing

View File

@ -0,0 +1,80 @@
import io
import os
from typing import Dict, Optional, Union
from urllib.parse import parse_qs, quote_plus, unquote, urljoin, urlparse
from .requests_markdown_browser import RequestsMarkdownBrowser
# Check if Selenium dependencies are installed
IS_SELENIUM_ENABLED = False
try:
from selenium import webdriver
from selenium.webdriver.chrome.options import Options
from selenium.webdriver.common.by import By
IS_SELENIUM_ENABLED = True
except ModuleNotFoundError:
pass
class SeleniumMarkdownBrowser(RequestsMarkdownBrowser):
"""
(In preview) A Selenium and Chromium powered Markdown web browser.
SeleniumMarkdownBrowser extends RequestsMarkdownBrowser, and replaces only the functionality of `visit_page(url)`.
"""
def __init__(self, **kwargs):
"""
Instantiate a new SeleniumMarkdownBrowser.
Arguments:
**kwargs: SeleniumMarkdownBrowser passes all arguments to the RequestsMarkdownBrowser superclass. See RequestsMarkdownBrowser documentation for more details.
"""
super().__init__(**kwargs)
self._webdriver = None
# Raise an error if Playwright isn't available
if not IS_SELENIUM_ENABLED:
raise ModuleNotFoundError(
"No module named 'selenium'. Selenium can be installed via 'pip install selenium' or 'conda install selenium' depending on your environment."
)
chrome_options = Options()
chrome_options.add_argument("--headless")
chrome_options.add_argument("--disable-gpu")
chrome_options.add_argument("--no-sandbox")
self._webdriver = webdriver.Chrome(options=chrome_options)
self._webdriver.implicitly_wait(99)
self._webdriver.get(self.start_page)
def __del__(self):
"""
Close the Selenium session when garbage-collected. Garbage collection may not always occur, or may happen at a later time. Call `close()` explicitly if you wish to free up resources used by Selenium or Chromium.
"""
self.close()
def close(self):
"""
Close the Selenium session used by this instance. The session cannot be reopened without instantiating a new SeleniumMarkdownBrowser instance.
"""
if self._webdriver is not None:
self._webdriver.quit()
self._webdriver = None
def _fetch_page(self, url) -> None:
"""
Fetch a page. If the page is a regular HTTP page, use Selenium to gather the HTML. If the page is a download, or a local file, rely on superclass behavior.
"""
if url.startswith("file://"):
super()._fetch_page(url)
else:
self._webdriver.get(url)
html = self._webdriver.execute_script("return document.documentElement.outerHTML;")
if not html: # Nothing... it's probably a download
super()._fetch_page(url)
else:
self.page_title = self._webdriver.execute_script("return document.title;")
res = self._markdown_converter.convert_stream(io.StringIO(html), file_extension=".html", url=url)
self._set_page_content(res.text_content)

View File

@ -129,12 +129,19 @@
"outputs": [],
"source": [
"from autogen.agentchat.contrib.web_surfer import WebSurferAgent # noqa: E402\n",
"from autogen.browser_utils import BingMarkdownSearch, RequestsMarkdownBrowser # noqa: E402\n",
"\n",
"browser = RequestsMarkdownBrowser(\n",
" downloads_folder=os.getcwd(),\n",
" search_engine=BingMarkdownSearch(bing_api_key=bing_api_key),\n",
")\n",
"\n",
"web_surfer = WebSurferAgent(\n",
" \"web_surfer\",\n",
" llm_config=llm_config,\n",
" summarizer_llm_config=summarizer_llm_config,\n",
" browser_config={\"viewport_size\": 4096, \"bing_api_key\": bing_api_key},\n",
" is_termination_msg=lambda x: x.get(\"content\", \"\").find(\"TERMINATE\") >= 0,\n",
" browser=browser,\n",
")\n",
"\n",
"user_proxy = autogen.UserProxyAgent(\n",
@ -179,42 +186,87 @@
">>>>>>>> EXECUTING FUNCTION informational_web_search...\u001b[0m\n",
"\u001b[33mweb_surfer\u001b[0m (to user_proxy):\n",
"\n",
"Address: bing: Microsoft AutoGen\n",
"Address: search: Microsoft AutoGen\n",
"Title: Microsoft AutoGen - Search\n",
"Viewport position: Showing page 1 of 1.\n",
"=======================\n",
"A Bing search for 'Microsoft AutoGen' found 10 results:\n",
"## A Bing search for 'Microsoft AutoGen' found 19 results:\n",
"\n",
"## Web Results\n",
"1. [AutoGen: Enabling next-generation large language model applications](https://www.microsoft.com/en-us/research/blog/autogen-enabling-next-generation-large-language-model-applications/)\n",
"AutoGen is a Python package that simplifies the orchestration, optimization, and automation of large language model applications. It enables customizable and conversable agents that integrate with humans, tools, and other agents to solve tasks using GPT-4 and other advanced LLMs. Learn how to use AutoGen for code-based question answering, supply-chain optimization, conversational chess, and more.\n",
"AutoGen enables complex LLM-based workflows using multi-agent conversations. (Left) AutoGen agents are customizable and can be based on LLMs, tools, humans, and even a combination of them. (Top-right) Agents can converse to solve tasks. (Bottom-right) The framework supports many additional complex conversation patterns.\n",
"\n",
"2. [GitHub - microsoft/autogen: Enable Next-Gen Large Language Model ...](https://github.com/microsoft/autogen)\n",
"AutoGen is a Python library that enables the development of large language model applications using multiple agents that can converse with each other to solve tasks. It supports various conversation patterns, enhanced LLM inference, and customizable and conversable agents based on OpenAI models.\n",
"2. [AutoGen - Microsoft Research](https://www.microsoft.com/en-us/research/project/autogen/)\n",
"Related projects. AutoGen is an open-source, community-driven project under active development (as a spinoff from FLAML, a fast library for automated machine learning and tuning), which encourages contributions from individuals of all backgrounds.Many Microsoft Research collaborators have made great contributions to this project, including academic contributors like Pennsylvania State ...\n",
"\n",
"3. [Getting Started | AutoGen](https://microsoft.github.io/autogen/docs/Getting-Started/)\n",
"AutoGen is a framework that enables development of LLM applications using multiple agents that can converse with each other to solve tasks. AutoGen agents are customizable, conversable, and seamlessly allow human participation. They can operate in various modes that employ combinations of LLMs, human inputs, and tools. Main Features\n",
"3. [AutoGen: Downloads - Microsoft Research](https://www.microsoft.com/en-us/research/project/autogen/downloads/)\n",
"Enable Next-Gen Large Language Model Applications. AutoGen is a framework that enables the development of LLM applications using multiple agents that can converse with each other to solve tasks. AutoGen agents are customizable, conversable, and seamlessly allow human participation. They…. AutoGen allows developers to build LLM applications ...\n",
"\n",
"4. [AutoGen | AutoGen - microsoft.github.io](https://microsoft.github.io/autogen/)\n",
"AutoGen is a tool that enables next-gen large language model applications by providing a high-level abstraction for building diverse and enhanced LLM workflows. It offers a collection of working systems for various domains and complexities, as well as enhanced LLM inference and optimization APIs.\n",
"4. [Microsoft Semantic Kernel and AutoGen: Open Source Frameworks for AI ...](https://techcommunity.microsoft.com/t5/educator-developer-blog/microsoft-semantic-kernel-and-autogen-open-source-frameworks-for/ba-p/4051305)\n",
"Microsoft AutoGen is designed for integrating and controlling multiple LLMs. Its a research project that shows the potential of using multiple agents together. AutoGen allows for the creation of diverse teams of agents, each with their own specialized skills or goals. These agents can chat with each other, facilitating greater diversity in ...\n",
"\n",
"5. [AutoGen - Microsoft Research](https://www.microsoft.com/en-us/research/project/autogen/)\n",
"AutoGen is an open-source library for building next-generation LLM applications with multiple agents, teachability and personalization. It supports agents that can be backed by various LLM configurations, code generation and execution, and human proxy agent integration.\n",
"5. [GitHub - microsoft/autogen: A programming framework for agentic AI ...](https://github.com/microsoft/autogen)\n",
"microsoft.github.io/autogen/ Topics chat chatbot gpt chat-application agent-based-framework agent-oriented-programming gpt-4 chatgpt llmops gpt-35-turbo llm-agent llm-inference agentic llm-framework agentic-agi\n",
"\n",
"6. [Installation | AutoGen](https://microsoft.github.io/autogen/docs/Installation/)\n",
"Installation Setup Virtual Environment When not using a docker container, we recommend using a virtual environment to install AutoGen. This will ensure that the dependencies for AutoGen are isolated from the rest of your system. Option 1: venv You can create a virtual environment with venv as below: python3 -m venv pyautogen\n",
"6. [Getting Started | AutoGen - microsoft.github.io](https://microsoft.github.io/autogen/docs/Getting-Started/)\n",
"Getting Started. AutoGen is a framework that enables development of LLM applications using multiple agents that can converse with each other to solve tasks. AutoGen agents are customizable, conversable, and seamlessly allow human participation. They can operate in various modes that employ combinations of LLMs, human inputs, and tools.\n",
"\n",
"7. [AutoGen: Downloads - Microsoft Research](https://www.microsoft.com/en-us/research/project/autogen/downloads/)\n",
"AutoGen allows developers to build LLM applications via multiple agents that can converse with each other to accomplish tasks.\n",
"7. [AutoGen | AutoGen - microsoft.github.io](https://microsoft.github.io/autogen/)\n",
"AutoGen provides multi-agent conversation framework as a high-level abstraction. With this framework, one can conveniently build LLM workflows. Easily Build Diverse Applications. AutoGen offers a collection of working systems spanning a wide range of applications from various domains and complexities.\n",
"\n",
"8. [Multi-agent Conversation Framework | AutoGen - microsoft.github.io](https://microsoft.github.io/autogen/docs/Use-Cases/agent_chat/)\n",
"AutoGen offers a unified multi-agent conversation framework as a high-level abstraction of using foundation models. It features capable, customizable and conversable agents which integrate LLMs, tools, and humans via automated agent chat.\n",
"8. [AutoGen Studio: Interactively Explore Multi-Agent Workflows](https://microsoft.github.io/autogen/blog/2023/12/01/AutoGenStudio/)\n",
"To help you rapidly prototype multi-agent solutions for your tasks, we are introducing AutoGen Studio, an interface powered by AutoGen. It allows you to: Declaratively define and modify agents and multi-agent workflows through a point and click, drag and drop interface (e.g., you can select the parameters of two agents that will communicate to ...\n",
"\n",
"9. [[2308.08155] AutoGen: Enabling Next-Gen LLM Applications via Multi ...](https://arxiv.org/abs/2308.08155)\n",
"AutoGen is an open-source framework that allows developers to create and customize agents that can converse with each other to perform tasks using various types of language models (LLMs). The framework supports natural language and code-based conversation patterns, and is effective for diverse applications such as mathematics, coding, question answering, and more.\n",
"9. [AutoGen | Getting Started A-Z Install & Run | Easy | Microsoft](https://www.youtube.com/watch?v=UxtJsIDTFZo)\n",
"Getting Started with AutoGen in 10 mins. AutoGen is a framework that enables the development of LLM applications using multiple agents that can converse with each other to solve tasks. AutoGen agents are customizable, conversable, and seamlessly allow human participation. They can operate in various modes that employ combinations of LLMs, human ...\n",
"Date published: 2023-10-04\n",
"\n",
"10. [How to setup and use the new Microsoft AutoGen AI agent](https://www.geeky-gadgets.com/microsoft-autogen/)\n",
"Learn how to use AutoGen, a tool that simplifies the automation and optimization of complex language model applications using multiple agents that can converse with each other. AutoGen supports diverse conversation patterns, human participation, and the tuning of expensive LLMs like ChatGPT and GPT-4.\n",
"10. [AutoGen Tutorial 🚀 Create Custom AI Agents EASILY (Incredible)](https://www.youtube.com/watch?v=vU2S6dVf79M)\n",
"In this video, I show you how to use AutoGen, which allows anyone to use multi-agent LLMs to power their applications. First, I give an overview of what AutoGen is, and then I show you how to use it with two examples. Currently, AutoGen works with OpenAI's API, but they are already working on adding local models natively, and you can already do ...\n",
"Date published: 2023-10-03\n",
"\n",
"11. [Microsoft Autogen Studio 2 - How to run an army of agents](https://www.youtube.com/watch?v=lRu_-yFY-4M)\n",
"In this video, well explore how to use and leverage Autogen Studio 2 by creating two agents: one to extract YouTube comments and another to transform those insights into fresh video content ideas. 🔧 Installation is a breeze with just two commands. 👁️‍🗨️ The 'Build' tab's intuitive interface to use different LLMs 🧠 'Skills ...\n",
"Date published: 2024-02-02\n",
"\n",
"12. [AutoGen Studio 2.0 Advanced Tutorial | Build multi-agent GenAI Application!!](https://www.youtube.com/watch?v=MUhRP8QCb9A)\n",
"In this tutorial we will be covering AutoGen 2.0 from Microsoft which is an open-source library, offers a high-level abstraction for multi-agent conversation frameworks, facilitating next-generation LLM applications with collaborative, teachable, and personalized features to enhance productivity. In this tutorial we will be installing AutoGen ...\n",
"Date published: 2024-02-02\n",
"\n",
"13. [AutoGen Tutorial: Create GODLY Custom AI Agents EASILY (Installation Tutorial)](https://www.youtube.com/watch?v=ijYDTDR4f8k)\n",
"In this video, we delve into the revolutionary world of AutoGen, a sophisticated framework designed to simplify and streamline the management of workflows involving large language models (LLMs). These workflows are intricate, demanding, and require expertise to design, implement, and optimize effectively. As developers explore complex ...\n",
"Date published: 2023-10-12\n",
"\n",
"14. [AutoGen Tutorial 🤖 Create Collaborating AI Agent teams](https://www.youtube.com/watch?v=0GyJ3FLHR1o)\n",
"You can now use AutoGen to create multiple AI agents to work together to complete a task that you defined. Let's take a look at what Autogen is, and then I will show you how to quickly start using it. In this video, we will go through the examples of task solving with code generation, execution and debugging and setting up a group chat of more ...\n",
"Date published: 2023-10-30\n",
"\n",
"15. [Autogen - Microsoft's best AI Agent framework that is controllable?](https://www.youtube.com/watch?v=Bq-0ClZttc8)\n",
"Microsoft just announced a multi agent framework called Autogen, which solved a few problems of existing agent frameworks; Lets dive in 🔗 Links - Follow me on twitter: https://twitter.com/jasonzhou1993 - Join my AI email list: https://www.ai-jason.com/ - My discord: https://discord.gg/eZXprSaCDE - Github repo & blog: https://ai-jason ...\n",
"Date published: 2023-10-03\n",
"\n",
"16. [Microsoft AUTOGEN STUDIO 2.0 HUGE UPDATE - Create Custom AI Agents | Microsoft AI](https://www.youtube.com/watch?v=kj8nVBI_oiM)\n",
"Learn everything about Microsofts revolutionary Autogen Studio 2.0 - an intuitive graphical interface enabling anyone to build coordinated AI solutions without coding! See how pre-built skills are visually combined into flexible teams handling complex goals through specialization and conversation. From travel planning to content creation ...\n",
"Date published: 2024-01-17\n",
"\n",
"17. [Microsoft's Autogen 2 - Create Custom AI Agents](https://www.youtube.com/watch?v=_LGUXoNuwOo)\n",
"💬 Access GPT-4 ,Claude-2 and more - chat.forefront.ai/?ref=theaigrid 🎤 Use the best AI Voice Creator - elevenlabs.io/?from=partnerscott3908 Join Our Weekly Newsletter - https://mailchi.mp/6cff54ad7e2e/theaigrid 🐤 Follow us on Twitter https://twitter.com/TheAiGrid 🌐 Checkout Our website - https://theaigrid.com/ https://microsoft ...\n",
"Date published: 2024-01-14\n",
"\n",
"18. [arXiv:2308.08155v2 cs.AI 3 Oct 2023](https://arxiv.org/pdf/2308.08155.pdf)\n",
"AutoGen: Enabling Next-Gen LLM Applications via Multi-Agent Conversation Qingyun Wu †, Gagan Bansal , Jieyu Zhang±, Yiran Wu , Beibin Li Erkang Zhu , Li Jiang , Xiaoyun Zhang , Shaokun Zhang†, Jiale Liu∓ Ahmed Awadallah , Ryen W. White , Doug Burger , Chi Wang1 Microsoft Research, †Pennsylvania State University ±University of Washington,∓Xidian University\n",
"\n",
"19. [Releases · microsoft/autogen · GitHub](https://github.com/microsoft/autogen/releases)\n",
"Notebook. New feature in code execution: Support user defined functions in local CLI executor - similar functionality to the \"skills\" in AutoGen Studio. New agent capability: Vision Capability for ConversableAgents allows them to \"see\" images. New IOStream protocol and support for web sockets!\n",
"\n",
"## Related Searches:\n",
"- microsoft autogen download\n",
"- microsoft autogen examples\n",
"- autogenai website\n",
"- is microsoft autogen free\n",
"- autogen install\n",
"- how to install microsoft autogen\n",
"- autogen microsoft tutorial\n",
"- autogen openai\n",
"\n",
"--------------------------------------------------------------------------------\n"
]
@ -225,7 +277,7 @@
"Search the web for information about Microsoft AutoGen\n",
"\"\"\"\n",
"\n",
"user_proxy.initiate_chat(web_surfer, message=task1)"
"user_proxy.initiate_chat(web_surfer, message=task1);"
]
},
{
@ -248,7 +300,7 @@
">>>>>>>> EXECUTING FUNCTION summarize_page...\u001b[0m\n",
"\u001b[33mweb_surfer\u001b[0m (to user_proxy):\n",
"\n",
"AutoGen is a Python package and framework developed by Microsoft that simplifies the orchestration, optimization, and automation of large language model (LLM) applications. It enables the development of customizable and conversable agents that can solve tasks using advanced LLMs like GPT-4. AutoGen supports various conversation patterns, enhanced LLM inference, and seamless integration with humans, tools, and other agents. It offers a high-level abstraction for building diverse and enhanced LLM workflows and provides a collection of working systems for different domains and complexities. AutoGen is open-source and supports natural language and code-based conversation patterns for applications such as question answering, coding, mathematics, and more.\n",
"Microsoft AutoGen is a framework that enables the development of large language model (LLM) applications using multiple agents that can converse with each other to solve tasks. It allows for the creation of diverse teams of agents with specialized skills or goals, facilitating greater diversity in workflows. AutoGen agents are customizable, conversable, and seamlessly allow human participation. The framework supports various conversation patterns and offers a collection of working systems spanning a wide range of applications. AutoGen Studio is an interface powered by AutoGen that allows for the rapid prototyping of multi-agent solutions through a point and click, drag and drop interface. There are also tutorials and videos available to help users get started with AutoGen.\n",
"\n",
"--------------------------------------------------------------------------------\n"
]
@ -256,7 +308,7 @@
],
"source": [
"task2 = \"Summarize these results\"\n",
"user_proxy.initiate_chat(web_surfer, message=task2, clear_history=False)"
"user_proxy.initiate_chat(web_surfer, message=task2, clear_history=False);"
]
},
{
@ -276,59 +328,183 @@
"\u001b[31m\n",
">>>>>>>> USING AUTO REPLY...\u001b[0m\n",
"\u001b[35m\n",
">>>>>>>> EXECUTING FUNCTION navigational_web_search...\u001b[0m\n",
">>>>>>>> EXECUTING FUNCTION visit_page...\u001b[0m\n",
"\u001b[33mweb_surfer\u001b[0m (to user_proxy):\n",
"\n",
"Address: https://microsoft.github.io/autogen/docs/Getting-Started/\n",
"Title: Getting Started | AutoGen\n",
"Viewport position: Showing page 1 of 2.\n",
"Viewport position: Showing page 1 of 1.\n",
"=======================\n",
"Getting Started | AutoGen\n",
"\n",
"[Skip to main content](#)[![AutoGen](/autogen/img/ag.svg)![AutoGen](/autogen/img/ag.svg)**AutoGen**](/autogen/)[Docs](/autogen/docs/Getting-Started)[SDK](/autogen/docs/reference/agentchat/conversable_agent)[Blog](/autogen/blog)[FAQ](/autogen/docs/FAQ)[Examples](/autogen/docs/Examples)Resources* [Ecosystem](/autogen/docs/Ecosystem)\n",
"* [Gallery](/autogen/docs/Gallery)\n",
"[GitHub](https://github.com/microsoft/autogen)🌜🌞`ctrl``K`* [Getting Started](/autogen/docs/Getting-Started)\n",
"* [Installation](/autogen/docs/Installation)\n",
"* [Use Cases](#)\n",
"[Skip to main content](#__docusaurus_skipToContent_fallback)What's new in AutoGen? Read [this blog](/autogen/blog/2024/03/03/AutoGen-Update) for an overview of updates[![AutoGen](/autogen/img/ag.svg)![AutoGen](/autogen/img/ag.svg)**AutoGen**](/autogen/)Docs* [Getting Started](/autogen/docs/Getting-Started)\n",
"* [Installation](/autogen/docs/installation/)\n",
"* [Tutorial](/autogen/docs/tutorial/introduction)\n",
"* [User Guide](/autogen/docs/topics)\n",
"* [API Reference](/autogen/docs/reference/agentchat/conversable_agent)\n",
"* [FAQs](/autogen/docs/FAQ)\n",
"* [Ecosystem](/autogen/docs/ecosystem)\n",
"* [Contribute](/autogen/docs/Contribute)\n",
"* [Research](/autogen/docs/Research)\n",
"Examples* [Examples by Category](/autogen/docs/Examples)\n",
"* [Examples by Notebook](/autogen/docs/notebooks)\n",
"* [Application Gallery](/autogen/docs/Gallery)\n",
"Other Languages* [Dotnet](https://microsoft.github.io/autogen-for-net/)\n",
"[Blog](/autogen/blog)[GitHub](https://github.com/microsoft/autogen)[Discord](https://aka.ms/autogen-dc)[Twitter](https://twitter.com/pyautogen)`ctrl``K`* [Getting Started](/autogen/docs/Getting-Started)\n",
"* [Installation](/autogen/docs/installation/)\n",
"* [Tutorial](/autogen/docs/tutorial)\n",
"\t+ [Introduction](/autogen/docs/tutorial/introduction)\n",
"\t+ [Chat Termination](/autogen/docs/tutorial/chat-termination)\n",
"\t+ [Human in the Loop](/autogen/docs/tutorial/human-in-the-loop)\n",
"\t+ [Code Executors](/autogen/docs/tutorial/code-executors)\n",
"\t+ [Tool Use](/autogen/docs/tutorial/tool-use)\n",
"\t+ [Conversation Patterns](/autogen/docs/tutorial/conversation-patterns)\n",
"\t+ [What Next?](/autogen/docs/tutorial/what-next)\n",
"* [Use Cases](/autogen/docs/Use-Cases/agent_chat)\n",
"* [User Guide](/autogen/docs/topics)\n",
"\t+ [Code Execution](/autogen/docs/topics/code-execution/cli-code-executor)\n",
"\t+ [Using Non-OpenAI Models](/autogen/docs/topics/non-openai-models/about-using-nonopenai-models)\n",
"\t+ [LLM Caching](/autogen/docs/topics/llm-caching)\n",
"\t+ [LLM Configuration](/autogen/docs/topics/llm_configuration)\n",
"\t+ [Prompting and Reasoning](/autogen/docs/topics/prompting-and-reasoning/react)\n",
"\t+ [Retrieval Augmentation](/autogen/docs/topics/retrieval_augmentation)\n",
"\t+ [Task Decomposition](/autogen/docs/topics/task_decomposition)\n",
"* [API Reference](/autogen/docs/reference/agentchat/conversable_agent)\n",
"* [FAQs](/autogen/docs/FAQ)\n",
"* [Ecosystem](/autogen/docs/ecosystem)\n",
"* [Contributing](/autogen/docs/Contribute)\n",
"* [Research](/autogen/docs/Research)\n",
"On this pageGetting Started\n",
"===============\n",
"* [Migration Guide](/autogen/docs/Migration-Guide)\n",
"*\n",
"* Getting Started\n",
"On this page\n",
"# Getting Started\n",
"\n",
"AutoGen is a framework that enables development of LLM applications using multiple agents that can converse with each other to solve tasks. AutoGen agents are customizable, conversable, and seamlessly allow human participation. They can operate in various modes that employ combinations of LLMs, human inputs, and tools.\n",
"AutoGen is a framework that enables development of LLM applications using\n",
"multiple agents that can converse with each other to solve tasks. AutoGen agents\n",
"are customizable, conversable, and seamlessly allow human participation. They\n",
"can operate in various modes that employ combinations of LLMs, human inputs, and\n",
"tools.\n",
"\n",
"![AutoGen Overview](/autogen/assets/images/autogen_agentchat-250ca64b77b87e70d34766a080bf6ba8.png)\n",
"\n",
"### Main Features[](#main-features \"Direct link to heading\")\n",
"### Main Features[](#main-features \"Direct link to Main Features\")\n",
"\n",
"* AutoGen enables building next-gen LLM applications based on [multi-agent conversations](https://microsoft.github.io/autogen/docs/Use-Cases/agent_chat) with minimal effort. It simplifies the orchestration, automation, and optimization of a complex LLM workflow. It maximizes the performance of LLM models and overcomes their weaknesses.\n",
"* It supports [diverse conversation patterns](https://microsoft.github.io/autogen/docs/Use-Cases/agent_chat#supporting-diverse-conversation-patterns) for complex workflows. With customizable and conversable agents, developers can use AutoGen to build a wide range of conversation patterns concerning conversation autonomy,\n",
"the number of agents, and agent conversation topology.\n",
"* It provides a collection of working systems with different complexities. These systems span a [wide range of applications](https://microsoft.github.io/autogen/docs/Use-Cases/agent_chat#diverse-applications-implemented-with-autogen) from various domains and complexities. This demonstrates how AutoGen can easily support diverse conversation patterns.\n",
"* AutoGen provides [enhanced LLM inference](https://microsoft.github.io/autogen/docs/Use-Cases/enhanced_inference#api-unification). It offers utilities like API unification and caching, and advanced usage patterns, such as error handling, multi-config inference, context programming, etc.\n",
"* AutoGen enables building next-gen LLM applications based on [multi-agent\n",
"conversations](/autogen/docs/Use-Cases/agent_chat) with minimal effort. It simplifies\n",
"the orchestration, automation, and optimization of a complex LLM workflow. It\n",
"maximizes the performance of LLM models and overcomes their weaknesses.\n",
"* It supports [diverse conversation\n",
"patterns](/autogen/docs/Use-Cases/agent_chat#supporting-diverse-conversation-patterns)\n",
"for complex workflows. With customizable and conversable agents, developers can\n",
"use AutoGen to build a wide range of conversation patterns concerning\n",
"conversation autonomy, the number of agents, and agent conversation topology.\n",
"* It provides a collection of working systems with different complexities. These\n",
"systems span a [wide range of\n",
"applications](/autogen/docs/Use-Cases/agent_chat#diverse-applications-implemented-with-autogen)\n",
"from various domains and complexities. This demonstrates how AutoGen can\n",
"easily support diverse conversation patterns.\n",
"\n",
"AutoGen is powered by collaborative [research studies](/autogen/docs/Research) from Microsoft, Penn State University, and University of Washington.\n",
"AutoGen is powered by collaborative [research studies](/autogen/docs/Research) from\n",
"Microsoft, Penn State University, and University of Washington.\n",
"\n",
"### Quickstart[](#quickstart \"Direct link to heading\")\n",
"### Quickstart[](#quickstart \"Direct link to Quickstart\")\n",
"\n",
"Install from pip: `pip install pyautogen`. Find more options in [Installation](/autogen/docs/Installation).\n",
"For [code execution](/autogen/docs/FAQ#code-execution), we strongly recommend installing the python docker package, and using docker.\n",
"```\n",
"pip install pyautogen\n",
"\n",
"#### Multi-Agent Conversation Framework[](#multi-agent-conversation-framework \"Direct link to heading\")\n",
"```\n",
"\n",
"* No code execution\n",
"* Local execution\n",
"* Docker execution\n",
"\n",
"```\n",
"from autogen import AssistantAgent, UserProxyAgent\n",
"\n",
"llm_config = {\"model\": \"gpt-4\", \"api_key\": os.environ[\"OPENAI_API_KEY\"]}\n",
"assistant = AssistantAgent(\"assistant\", llm_config=llm_config)\n",
"user_proxy = UserProxyAgent(\"user_proxy\", code_execution_config=False)\n",
"\n",
"# Start the chat\n",
"user_proxy.initiate_chat(\n",
" assistant,\n",
" message=\"Tell me a joke about NVDA and TESLA stock prices.\",\n",
")\n",
"\n",
"```\n",
"warningWhen asked, be sure to check the generated code before continuing to ensure it is safe to run.\n",
"\n",
"```\n",
"import autogen\n",
"from autogen import AssistantAgent, UserProxyAgent\n",
"\n",
"llm_config = {\"model\": \"gpt-4\", \"api_key\": os.environ[\"OPENAI_API_KEY\"]}\n",
"assistant = AssistantAgent(\"assistant\", llm_config=llm_config)\n",
"\n",
"user_proxy = UserProxyAgent(\n",
" \"user_proxy\", code_execution_config={\"executor\": autogen.coding.LocalCommandLineCodeExecutor(work_dir=\"coding\")}\n",
")\n",
"\n",
"# Start the chat\n",
"user_proxy.initiate_chat(\n",
" assistant,\n",
" message=\"Plot a chart of NVDA and TESLA stock price change YTD.\",\n",
")\n",
"\n",
"```\n",
"\n",
"```\n",
"import autogen\n",
"from autogen import AssistantAgent, UserProxyAgent\n",
"\n",
"llm_config = {\"model\": \"gpt-4\", \"api_key\": os.environ[\"OPENAI_API_KEY\"]}\n",
"\n",
"with autogen.coding.DockerCommandLineCodeExecutor(work_dir=\"coding\") as code_executor:\n",
" assistant = AssistantAgent(\"assistant\", llm_config=llm_config)\n",
" user_proxy = UserProxyAgent(\n",
" \"user_proxy\", code_execution_config={\"executor\": code_executor}\n",
" )\n",
"\n",
" # Start the chat\n",
" user_proxy.initiate_chat(\n",
" assistant,\n",
" message=\"Plot a chart of NVDA and TESLA stock price change YTD. Save the plot to a file called plot.png\",\n",
" )\n",
"\n",
"```\n",
"Open `coding/plot.png` to see the generated plot.\n",
"\n",
"tipLearn more about configuring LLMs for agents [here](/autogen/docs/topics/llm_configuration).\n",
"\n",
"#### Multi-Agent Conversation Framework[](#multi-agent-conversation-framework \"Direct link to Multi-Agent Conversation Framework\")\n",
"\n",
"Autogen enables the next-gen LLM applications with a generic multi-agent conversation framework. It offers customizable and conversable agents which integrate LLMs, tools, and humans.\n",
"By automating chat among multiple capable agents, one can easily make them collectively perform tasks autonomously or with human feedback, including tasks that require using tools via code. For [example](https://github.com/microsoft/autogen/blob/main/test/twoagent.py),\n",
"\n",
"```\n",
"from autogen import AssistantAgent, UserProxyAgent, config\\_list\\_from\\_json \n",
" \n",
"# Load LLM inference endpoints from an env variable or a file \n",
"# See https://microsoft.github.io/autogen/docs/FAQ#set-your-api-endpoints \n",
"# and OAI\\_CONFIG\\_LIST\\_sample.json \n",
"config\\_list = config\\_list\\_from\\_json(env\\_or\\_file=\"OAI\\_CONFIG\\_LIST\") \n",
"assistant = AssistantAgent(\"assistant\", llm\\_config={\"config\\_list\": config\\_list}) \n",
"user\\_proxy = UserProxyAgent(\"user\\_proxy\", code\\_execution\\_config={\"work\\_dir\": \"coding\"}) \n",
"user\\_proxy.initiate\\_chat(assistant, \n",
"The figure below shows an example conversation flow with AutoGen.\n",
"\n",
"![Agent Chat Example](/autogen/assets/images/chat_example-da70a7420ebc817ef9826fa4b1e80951.png)\n",
"\n",
"### Where to Go Next?[](#where-to-go-next \"Direct link to Where to Go Next?\")\n",
"\n",
"* Go through the [tutorial](/autogen/docs/tutorial/introduction) to learn more about the core concepts in AutoGen\n",
"* Read the examples and guides in the [notebooks section](/autogen/docs/notebooks)\n",
"* Understand the use cases for [multi-agent conversation](/autogen/docs/Use-Cases/agent_chat) and [enhanced LLM inference](/autogen/docs/Use-Cases/enhanced_inference)\n",
"* Read the [API](/autogen/docs/reference/agentchat/conversable_agent/) docs\n",
"* Learn about [research](/autogen/docs/Research) around AutoGen\n",
"* Chat on [Discord](https://aka.ms/autogen-dc)\n",
"* Follow on [Twitter](https://twitter.com/pyautogen)\n",
"* See our [roadmaps](https://aka.ms/autogen-roadmap)\n",
"\n",
"If you like our project, please give it a [star](https://github.com/microsoft/autogen/stargazers) on GitHub. If you are interested in contributing, please read [Contributor's Guide](/autogen/docs/Contribute).\n",
"\n",
"[Edit this page](https://github.com/microsoft/autogen/edit/main/website/docs/Getting-Started.mdx)[NextInstallation](/autogen/docs/installation/)* [Main Features](#main-features)\n",
"* [Quickstart](#quickstart)\n",
"* [Where to Go Next?](#where-to-go-next)\n",
"Community* [Discord](https://aka.ms/autogen-dc)\n",
"* [Twitter](https://twitter.com/pyautogen)\n",
"Copyright © 2024 AutoGen Authors | [Privacy and Cookies](https://go.microsoft.com/fwlink/?LinkId=521839)\n",
"\n",
"\n",
"--------------------------------------------------------------------------------\n"
]
@ -336,7 +512,7 @@
],
"source": [
"task3 = \"Click the 'Getting Started' result\"\n",
"user_proxy.initiate_chat(web_surfer, message=task3, clear_history=False)"
"user_proxy.initiate_chat(web_surfer, message=task3, clear_history=False);"
]
},
{
@ -370,74 +546,47 @@
"\u001b[33mweb_surfer\u001b[0m (to user_proxy):\n",
"\n",
"Address: https://en.wikipedia.org/wiki/Microsoft\n",
"Title: Microsoft - Wikipedia\n",
"Viewport position: Showing page 1 of 64.\n",
"Title: Microsoft\n",
"Viewport position: Showing page 1 of 34.\n",
"=======================\n",
"# Microsoft\n",
"\n",
"American multinational technology corporation\n",
"\n",
"Microsoft Corporation| [A square divided into four sub-squares, colored red-orange, green, yellow and blue (clockwise), with the company name appearing to its right](/wiki/File:Microsoft_logo_(2012).svg) |\n",
"| Building 92 on the [Microsoft Redmond campus](/wiki/Microsoft_Redmond_campus \"Microsoft Redmond campus\") |\n",
"| Type | [Public](/wiki/Public_company \"Public company\") |\n",
"| [Traded as](/wiki/Ticker_symbol \"Ticker symbol\") | * [Nasdaq](/wiki/Nasdaq \"Nasdaq\"): [MSFT](https://www.nasdaq.com/market-activity/stocks/msft)\n",
"* [Nasdaq-100](/wiki/Nasdaq-100 \"Nasdaq-100\") component\n",
"* [DJIA](/wiki/Dow_Jones_Industrial_Average \"Dow Jones Industrial Average\") component\n",
"* [S&P 100](/wiki/S%26P_100 \"S&P 100\") component\n",
"* [S&P 500](/wiki/S%26P_500 \"S&P 500\") component\n",
" |\n",
"Microsoft Corporation\n",
"| [A square divided into four sub-squares, colored red-orange, green, yellow and blue (clockwise), with the company name appearing to its right](/wiki/File%3AMicrosoft_logo_%282012%29.svg) | |\n",
"| --- | --- |\n",
"| Aerial view of the [Microsoft Redmond campus](/wiki/Microsoft_Redmond_campus \"Microsoft Redmond campus\") | |\n",
"| Company type | [Public](/wiki/Public_company \"Public company\") |\n",
"| [Traded as](/wiki/Ticker_symbol \"Ticker symbol\") | * [Nasdaq](/wiki/Nasdaq \"Nasdaq\"): [MSFT](https://www.nasdaq.com/market-activity/stocks/msft) * [Nasdaq-100](/wiki/Nasdaq-100 \"Nasdaq-100\") component * [DJIA](/wiki/Dow_Jones_Industrial_Average \"Dow Jones Industrial Average\") component * [S&P 100](/wiki/S%26P_100 \"S&P 100\") component * [S&P 500](/wiki/S%26P_500 \"S&P 500\") component |\n",
"| [ISIN](/wiki/International_Securities_Identification_Number \"International Securities Identification Number\") | [US5949181045](https://isin.toolforge.org/?language=en&isin=US5949181045) |\n",
"| Industry | [Information technology](/wiki/Information_technology \"Information technology\") |\n",
"| Founded | April 4, 1975; 48 years ago (1975-04-04) in [Albuquerque, New Mexico](/wiki/Albuquerque,_New_Mexico \"Albuquerque, New Mexico\"), U.S. |\n",
"| Founders | * [Bill Gates](/wiki/Bill_Gates \"Bill Gates\")\n",
"* [Paul Allen](/wiki/Paul_Allen \"Paul Allen\")\n",
" |\n",
"| Headquarters | [One Microsoft Way](/wiki/Microsoft_campus \"Microsoft campus\")[Redmond, Washington](/wiki/Redmond,_Washington \"Redmond, Washington\"), U.S. |\n",
"| Founded | April 4, 1975; 48 years ago (1975-04-04) in [Albuquerque, New Mexico](/wiki/Albuquerque%2C_New_Mexico \"Albuquerque, New Mexico\"), U.S. |\n",
"| Founders | * [Bill Gates](/wiki/Bill_Gates \"Bill Gates\") * [Paul Allen](/wiki/Paul_Allen \"Paul Allen\") |\n",
"| Headquarters | [One Microsoft Way](/wiki/One_Microsoft_Way \"One Microsoft Way\"), [Redmond, Washington](/wiki/Redmond%2C_Washington \"Redmond, Washington\"), U.S. |\n",
"| Area served | Worldwide |\n",
"| Key people | * [Satya Nadella](/wiki/Satya_Nadella \"Satya Nadella\")([Chairman](/wiki/Chairman \"Chairman\") & [CEO](/wiki/Chief_executive_officer \"Chief executive officer\"))\n",
"* [Brad Smith](/wiki/Brad_Smith_(American_lawyer) \"Brad Smith (American lawyer)\")([Vice Chairman](/wiki/Vice-Chairman \"Vice-Chairman\") & [President](/wiki/President_(corporate_title) \"President (corporate title)\"))\n",
"* [Bill Gates](/wiki/Bill_Gates \"Bill Gates\")([technical adviser](/wiki/Adviser \"Adviser\"))\n",
" |\n",
"| Products | * [Software development](/wiki/Software_development \"Software development\")\n",
"* [Computer hardware](/wiki/Computer_hardware \"Computer hardware\")\n",
"* [Consumer electronics](/wiki/Consumer_electronics \"Consumer electronics\")\n",
"* [Social networking service](/wiki/Social_networking_service \"Social networking service\")\n",
"* [Cloud computing](/wiki/Cloud_computing \"Cloud computing\")\n",
"* [Video games](/wiki/Video_game_industry \"Video game industry\")\n",
"* [Internet](/wiki/Internet \"Internet\")\n",
"* [Corporate venture capital](/wiki/Corporate_venture_capital \"Corporate venture capital\")\n",
" |\n",
"| Brands | \n",
"* [Windows](/wiki/Microsoft_Windows \"Microsoft Windows\")\n",
"* [Microsoft 365](/wiki/Microsoft_365 \"Microsoft 365\")\n",
"* [Skype](/wiki/Skype \"Skype\")\n",
"* [Visual Studio](/wiki/Visual_Studio \"Visual Studio\")\n",
"* [Xbox](/wiki/Xbox \"Xbox\")\n",
"* [Dynamics](/wiki/Microsoft_Dynamics_365 \"Microsoft Dynamics 365\")\n",
"* [Surface](/wiki/Microsoft_Surface \"Microsoft Surface\")\n",
"\n",
" |\n",
"| Services | \n",
"* [Edge](/wiki/Microsoft_Edge \"Microsoft Edge\")\n",
"* [Azure](/wiki/Microsoft_Azure \"Microsoft Azure\")\n",
"* [Bing](/wiki/Microsoft_Bing \"Microsoft Bing\")\n",
"* [LinkedIn](/wiki/LinkedIn \"LinkedIn\")\n",
"* [Yammer](/wiki/Yammer \"Yammer\")\n",
"* [Microsoft 365](/wiki/Microsoft_365 \"Microsoft 365\")\n",
"* [OneDrive](/wiki/OneDrive \"OneDrive\")\n",
"* [Outlook](/wiki/Microsoft_Outlook \"Microsoft Outlook\")\n",
"* [GitHub](/wiki/GitHub \"GitHub\")\n",
"* [Microsoft Store](/wiki/Microsoft_Store_(digital) \"Microsoft Store (digital)\")\n",
"* [Windows Update](/wiki/Windows_Update \"Windows Update\")\n",
"* [Xbox Game Pass](/wiki/Xbox_Game_Pass \"Xbox Game Pass\")\n",
"* [Xbox network](/wiki/Xbox_network \"Xbox network\")\n",
"\n",
" |\n",
"| Key people | * [Satya Nadella](/wiki/Satya_Nadella \"Satya Nadella\")([Chairman](/wiki/Chairman \"Chairman\") & [CEO](/wiki/Chief_executive_officer \"Chief executive officer\")) * [Brad Smith](/wiki/Brad_Smith_%28American_lawyer%29 \"Brad Smith (American lawyer)\")([Vice Chairman](/wiki/Vice-Chairman \"Vice-Chairman\") & [President](/wiki/President_%28corporate_title%29 \"President (corporate title)\")) * Bill Gates([technical adviser](/wiki/Adviser \"Adviser\")) |\n",
"| Products | * [Software development](/wiki/Software_development \"Software development\") * [Computer hardware](/wiki/Computer_hardware \"Computer hardware\") * [Consumer electronics](/wiki/Consumer_electronics \"Consumer electronics\") * [Social networking service](/wiki/Social_networking_service \"Social networking service\") * [Cloud computing](/wiki/Cloud_computing \"Cloud computing\") * [Video games](/wiki/Video_game_industry \"Video game industry\") * [Internet](/wiki/Internet \"Internet\") * [Corporate venture capital](/wiki/Corporate_venture_capital \"Corporate venture capital\") |\n",
"| Brands | * [Windows](/wiki/Microsoft_Windows \"Microsoft Windows\") * [Microsoft 365](/wiki/Microsoft_365 \"Microsoft 365\") * [Skype](/wiki/Skype \"Skype\") * [Visual Studio](/wiki/Visual_Studio \"Visual Studio\") * [Xbox](/wiki/Xbox \"Xbox\") * [Dynamics](/wiki/Microsoft_Dynamics_365 \"Microsoft Dynamics 365\") * [Surface](/wiki/Microsoft_Surface \"Microsoft Surface\") |\n",
"| Services | * [Edge](/wiki/Microsoft_Edge \"Microsoft Edge\") * [Azure](/wiki/Microsoft_Azure \"Microsoft Azure\") * [Bing](/wiki/Microsoft_Bing \"Microsoft Bing\") * [LinkedIn](/wiki/LinkedIn \"LinkedIn\") * [Yammer](/wiki/Yammer \"Yammer\") * [Microsoft 365](/wiki/Microsoft_365 \"Microsoft 365\") * [OneDrive](/wiki/OneDrive \"OneDrive\") * [Outlook](/wiki/Microsoft_Outlook \"Microsoft Outlook\") * [GitHub](/wiki/GitHub \"GitHub\") * [Microsoft Store](/wiki/Microsoft_Store_%28digital%29 \"Microsoft Store (digital)\") * [Windows Update](/wiki/Windows_Update \"Windows Update\") * [Xbox Game Pass](/wiki/Xbox_Game_Pass \"Xbox Game Pass\") * [Xbox network](/wiki/Xbox_network \"Xbox network\") |\n",
"| Revenue | Increase [US$](/wiki/United_States_dollar \"United States dollar\")211.9 billion (2023) |\n",
"| [Operating income](/wiki/Earnings_before_interest_and_taxes \"Earnings before interest and taxes\") | Increase US$88.5 billion (2023) |\n",
"| [Net income](/wiki/Net_income \"Net income\") | Increase US$73.4 billion (2023) |\n",
"| [Total assets](/wiki/Asset \"Asset\") | Increase US$411.9 billion (2023) |\n",
"| [Total equity](/wiki/Equity_(finance) \"Equity \n",
"| [Total equity](/wiki/Equity_%28finance%29 \"Equity (finance)\") | Increase US$206.2 billion (2023) |\n",
"| Number of employees | 221,000 (2023) |\n",
"| [Divisions](/wiki/Division_%28business%29 \"Division (business)\") | * [Microsoft Engineering Groups](/wiki/Microsoft_engineering_groups \"Microsoft engineering groups\") * [Microsoft Digital Crimes Unit](/wiki/Microsoft_Digital_Crimes_Unit \"Microsoft Digital Crimes Unit\") * [Microsoft Press](/wiki/Microsoft_Press \"Microsoft Press\") * [Microsoft Gaming](/wiki/Microsoft_Gaming \"Microsoft Gaming\") * Microsoft AI |\n",
"| [Subsidiaries](/wiki/Subsidiary \"Subsidiary\") | * [Microsoft Japan](/wiki/Microsoft_Japan \"Microsoft Japan\") * [Microsoft India](/wiki/Microsoft_India \"Microsoft India\") * [Microsoft Egypt](/wiki/Microsoft_Egypt \"Microsoft Egypt\") * [GitHub](/wiki/GitHub \"GitHub\") * [LinkedIn](/wiki/LinkedIn \"LinkedIn\") * [Metaswitch](/wiki/Metaswitch \"Metaswitch\") * [Nuance Communications](/wiki/Nuance_Communications \"Nuance Communications\") * [RiskIQ](/wiki/RiskIQ \"RiskIQ\") * [Skype Technologies](/wiki/Skype_Technologies \"Skype Technologies\") * [Xamarin](/wiki/Xamarin \"Xamarin\") * [Xandr](/wiki/Xandr \"Xandr\") |\n",
"| | |\n",
"| [ASN](/wiki/Autonomous_System_Number \"Autonomous System Number\") | * [8075](https://bgp.tools/as/8075) |\n",
"| | |\n",
"| Website | [microsoft.com](https://www.microsoft.com/) |\n",
"| **Footnotes / references**Financials as of June 30, 2023[[update]](https://en.wikipedia.org/w/index.php?title=Microsoft&action=edit)[[1]](#cite_note-1) | |\n",
"\n",
"| | [Bill Gates in 2023](/wiki/File%3ABill_Gates_2017_%28cropped%29.jpg) | This article is part of a series about [Bill Gates](/wiki/Bill_Gates \"Bill Gates\") | | --- | --- | |\n",
"| --- | --- | --- |\n",
"| * [Awards and honors](/wiki/Bill_Gates#Recognition \"Bill Gates\") * [Philanthropy](/wiki/Bill_Gates#Philanthropy \"Bill Gates\") * [Political positions](/wiki/Bill_Gates#Political_positions \"Bill Gates\") * [Public image](/wiki/Bill_Gates#Public_image \"Bill Gates\") * [Residence](/wiki/Bill_Gates%27s_house \"Bill Gates's house\") --- Companies* [Traf-O-Data](/wiki/Traf-O-Data \"Traf-O-Data\") * Microsoft ([criticism](/wiki/Criticism_of_Microsoft \"Criticism of Microsoft\")) * [BEN](/wiki/Branded_Entertainment_Network \"Branded Entertainment Network\") * [Cascade Investment](/wiki/Cascade_Investment \"Cascade Investment\") * [TerraPower](/wiki/TerraPower \"TerraPower\") * [Gates Ventures](/wiki/Gates_Ventures \"Gates Ventures\") --- Charitable organizations* [Bill & Melinda Gates Foundation](/wiki/Bill_%26_Melinda_Gates_Foundation \"Bill & Melinda Gates Foundation\") * [Match for Africa](/wiki/Match_for_Africa \"Match for Africa\") * [The Giving Pledge](/wiki/The_Giving_Pledge \"The Giving Pledge\") * [OER Project](/wiki/OER_Project \"OER Project\") * [Breakthrough Energy](/wiki/Breakthrough_Energy \"Breakthrough Energy\") * [Mission Innovation](/wiki/Mission_Innovation \"Mission Innovation\") --- Writings* \"[An Open Letter to Hobbyists](/wiki/An_Open_Letter_to_Hobbyists \"An Open Letter to Hobbyists\")\" * *[The Road Ahead](/wiki/The_Road_Ahead_%28Gates_book%29 \"The Road Ahead (Gates book)\")* * *[Business @ the Speed of Thought](/wiki/Business_%40_the_Speed_of_Thought \"Business @ the Speed of Thought\")* * *[How to Avoid a Climate Disaster](/wiki/How_to_Avoid_a_Climate_Disaster \"How to Avoid a Climate Disaster\")* * *[How to Prevent the Next Pandemic](/wiki/How_to_Prevent_the_Next_Pandemic \"How to Prevent the Next Pandemic\")* --- Related* [Bill Gates' flower fly](/wiki/Bill_Gates%27_flower_fly \"Bill Gates' flower fly\") * [Codex Leicester](/wiki/Codex_Leicester \"Codex Leicester\") * *[Lost on the Grand Banks](/wiki/Lost_on_the_Grand_Banks \"Lost on the Grand Banks\")* * [History of Microsoft](/wiki/History_of_Microsoft \"History of Microsoft\") * [Timeline of Microsoft](/wiki/Timeline_of_Microsoft \"Timeline of Microsoft\") * [Paul Allen](/wiki/Paul_Allen \"Paul Allen\") --- |\n",
"| * [v](/wiki/Template%3ABill_Gates_series \"Template:Bill Gates series\") * [t](/wiki/Template_talk%3ABill_Gates_series \"Template talk:Bill Gates series\") * [e](/wiki/Special%3AEditPage/Template%3ABill_Gates_series \"Special:EditPage/Template:Bill \n",
"\n",
"--------------------------------------------------------------------------------\n"
]
@ -445,7 +594,7 @@
],
"source": [
"task4 = \"\"\"Find Microsoft's Wikipedia page.\"\"\"\n",
"user_proxy.initiate_chat(web_surfer, message=task4, clear_history=False)"
"user_proxy.initiate_chat(web_surfer, message=task4, clear_history=False);"
]
},
{
@ -469,98 +618,40 @@
"\u001b[33mweb_surfer\u001b[0m (to user_proxy):\n",
"\n",
"Address: https://en.wikipedia.org/wiki/Microsoft\n",
"Title: Microsoft - Wikipedia\n",
"Viewport position: Showing page 2 of 64.\n",
"Title: Microsoft\n",
"Viewport position: Showing page 2 of 34.\n",
"=======================\n",
"(finance)\") | Increase US$206.2 billion (2023) |\n",
"| Number of employees | 238,000 (2023) |\n",
"| [Divisions](/wiki/Division_(business) \"Division (business)\") | \n",
"* [Microsoft Engineering Groups](/wiki/Microsoft_engineering_groups \"Microsoft engineering groups\")\n",
"* [Microsoft Digital Crimes Unit](/wiki/Microsoft_Digital_Crimes_Unit \"Microsoft Digital Crimes Unit\")\n",
"* [Microsoft Press](/wiki/Microsoft_Press \"Microsoft Press\")\n",
"* [Microsoft Japan](/wiki/Microsoft_Japan \"Microsoft Japan\")\n",
"* [Microsoft Gaming](/wiki/Microsoft_Gaming \"Microsoft Gaming\")\n",
"Gates series\") |\n",
"\n",
" |\n",
"| [Subsidiaries](/wiki/Subsidiary \"Subsidiary\") | \n",
"* [GitHub](/wiki/GitHub \"GitHub\")\n",
"* [LinkedIn](/wiki/LinkedIn \"LinkedIn\")\n",
"* [Metaswitch](/wiki/Metaswitch \"Metaswitch\")\n",
"* [Nuance Communications](/wiki/Nuance_Communications \"Nuance Communications\")\n",
"* [RiskIQ](/wiki/RiskIQ \"RiskIQ\")\n",
"* [Skype Technologies](/wiki/Skype_Technologies \"Skype Technologies\")\n",
"* [OpenAI](/wiki/OpenAI \"OpenAI\") (49%)[[1]](#cite_note-1)\n",
"* [Xamarin](/wiki/Xamarin \"Xamarin\")\n",
"* [Xandr](/wiki/Xandr \"Xandr\")\n",
"**Microsoft Corporation** is an American [multinational corporation](/wiki/Multinational_corporation \"Multinational corporation\") and [technology company](/wiki/Technology_company \"Technology company\") headquartered in [Redmond, Washington](/wiki/Redmond%2C_Washington \"Redmond, Washington\").[[2]](#cite_note-2) Microsoft's best-known [software products](/wiki/List_of_Microsoft_software \"List of Microsoft software\") are the [Windows](/wiki/Microsoft_Windows \"Microsoft Windows\") line of [operating systems](/wiki/List_of_Microsoft_operating_systems \"List of Microsoft operating systems\"), the [Microsoft 365](/wiki/Microsoft_365 \"Microsoft 365\") suite of productivity applications, and the [Edge](/wiki/Microsoft_Edge \"Microsoft Edge\") web browser. Its flagship [hardware products](/wiki/List_of_Microsoft_hardware \"List of Microsoft hardware\") are the [Xbox](/wiki/Xbox \"Xbox\") video game consoles and the [Microsoft Surface](/wiki/Microsoft_Surface \"Microsoft Surface\") lineup of [touchscreen](/wiki/Touchscreen \"Touchscreen\") personal computers. Microsoft ranked No. 14 in the 2022 [Fortune 500](/wiki/Fortune_500 \"Fortune 500\") rankings of the largest United States corporations by total revenue;[[3]](#cite_note-3) and it was the world's [largest software maker](/wiki/List_of_the_largest_software_companies \"List of the largest software companies\") by revenue in 2022 according to [Forbes Global 2000](/wiki/Forbes_Global_2000 \"Forbes Global 2000\"). It is considered one of the [Big Five](/wiki/Big_Tech \"Big Tech\") American [information technology](/wiki/Information_technology \"Information technology\") companies, alongside [Alphabet](/wiki/Alphabet_Inc. \"Alphabet Inc.\") (parent company of [Google](/wiki/Google \"Google\")), [Amazon](/wiki/Amazon_%28company%29 \"Amazon (company)\"), [Apple](/wiki/Apple_Inc. \"Apple Inc.\"), and [Meta](/wiki/Meta_Platforms \"Meta Platforms\") (parent company of [Facebook](/wiki/Facebook \"Facebook\")).\n",
"\n",
" |\n",
"| |\n",
"| [ASN](/wiki/Autonomous_System_Number \"Autonomous System Number\") | * [8075](https://bgp.tools/as/8075)\n",
" |\n",
"| |\n",
"| Website | [microsoft.com](https://www.microsoft.com/) |\n",
"| **Footnotes / references**Financials as of June 30, 2023[[update]](https://en.wikipedia.org/w/index.php?title=Microsoft&action=edit)[[2]](#cite_note-2) |\n",
"Microsoft was founded by [Bill Gates](/wiki/Bill_Gates \"Bill Gates\") and [Paul Allen](/wiki/Paul_Allen \"Paul Allen\") on April 4, 1975, to develop and sell [BASIC interpreters](/wiki/BASIC_interpreter \"BASIC interpreter\") for the [Altair 8800](/wiki/Altair_8800 \"Altair 8800\"). It rose to dominate the personal computer operating system market with [MS-DOS](/wiki/MS-DOS \"MS-DOS\") in the mid-1980s, followed by Windows. The company's 1986 [initial public offering](/wiki/Initial_public_offering \"Initial public offering\") (IPO) and subsequent rise in its share price created three billionaires and an estimated 12,000 millionaires among Microsoft employees. Since the 1990s, it has increasingly diversified from the operating system market and has made several [corporate acquisitions](/wiki/List_of_mergers_and_acquisitions_by_Microsoft \"List of mergers and acquisitions by Microsoft\"), the largest being the [acquisition](/wiki/Acquisition_of_Activision_Blizzard_by_Microsoft \"Acquisition of Activision Blizzard by Microsoft\") of [Activision Blizzard](/wiki/Activision_Blizzard \"Activision Blizzard\") for $68.7 billion in October 2023,[[4]](#cite_note-4) followed by its acquisition of [LinkedIn](/wiki/LinkedIn \"LinkedIn\") for $26.2 billion in December 2016,[[5]](#cite_note-5) and its acquisition of [Skype Technologies](/wiki/Skype_Technologies \"Skype Technologies\") for $8.5 billion in May 2011.[[6]](#cite_note-6)\n",
"\n",
"| | | |\n",
"| --- | --- | --- |\n",
"| \n",
"As of 2015[[update]](https://en.wikipedia.org/w/index.php?title=Microsoft&action=edit), Microsoft is market-dominant in the [IBM PC compatible](/wiki/IBM_PC_compatible \"IBM PC compatible\") operating system market and the office software suite market, although it has lost the majority of the overall operating system market to [Android](/wiki/Android_%28operating_system%29 \"Android (operating system)\").[[7]](#cite_note-7) The company also produces a wide range of other consumer and enterprise software for desktops, laptops, tabs, gadgets, and servers, including [Internet search](/wiki/Web_search_engine \"Web search engine\") (with [Bing](/wiki/Microsoft_Bing \"Microsoft Bing\")), the digital services market (through [MSN](/wiki/MSN \"MSN\")), [mixed reality](/wiki/Mixed_reality \"Mixed reality\") ([HoloLens](/wiki/Microsoft_HoloLens \"Microsoft HoloLens\")), cloud computing ([Azure](/wiki/Microsoft_Azure \"Microsoft Azure\")), and software development ([Visual Studio](/wiki/Microsoft_Visual_Studio \"Microsoft Visual Studio\")).\n",
"\n",
"| | |\n",
"| --- | --- |\n",
"| [Bill Gates in 2023](/wiki/File:Bill_Gates_2017_(cropped).jpg) | This article is part of a series about\n",
"[Bill Gates](/wiki/Bill_Gates \"Bill Gates\") |\n",
"[Steve Ballmer](/wiki/Steve_Ballmer \"Steve Ballmer\") replaced Gates as CEO in 2000 and later envisioned a \"devices and services\" strategy.[[8]](#cite_note-8) This unfolded with Microsoft acquiring [Danger Inc.](/wiki/Danger_Inc. \"Danger Inc.\") in 2008,[[9]](#cite_note-9) entering the personal computer production market for the first time in June 2012 with the launch of the Microsoft Surface line of [tablet computers](/wiki/Tablet_computer \"Tablet computer\"), and later forming [Microsoft Mobile](/wiki/Microsoft_Mobile \"Microsoft Mobile\") through the acquisition of [Nokia](/wiki/Nokia \"Nokia\")'s devices and services division. Since [Satya Nadella](/wiki/Satya_Nadella \"Satya Nadella\") took over as CEO in 2014, the company has scaled back on hardware and instead focused on [cloud computing](/wiki/Cloud_computing \"Cloud computing\"), a move that helped the company's [shares](/wiki/Share_%28finance%29 \"Share (finance)\") reach their highest value since December 1999.[[10]](#cite_note-10)[[11]](#cite_note-11) Under Nadella's direction, the company has also heavily expanded its gaming business to support the Xbox brand, establishing the [Microsoft Gaming](/wiki/Microsoft_Gaming \"Microsoft Gaming\") division in 2022, dedicated to operating Xbox in addition to its three subsidiaries ([publishers](/wiki/Video_game_publisher \"Video game publisher\")). Microsoft Gaming is the third-largest gaming company in the world by revenue as of 2023.[[12]](#cite_note-12)\n",
"\n",
" |\n",
"| * [Awards and honors](/wiki/Bill_Gates#Recognition \"Bill Gates\")\n",
"* [Philanthropy](/wiki/Bill_Gates#Philanthropy \"Bill Gates\")\n",
"* [Political positions](/wiki/Bill_Gates#Political_positions \"Bill Gates\")\n",
"* [Public image](/wiki/Bill_Gates#Public_image \"Bill Gates\")\n",
"* [Residence](/wiki/Bill_Gates%27s_house \"Bill Gates's house\")\n",
"Earlier dethroned by Apple in 2010, and in 2018, Microsoft reclaimed[*[when?](/wiki/Wikipedia%3AManual_of_Style/Dates_and_numbers#Chronological_items \"Wikipedia:Manual of Style/Dates and numbers\")*] its position as the most valuable publicly traded company in the world.[[13]](#cite_note-13) In April 2019, Microsoft reached a trillion-dollar [market cap](/wiki/Market_capitalization \"Market capitalization\"), becoming the third U.S. public company to be [valued at over $1 trillion](/wiki/Trillion-dollar_company \"Trillion-dollar company\") after Apple and Amazon, respectively. As of 2023[[update]](https://en.wikipedia.org/w/index.php?title=Microsoft&action=edit), Microsoft has the [third-highest](/wiki/List_of_most_valuable_brands \"List of most valuable brands\") global [brand valuation](/wiki/Brand_valuation \"Brand valuation\").\n",
"\n",
"---\n",
"Microsoft [has been criticized](/wiki/Criticism_of_Microsoft \"Criticism of Microsoft\") for its monopolistic practices and the company's software has been criticized for problems with [ease of use](/wiki/Ease_of_use \"Ease of use\"), [robustness](/wiki/Robustness_%28computer_science%29 \"Robustness (computer science)\"), and [security](/wiki/Computer_security \"Computer security\").\n",
"\n",
"Companies* [Traf-O-Data](/wiki/Traf-O-Data \"Traf-O-Data\")\n",
"* Microsoft ([criticism](/wiki/Criticism_of_Microsoft \"Criticism of Microsoft\"))\n",
"* [BEN](/wiki/Branded_Entertainment_Network \"Branded Entertainment Network\")\n",
"* [Cascade Investment](/wiki/Cascade_Investment \"Cascade Investment\")\n",
"* [TerraPower](/wiki/TerraPower \"TerraPower\")\n",
"* [Gates Ventures](/wiki/Gates_Ventures \"Gates Ventures\")\n",
"## History\n",
"\n",
"---\n",
"Main article: [History of Microsoft](/wiki/History_of_Microsoft \"History of Microsoft\")\n",
"For a chronological guide, see [Timeline of Microsoft](/wiki/Timeline_of_Microsoft \"Timeline of Microsoft\").\n",
"\n",
"Charitable organizations* [Bill & Melinda Gates Foundation](/wiki/Bill_%26_Melinda_Gates_Foundation \"Bill & Melinda Gates Foundation\")\n",
"* [Match for Africa](/wiki/Match_for_Africa \"Match for Africa\")\n",
"* [The Giving Pledge](/wiki/The_Giving_Pledge \"The Giving Pledge\")\n",
"* [OER Project](/wiki/OER_Project \"OER Project\")\n",
"* [Breakthrough Energy](/wiki/Breakthrough_Energy \"Breakthrough Energy\")\n",
"* [Mission Innovation](/wiki/Mission_Innovation \"Mission Innovation\")\n",
"### 19721985: Founding\n",
"\n",
"---\n",
"[![](//upload.wikimedia.org/wikipedia/commons/thumb/d/d7/Altair_8800_and_Model_33_ASR_Teletype_.jpg/256px-Altair_8800_and_Model_33_ASR_Teletype_.jpg)](/wiki/File%3AAltair_8800_and_Model_33_ASR_Teletype_.jpg)\n",
"\n",
"Writings* \"[An Open Letter to Hobbyists](/wiki/An_Open_Letter_to_Hobbyists \"An Open Letter to Hobbyists\")\"\n",
"* *[The Road Ahead](/wiki/The_Road_Ahead_(Gates_book) \"The Road Ahead (Gates book)\")*\n",
"* *[Business @ the Speed of Thought](/wiki/Business_@_the_Speed_of_Thought \"Business @ the Speed of Thought\")*\n",
"* *[How to Avoid a Climate Disaster](/wiki/How_to_Avoid_a_Climate_Disaster \"How to Avoid a Climate Disaster\")*\n",
"* *[How to Prevent the Next Pandemic](/wiki/How_to_Prevent_the_Next_Pandemic \"How to Prevent the Next Pandemic\")*\n",
"An Altair 8800 computer (left) with the popular Model 33 ASR Teletype as terminal, paper tape reader, and paper tape punch\n",
"\n",
"---\n",
"[![](//upload.wikimedia.org/wikipedia/en/thumb/4/4f/1981BillPaul.jpg/220px-1981BillPaul.jpg)](/wiki/File%3A1981BillPaul.jpg)\n",
"\n",
"Related* [Bill Gates' flower fly](/wiki/Bill_Gates%27_flower_fly \"Bill Gates' flower fly\")\n",
"* [Codex Leicester](/wiki/Codex_Leicester \"Codex Leicester\")\n",
"* *[Lost on the Grand Banks](/wiki/Lost_on_the_Grand_Banks \"Lost on the Grand Banks\")*\n",
"* [History of Microsoft](/wiki/History_of_Microsoft \"History of Microsoft\")\n",
"* [Timeline of Microsoft](/wiki/Timeline_of_Microsoft \"Timeline of Microsoft\")\n",
"* [Paul Allen](/wiki/Paul_Allen \"Paul Allen\")\n",
"[Paul Allen](/wiki/Paul_Allen \"Paul Allen\") and [Bill Gates](/wiki/Bill_Gates \"Bill Gates\") on October 19, 1981, after signing a pivotal contract with [IBM](/wiki/IBM \"IBM\")[[14]](#cite_note-Allan_2001-14):228\n",
"\n",
"---\n",
"[![](//upload.wikimedia.org/wikipedia/commons/thumb/f/f1/Bill_Gates_and_Paul_Allen_Business_Cards.jpg/220px-Bill_Gates_and_Paul_Allen_Business_Cards.jpg)](/wiki/File%3ABill_Gates_and_Paul_Allen_Business_Cards.jpg)\n",
"\n",
" |\n",
"| * [v](/wiki/Template:Bill_Gates_series \"Template:Bill Gates series\")\n",
"* [t](/wiki/Template_talk:Bill_Gates_series \"Template talk:Bill Gates series\")\n",
"* [e](/wiki/Special:EditPage/Template:Bill_Gates_series \"Special:EditPage/Template:Bill Gates series\")\n",
" |\n",
"\n",
"**Microsoft Corporation** is an American multinational [technology corporation](/wiki/Technology_company \n",
"\n",
"--------------------------------------------------------------------------------\n"
]
@ -568,7 +659,7 @@
],
"source": [
"task5 = \"\"\"Scroll down.\"\"\"\n",
"user_proxy.initiate_chat(web_surfer, message=task5, clear_history=False)"
"user_proxy.initiate_chat(web_surfer, message=task5, clear_history=False);"
]
},
{
@ -582,24 +673,24 @@
"text": [
"\u001b[33muser_proxy\u001b[0m (to web_surfer):\n",
"\n",
"Where was the first office location, and when did they move to Redmond?\n",
"Read the page and answer: Where was the first office location, and when did they move to Redmond?\n",
"\n",
"--------------------------------------------------------------------------------\n",
"\u001b[31m\n",
">>>>>>>> USING AUTO REPLY...\u001b[0m\n",
"\u001b[35m\n",
">>>>>>>> EXECUTING FUNCTION answer_from_page...\u001b[0m\n",
">>>>>>>> EXECUTING FUNCTION read_page_and_answer...\u001b[0m\n",
"\u001b[33mweb_surfer\u001b[0m (to user_proxy):\n",
"\n",
"Microsoft's first office location was in Albuquerque, New Mexico, where it was founded on April 4, 1975. However, Microsoft later moved its headquarters to Redmond, Washington in January 1979. Since then, Redmond has been the main office location for Microsoft.\n",
"Microsoft Corporation, an American multinational technology company, was founded on April 4, 1975, in Albuquerque, New Mexico, by Bill Gates and Paul Allen. The company's first office location was in Albuquerque, but they later moved their headquarters to Redmond, Washington. The move to Redmond occurred in January 1979. Since then, Microsoft has become a major player in the technology industry, developing and selling software products such as the Windows operating system, Microsoft Office suite, and Xbox video game consoles. They have also expanded into cloud computing with Microsoft Azure and have made acquisitions such as Nokia's mobile unit and LinkedIn.\n",
"\n",
"--------------------------------------------------------------------------------\n"
]
}
],
"source": [
"task6 = \"\"\"Where was the first office location, and when did they move to Redmond?\"\"\"\n",
"user_proxy.initiate_chat(web_surfer, message=task6, clear_history=False)"
"task6 = \"\"\"Read the page and answer: Where was the first office location, and when did they move to Redmond?\"\"\"\n",
"user_proxy.initiate_chat(web_surfer, message=task6, clear_history=False);"
]
}
],

45
samples/browser_chat.py Normal file
View File

@ -0,0 +1,45 @@
import os
from autogen import UserProxyAgent, config_list_from_json
from autogen.agentchat.contrib.web_surfer import WebSurferAgent
from autogen.browser_utils import (
BingMarkdownSearch,
PlaywrightMarkdownBrowser,
RequestsMarkdownBrowser,
SeleniumMarkdownBrowser,
)
def main():
# Load LLM inference endpoints from an env variable or a file
# See https://microsoft.github.io/autogen/docs/FAQ#set-your-api-endpoints
# and OAI_CONFIG_LIST_sample.
# For example, if you have created a OAI_CONFIG_LIST file in the current working directory, that file will be used.
config_list = config_list_from_json(env_or_file="OAI_CONFIG_LIST")
browser = RequestsMarkdownBrowser(
# PlaywrightMarkdownBrowser(
viewport_size=1024 * 3,
downloads_folder=os.getcwd(),
search_engine=BingMarkdownSearch(bing_api_key=os.environ["BING_API_KEY"]),
# launch_args={"channel": "msedge", "headless": False},
)
web_surfer = WebSurferAgent(
"web_surfer",
llm_config={"config_list": config_list},
summarizer_llm_config={"config_list": config_list},
is_termination_msg=lambda x: x.get("content", "").rstrip().find("TERMINATE") >= 0,
code_execution_config=False,
browser=browser,
)
# Create the agent that represents the user in the conversation.
user_proxy = UserProxyAgent("user", code_execution_config=False)
# Let the assistant start the conversation. It will end when the user types exit.
web_surfer.initiate_chat(user_proxy, message="How can I help you today?")
if __name__ == "__main__":
main()

View File

@ -81,7 +81,20 @@ extra_require = {
"graph": ["networkx", "matplotlib"],
"gemini": ["google-generativeai>=0.5,<1", "google-cloud-aiplatform", "google-auth", "pillow", "pydantic"],
"together": ["together>=1.2"],
"websurfer": ["beautifulsoup4", "markdownify", "pdfminer.six", "pathvalidate"],
"websurfer": [
"beautifulsoup4",
"markdownify",
"pathvalidate",
# for mdconvert
"puremagic", # File identification
"binaryornot", # More file identification
"pdfminer.six", # Pdf
"mammoth", # Docx
"python-pptx", # Ppts
"pandas", # Xlsx
"openpyxl",
"youtube_transcript_api==0.6.0", # Transcription
],
"redis": ["redis"],
"cosmosdb": ["azure-cosmos>=4.2.0"],
"websockets": ["websockets>=12.0,<13"],

13
test/agentchat/contrib/test_web_surfer.py Executable file → Normal file
View File

@ -81,16 +81,9 @@ def test_web_surfer() -> None:
response = function_map["page_down"]()
assert f"Viewport position: Showing page {total_pages} of {total_pages}." in response
# Test web search -- we don't have a key in this case, so we expect it to raise an error (but it means the code path is correct)
with pytest.raises(ValueError, match="Missing Bing API key."):
response = function_map["informational_web_search"](BING_QUERY)
with pytest.raises(ValueError, match="Missing Bing API key."):
response = function_map["navigational_web_search"](BING_QUERY)
# Test Q&A and summarization -- we don't have a key so we expect it to fail (but it means the code path is correct)
with pytest.raises(IndexError):
response = function_map["answer_from_page"]("When was it founded?")
response = function_map["read_page_and_answer"]("When was it founded?")
with pytest.raises(IndexError):
response = function_map["summarize_page"]()
@ -155,7 +148,7 @@ def test_web_surfer_bing() -> None:
"config_list": [
{
"model": "gpt-3.5-turbo-16k",
"api_key": "sk-PLACEHOLDER_KEY",
"api_key": MOCK_OPEN_AI_API_KEY,
}
]
},
@ -167,7 +160,7 @@ def test_web_surfer_bing() -> None:
# Test informational queries
response = function_map["informational_web_search"](BING_QUERY)
assert f"Address: bing: {BING_QUERY}" in response
assert f"Address: search: {BING_QUERY}" in response
assert f"Title: {BING_QUERY} - Search" in response
assert "Viewport position: Showing page 1 of 1." in response
assert f"A Bing search for '{BING_QUERY}' found " in response

View File

@ -0,0 +1,49 @@
#!/usr/bin/env python3 -m pytest
import os
import pytest
try:
from autogen.browser_utils import BingMarkdownSearch
except ImportError:
skip_all = True
else:
skip_all = False
bing_api_key = None
if "BING_API_KEY" in os.environ:
bing_api_key = os.environ["BING_API_KEY"]
del os.environ["BING_API_KEY"]
skip_api = bing_api_key is None
BING_QUERY = "Microsoft wikipedia"
BING_STRING = f"A Bing search for '{BING_QUERY}' found"
BING_EXPECTED_RESULT = "https://en.wikipedia.org/wiki/Microsoft"
@pytest.mark.skipif(
skip_all,
reason="do not run if dependency is not installed",
)
def test_bing_markdown_search():
search_engine = BingMarkdownSearch()
results = search_engine.search(BING_QUERY)
assert BING_STRING in results
assert BING_EXPECTED_RESULT in results
@pytest.mark.skipif(
skip_api,
reason="skipping tests that require a Bing API key",
)
def test_bing_markdown_search_api():
search_engine = BingMarkdownSearch(bing_api_key=bing_api_key)
results = search_engine.search(BING_QUERY)
assert BING_STRING in results
assert BING_EXPECTED_RESULT in results
if __name__ == "__main__":
"""Runs this file's tests from the command line."""
test_bing_markdown_search()
test_bing_markdown_search_api()

Binary file not shown.

View File

@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:9390b34525fd044df69265e022a06346abb6d203b14cbc9b2473c080c680e82e
size 474288

Binary file not shown.

Binary file not shown.

File diff suppressed because one or more lines are too long

File diff suppressed because one or more lines are too long

File diff suppressed because one or more lines are too long

View File

@ -0,0 +1,175 @@
#!/usr/bin/env python3 -m pytest
import io
import os
import shutil
import pytest
import requests
try:
from autogen.browser_utils import FileConversionException, MarkdownConverter, UnsupportedFormatException
except ImportError:
skip_all = True
else:
skip_all = False
skip_exiftool = shutil.which("exiftool") is None
TEST_FILES_DIR = os.path.join(os.path.dirname(__file__), "test_files")
JPG_TEST_EXIFTOOL = {
"Author": "AutoGen Authors",
"Title": "AutoGen: Enabling Next-Gen LLM Applications via Multi-Agent Conversation",
"Description": "AutoGen enables diverse LLM-based applications",
"ImageSize": "1615x1967",
"DateTimeOriginal": "2024:03:14 22:10:00",
}
PDF_TEST_URL = "https://arxiv.org/pdf/2308.08155v2.pdf"
PDF_TEST_STRINGS = ["While there is contemporaneous exploration of multi-agent approaches"]
YOUTUBE_TEST_URL = "https://www.youtube.com/watch?v=V2qZ_lgxTzg"
YOUTUBE_TEST_STRINGS = [
"## AutoGen FULL Tutorial with Python (Step-By-Step)",
"This is an intermediate tutorial for installing and using AutoGen locally",
"PT15M4S",
"the model we're going to be using today is GPT 3.5 turbo", # From the transcript
]
XLSX_TEST_STRINGS = [
"## 09060124-b5e7-4717-9d07-3c046eb",
"6ff4173b-42a5-4784-9b19-f49caff4d93d",
"affc7dad-52dc-4b98-9b5d-51e65d8a8ad0",
]
DOCX_TEST_STRINGS = [
"314b0a30-5b04-470b-b9f7-eed2c2bec74a",
"49e168b7-d2ae-407f-a055-2167576f39a1",
"## d666f1f7-46cb-42bd-9a39-9a39cf2a509f",
"# Abstract",
"# Introduction",
"AutoGen: Enabling Next-Gen LLM Applications via Multi-Agent Conversation",
]
PPTX_TEST_STRINGS = [
"2cdda5c8-e50e-4db4-b5f0-9722a649f455",
"04191ea8-5c73-4215-a1d3-1cfb43aaaf12",
"44bf7d06-5e7a-4a40-a2e1-a2e42ef28c8a",
"1b92870d-e3b5-4e65-8153-919f4ff45592",
"AutoGen: Enabling Next-Gen LLM Applications via Multi-Agent Conversation",
]
BLOG_TEST_URL = "https://microsoft.github.io/autogen/blog/2023/04/21/LLM-tuning-math"
BLOG_TEST_STRINGS = [
"Large language models (LLMs) are powerful tools that can generate natural language texts for various applications, such as chatbots, summarization, translation, and more. GPT-4 is currently the state of the art LLM in the world. Is model selection irrelevant? What about inference parameters?",
"an example where high cost can easily prevent a generic complex",
]
WIKIPEDIA_TEST_URL = "https://en.wikipedia.org/wiki/Microsoft"
WIKIPEDIA_TEST_STRINGS = [
"Microsoft entered the operating system (OS) business in 1980 with its own version of [Unix]",
'Microsoft was founded by [Bill Gates](/wiki/Bill_Gates "Bill Gates")',
]
WIKIPEDIA_TEST_EXCLUDES = [
"You are encouraged to create an account and log in",
"154 languages",
"move to sidebar",
]
SERP_TEST_URL = "https://www.bing.com/search?q=microsoft+wikipedia"
SERP_TEST_STRINGS = [
"](https://en.wikipedia.org/wiki/Microsoft",
"Microsoft Corporation is **an American multinational corporation and technology company headquartered** in Redmond",
"19952007: Foray into the Web, Windows 95, Windows XP, and Xbox",
]
SERP_TEST_EXCLUDES = [
"https://www.bing.com/ck/a?!&&p=",
"data:image/svg+xml,%3Csvg%20width%3D",
]
@pytest.mark.skipif(
skip_all,
reason="do not run if dependency is not installed",
)
def test_mdconvert_remote():
mdconvert = MarkdownConverter()
# By URL
result = mdconvert.convert(PDF_TEST_URL)
for test_string in PDF_TEST_STRINGS:
assert test_string in result.text_content
# By stream
response = requests.get(PDF_TEST_URL)
result = mdconvert.convert_stream(io.BytesIO(response.content), file_extension=".pdf", url=PDF_TEST_URL)
for test_string in PDF_TEST_STRINGS:
assert test_string in result.text_content
# # Youtube
# result = mdconvert.convert(YOUTUBE_TEST_URL)
# for test_string in YOUTUBE_TEST_STRINGS:
# assert test_string in result.text_content
@pytest.mark.skipif(
skip_all,
reason="do not run if dependency is not installed",
)
def test_mdconvert_local():
mdconvert = MarkdownConverter()
# Test XLSX processing
result = mdconvert.convert(os.path.join(TEST_FILES_DIR, "test.xlsx"))
for test_string in XLSX_TEST_STRINGS:
assert test_string in result.text_content.replace(r"\-", "-")
# Test DOCX processing
result = mdconvert.convert(os.path.join(TEST_FILES_DIR, "test.docx"))
for test_string in DOCX_TEST_STRINGS:
assert test_string in result.text_content.replace(r"\-", "-")
# Test PPTX processing
result = mdconvert.convert(os.path.join(TEST_FILES_DIR, "test.pptx"))
for test_string in PPTX_TEST_STRINGS:
assert test_string in result.text_content.replace(r"\-", "-")
# Test HTML processing
result = mdconvert.convert(os.path.join(TEST_FILES_DIR, "test_blog.html"), url=BLOG_TEST_URL)
for test_string in BLOG_TEST_STRINGS:
assert test_string in result.text_content.replace(r"\-", "-")
# Test Wikipedia processing
result = mdconvert.convert(os.path.join(TEST_FILES_DIR, "test_wikipedia.html"), url=WIKIPEDIA_TEST_URL)
for test_string in WIKIPEDIA_TEST_EXCLUDES:
assert test_string not in result.text_content.replace(r"\-", "-")
for test_string in WIKIPEDIA_TEST_STRINGS:
assert test_string in result.text_content.replace(r"\-", "-")
# Test Bing processing
result = mdconvert.convert(os.path.join(TEST_FILES_DIR, "test_serp.html"), url=SERP_TEST_URL)
for test_string in SERP_TEST_EXCLUDES:
assert test_string not in result.text_content.replace(r"\-", "-")
for test_string in SERP_TEST_STRINGS:
assert test_string in result.text_content.replace(r"\-", "-")
@pytest.mark.skipif(
skip_exiftool,
reason="do not run if exiftool is not installed",
)
def test_mdconvert_exiftool():
mdconvert = MarkdownConverter()
# Test JPG metadata processing
result = mdconvert.convert(os.path.join(TEST_FILES_DIR, "test.jpg"))
for key in JPG_TEST_EXIFTOOL:
target = f"{key}: {JPG_TEST_EXIFTOOL[key]}"
assert target in result.text_content
if __name__ == "__main__":
"""Runs this file's tests from the command line."""
test_mdconvert_remote()
test_mdconvert_local()
test_mdconvert_exiftool()

View File

@ -0,0 +1,226 @@
#!/usr/bin/env python3 -m pytest
import hashlib
import math
import os
import pathlib
import re
import sys
import pytest
import requests
BLOG_POST_URL = "https://microsoft.github.io/autogen/blog/2023/04/21/LLM-tuning-math"
BLOG_POST_TITLE = "Does Model and Inference Parameter Matter in LLM Applications? - A Case Study for MATH | AutoGen"
BLOG_POST_STRING = "powerful tools that can generate natural language texts for various applications"
BLOG_POST_FIND_ON_PAGE_QUERY = "an example where high * complex"
BLOG_POST_FIND_ON_PAGE_MATCH = "an example where high cost can easily prevent a generic complex"
WIKIPEDIA_URL = "https://en.wikipedia.org/wiki/Microsoft"
WIKIPEDIA_TITLE = "Microsoft"
WIKIPEDIA_STRING = "Redmond"
PLAIN_TEXT_URL = "https://raw.githubusercontent.com/microsoft/autogen/main/README.md"
DOWNLOAD_URL = "https://arxiv.org/src/2308.08155"
PDF_URL = "https://arxiv.org/pdf/2308.08155.pdf"
PDF_STRING = "Figure 1: AutoGen enables diverse LLM-based applications using multi-agent conversations."
DIR_TEST_STRINGS = [
"# Index of ",
"[.. (parent directory)]",
"/test/browser_utils/test_requests_markdown_browser.py",
]
LOCAL_FILE_TEST_STRINGS = [
BLOG_POST_STRING,
BLOG_POST_FIND_ON_PAGE_MATCH,
]
try:
from autogen.browser_utils import BingMarkdownSearch, RequestsMarkdownBrowser
except ImportError:
skip_all = True
else:
skip_all = False
def _rm_folder(path):
"""Remove all the regular files in a folder, then deletes the folder. Assumes a flat file structure, with no subdirectories."""
for fname in os.listdir(path):
fpath = os.path.join(path, fname)
if os.path.isfile(fpath):
os.unlink(fpath)
os.rmdir(path)
@pytest.mark.skipif(
skip_all,
reason="do not run if dependency is not installed",
)
def test_requests_markdown_browser():
# Create a downloads folder (removing any leftover ones from prior tests)
downloads_folder = os.path.join(os.getcwd(), "downloads")
if os.path.isdir(downloads_folder):
_rm_folder(downloads_folder)
os.mkdir(downloads_folder)
# Instantiate the browser
viewport_size = 1024
browser = RequestsMarkdownBrowser(
viewport_size=viewport_size,
downloads_folder=downloads_folder,
search_engine=BingMarkdownSearch(),
)
# Test that we can visit a page and find what we expect there
top_viewport = browser.visit_page(BLOG_POST_URL)
assert browser.viewport == top_viewport
assert browser.page_title.strip() == BLOG_POST_TITLE.strip()
assert BLOG_POST_STRING in browser.page_content
# Check if page splitting works
approx_pages = math.ceil(len(browser.page_content) / viewport_size) # May be fewer, since it aligns to word breaks
assert len(browser.viewport_pages) <= approx_pages
assert abs(len(browser.viewport_pages) - approx_pages) <= 1 # allow only a small deviation
assert browser.viewport_pages[0][0] == 0
assert browser.viewport_pages[-1][1] == len(browser.page_content)
# Make sure we can reconstruct the full contents from the split pages
buffer = ""
for bounds in browser.viewport_pages:
buffer += browser.page_content[bounds[0] : bounds[1]]
assert buffer == browser.page_content
# Test scrolling (scroll all the way to the bottom)
for i in range(1, len(browser.viewport_pages)):
browser.page_down()
assert browser.viewport_current_page == i
# Test scrolloing beyond the limits
for i in range(0, 5):
browser.page_down()
assert browser.viewport_current_page == len(browser.viewport_pages) - 1
# Test scrolling (scroll all the way to the bottom)
for i in range(len(browser.viewport_pages) - 2, 0, -1):
browser.page_up()
assert browser.viewport_current_page == i
# Test scrolloing beyond the limits
for i in range(0, 5):
browser.page_up()
assert browser.viewport_current_page == 0
# Test Wikipedia handling
assert WIKIPEDIA_STRING in browser.visit_page(WIKIPEDIA_URL)
assert WIKIPEDIA_TITLE.strip() == browser.page_title.strip()
# Visit a plain-text file
# response = requests.get(PLAIN_TEXT_URL)
# response.raise_for_status()
# expected_results = re.sub(r"\s+", " ", response.text, re.DOTALL).strip()
# browser.visit_page(PLAIN_TEXT_URL)
# assert re.sub(r"\s+", " ", browser.page_content, re.DOTALL).strip() == expected_results
# Disrectly download a ZIP file and compute its md5
response = requests.get(DOWNLOAD_URL, stream=True)
response.raise_for_status()
expected_md5 = hashlib.md5(response.raw.read()).hexdigest()
# Download it with the browser and check for a match
viewport = browser.visit_page(DOWNLOAD_URL)
m = re.search(r"Saved file to '(.*?)'", viewport)
download_loc = m.group(1)
with open(download_loc, "rb") as fh:
downloaded_md5 = hashlib.md5(fh.read()).hexdigest()
# MD%s should match
assert expected_md5 == downloaded_md5
# Fetch a PDF
viewport = browser.visit_page(PDF_URL)
assert PDF_STRING in viewport
# Test find in page
browser.visit_page(BLOG_POST_URL)
find_viewport = browser.find_on_page(BLOG_POST_FIND_ON_PAGE_QUERY)
assert BLOG_POST_FIND_ON_PAGE_MATCH in find_viewport
assert find_viewport is not None
loc = browser.viewport_current_page
find_viewport = browser.find_on_page("LLM app*")
assert find_viewport is not None
# Find next using the same query
for i in range(0, 10):
find_viewport = browser.find_on_page("LLM app*")
assert find_viewport is not None
new_loc = browser.viewport_current_page
assert new_loc != loc
loc = new_loc
# Find next using find_next
for i in range(0, 10):
find_viewport = browser.find_next()
assert find_viewport is not None
new_loc = browser.viewport_current_page
assert new_loc != loc
loc = new_loc
# Bounce around
browser.viewport_current_page = 0
find_viewport = browser.find_on_page("For Further Reading")
assert find_viewport is not None
loc = browser.viewport_current_page
browser.page_up()
assert browser.viewport_current_page != loc
find_viewport = browser.find_on_page("For Further Reading")
assert find_viewport is not None
assert loc == browser.viewport_current_page
# Find something that doesn't exist
find_viewport = browser.find_on_page("7c748f9a-8dce-461f-a092-4e8d29913f2d")
assert find_viewport is None
assert loc == browser.viewport_current_page # We didn't move
# Clean up
_rm_folder(downloads_folder)
@pytest.mark.skipif(
skip_all,
reason="do not run if dependency is not installed",
)
def test_local_file_browsing():
directory = os.path.dirname(__file__)
test_file = os.path.join(directory, "test_files", "test_blog.html")
browser = RequestsMarkdownBrowser()
# Directory listing via open_local_file
viewport = browser.open_local_file(directory)
for target_string in DIR_TEST_STRINGS:
assert target_string in viewport
# Directory listing via file URI
viewport = browser.visit_page(pathlib.Path(os.path.abspath(directory)).as_uri())
for target_string in DIR_TEST_STRINGS:
assert target_string in viewport
# File access via file open_local_file
browser.open_local_file(test_file)
for target_string in LOCAL_FILE_TEST_STRINGS:
assert target_string in browser.page_content
# File access via file URI
browser.visit_page(pathlib.Path(os.path.abspath(test_file)).as_uri())
for target_string in LOCAL_FILE_TEST_STRINGS:
assert target_string in browser.page_content
if __name__ == "__main__":
"""Runs this file's tests from the command line."""
test_requests_markdown_browser()
test_local_file_browsing()

View File

@ -1,10 +1,9 @@
#!/usr/bin/env python3 -m pytest
import hashlib
import math
import os
import re
import sys
import tempfile
import pytest
import requests
@ -57,96 +56,91 @@ def _rm_folder(path):
reason="do not run if dependency is not installed",
)
def test_simple_text_browser():
# Create a downloads folder (removing any leftover ones from prior tests)
downloads_folder = os.path.join(KEY_LOC, "downloads")
if os.path.isdir(downloads_folder):
_rm_folder(downloads_folder)
os.mkdir(downloads_folder)
# Create a temp downloads folder (removing any leftover ones from prior tests)
with tempfile.TemporaryDirectory() as downloads_folder:
# Instantiate the browser
user_agent = "python-requests/" + requests.__version__
viewport_size = 1024
browser = SimpleTextBrowser(
downloads_folder=downloads_folder,
viewport_size=viewport_size,
request_kwargs={
"headers": {"User-Agent": user_agent},
},
)
# Instantiate the browser
user_agent = "python-requests/" + requests.__version__
viewport_size = 1024
browser = SimpleTextBrowser(
downloads_folder=downloads_folder,
viewport_size=viewport_size,
request_kwargs={
"headers": {"User-Agent": user_agent},
},
)
# Test that we can visit a page and find what we expect there
top_viewport = browser.visit_page(BLOG_POST_URL)
assert browser.viewport == top_viewport
assert browser.page_title.strip() == BLOG_POST_TITLE.strip()
assert BLOG_POST_STRING in browser.page_content
# Test that we can visit a page and find what we expect there
top_viewport = browser.visit_page(BLOG_POST_URL)
assert browser.viewport == top_viewport
assert browser.page_title.strip() == BLOG_POST_TITLE.strip()
assert BLOG_POST_STRING in browser.page_content.replace("\n\n", " ").replace("\\", "")
# Check if page splitting works
approx_pages = math.ceil(
len(browser.page_content) / viewport_size
) # May be fewer, since it aligns to word breaks
assert len(browser.viewport_pages) <= approx_pages
assert abs(len(browser.viewport_pages) - approx_pages) <= 1 # allow only a small deviation
assert browser.viewport_pages[0][0] == 0
assert browser.viewport_pages[-1][1] == len(browser.page_content)
# Check if page splitting works
approx_pages = math.ceil(len(browser.page_content) / viewport_size) # May be fewer, since it aligns to word breaks
assert len(browser.viewport_pages) <= approx_pages
assert abs(len(browser.viewport_pages) - approx_pages) <= 1 # allow only a small deviation
assert browser.viewport_pages[0][0] == 0
assert browser.viewport_pages[-1][1] == len(browser.page_content)
# Make sure we can reconstruct the full contents from the split pages
buffer = ""
for bounds in browser.viewport_pages:
buffer += browser.page_content[bounds[0] : bounds[1]]
assert buffer == browser.page_content
# Make sure we can reconstruct the full contents from the split pages
buffer = ""
for bounds in browser.viewport_pages:
buffer += browser.page_content[bounds[0] : bounds[1]]
assert buffer == browser.page_content
# Test scrolling (scroll all the way to the bottom)
for i in range(1, len(browser.viewport_pages)):
browser.page_down()
assert browser.viewport_current_page == i
# Test scrolloing beyond the limits
for i in range(0, 5):
browser.page_down()
assert browser.viewport_current_page == len(browser.viewport_pages) - 1
# Test scrolling (scroll all the way to the bottom)
for i in range(1, len(browser.viewport_pages)):
browser.page_down()
assert browser.viewport_current_page == i
# Test scrolloing beyond the limits
for i in range(0, 5):
browser.page_down()
assert browser.viewport_current_page == len(browser.viewport_pages) - 1
# Test scrolling (scroll all the way to the bottom)
for i in range(len(browser.viewport_pages) - 2, 0, -1):
browser.page_up()
assert browser.viewport_current_page == i
# Test scrolloing beyond the limits
for i in range(0, 5):
browser.page_up()
assert browser.viewport_current_page == 0
# Test scrolling (scroll all the way to the bottom)
for i in range(len(browser.viewport_pages) - 2, 0, -1):
browser.page_up()
assert browser.viewport_current_page == i
# Test scrolloing beyond the limits
for i in range(0, 5):
browser.page_up()
assert browser.viewport_current_page == 0
# Test Wikipedia handling
assert WIKIPEDIA_STRING in browser.visit_page(WIKIPEDIA_URL)
assert WIKIPEDIA_TITLE.strip() == browser.page_title.strip()
# Test Wikipedia handling
assert WIKIPEDIA_STRING in browser.visit_page(WIKIPEDIA_URL)
assert WIKIPEDIA_TITLE.strip() == browser.page_title.strip()
# Visit a plain-text file
response = requests.get(PLAIN_TEXT_URL)
response.raise_for_status()
expected_results = response.text
# Visit a plain-text file
response = requests.get(PLAIN_TEXT_URL)
response.raise_for_status()
expected_results = response.text
browser.visit_page(PLAIN_TEXT_URL)
assert browser.page_content.strip() == expected_results.strip()
browser.visit_page(PLAIN_TEXT_URL)
assert browser.page_content.strip() == expected_results.strip()
# Directly download an image, and compute its md5
response = requests.get(IMAGE_URL, stream=True)
response.raise_for_status()
expected_md5 = hashlib.md5(response.raw.read()).hexdigest()
# Directly download an image, and compute its md5
response = requests.get(IMAGE_URL, stream=True)
response.raise_for_status()
expected_md5 = hashlib.md5(response.raw.read()).hexdigest()
# Visit an image causing it to be downloaded by the SimpleTextBrowser, then compute its md5
viewport = browser.visit_page(IMAGE_URL)
m = re.search(r"Downloaded '(.*?)' to '(.*?)'", viewport)
fetched_url = m.group(1)
download_loc = m.group(2)
assert fetched_url == IMAGE_URL
# Visit an image causing it to be downloaded by the SimpleTextBrowser, then compute its md5
viewport = browser.visit_page(IMAGE_URL)
m = re.search(r"Downloaded '(.*?)' to '(.*?)'", viewport)
fetched_url = m.group(1)
download_loc = m.group(2)
assert fetched_url == IMAGE_URL
with open(download_loc, "rb") as fh:
downloaded_md5 = hashlib.md5(fh.read()).hexdigest()
with open(download_loc, "rb") as fh:
downloaded_md5 = hashlib.md5(fh.read()).hexdigest()
# MD%s should match
assert expected_md5 == downloaded_md5
# MD%s should match
assert expected_md5 == downloaded_md5
# Fetch a PDF
viewport = browser.visit_page(PDF_URL)
assert PDF_STRING in viewport
# Clean up
_rm_folder(downloads_folder)
# Fetch a PDF
viewport = browser.visit_page(PDF_URL)
assert PDF_STRING in viewport
@pytest.mark.skipif(