improve max_valid_n and doc (#933)

* improve max_valid_n and doc * Update README.md Co-authored-by: Li Jiang <lijiang1@microsoft.com> * newline at end of file * doc --------- Co-authored-by: Li Jiang <lijiang1@microsoft.com> Co-authored-by: Susan Xueqing Liu <liususan091219@users.noreply.github.com> Co-authored-by: Qingyun Wu <qingyun.wu@psu.edu>
2023-03-05 08:40:57 -08:00 · 2023-03-05 08:40:57 -08:00 · 1ec77b58b4
parent 97928609ba
commit 1ec77b58b4
9 changed files with 1780 additions and 1752 deletions
--- a/README.md
+++ b/README.md
@ -14,20 +14,22 @@
    <br>
 </p>

-:fire: An [upcoming tutorial on FLAML](https://github.com/microsoft/FLAML/tree/tutorial-aaai23/tutorial) at [AAAI-23](https://aaai.org/Conferences/AAAI-23/aaai23tutorials/) (to be held on Feb 08, 2023)
+:fire: OpenAI GPT-3 models support in v1.1.3. ChatGPT support is coming.
+
+:fire: A [lab forum](https://github.com/microsoft/FLAML/tree/tutorial-aaai23/tutorial) on FLAML at AAAI 2023.

 :fire: A [hands-on tutorial](https://github.com/microsoft/FLAML/tree/tutorial/tutorial) on FLAML presented at KDD 2022

 ## What is FLAML
 FLAML is a lightweight Python library that finds accurate machine
 learning models automatically, efficiently and economically. It frees users from selecting
-learners and hyperparameters for each learner. It can also be used to tune generic hyperparameters for MLOps workflows, pipelines, mathematical/statistical models, algorithms, computing experiments, software configurations and so on.
+models and hyperparameters for each model. It can also be used to tune generic hyperparameters for large language models (LLM), MLOps/LMOps workflows, pipelines, mathematical/statistical models, algorithms, computing experiments, software configurations and so on.

-1. For common machine learning tasks like classification and regression, it quickly finds quality models for user-provided data with low computational resources. It supports both classifcal machine learning models and deep neural networks.
+1. For common machine learning or AI tasks like classification, regression, and generation, it quickly finds quality models for user-provided data with low computational resources. It supports both classical machine learning models and deep neural networks, including large language models such as the OpenAI GPT-3 models.
 1. It is easy to customize or extend. Users can find their desired customizability from a smooth range: minimal customization (computational resource budget), medium customization (e.g., scikit-style learner, search space and metric), or full customization (arbitrary training and evaluation code).
 1. It supports fast automatic tuning, capable of handling complex constraints/guidance/early stopping. FLAML is powered by a new, [cost-effective
 hyperparameter optimization](https://microsoft.github.io/FLAML/docs/Use-Cases/Tune-User-Defined-Function/#hyperparameter-optimization-algorithm)
-and learner selection method invented by Microsoft Research.
+and model selection method invented by Microsoft Research, and many followup [research studies](https://microsoft.github.io/FLAML/docs/Research).

 FLAML has a .NET implementation in [ML.NET](http://dot.net/ml), an open-source, cross-platform machine learning framework for .NET. In ML.NET, you can use FLAML via low-code solutions like [Model Builder](https://dotnet.microsoft.com/apps/machinelearning-ai/ml-dotnet/model-builder) Visual Studio extension and the cross-platform [ML.NET CLI](https://docs.microsoft.com/dotnet/machine-learning/automate-training-with-cli). Alternatively, you can use the [ML.NET AutoML API](https://www.nuget.org/packages/Microsoft.ML.AutoML/#versions-body-tab) for a code-first experience.

--- a/flaml/automl/ml.py
+++ b/flaml/automl/ml.py
@ -207,11 +207,11 @@ def metric_loss_score(
        except ImportError:
            raise ValueError(
                metric_name
-                + " is not an built-in sklearn metric and nlp is not installed. "
+                + " is not an built-in sklearn metric and [hf] is not installed. "
                "Currently built-in sklearn metrics are: "
                "r2, rmse, mae, mse, accuracy, roc_auc, roc_auc_ovr, roc_auc_ovo,"
                "log_loss, mape, f1, micro_f1, macro_f1, ap. "
-                "If the metric is an nlp metric, please pip install flaml[nlp] ",
+                "If the metric is a huggingface metric, please pip install flaml[hf] ",
                "or pass a customized metric function to AutoML.fit(metric=func)",
            )
        # If the metric is not found from huggingface dataset metric list (i.e., FileNotFoundError)
--- a/flaml/integrations/oai/completion.py
+++ b/flaml/integrations/oai/completion.py
@ -179,6 +179,7 @@ class Completion:
        """
        cost = 0
        data = cls.data
+        data_length = len(data)
        target_n_tokens = (
            1000 * cls.inference_budget / cls.price1K[config["model"]]
            if cls.inference_budget and cls.price1K.get(config["model"])
@ -187,26 +188,33 @@ class Completion:
        prune_hp = cls._prune_hp
        metric = cls._metric
        config_n = config[prune_hp]
-        max_tokens = config["max_tokens"]
+        max_tokens = config.get("max_tokens", 16)  # default value in OpenAI is 16
        region_key = cls._get_region_key(config)
        prompt = cls._prompts[config["prompt"]]
        stop = cls._stops and cls._stops[config["stop"]]
        if prune and target_n_tokens:
            max_valid_n = cls._get_max_valid_n(region_key, max_tokens)
-            min_invalid_n = cls._get_min_invalid_n(region_key, max_tokens)
-            if min_invalid_n is not None and config_n >= min_invalid_n:
-                if config_n > max_valid_n:
+            if cls.avg_input_tokens:
+                # max_tokens bounds the maximum tokens
+                # so using it we can calculate a valid n according to the avg # input tokens
+                max_valid_n = max(
+                    max_valid_n,
+                    int((target_n_tokens - cls.avg_input_tokens) // max_tokens),
+                )
+            else:
+                input_tokens = [None] * data_length
+            if config_n <= max_valid_n:
+                start_n = config_n
+            else:
+                min_invalid_n = cls._get_min_invalid_n(region_key, max_tokens)
+                if min_invalid_n is not None and config_n >= min_invalid_n:
                    # prune this config
                    return {
                        "inference_cost": np.inf,
                        metric: np.inf if cls._mode == "min" else -np.inf,
                        "cost": cost,
                    }
-                # since config_n<=max_valid_n, there is a chance config_n is valid
-                start_n = config_n
-            else:
-                # start from a valid n
-                start_n = min(max_valid_n, config_n)
+                start_n = max_valid_n + 1
        else:
            start_n = config_n
        params = config.copy()
@ -214,7 +222,6 @@ class Completion:
        temperature_or_top_p = params.pop("temperature_or_top_p", None)
        if temperature_or_top_p:
            params.update(temperature_or_top_p)
-        data_length = len(data)
        num_completions, previous_num_completions = start_n, 0
        n_tokens_list, result, responses_list = [], {}, []
        while True:  # n <= config_n
@ -242,6 +249,14 @@ class Completion:
                        if previous_num_completions
                        else response["usage"]["total_tokens"]
                    )
+                    if (
+                        prune
+                        and target_n_tokens
+                        and not cls.avg_input_tokens
+                        and not input_tokens[i]
+                    ):
+                        # store the # input tokens
+                        input_tokens[i] = response["usage"]["prompt_tokens"]
                    # Under Assumption 1, we should count both the input and output tokens in the first query,
                    # and only count ouput tokens afterwards
                    query_cost = (
@ -335,6 +350,8 @@ class Completion:
                result["inference_cost"] = (
                    avg_n_tokens * cls.price1K[config["model"]] / 1000
                )
+                if prune and target_n_tokens and not cls.avg_input_tokens:
+                    cls.avg_input_tokens = np.mean(input_tokens)
                break
            else:
                if data_early_stop:
@ -424,6 +441,7 @@ class Completion:
            cls._total_cost = 0  # total optimization cost
            cls._eval_func = eval_func
            cls.data = data
+            cls.avg_input_tokens = None

            search_alg = BlendSearch(
                cost_attr="cost",
--- a/notebook/automl_nlp.ipynb
+++ b/notebook/automl_nlp.ipynb
--- a/notebook/integrate_openai.ipynb
+++ b/notebook/integrate_openai.ipynb
@ -30,10 +30,10 @@
   "execution_count": 1,
   "metadata": {
    "execution": {
-     "iopub.execute_input": "2023-02-13T23:40:52.317406Z",
-     "iopub.status.busy": "2023-02-13T23:40:52.316561Z",
-     "iopub.status.idle": "2023-02-13T23:40:52.321193Z",
-     "shell.execute_reply": "2023-02-13T23:40:52.320628Z"
+     "iopub.execute_input": "2023-02-24T23:25:36.910966Z",
+     "iopub.status.busy": "2023-02-24T23:25:36.910473Z",
+     "iopub.status.idle": "2023-02-24T23:25:36.914554Z",
+     "shell.execute_reply": "2023-02-24T23:25:36.914030Z"
    }
   },
   "outputs": [],
@ -54,10 +54,10 @@
   "execution_count": 2,
   "metadata": {
    "execution": {
-     "iopub.execute_input": "2023-02-13T23:40:52.324240Z",
-     "iopub.status.busy": "2023-02-13T23:40:52.323783Z",
-     "iopub.status.idle": "2023-02-13T23:40:52.330570Z",
-     "shell.execute_reply": "2023-02-13T23:40:52.329750Z"
+     "iopub.execute_input": "2023-02-24T23:25:36.917301Z",
+     "iopub.status.busy": "2023-02-24T23:25:36.917011Z",
+     "iopub.status.idle": "2023-02-24T23:25:36.923156Z",
+     "shell.execute_reply": "2023-02-24T23:25:36.922619Z"
    }
   },
   "outputs": [],
@ -81,10 +81,10 @@
   "execution_count": 3,
   "metadata": {
    "execution": {
-     "iopub.execute_input": "2023-02-13T23:40:52.333547Z",
-     "iopub.status.busy": "2023-02-13T23:40:52.333249Z",
-     "iopub.status.idle": "2023-02-13T23:40:52.336508Z",
-     "shell.execute_reply": "2023-02-13T23:40:52.335858Z"
+     "iopub.execute_input": "2023-02-24T23:25:36.925804Z",
+     "iopub.status.busy": "2023-02-24T23:25:36.925423Z",
+     "iopub.status.idle": "2023-02-24T23:25:36.928191Z",
+     "shell.execute_reply": "2023-02-24T23:25:36.927673Z"
    }
   },
   "outputs": [],
@ -109,10 +109,10 @@
   "execution_count": 4,
   "metadata": {
    "execution": {
-     "iopub.execute_input": "2023-02-13T23:40:52.339977Z",
-     "iopub.status.busy": "2023-02-13T23:40:52.339556Z",
-     "iopub.status.idle": "2023-02-13T23:40:54.603349Z",
-     "shell.execute_reply": "2023-02-13T23:40:54.602630Z"
+     "iopub.execute_input": "2023-02-24T23:25:36.931255Z",
+     "iopub.status.busy": "2023-02-24T23:25:36.930838Z",
+     "iopub.status.idle": "2023-02-24T23:25:39.148799Z",
+     "shell.execute_reply": "2023-02-24T23:25:39.148113Z"
    }
   },
   "outputs": [
@ -126,7 +126,7 @@
    {
     "data": {
      "application/vnd.jupyter.widget-view+json": {
-       "model_id": "454146d0f7224f038689031002906e6f",
+       "model_id": "35cd066a31b242bb87b2c106ee72e5f2",
       "version_major": 2,
       "version_minor": 0
      },
@ -186,10 +186,10 @@
   "execution_count": 5,
   "metadata": {
    "execution": {
-     "iopub.execute_input": "2023-02-13T23:40:54.607152Z",
-     "iopub.status.busy": "2023-02-13T23:40:54.606441Z",
-     "iopub.status.idle": "2023-02-13T23:40:54.610504Z",
-     "shell.execute_reply": "2023-02-13T23:40:54.609759Z"
+     "iopub.execute_input": "2023-02-24T23:25:39.152156Z",
+     "iopub.status.busy": "2023-02-24T23:25:39.151531Z",
+     "iopub.status.idle": "2023-02-24T23:25:39.155313Z",
+     "shell.execute_reply": "2023-02-24T23:25:39.154731Z"
    },
    "slideshow": {
     "slide_type": "subslide"
@ -238,10 +238,10 @@
   "execution_count": 6,
   "metadata": {
    "execution": {
-     "iopub.execute_input": "2023-02-13T23:40:54.613590Z",
-     "iopub.status.busy": "2023-02-13T23:40:54.613168Z",
-     "iopub.status.idle": "2023-02-13T23:40:54.616873Z",
-     "shell.execute_reply": "2023-02-13T23:40:54.616193Z"
+     "iopub.execute_input": "2023-02-24T23:25:39.158398Z",
+     "iopub.status.busy": "2023-02-24T23:25:39.157766Z",
+     "iopub.status.idle": "2023-02-24T23:25:39.161396Z",
+     "shell.execute_reply": "2023-02-24T23:25:39.160797Z"
    }
   },
   "outputs": [
@ -287,10 +287,10 @@
   "execution_count": 7,
   "metadata": {
    "execution": {
-     "iopub.execute_input": "2023-02-13T23:40:54.619618Z",
-     "iopub.status.busy": "2023-02-13T23:40:54.619218Z",
-     "iopub.status.idle": "2023-02-13T23:40:54.624272Z",
-     "shell.execute_reply": "2023-02-13T23:40:54.623664Z"
+     "iopub.execute_input": "2023-02-24T23:25:39.164187Z",
+     "iopub.status.busy": "2023-02-24T23:25:39.163867Z",
+     "iopub.status.idle": "2023-02-24T23:25:39.169009Z",
+     "shell.execute_reply": "2023-02-24T23:25:39.168427Z"
    }
   },
   "outputs": [],
@ -337,10 +337,10 @@
   "execution_count": 8,
   "metadata": {
    "execution": {
-     "iopub.execute_input": "2023-02-13T23:40:54.626998Z",
-     "iopub.status.busy": "2023-02-13T23:40:54.626593Z",
-     "iopub.status.idle": "2023-02-13T23:40:54.631383Z",
-     "shell.execute_reply": "2023-02-13T23:40:54.630770Z"
+     "iopub.execute_input": "2023-02-24T23:25:39.171752Z",
+     "iopub.status.busy": "2023-02-24T23:25:39.171347Z",
+     "iopub.status.idle": "2023-02-24T23:25:39.176343Z",
+     "shell.execute_reply": "2023-02-24T23:25:39.175510Z"
    }
   },
   "outputs": [],
@ -391,10 +391,10 @@
   "execution_count": 9,
   "metadata": {
    "execution": {
-     "iopub.execute_input": "2023-02-13T23:40:54.634335Z",
-     "iopub.status.busy": "2023-02-13T23:40:54.633929Z",
-     "iopub.status.idle": "2023-02-13T23:40:56.105700Z",
-     "shell.execute_reply": "2023-02-13T23:40:56.105085Z"
+     "iopub.execute_input": "2023-02-24T23:25:39.179030Z",
+     "iopub.status.busy": "2023-02-24T23:25:39.178624Z",
+     "iopub.status.idle": "2023-02-24T23:25:40.584410Z",
+     "shell.execute_reply": "2023-02-24T23:25:40.583802Z"
    },
    "slideshow": {
     "slide_type": "slide"
@ -418,10 +418,10 @@
   "execution_count": 10,
   "metadata": {
    "execution": {
-     "iopub.execute_input": "2023-02-13T23:40:56.109177Z",
-     "iopub.status.busy": "2023-02-13T23:40:56.108624Z",
-     "iopub.status.idle": "2023-02-13T23:40:56.112651Z",
-     "shell.execute_reply": "2023-02-13T23:40:56.112076Z"
+     "iopub.execute_input": "2023-02-24T23:25:40.587815Z",
+     "iopub.status.busy": "2023-02-24T23:25:40.587283Z",
+     "iopub.status.idle": "2023-02-24T23:25:40.590826Z",
+     "shell.execute_reply": "2023-02-24T23:25:40.590158Z"
    },
    "slideshow": {
     "slide_type": "slide"
@ -483,10 +483,10 @@
   "execution_count": 11,
   "metadata": {
    "execution": {
-     "iopub.execute_input": "2023-02-13T23:40:56.115383Z",
-     "iopub.status.busy": "2023-02-13T23:40:56.114975Z",
-     "iopub.status.idle": "2023-02-13T23:41:55.045654Z",
-     "shell.execute_reply": "2023-02-13T23:41:55.044973Z"
+     "iopub.execute_input": "2023-02-24T23:25:40.593603Z",
+     "iopub.status.busy": "2023-02-24T23:25:40.593269Z",
+     "iopub.status.idle": "2023-02-24T23:26:38.349191Z",
+     "shell.execute_reply": "2023-02-24T23:26:38.348392Z"
    }
   },
   "outputs": [
@ -494,119 +494,119 @@
     "name": "stderr",
     "output_type": "stream",
     "text": [
-      "\u001b[32m[I 2023-02-13 23:40:56,159]\u001b[0m A new study created in memory with name: optuna\u001b[0m\n"
+      "\u001b[32m[I 2023-02-24 23:25:40,643]\u001b[0m A new study created in memory with name: optuna\u001b[0m\n"
     ]
    },
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
-      "\u001b[32m[I 2023-02-13 23:40:56,161]\u001b[0m A new study created in memory with name: optuna\u001b[0m\n"
+      "\u001b[32m[I 2023-02-24 23:25:40,646]\u001b[0m A new study created in memory with name: optuna\u001b[0m\n"
     ]
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
-      "[flaml.tune.tune: 02-13 23:40:56] {806} INFO - trial 1 config: {'model': 'code-davinci-002', 'temperature_or_top_p': {'temperature': 0.36865945026811975}, 'max_tokens': 347, 'n': 1, 'prompt': 1, 'stop': 0}\n"
+      "[flaml.tune.tune: 02-24 23:25:40] {811} INFO - trial 1 config: {'model': 'code-davinci-002', 'temperature_or_top_p': {'temperature': 0.36865945026811975}, 'max_tokens': 347, 'n': 1, 'prompt': 1, 'stop': 0}\n"
     ]
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
-      "[flaml.tune.tune: 02-13 23:40:59] {215} INFO - result: {'expected_success': 0.6, 'success': 0.6, 'total_cost': 0.4624999999999999, 'cost': 0.4624999999999999, 'inference_cost': 0.023125, 'training_iteration': 0, 'config': {'model': 'code-davinci-002', 'temperature_or_top_p': {'temperature': 0.36865945026811975}, 'max_tokens': 347, 'n': 1, 'prompt': 1, 'stop': 0}, 'config/model': 'code-davinci-002', 'config/temperature_or_top_p': {'temperature': 0.36865945026811975}, 'config/max_tokens': 347, 'config/n': 1, 'config/prompt': 1, 'config/stop': 0, 'experiment_tag': 'exp', 'time_total_s': 3.7016141414642334}\n"
+      "[flaml.tune.tune: 02-24 23:25:44] {215} INFO - result: {'expected_success': 0.6, 'success': 0.6, 'total_cost': 0.4624999999999999, 'cost': 0.4624999999999999, 'inference_cost': 0.023125, 'training_iteration': 0, 'config': {'model': 'code-davinci-002', 'temperature_or_top_p': {'temperature': 0.36865945026811975}, 'max_tokens': 347, 'n': 1, 'prompt': 1, 'stop': 0}, 'config/model': 'code-davinci-002', 'config/temperature_or_top_p': {'temperature': 0.36865945026811975}, 'config/max_tokens': 347, 'config/n': 1, 'config/prompt': 1, 'config/stop': 0, 'experiment_tag': 'exp', 'time_total_s': 3.687161445617676}\n"
     ]
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
-      "[flaml.tune.tune: 02-13 23:40:59] {806} INFO - trial 2 config: {'model': 'code-cushman-001', 'temperature_or_top_p': {'temperature': 0.36865945026811975}, 'max_tokens': 347, 'n': 1, 'prompt': 1, 'stop': 0}\n"
+      "[flaml.tune.tune: 02-24 23:25:44] {811} INFO - trial 2 config: {'model': 'code-cushman-001', 'temperature_or_top_p': {'temperature': 0.36865945026811975}, 'max_tokens': 347, 'n': 1, 'prompt': 1, 'stop': 0}\n"
     ]
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
-      "[flaml.tune.tune: 02-13 23:41:00] {215} INFO - result: {'expected_success': 0.35, 'success': 0.35, 'total_cost': 0.5671159999999997, 'cost': 0.104616, 'inference_cost': 0.0052308, 'training_iteration': 0, 'config': {'model': 'code-cushman-001', 'temperature_or_top_p': {'temperature': 0.36865945026811975}, 'max_tokens': 347, 'n': 1, 'prompt': 1, 'stop': 0}, 'config/model': 'code-cushman-001', 'config/temperature_or_top_p': {'temperature': 0.36865945026811975}, 'config/max_tokens': 347, 'config/n': 1, 'config/prompt': 1, 'config/stop': 0, 'experiment_tag': 'exp', 'time_total_s': 0.673302412033081}\n"
+      "[flaml.tune.tune: 02-24 23:25:45] {215} INFO - result: {'expected_success': 0.35, 'success': 0.35, 'total_cost': 0.5671159999999997, 'cost': 0.104616, 'inference_cost': 0.0052308, 'training_iteration': 0, 'config': {'model': 'code-cushman-001', 'temperature_or_top_p': {'temperature': 0.36865945026811975}, 'max_tokens': 347, 'n': 1, 'prompt': 1, 'stop': 0}, 'config/model': 'code-cushman-001', 'config/temperature_or_top_p': {'temperature': 0.36865945026811975}, 'config/max_tokens': 347, 'config/n': 1, 'config/prompt': 1, 'config/stop': 0, 'experiment_tag': 'exp', 'time_total_s': 0.6666913032531738}\n"
     ]
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
-      "[flaml.tune.tune: 02-13 23:41:00] {806} INFO - trial 3 config: {'model': 'code-cushman-001', 'temperature_or_top_p': {'top_p': 0.4985070123025904}, 'max_tokens': 97, 'n': 20, 'prompt': 0, 'stop': 0}\n"
+      "[flaml.tune.tune: 02-24 23:25:45] {811} INFO - trial 3 config: {'model': 'code-cushman-001', 'temperature_or_top_p': {'top_p': 0.4985070123025904}, 'max_tokens': 97, 'n': 20, 'prompt': 0, 'stop': 0}\n"
     ]
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
-      "[flaml.tune.tune: 02-13 23:41:17] {215} INFO - result: {'expected_success': 0.5080706992649381, 'success': 0.55, 'total_cost': 1.1848999999999996, 'cost': 0.617784, 'inference_cost': 0.0287676, 'training_iteration': 0, 'config': {'model': 'code-cushman-001', 'temperature_or_top_p': {'top_p': 0.4985070123025904}, 'max_tokens': 97, 'n': 20, 'prompt': 0, 'stop': 0}, 'config/model': 'code-cushman-001', 'config/temperature_or_top_p': {'top_p': 0.4985070123025904}, 'config/max_tokens': 97, 'config/n': 20, 'config/prompt': 0, 'config/stop': 0, 'experiment_tag': 'exp', 'time_total_s': 16.56331181526184}\n"
+      "[flaml.tune.tune: 02-24 23:26:01] {215} INFO - result: {'expected_success': 0.5080706992649381, 'success': 0.55, 'total_cost': 1.1424679999999998, 'cost': 0.575352, 'inference_cost': 0.0287676, 'training_iteration': 0, 'config': {'model': 'code-cushman-001', 'temperature_or_top_p': {'top_p': 0.4985070123025904}, 'max_tokens': 97, 'n': 20, 'prompt': 0, 'stop': 0}, 'config/model': 'code-cushman-001', 'config/temperature_or_top_p': {'top_p': 0.4985070123025904}, 'config/max_tokens': 97, 'config/n': 20, 'config/prompt': 0, 'config/stop': 0, 'experiment_tag': 'exp', 'time_total_s': 16.66586470603943}\n"
     ]
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
-      "[flaml.tune.tune: 02-13 23:41:17] {806} INFO - trial 4 config: {'model': 'code-cushman-001', 'temperature_or_top_p': {'top_p': 0.6125260668293881}, 'max_tokens': 433, 'n': 29, 'prompt': 0, 'stop': 0}\n"
+      "[flaml.tune.tune: 02-24 23:26:01] {811} INFO - trial 4 config: {'model': 'code-cushman-001', 'temperature_or_top_p': {'top_p': 0.6125260668293881}, 'max_tokens': 433, 'n': 29, 'prompt': 0, 'stop': 0}\n"
     ]
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
-      "[flaml.tune.tune: 02-13 23:41:51] {215} INFO - result: {'expected_success': 0.6186627404336135, 'success': 0.65, 'total_cost': 2.4239719999999987, 'cost': 1.2390720000000002, 'inference_cost': 0.059620799999999995, 'training_iteration': 0, 'config': {'model': 'code-cushman-001', 'temperature_or_top_p': {'top_p': 0.6125260668293881}, 'max_tokens': 433, 'n': 29, 'prompt': 0, 'stop': 0}, 'config/model': 'code-cushman-001', 'config/temperature_or_top_p': {'top_p': 0.6125260668293881}, 'config/max_tokens': 433, 'config/n': 29, 'config/prompt': 0, 'config/stop': 0, 'experiment_tag': 'exp', 'time_total_s': 34.57707595825195}\n"
+      "[flaml.tune.tune: 02-24 23:26:38] {215} INFO - result: {'expected_success': 0.6186627404336135, 'success': 0.65, 'total_cost': 2.3693479999999987, 'cost': 1.2268800000000002, 'inference_cost': 0.059620799999999995, 'training_iteration': 0, 'config': {'model': 'code-cushman-001', 'temperature_or_top_p': {'top_p': 0.6125260668293881}, 'max_tokens': 433, 'n': 29, 'prompt': 0, 'stop': 0}, 'config/model': 'code-cushman-001', 'config/temperature_or_top_p': {'top_p': 0.6125260668293881}, 'config/max_tokens': 433, 'config/n': 29, 'config/prompt': 0, 'config/stop': 0, 'experiment_tag': 'exp', 'time_total_s': 36.605130434036255}\n"
     ]
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
-      "[flaml.tune.tune: 02-13 23:41:51] {806} INFO - trial 5 config: {'model': 'code-davinci-002', 'temperature_or_top_p': {'temperature': 0.6177669784693172}, 'max_tokens': 231, 'n': 65, 'prompt': 3, 'stop': 0}\n"
+      "[flaml.tune.tune: 02-24 23:26:38] {811} INFO - trial 5 config: {'model': 'code-davinci-002', 'temperature_or_top_p': {'temperature': 0.6177669784693172}, 'max_tokens': 231, 'n': 65, 'prompt': 3, 'stop': 0}\n"
     ]
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
-      "[flaml.tune.tune: 02-13 23:41:51] {215} INFO - result: {'expected_success': 0, 'total_cost': 2.6356719999999987, 'cost': 0.2117, 'training_iteration': 0, 'config': {'model': 'code-davinci-002', 'temperature_or_top_p': {'temperature': 0.6177669784693172}, 'max_tokens': 231, 'n': 65, 'prompt': 3, 'stop': 0}, 'config/model': 'code-davinci-002', 'config/temperature_or_top_p': {'temperature': 0.6177669784693172}, 'config/max_tokens': 231, 'config/n': 65, 'config/prompt': 3, 'config/stop': 0, 'experiment_tag': 'exp', 'time_total_s': 0.0022132396697998047}\n"
+      "[flaml.tune.tune: 02-24 23:26:38] {215} INFO - result: {'expected_success': 0, 'total_cost': 2.5295479999999984, 'cost': 0.1602, 'training_iteration': 0, 'config': {'model': 'code-davinci-002', 'temperature_or_top_p': {'temperature': 0.6177669784693172}, 'max_tokens': 231, 'n': 65, 'prompt': 3, 'stop': 0}, 'config/model': 'code-davinci-002', 'config/temperature_or_top_p': {'temperature': 0.6177669784693172}, 'config/max_tokens': 231, 'config/n': 65, 'config/prompt': 3, 'config/stop': 0, 'experiment_tag': 'exp', 'time_total_s': 0.0020499229431152344}\n"
     ]
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
-      "[flaml.tune.tune: 02-13 23:41:51] {806} INFO - trial 6 config: {'model': 'code-davinci-002', 'max_tokens': 263, 'n': 41, 'prompt': 0, 'stop': 0, 'temperature_or_top_p': {'top_p': 0.49834557213253655}}\n"
+      "[flaml.tune.tune: 02-24 23:26:38] {811} INFO - trial 6 config: {'model': 'code-davinci-002', 'max_tokens': 263, 'n': 41, 'prompt': 0, 'stop': 0, 'temperature_or_top_p': {'top_p': 0.49834557213253655}}\n"
     ]
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
-      "[flaml.tune.tune: 02-13 23:41:54] {215} INFO - result: {'expected_success': 0, 'total_cost': 3.003171999999999, 'cost': 0.3675, 'training_iteration': 0, 'config': {'model': 'code-davinci-002', 'max_tokens': 263, 'n': 41, 'prompt': 0, 'stop': 0, 'temperature_or_top_p': {'top_p': 0.49834557213253655}}, 'config/model': 'code-davinci-002', 'config/max_tokens': 263, 'config/n': 41, 'config/prompt': 0, 'config/stop': 0, 'config/temperature_or_top_p': {'top_p': 0.49834557213253655}, 'experiment_tag': 'exp', 'time_total_s': 3.3002660274505615}\n"
+      "[flaml.tune.tune: 02-24 23:26:38] {215} INFO - result: {'expected_success': 0, 'total_cost': 2.8578479999999984, 'cost': 0.32830000000000004, 'training_iteration': 0, 'config': {'model': 'code-davinci-002', 'max_tokens': 263, 'n': 41, 'prompt': 0, 'stop': 0, 'temperature_or_top_p': {'top_p': 0.49834557213253655}}, 'config/model': 'code-davinci-002', 'config/max_tokens': 263, 'config/n': 41, 'config/prompt': 0, 'config/stop': 0, 'config/temperature_or_top_p': {'top_p': 0.49834557213253655}, 'experiment_tag': 'exp', 'time_total_s': 0.002808809280395508}\n"
     ]
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
-      "[flaml.tune.tune: 02-13 23:41:55] {806} INFO - trial 7 config: {'model': 'code-cushman-001', 'temperature_or_top_p': {'temperature': 0.8286813263076767}, 'max_tokens': 57, 'n': 63, 'prompt': 3, 'stop': 0}\n"
+      "[flaml.tune.tune: 02-24 23:26:38] {811} INFO - trial 7 config: {'model': 'code-cushman-001', 'temperature_or_top_p': {'temperature': 0.8286813263076767}, 'max_tokens': 57, 'n': 63, 'prompt': 3, 'stop': 0}\n"
     ]
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
-      "[flaml.tune.tune: 02-13 23:41:55] {215} INFO - result: {'expected_success': 0, 'total_cost': 4.046379999999999, 'cost': 1.043208, 'training_iteration': 0, 'config': {'model': 'code-cushman-001', 'temperature_or_top_p': {'temperature': 0.8286813263076767}, 'max_tokens': 57, 'n': 63, 'prompt': 3, 'stop': 0}, 'config/model': 'code-cushman-001', 'config/temperature_or_top_p': {'temperature': 0.8286813263076767}, 'config/max_tokens': 57, 'config/n': 63, 'config/prompt': 3, 'config/stop': 0, 'experiment_tag': 'exp', 'time_total_s': 0.007852792739868164}\n"
+      "[flaml.tune.tune: 02-24 23:26:38] {215} INFO - result: {'expected_success': 0, 'total_cost': 4.028831999999999, 'cost': 1.170984, 'training_iteration': 0, 'config': {'model': 'code-cushman-001', 'temperature_or_top_p': {'temperature': 0.8286813263076767}, 'max_tokens': 57, 'n': 63, 'prompt': 3, 'stop': 0}, 'config/model': 'code-cushman-001', 'config/temperature_or_top_p': {'temperature': 0.8286813263076767}, 'config/max_tokens': 57, 'config/n': 63, 'config/prompt': 3, 'config/stop': 0, 'experiment_tag': 'exp', 'time_total_s': 0.015198230743408203}\n"
     ]
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
-      "[flaml.tune.tune: 02-13 23:41:55] {827} WARNING - fail to sample a trial for 100 times in a row, stopping.\n"
+      "[flaml.tune.tune: 02-24 23:26:38] {834} WARNING - fail to sample a trial for 100 times in a row, stopping.\n"
     ]
    }
   ],
@ -656,10 +656,10 @@
   "execution_count": 12,
   "metadata": {
    "execution": {
-     "iopub.execute_input": "2023-02-13T23:41:55.049204Z",
-     "iopub.status.busy": "2023-02-13T23:41:55.048871Z",
-     "iopub.status.idle": "2023-02-13T23:41:55.053284Z",
-     "shell.execute_reply": "2023-02-13T23:41:55.052574Z"
+     "iopub.execute_input": "2023-02-24T23:26:38.352710Z",
+     "iopub.status.busy": "2023-02-24T23:26:38.352378Z",
+     "iopub.status.idle": "2023-02-24T23:26:38.356939Z",
+     "shell.execute_reply": "2023-02-24T23:26:38.356217Z"
    }
   },
   "outputs": [
@ -668,7 +668,7 @@
     "output_type": "stream",
     "text": [
      "optimized config {'model': 'code-cushman-001', 'max_tokens': 433, 'n': 29, 'prompt': '{prompt}', 'stop': ['\\nclass', '\\ndef', '\\nif', '\\nprint'], 'top_p': 0.6125260668293881}\n",
-      "best result on tuning data {'expected_success': 0.6186627404336135, 'success': 0.65, 'total_cost': 2.4239719999999987, 'cost': 1.2390720000000002, 'inference_cost': 0.059620799999999995, 'training_iteration': 0, 'config': {'model': 'code-cushman-001', 'temperature_or_top_p': {'top_p': 0.6125260668293881}, 'max_tokens': 433, 'n': 29, 'prompt': 0, 'stop': 0}, 'config/model': 'code-cushman-001', 'config/temperature_or_top_p': {'top_p': 0.6125260668293881}, 'config/max_tokens': 433, 'config/n': 29, 'config/prompt': 0, 'config/stop': 0, 'experiment_tag': 'exp', 'time_total_s': 34.57707595825195}\n"
+      "best result on tuning data {'expected_success': 0.6186627404336135, 'success': 0.65, 'total_cost': 2.3693479999999987, 'cost': 1.2268800000000002, 'inference_cost': 0.059620799999999995, 'training_iteration': 0, 'config': {'model': 'code-cushman-001', 'temperature_or_top_p': {'top_p': 0.6125260668293881}, 'max_tokens': 433, 'n': 29, 'prompt': 0, 'stop': 0}, 'config/model': 'code-cushman-001', 'config/temperature_or_top_p': {'top_p': 0.6125260668293881}, 'config/max_tokens': 433, 'config/n': 29, 'config/prompt': 0, 'config/stop': 0, 'experiment_tag': 'exp', 'time_total_s': 36.605130434036255}\n"
     ]
    }
   ],
@ -696,10 +696,10 @@
   "execution_count": 13,
   "metadata": {
    "execution": {
-     "iopub.execute_input": "2023-02-13T23:41:55.056205Z",
-     "iopub.status.busy": "2023-02-13T23:41:55.055631Z",
-     "iopub.status.idle": "2023-02-13T23:41:56.039259Z",
-     "shell.execute_reply": "2023-02-13T23:41:56.038427Z"
+     "iopub.execute_input": "2023-02-24T23:26:38.359902Z",
+     "iopub.status.busy": "2023-02-24T23:26:38.359506Z",
+     "iopub.status.idle": "2023-02-24T23:26:39.343921Z",
+     "shell.execute_reply": "2023-02-24T23:26:39.343051Z"
    },
    "slideshow": {
     "slide_type": "subslide"
@ -921,7 +921,7 @@
   "source": [
    "### Evaluate the success rate on the test data\n",
    "\n",
-    "You can use flaml's `oai.Completion.eval` to evaluate the performance of an entire dataset with the tuned config. To do that you need to set `oai.Completion.data` to the data to evaluate. The following code will take a while to evaluate all the 144 test data instances. Compared to the baseline success rate (0.46) on the [HELM benchmark](https://crfm.stanford.edu/helm/latest/?group=code_humaneval), the tuned config has a success rate of 0.68. It can be further improved if the inference budget and optimization budget are further increased."
+    "You can use flaml's `oai.Completion.eval` to evaluate the performance of an entire dataset with the tuned config. To do that you need to set `oai.Completion.data` to the data to evaluate. The following code will take a while to evaluate all the 144 test data instances. Compared to the baseline success rate (46%) on the [HELM benchmark](https://crfm.stanford.edu/helm/latest/?group=code_humaneval), the tuned config has a success rate of 68%. It can be further improved if the inference budget and optimization budget are further increased."
   ]
  },
  {
@ -929,10 +929,10 @@
   "execution_count": 14,
   "metadata": {
    "execution": {
-     "iopub.execute_input": "2023-02-13T23:41:56.042764Z",
-     "iopub.status.busy": "2023-02-13T23:41:56.042086Z",
-     "iopub.status.idle": "2023-02-13T23:53:05.597643Z",
-     "shell.execute_reply": "2023-02-13T23:53:05.596603Z"
+     "iopub.execute_input": "2023-02-24T23:26:39.347295Z",
+     "iopub.status.busy": "2023-02-24T23:26:39.346994Z",
+     "iopub.status.idle": "2023-02-24T23:29:27.160335Z",
+     "shell.execute_reply": "2023-02-24T23:29:27.159519Z"
    }
   },
   "outputs": [
@ -940,7 +940,7 @@
     "name": "stdout",
     "output_type": "stream",
     "text": [
-      "{'expected_success': 0.6364503360372493, 'success': 0.6805555555555556, 'total_cost': 12.227739999999997, 'cost': 8.181360000000003, 'inference_cost': 0.056815}\n"
+      "{'expected_success': 0.6364503360372493, 'success': 0.6805555555555556, 'total_cost': 12.210191999999997, 'cost': 8.181360000000003, 'inference_cost': 0.056815}\n"
     ]
    }
   ],
@ -977,60 +977,25 @@
  "widgets": {
   "application/vnd.jupyter.widget-state+json": {
    "state": {
-     "2d910cfd2d2a4fc49fc30fbbdc5576a7": {
-      "model_module": "@jupyter-widgets/base",
+     "24dd93300e0442788ee6cc1310e5bf14": {
+      "model_module": "@jupyter-widgets/controls",
      "model_module_version": "2.0.0",
-      "model_name": "LayoutModel",
+      "model_name": "HTMLStyleModel",
      "state": {
-       "_model_module": "@jupyter-widgets/base",
+       "_model_module": "@jupyter-widgets/controls",
       "_model_module_version": "2.0.0",
-       "_model_name": "LayoutModel",
+       "_model_name": "HTMLStyleModel",
       "_view_count": null,
       "_view_module": "@jupyter-widgets/base",
       "_view_module_version": "2.0.0",
-       "_view_name": "LayoutView",
-       "align_content": null,
-       "align_items": null,
-       "align_self": null,
-       "border_bottom": null,
-       "border_left": null,
-       "border_right": null,
-       "border_top": null,
-       "bottom": null,
-       "display": null,
-       "flex": null,
-       "flex_flow": null,
-       "grid_area": null,
-       "grid_auto_columns": null,
-       "grid_auto_flow": null,
-       "grid_auto_rows": null,
-       "grid_column": null,
-       "grid_gap": null,
-       "grid_row": null,
-       "grid_template_areas": null,
-       "grid_template_columns": null,
-       "grid_template_rows": null,
-       "height": null,
-       "justify_content": null,
-       "justify_items": null,
-       "left": null,
-       "margin": null,
-       "max_height": null,
-       "max_width": null,
-       "min_height": null,
-       "min_width": null,
-       "object_fit": null,
-       "object_position": null,
-       "order": null,
-       "overflow": null,
-       "padding": null,
-       "right": null,
-       "top": null,
-       "visibility": null,
-       "width": null
+       "_view_name": "StyleView",
+       "background": null,
+       "description_width": "",
+       "font_size": null,
+       "text_color": null
      }
     },
-     "454146d0f7224f038689031002906e6f": {
+     "35cd066a31b242bb87b2c106ee72e5f2": {
      "model_module": "@jupyter-widgets/controls",
      "model_module_version": "2.0.0",
      "model_name": "HBoxModel",
@ -1045,95 +1010,34 @@
       "_view_name": "HBoxView",
       "box_style": "",
       "children": [
-        "IPY_MODEL_e4ae2b6f5a974fd4bafb6abb9d12ff26",
-        "IPY_MODEL_577e1e3cc4db4942b0883577b3b52755",
-        "IPY_MODEL_b40bdfb1ac1d4cffb7cefcb870c64d45"
+        "IPY_MODEL_8e7ee7687a99410d88a98a74ecfcea99",
+        "IPY_MODEL_421e02a11a974b40b3ddb75382b3b640",
+        "IPY_MODEL_77db9797e78b49438d21c5c8da34b4cb"
       ],
-       "layout": "IPY_MODEL_dc83c7bff2f241309537a8119dfc7555",
+       "layout": "IPY_MODEL_47d3046236a54b0e8f9ae455a82c7e0b",
       "tabbable": null,
       "tooltip": null
      }
     },
-     "577e1e3cc4db4942b0883577b3b52755": {
+     "3d5d106a38954af2bb3bde5777702f4e": {
      "model_module": "@jupyter-widgets/controls",
      "model_module_version": "2.0.0",
-      "model_name": "FloatProgressModel",
+      "model_name": "HTMLStyleModel",
      "state": {
-       "_dom_classes": [],
       "_model_module": "@jupyter-widgets/controls",
       "_model_module_version": "2.0.0",
-       "_model_name": "FloatProgressModel",
-       "_view_count": null,
-       "_view_module": "@jupyter-widgets/controls",
-       "_view_module_version": "2.0.0",
-       "_view_name": "ProgressView",
-       "bar_style": "success",
-       "description": "",
-       "description_allow_html": false,
-       "layout": "IPY_MODEL_2d910cfd2d2a4fc49fc30fbbdc5576a7",
-       "max": 1,
-       "min": 0,
-       "orientation": "horizontal",
-       "style": "IPY_MODEL_74a6ba0c3cbc4051be0a83e152fe1e62",
-       "tabbable": null,
-       "tooltip": null,
-       "value": 1
-      }
-     },
-     "6086462a12d54bafa59d3c4566f06cb2": {
-      "model_module": "@jupyter-widgets/base",
-      "model_module_version": "2.0.0",
-      "model_name": "LayoutModel",
-      "state": {
-       "_model_module": "@jupyter-widgets/base",
-       "_model_module_version": "2.0.0",
-       "_model_name": "LayoutModel",
+       "_model_name": "HTMLStyleModel",
       "_view_count": null,
       "_view_module": "@jupyter-widgets/base",
       "_view_module_version": "2.0.0",
-       "_view_name": "LayoutView",
-       "align_content": null,
-       "align_items": null,
-       "align_self": null,
-       "border_bottom": null,
-       "border_left": null,
-       "border_right": null,
-       "border_top": null,
-       "bottom": null,
-       "display": null,
-       "flex": null,
-       "flex_flow": null,
-       "grid_area": null,
-       "grid_auto_columns": null,
-       "grid_auto_flow": null,
-       "grid_auto_rows": null,
-       "grid_column": null,
-       "grid_gap": null,
-       "grid_row": null,
-       "grid_template_areas": null,
-       "grid_template_columns": null,
-       "grid_template_rows": null,
-       "height": null,
-       "justify_content": null,
-       "justify_items": null,
-       "left": null,
-       "margin": null,
-       "max_height": null,
-       "max_width": null,
-       "min_height": null,
-       "min_width": null,
-       "object_fit": null,
-       "object_position": null,
-       "order": null,
-       "overflow": null,
-       "padding": null,
-       "right": null,
-       "top": null,
-       "visibility": null,
-       "width": null
+       "_view_name": "StyleView",
+       "background": null,
+       "description_width": "",
+       "font_size": null,
+       "text_color": null
      }
     },
-     "74a6ba0c3cbc4051be0a83e152fe1e62": {
+     "3e1ebb31412443b0bca86a301cbdac11": {
      "model_module": "@jupyter-widgets/controls",
      "model_module_version": "2.0.0",
      "model_name": "ProgressStyleModel",
@ -1149,66 +1053,33 @@
       "description_width": ""
      }
     },
-     "7d3f3d9e15894d05a4d188ff4f466554": {
+     "421e02a11a974b40b3ddb75382b3b640": {
      "model_module": "@jupyter-widgets/controls",
      "model_module_version": "2.0.0",
-      "model_name": "HTMLStyleModel",
-      "state": {
-       "_model_module": "@jupyter-widgets/controls",
-       "_model_module_version": "2.0.0",
-       "_model_name": "HTMLStyleModel",
-       "_view_count": null,
-       "_view_module": "@jupyter-widgets/base",
-       "_view_module_version": "2.0.0",
-       "_view_name": "StyleView",
-       "background": null,
-       "description_width": "",
-       "font_size": null,
-       "text_color": null
-      }
-     },
-     "b40bdfb1ac1d4cffb7cefcb870c64d45": {
-      "model_module": "@jupyter-widgets/controls",
-      "model_module_version": "2.0.0",
-      "model_name": "HTMLModel",
+      "model_name": "FloatProgressModel",
      "state": {
       "_dom_classes": [],
       "_model_module": "@jupyter-widgets/controls",
       "_model_module_version": "2.0.0",
-       "_model_name": "HTMLModel",
+       "_model_name": "FloatProgressModel",
       "_view_count": null,
       "_view_module": "@jupyter-widgets/controls",
       "_view_module_version": "2.0.0",
-       "_view_name": "HTMLView",
+       "_view_name": "ProgressView",
+       "bar_style": "success",
       "description": "",
       "description_allow_html": false,
-       "layout": "IPY_MODEL_f1355871cc6f4dd4b50d9df5af20e5c8",
-       "placeholder": "",
-       "style": "IPY_MODEL_ca245376fd9f4354af6b2befe4af4466",
+       "layout": "IPY_MODEL_e6398d4027c9459a97965b9d91ae484f",
+       "max": 1,
+       "min": 0,
+       "orientation": "horizontal",
+       "style": "IPY_MODEL_3e1ebb31412443b0bca86a301cbdac11",
       "tabbable": null,
       "tooltip": null,
-       "value": " 1/1 [00:00&lt;00:00, 44.69it/s]"
+       "value": 1
      }
     },
-     "ca245376fd9f4354af6b2befe4af4466": {
-      "model_module": "@jupyter-widgets/controls",
-      "model_module_version": "2.0.0",
-      "model_name": "HTMLStyleModel",
-      "state": {
-       "_model_module": "@jupyter-widgets/controls",
-       "_model_module_version": "2.0.0",
-       "_model_name": "HTMLStyleModel",
-       "_view_count": null,
-       "_view_module": "@jupyter-widgets/base",
-       "_view_module_version": "2.0.0",
-       "_view_name": "StyleView",
-       "background": null,
-       "description_width": "",
-       "font_size": null,
-       "text_color": null
-      }
-     },
-     "dc83c7bff2f241309537a8119dfc7555": {
+     "47d3046236a54b0e8f9ae455a82c7e0b": {
      "model_module": "@jupyter-widgets/base",
      "model_module_version": "2.0.0",
      "model_name": "LayoutModel",
@ -1261,7 +1132,60 @@
       "width": null
      }
     },
-     "e4ae2b6f5a974fd4bafb6abb9d12ff26": {
+     "754800f7feb04acea977696e4787d1ff": {
+      "model_module": "@jupyter-widgets/base",
+      "model_module_version": "2.0.0",
+      "model_name": "LayoutModel",
+      "state": {
+       "_model_module": "@jupyter-widgets/base",
+       "_model_module_version": "2.0.0",
+       "_model_name": "LayoutModel",
+       "_view_count": null,
+       "_view_module": "@jupyter-widgets/base",
+       "_view_module_version": "2.0.0",
+       "_view_name": "LayoutView",
+       "align_content": null,
+       "align_items": null,
+       "align_self": null,
+       "border_bottom": null,
+       "border_left": null,
+       "border_right": null,
+       "border_top": null,
+       "bottom": null,
+       "display": null,
+       "flex": null,
+       "flex_flow": null,
+       "grid_area": null,
+       "grid_auto_columns": null,
+       "grid_auto_flow": null,
+       "grid_auto_rows": null,
+       "grid_column": null,
+       "grid_gap": null,
+       "grid_row": null,
+       "grid_template_areas": null,
+       "grid_template_columns": null,
+       "grid_template_rows": null,
+       "height": null,
+       "justify_content": null,
+       "justify_items": null,
+       "left": null,
+       "margin": null,
+       "max_height": null,
+       "max_width": null,
+       "min_height": null,
+       "min_width": null,
+       "object_fit": null,
+       "object_position": null,
+       "order": null,
+       "overflow": null,
+       "padding": null,
+       "right": null,
+       "top": null,
+       "visibility": null,
+       "width": null
+      }
+     },
+     "77db9797e78b49438d21c5c8da34b4cb": {
      "model_module": "@jupyter-widgets/controls",
      "model_module_version": "2.0.0",
      "model_name": "HTMLModel",
@ -1276,15 +1200,91 @@
       "_view_name": "HTMLView",
       "description": "",
       "description_allow_html": false,
-       "layout": "IPY_MODEL_6086462a12d54bafa59d3c4566f06cb2",
+       "layout": "IPY_MODEL_7b6c4e1c11e249409a1edcd63be450d8",
       "placeholder": "",
-       "style": "IPY_MODEL_7d3f3d9e15894d05a4d188ff4f466554",
+       "style": "IPY_MODEL_3d5d106a38954af2bb3bde5777702f4e",
+       "tabbable": null,
+       "tooltip": null,
+       "value": " 1/1 [00:00&lt;00:00, 44.40it/s]"
+      }
+     },
+     "7b6c4e1c11e249409a1edcd63be450d8": {
+      "model_module": "@jupyter-widgets/base",
+      "model_module_version": "2.0.0",
+      "model_name": "LayoutModel",
+      "state": {
+       "_model_module": "@jupyter-widgets/base",
+       "_model_module_version": "2.0.0",
+       "_model_name": "LayoutModel",
+       "_view_count": null,
+       "_view_module": "@jupyter-widgets/base",
+       "_view_module_version": "2.0.0",
+       "_view_name": "LayoutView",
+       "align_content": null,
+       "align_items": null,
+       "align_self": null,
+       "border_bottom": null,
+       "border_left": null,
+       "border_right": null,
+       "border_top": null,
+       "bottom": null,
+       "display": null,
+       "flex": null,
+       "flex_flow": null,
+       "grid_area": null,
+       "grid_auto_columns": null,
+       "grid_auto_flow": null,
+       "grid_auto_rows": null,
+       "grid_column": null,
+       "grid_gap": null,
+       "grid_row": null,
+       "grid_template_areas": null,
+       "grid_template_columns": null,
+       "grid_template_rows": null,
+       "height": null,
+       "justify_content": null,
+       "justify_items": null,
+       "left": null,
+       "margin": null,
+       "max_height": null,
+       "max_width": null,
+       "min_height": null,
+       "min_width": null,
+       "object_fit": null,
+       "object_position": null,
+       "order": null,
+       "overflow": null,
+       "padding": null,
+       "right": null,
+       "top": null,
+       "visibility": null,
+       "width": null
+      }
+     },
+     "8e7ee7687a99410d88a98a74ecfcea99": {
+      "model_module": "@jupyter-widgets/controls",
+      "model_module_version": "2.0.0",
+      "model_name": "HTMLModel",
+      "state": {
+       "_dom_classes": [],
+       "_model_module": "@jupyter-widgets/controls",
+       "_model_module_version": "2.0.0",
+       "_model_name": "HTMLModel",
+       "_view_count": null,
+       "_view_module": "@jupyter-widgets/controls",
+       "_view_module_version": "2.0.0",
+       "_view_name": "HTMLView",
+       "description": "",
+       "description_allow_html": false,
+       "layout": "IPY_MODEL_754800f7feb04acea977696e4787d1ff",
+       "placeholder": "",
+       "style": "IPY_MODEL_24dd93300e0442788ee6cc1310e5bf14",
       "tabbable": null,
       "tooltip": null,
       "value": "100%"
      }
     },
-     "f1355871cc6f4dd4b50d9df5af20e5c8": {
+     "e6398d4027c9459a97965b9d91ae484f": {
      "model_module": "@jupyter-widgets/base",
      "model_module_version": "2.0.0",
      "model_name": "LayoutModel",
--- a/setup.py
+++ b/setup.py
@ -92,7 +92,14 @@ setuptools.setup(
        "vw": [
            "vowpalwabbit>=8.10.0, <9.0.0",
        ],
-        "nlp": [
+        "hf": [
+            "transformers[torch]==4.26",
+            "datasets",
+            "nltk",
+            "rouge_score",
+            "seqeval",
+        ],
+        "nlp": [  # for backward compatibility; hf is the new option name
            "transformers[torch]==4.26",
            "datasets",
            "nltk",
--- a/website/docs/Examples/AutoML-NLP.md
+++ b/website/docs/Examples/AutoML-NLP.md
@ -2,9 +2,9 @@

 ### Requirements

-This example requires GPU. Install the [nlp] option:
+This example requires GPU. Install the [hf] option:
 ```python
-pip install "flaml[nlp]"
+pip install "flaml[hf]"
 ```

 ### A simple sequence classification example
--- a/website/docs/Getting-Started.md
+++ b/website/docs/Getting-Started.md
@ -3,17 +3,17 @@
 <!-- ### Welcome to FLAML, a Fast Library for Automated Machine Learning & Tuning! -->

 FLAML is a lightweight Python library that finds accurate machine
-learning models automatically, efficiently and economically. It frees users from selecting learners and hyperparameters for each learner.
+learning models automatically, efficiently and economically. It frees users from selecting models and hyperparameters for each model.

 ### Main Features

-1. For common machine learning tasks like classification and regression, it quickly finds quality models for user-provided data with low computational resources. It supports both classical machine learning models and deep neural networks.
+1. For common machine learning or AI tasks like classification, regression, and generation, it quickly finds quality models for user-provided data with low computational resources. It supports both classical machine learning models and deep neural networks, including large language models such as the OpenAI GPT-3 models.

 2. It is easy to customize or extend. Users can find their desired customizability from a smooth range: minimal customization (computational resource budget), medium customization (e.g., scikit-style learner, search space and metric), or full customization (arbitrary training and evaluation code). Users can customize only when and what they need to, and leave the rest to the library.

 3. It supports fast and economical automatic tuning, capable of handling large search space with heterogeneous evaluation cost and complex constraints/guidance/early stopping. FLAML is powered by a new, [cost-effective
 hyperparameter optimization](Use-Cases/Tune-User-Defined-Function#hyperparameter-optimization-algorithm)
-and learner selection method invented by Microsoft Research.
+and model selection method invented by Microsoft Research, and many followup [research studies](Research).

 ### Quickstart

--- a/website/docs/Installation.md
+++ b/website/docs/Installation.md
@ -24,8 +24,11 @@ install flaml with the [notebook] option:
 pip install flaml[notebook]
 ```

-#### Extra learners
-
+#### Extra learners/models
+* openai models
+```bash
+pip install flaml[openai]
+```
 * catboost
 ```bash
 pip install flaml[catboost]
@ -38,10 +41,9 @@ pip install flaml[vw]
 ```bash
 pip install flaml[forecast]
 ```
-
-* natural language processing: transformers
+* huggingface transformers
 ```bash
-pip install flaml[nlp]
+pip install flaml[hf]
 ```

 #### Distributed tuning