autogen/notebook/automl_in_sklearn_pipeline....

{
 "cells": [
  {
   "cell_type": "markdown",
   "source": [
    "Copyright (c) 2021. All rights reserved.\n",
    "\n",
    "Contributed by: @bnriiitb\n",
    "\n",
    "Licensed under the MIT License."
   ],
   "metadata": {}
  },
  {
   "cell_type": "markdown",
   "source": [
    "# Using AutoML in Sklearn Pipeline\n",
    "\n",
    "This tutorial will help you understand how FLAML's AutoML can be used as a transformer in the Sklearn pipeline."
   ],
   "metadata": {}
  },
  {
   "cell_type": "markdown",
   "source": [
    "\n",
    "## 1.Introduction\n",
    "\n",
    "### 1.1 FLAML - Fast and Lightweight AutoML\n",
    "\n",
    "FLAML is a Python library (https://github.com/microsoft/FLAML) designed to automatically produce accurate machine learning models with low computational cost. It is fast and cheap. The simple and lightweight design makes it easy  to use and extend, such as adding new learners. \n",
    "\n",
    "FLAML can \n",
    "- serve as an economical AutoML engine,\n",
    "- be used as a fast hyperparameter tuning tool, or \n",
    "- be embedded in self-tuning software that requires low latency & resource in repetitive\n",
    "   tuning tasks.\n",
    "\n",
    "In this notebook, we use one real data example (binary classification) to showcase how to use FLAML library.\n",
    "\n",
    "FLAML requires `Python>=3.6`. To run this notebook example, please install flaml with the `notebook` option:\n",
    "```bash\n",
    "pip install flaml[notebook]\n",
    "```"
   ],
   "metadata": {}
  },
  {
   "cell_type": "markdown",
   "source": [
    "### 1.2 Why are pipelines a silver bullet?\n",
    "\n",
    "In a typical machine learning workflow we have to apply all the transformations at least twice. \n",
    "1. During Training\n",
    "2. During Inference\n",
    "\n",
    "Scikit-learn pipelines provide an easy to use inteface to automate ML workflows by allowing several transformers to be chained together. \n",
    "\n",
    "The key benefits of using pipelines:\n",
    "* Make ML workflows highly readable, enabling fast development and easy review\n",
    "* Help to build sequential and parallel processes\n",
    "* Allow hyperparameter tuning across the estimators\n",
    "* Easier to share and collaborate with multiple users (bug fixes, enhancements etc)\n",
    "* Enforce the implementation and order of steps"
   ],
   "metadata": {}
  },
  {
   "cell_type": "markdown",
   "source": [
    "#### As FLAML's AutoML module can be used a transformer in the Sklearn's pipeline we can get all the benefits of pipeline and thereby write extremley clean, and resuable code."
   ],
   "metadata": {}
  },
  {
   "cell_type": "code",
   "execution_count": 44,
   "source": [
    "!pip install flaml[notebook];"
   ],
   "outputs": [],
   "metadata": {}
  },
  {
   "cell_type": "markdown",
   "source": [
    "## 2. Classification Example\n",
    "### Load data and preprocess\n",
    "\n",
    "Download [Airlines dataset](https://www.openml.org/d/1169) from OpenML. The task is to predict whether a given flight will be delayed, given the information of the scheduled departure."
   ],
   "metadata": {}
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "source": [
    "from flaml.data import load_openml_dataset\n",
    "X_train, X_test, y_train, y_test = load_openml_dataset(\n",
    "    dataset_id=1169, data_dir='./', random_state=1234, dataset_format='array')"
   ],
   "outputs": [
    {
     "output_type": "stream",
     "name": "stdout",
     "text": [
      "load dataset from ./openml_ds1169.pkl\n",
      "Dataset name: airlines\n",
      "X_train.shape: (404537, 7), y_train.shape: (404537,);\n",
      "X_test.shape: (134846, 7), y_test.shape: (134846,)\n"
     ]
    }
   ],
   "metadata": {}
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "source": [
    "X_train[0]"
   ],
   "outputs": [
    {
     "output_type": "execute_result",
     "data": {
      "text/plain": [
       "array([  12., 2648.,    4.,   15.,    4.,  450.,   67.], dtype=float32)"
      ]
     },
     "metadata": {},
     "execution_count": 5
    }
   ],
   "metadata": {}
  },
  {
   "cell_type": "markdown",
   "source": [
    "## 3. Create a Pipeline"
   ],
   "metadata": {}
  },
  {
   "cell_type": "code",
   "execution_count": 6,
   "source": [
    "import sklearn\n",
    "from sklearn import set_config\n",
    "from sklearn.pipeline import Pipeline\n",
    "from sklearn.impute import SimpleImputer\n",
    "from sklearn.preprocessing import StandardScaler\n",
    "from flaml import AutoML\n",
    "\n",
    "set_config(display='diagram')\n",
    "\n",
    "imputer = SimpleImputer()\n",
    "standardizer = StandardScaler()\n",
    "automl = AutoML()\n",
    "\n",
    "automl_pipeline = Pipeline([\n",
    "    (\"imputuer\",imputer),\n",
    "    (\"standardizer\", standardizer),\n",
    "    (\"automl\", automl)\n",
    "])\n",
    "automl_pipeline"
   ],
   "outputs": [
    {
     "output_type": "execute_result",
     "data": {
      "text/plain": [
       "Pipeline(steps=[('imputuer', SimpleImputer()),\n",
       "                ('standardizer', StandardScaler()),\n",
       "                ('automl', <flaml.automl.AutoML object at 0x7f046d56fb50>)])"
      ],
      "text/html": [
       "<style>div.sk-top-container {color: black;background-color: white;}div.sk-toggleable {background-color: white;}label.sk-toggleable__label {cursor: pointer;display: block;width: 100%;margin-bottom: 0;padding: 0.2em 0.3em;box-sizing: border-box;text-align: center;}div.sk-toggleable__content {max-height: 0;max-width: 0;overflow: hidden;text-align: left;background-color: #f0f8ff;}div.sk-toggleable__content pre {margin: 0.2em;color: black;border-radius: 0.25em;background-color: #f0f8ff;}input.sk-toggleable__control:checked~div.sk-toggleable__content {max-height: 200px;max-width: 100%;overflow: auto;}div.sk-estimator input.sk-toggleable__control:checked~label.sk-toggleable__label {background-color: #d4ebff;}div.sk-label input.sk-toggleable__control:checked~label.sk-toggleable__label {background-color: #d4ebff;}input.sk-hidden--visually {border: 0;clip: rect(1px 1px 1px 1px);clip: rect(1px, 1px, 1px, 1px);height: 1px;margin: -1px;overflow: hidden;padding: 0;position: absolute;width: 1px;}div.sk-estimator {font-family: monospace;background-color: #f0f8ff;margin: 0.25em 0.25em;border: 1px dotted black;border-radius: 0.25em;box-sizing: border-box;}div.sk-estimator:hover {background-color: #d4ebff;}div.sk-parallel-item::after {content: \"\";width: 100%;border-bottom: 1px solid gray;flex-grow: 1;}div.sk-label:hover label.sk-toggleable__label {background-color: #d4ebff;}div.sk-serial::before {content: \"\";position: absolute;border-left: 1px solid gray;box-sizing: border-box;top: 2em;bottom: 0;left: 50%;}div.sk-serial {display: flex;flex-direction: column;align-items: center;background-color: white;}div.sk-item {z-index: 1;}div.sk-parallel {display: flex;align-items: stretch;justify-content: center;background-color: white;}div.sk-parallel-item {display: flex;flex-direction: column;position: relative;background-color: white;}div.sk-parallel-item:first-child::after {align-self: flex-end;width: 50%;}div.sk-parallel-item:last-child::after {align-self: flex-start;width: 50%;}div.sk-parallel-item:only-child::after {width: 0;}div.sk-dashed-wrapped {border: 1px dashed gray;margin: 0.2em;box-sizing: border-box;padding-bottom: 0.1em;background-color: white;position: relative;}div.sk-label label {font-family: monospace;font-weight: bold;background-color: white;display: inline-block;line-height: 1.2em;}div.sk-label-container {position: relative;z-index: 2;text-align: center;}div.sk-container {display: inline-block;position: relative;}</style><div class=\"sk-top-container\"><div class=\"sk-container\"><div class=\"sk-item sk-dashed-wrapped\"><div class=\"sk-label-container\"><div class=\"sk-label sk-toggleable\"><input class=\"sk-toggleable__control sk-hidden--visually\" id=\"b91d1bdf-ccb8-4fa5-a2d0-67a3538c0afc\" type=\"checkbox\" ><label class=\"sk-toggleable__label\" for=\"b91d1bdf-ccb8-4fa5-a2d0-67a3538c0afc\">Pipeline</label><div class=\"sk-toggleable__content\"><pre>Pipeline(steps=[('imputuer', SimpleImputer()),\n",
       "                ('standardizer', StandardScaler()),\n",
       "                ('automl', <flaml.automl.AutoML object at 0x7f046d56fb50>)])</pre></div></div></div><div class=\"sk-serial\"><div class=\"sk-item\"><div class=\"sk-estimator sk-toggleable\"><input class=\"sk-toggleable__control sk-hidden--visually\" id=\"a8311733-9e55-4c0c-9c2a-6b9ba6227596\" type=\"checkbox\" ><label class=\"sk-toggleable__label\" for=\"a8311733-9e55-4c0c-9c2a-6b9ba6227596\">SimpleImputer</label><div class=\"sk-toggleable__content\"><pre>SimpleImputer()</pre></div></div></div><div class=\"sk-item\"><div class=\"sk-estimator sk-toggleable\"><input class=\"sk-toggleable__control sk-hidden--visually\" id=\"52580e54-89ab-4fb7-83a1-ae13962854bb\" type=\"checkbox\" ><label class=\"sk-toggleable__label\" for=\"52580e54-89ab-4fb7-83a1-ae13962854bb\">StandardScaler</label><div class=\"sk-toggleable__content\"><pre>StandardScaler()</pre></div></div></div><div class=\"sk-item\"><div class=\"sk-estimator sk-toggleable\"><input class=\"sk-toggleable__control sk-hidden--visually\" id=\"b9fe5397-bf24-491d-a938-c39a780e1ac0\" type=\"checkbox\" ><label class=\"sk-toggleable__label\" for=\"b9fe5397-bf24-491d-a938-c39a780e1ac0\">AutoML</label><div class=\"sk-toggleable__content\"><pre><flaml.automl.AutoML object at 0x7f046d56fb50></pre></div></div></div></div></div></div></div>"
      ]
     },
     "metadata": {},
     "execution_count": 6
    }
   ],
   "metadata": {}
  },
  {
   "cell_type": "markdown",
   "source": [
    "### Run FLAML\n",
    "In the FLAML automl run configuration, users can specify the task type, time budget, error metric, learner list, whether to subsample, resampling strategy type, and so on. All these arguments have default values which will be used if users do not provide them. For example, the default ML learners of FLAML are `['lgbm', 'xgboost', 'catboost', 'rf', 'extra_tree', 'lrl1']`. "
   ],
   "metadata": {}
  },
  {
   "cell_type": "code",
   "execution_count": 7,
   "source": [
    "settings = {\n",
    "    \"time_budget\": 60,  # total running time in seconds\n",
    "    \"metric\": 'accuracy',  # primary metrics can be chosen from: ['accuracy','roc_auc', 'roc_auc_ovr', 'roc_auc_ovo', 'f1','log_loss','mae','mse','r2']\n",
    "    \"task\": 'classification',  # task type   \n",
    "    \"estimator_list\":['xgboost','catboost','lgbm'],\n",
    "    \"log_file_name\": 'airlines_experiment.log',  # flaml log file\n",
    "}"
   ],
   "outputs": [],
   "metadata": {}
  },
  {
   "cell_type": "code",
   "execution_count": 8,
   "source": [
    "automl_pipeline.fit(X_train, y_train, \n",
    "                        automl__time_budget=settings['time_budget'],\n",
    "                        automl__metric=settings['metric'],\n",
    "                        automl__estimator_list=settings['estimator_list'],\n",
    "                        automl__log_training_metric=True)"
   ],
   "outputs": [
    {
     "output_type": "stream",
     "name": "stderr",
     "text": [
      "[flaml.automl: 08-22 21:32:13] {1130} INFO - Evaluation method: holdout\n",
      "[flaml.automl: 08-22 21:32:14] {624} INFO - Using StratifiedKFold\n",
      "[flaml.automl: 08-22 21:32:14] {1155} INFO - Minimizing error metric: 1-accuracy\n",
      "[flaml.automl: 08-22 21:32:14] {1175} INFO - List of ML learners in AutoML Run: ['xgboost', 'catboost', 'lgbm']\n",
      "[flaml.automl: 08-22 21:32:14] {1358} INFO - iteration 0, current learner xgboost\n",
      "[flaml.automl: 08-22 21:32:14] {1515} INFO -  at 0.5s,\tbest xgboost's error=0.3755,\tbest xgboost's error=0.3755\n",
      "[flaml.automl: 08-22 21:32:14] {1358} INFO - iteration 1, current learner xgboost\n",
      "[flaml.automl: 08-22 21:32:14] {1515} INFO -  at 0.6s,\tbest xgboost's error=0.3755,\tbest xgboost's error=0.3755\n",
      "[flaml.automl: 08-22 21:32:14] {1358} INFO - iteration 2, current learner xgboost\n",
      "[flaml.automl: 08-22 21:32:14] {1515} INFO -  at 0.6s,\tbest xgboost's error=0.3755,\tbest xgboost's error=0.3755\n",
      "[flaml.automl: 08-22 21:32:14] {1358} INFO - iteration 3, current learner xgboost\n",
      "[flaml.automl: 08-22 21:32:14] {1515} INFO -  at 0.7s,\tbest xgboost's error=0.3755,\tbest xgboost's error=0.3755\n",
      "[flaml.automl: 08-22 21:32:14] {1358} INFO - iteration 4, current learner xgboost\n",
      "[flaml.automl: 08-22 21:32:14] {1515} INFO -  at 0.7s,\tbest xgboost's error=0.3679,\tbest xgboost's error=0.3679\n",
      "[flaml.automl: 08-22 21:32:14] {1358} INFO - iteration 5, current learner lgbm\n",
      "[flaml.automl: 08-22 21:32:14] {1515} INFO -  at 0.8s,\tbest lgbm's error=0.3811,\tbest xgboost's error=0.3679\n",
      "[flaml.automl: 08-22 21:32:14] {1358} INFO - iteration 6, current learner xgboost\n",
      "[flaml.automl: 08-22 21:32:14] {1515} INFO -  at 0.8s,\tbest xgboost's error=0.3679,\tbest xgboost's error=0.3679\n",
      "[flaml.automl: 08-22 21:32:14] {1358} INFO - iteration 7, current learner xgboost\n",
      "[flaml.automl: 08-22 21:32:14] {1515} INFO -  at 0.9s,\tbest xgboost's error=0.3679,\tbest xgboost's error=0.3679\n",
      "[flaml.automl: 08-22 21:32:14] {1358} INFO - iteration 8, current learner xgboost\n",
      "[flaml.automl: 08-22 21:32:14] {1515} INFO -  at 1.0s,\tbest xgboost's error=0.3679,\tbest xgboost's error=0.3679\n",
      "[flaml.automl: 08-22 21:32:14] {1358} INFO - iteration 9, current learner lgbm\n",
      "[flaml.automl: 08-22 21:32:15] {1515} INFO -  at 1.1s,\tbest lgbm's error=0.3811,\tbest xgboost's error=0.3679\n",
      "[flaml.automl: 08-22 21:32:15] {1358} INFO - iteration 10, current learner lgbm\n",
      "[flaml.automl: 08-22 21:32:15] {1515} INFO -  at 1.1s,\tbest lgbm's error=0.3755,\tbest xgboost's error=0.3679\n",
      "[flaml.automl: 08-22 21:32:15] {1358} INFO - iteration 11, current learner xgboost\n",
      "[flaml.automl: 08-22 21:32:15] {1515} INFO -  at 1.2s,\tbest xgboost's error=0.3637,\tbest xgboost's error=0.3637\n",
      "[flaml.automl: 08-22 21:32:15] {1358} INFO - iteration 12, current learner xgboost\n",
      "[flaml.automl: 08-22 21:32:15] {1515} INFO -  at 1.4s,\tbest xgboost's error=0.3594,\tbest xgboost's error=0.3594\n",
      "[flaml.automl: 08-22 21:32:15] {1358} INFO - iteration 13, current learner xgboost\n",
      "[flaml.automl: 08-22 21:32:15] {1515} INFO -  at 1.5s,\tbest xgboost's error=0.3594,\tbest xgboost's error=0.3594\n",
      "[flaml.automl: 08-22 21:32:15] {1358} INFO - iteration 14, current learner xgboost\n",
      "[flaml.automl: 08-22 21:32:15] {1515} INFO -  at 1.7s,\tbest xgboost's error=0.3591,\tbest xgboost's error=0.3591\n",
      "[flaml.automl: 08-22 21:32:15] {1358} INFO - iteration 15, current learner lgbm\n",
      "[flaml.automl: 08-22 21:32:15] {1515} INFO -  at 1.7s,\tbest lgbm's error=0.3647,\tbest xgboost's error=0.3591\n",
      "[flaml.automl: 08-22 21:32:15] {1358} INFO - iteration 16, current learner xgboost\n",
      "[flaml.automl: 08-22 21:32:15] {1515} INFO -  at 2.0s,\tbest xgboost's error=0.3585,\tbest xgboost's error=0.3585\n",
      "[flaml.automl: 08-22 21:32:15] {1358} INFO - iteration 17, current learner lgbm\n",
      "[flaml.automl: 08-22 21:32:16] {1515} INFO -  at 2.0s,\tbest lgbm's error=0.3647,\tbest xgboost's error=0.3585\n",
      "[flaml.automl: 08-22 21:32:16] {1358} INFO - iteration 18, current learner lgbm\n",
      "[flaml.automl: 08-22 21:32:16] {1515} INFO -  at 2.1s,\tbest lgbm's error=0.3629,\tbest xgboost's error=0.3585\n",
      "[flaml.automl: 08-22 21:32:16] {1358} INFO - iteration 19, current learner xgboost\n",
      "[flaml.automl: 08-22 21:32:16] {1515} INFO -  at 2.3s,\tbest xgboost's error=0.3553,\tbest xgboost's error=0.3553\n",
      "[flaml.automl: 08-22 21:32:16] {1358} INFO - iteration 20, current learner xgboost\n",
      "[flaml.automl: 08-22 21:32:16] {1515} INFO -  at 2.6s,\tbest xgboost's error=0.3553,\tbest xgboost's error=0.3553\n",
      "[flaml.automl: 08-22 21:32:16] {1358} INFO - iteration 21, current learner xgboost\n",
      "[flaml.automl: 08-22 21:32:16] {1515} INFO -  at 2.7s,\tbest xgboost's error=0.3553,\tbest xgboost's error=0.3553\n",
      "[flaml.automl: 08-22 21:32:16] {1358} INFO - iteration 22, current learner lgbm\n",
      "[flaml.automl: 08-22 21:32:16] {1515} INFO -  at 2.8s,\tbest lgbm's error=0.3629,\tbest xgboost's error=0.3553\n",
      "[flaml.automl: 08-22 21:32:16] {1358} INFO - iteration 23, current learner lgbm\n",
      "[flaml.automl: 08-22 21:32:16] {1515} INFO -  at 2.9s,\tbest lgbm's error=0.3629,\tbest xgboost's error=0.3553\n",
      "[flaml.automl: 08-22 21:32:16] {1358} INFO - iteration 24, current learner xgboost\n",
      "[flaml.automl: 08-22 21:32:17] {1515} INFO -  at 3.1s,\tbest xgboost's error=0.3520,\tbest xgboost's error=0.3520\n",
      "[flaml.automl: 08-22 21:32:17] {1358} INFO - iteration 25, current learner xgboost\n",
      "[flaml.automl: 08-22 21:32:17] {1515} INFO -  at 3.3s,\tbest xgboost's error=0.3520,\tbest xgboost's error=0.3520\n",
      "[flaml.automl: 08-22 21:32:17] {1358} INFO - iteration 26, current learner lgbm\n",
      "[flaml.automl: 08-22 21:32:17] {1515} INFO -  at 3.4s,\tbest lgbm's error=0.3573,\tbest xgboost's error=0.3520\n",
      "[flaml.automl: 08-22 21:32:17] {1358} INFO - iteration 27, current learner lgbm\n",
      "[flaml.automl: 08-22 21:32:17] {1515} INFO -  at 3.5s,\tbest lgbm's error=0.3573,\tbest xgboost's error=0.3520\n",
      "[flaml.automl: 08-22 21:32:17] {1358} INFO - iteration 28, current learner xgboost\n",
      "[flaml.automl: 08-22 21:32:17] {1515} INFO -  at 3.9s,\tbest xgboost's error=0.3520,\tbest xgboost's error=0.3520\n",
      "[flaml.automl: 08-22 21:32:17] {1358} INFO - iteration 29, current learner xgboost\n",
      "[flaml.automl: 08-22 21:32:18] {1515} INFO -  at 4.1s,\tbest xgboost's error=0.3520,\tbest xgboost's error=0.3520\n",
      "[flaml.automl: 08-22 21:32:18] {1358} INFO - iteration 30, current learner xgboost\n",
      "[flaml.automl: 08-22 21:32:18] {1515} INFO -  at 4.8s,\tbest xgboost's error=0.3485,\tbest xgboost's error=0.3485\n",
      "[flaml.automl: 08-22 21:32:18] {1358} INFO - iteration 31, current learner lgbm\n",
      "[flaml.automl: 08-22 21:32:19] {1515} INFO -  at 5.2s,\tbest lgbm's error=0.3573,\tbest xgboost's error=0.3485\n",
      "[flaml.automl: 08-22 21:32:19] {1358} INFO - iteration 32, current learner xgboost\n",
      "[flaml.automl: 08-22 21:32:19] {1515} INFO -  at 5.7s,\tbest xgboost's error=0.3485,\tbest xgboost's error=0.3485\n",
      "[flaml.automl: 08-22 21:32:19] {1358} INFO - iteration 33, current learner xgboost\n",
      "[flaml.automl: 08-22 21:32:20] {1515} INFO -  at 6.6s,\tbest xgboost's error=0.3485,\tbest xgboost's error=0.3485\n",
      "[flaml.automl: 08-22 21:32:20] {1358} INFO - iteration 34, current learner lgbm\n",
      "[flaml.automl: 08-22 21:32:20] {1515} INFO -  at 6.9s,\tbest lgbm's error=0.3481,\tbest lgbm's error=0.3481\n",
      "[flaml.automl: 08-22 21:32:20] {1358} INFO - iteration 35, current learner lgbm\n",
      "[flaml.automl: 08-22 21:32:21] {1515} INFO -  at 7.2s,\tbest lgbm's error=0.3481,\tbest lgbm's error=0.3481\n",
      "[flaml.automl: 08-22 21:32:21] {1358} INFO - iteration 36, current learner lgbm\n",
      "[flaml.automl: 08-22 21:32:21] {1515} INFO -  at 7.4s,\tbest lgbm's error=0.3481,\tbest lgbm's error=0.3481\n",
      "[flaml.automl: 08-22 21:32:21] {1358} INFO - iteration 37, current learner xgboost\n",
      "[flaml.automl: 08-22 21:32:22] {1515} INFO -  at 8.2s,\tbest xgboost's error=0.3485,\tbest lgbm's error=0.3481\n",
      "[flaml.automl: 08-22 21:32:22] {1358} INFO - iteration 38, current learner lgbm\n",
      "[flaml.automl: 08-22 21:32:22] {1515} INFO -  at 8.5s,\tbest lgbm's error=0.3481,\tbest lgbm's error=0.3481\n",
      "[flaml.automl: 08-22 21:32:22] {1358} INFO - iteration 39, current learner lgbm\n",
      "[flaml.automl: 08-22 21:32:22] {1515} INFO -  at 8.8s,\tbest lgbm's error=0.3481,\tbest lgbm's error=0.3481\n",
      "[flaml.automl: 08-22 21:32:22] {1358} INFO - iteration 40, current learner xgboost\n",
      "[flaml.automl: 08-22 21:32:23] {1515} INFO -  at 9.7s,\tbest xgboost's error=0.3485,\tbest lgbm's error=0.3481\n",
      "[flaml.automl: 08-22 21:32:23] {1358} INFO - iteration 41, current learner lgbm\n",
      "[flaml.automl: 08-22 21:32:25] {1515} INFO -  at 11.7s,\tbest lgbm's error=0.3481,\tbest lgbm's error=0.3481\n",
      "[flaml.automl: 08-22 21:32:25] {1358} INFO - iteration 42, current learner catboost\n",
      "[flaml.automl: 08-22 21:32:26] {1515} INFO -  at 12.2s,\tbest catboost's error=0.3647,\tbest lgbm's error=0.3481\n",
      "[flaml.automl: 08-22 21:32:26] {1358} INFO - iteration 43, current learner lgbm\n",
      "[flaml.automl: 08-22 21:32:28] {1515} INFO -  at 14.4s,\tbest lgbm's error=0.3427,\tbest lgbm's error=0.3427\n",
      "[flaml.automl: 08-22 21:32:28] {1358} INFO - iteration 44, current learner catboost\n",
      "[flaml.automl: 08-22 21:32:28] {1515} INFO -  at 14.6s,\tbest catboost's error=0.3647,\tbest lgbm's error=0.3427\n",
      "[flaml.automl: 08-22 21:32:28] {1358} INFO - iteration 45, current learner catboost\n",
      "[flaml.automl: 08-22 21:32:28] {1515} INFO -  at 14.8s,\tbest catboost's error=0.3601,\tbest lgbm's error=0.3427\n",
      "[flaml.automl: 08-22 21:32:28] {1358} INFO - iteration 46, current learner lgbm\n",
      "[flaml.automl: 08-22 21:32:30] {1515} INFO -  at 16.9s,\tbest lgbm's error=0.3427,\tbest lgbm's error=0.3427\n",
      "[flaml.automl: 08-22 21:32:30] {1358} INFO - iteration 47, current learner xgboost\n",
      "[flaml.automl: 08-22 21:32:34] {1515} INFO -  at 21.0s,\tbest xgboost's error=0.3332,\tbest xgboost's error=0.3332\n",
      "[flaml.automl: 08-22 21:32:34] {1358} INFO - iteration 48, current learner catboost\n",
      "[flaml.automl: 08-22 21:32:35] {1515} INFO -  at 21.1s,\tbest catboost's error=0.3601,\tbest xgboost's error=0.3332\n",
      "[flaml.automl: 08-22 21:32:35] {1358} INFO - iteration 49, current learner lgbm\n",
      "[flaml.automl: 08-22 21:32:37] {1515} INFO -  at 23.2s,\tbest lgbm's error=0.3409,\tbest xgboost's error=0.3332\n",
      "[flaml.automl: 08-22 21:32:37] {1358} INFO - iteration 50, current learner xgboost\n",
      "[flaml.automl: 08-22 21:32:38] {1515} INFO -  at 24.6s,\tbest xgboost's error=0.3332,\tbest xgboost's error=0.3332\n",
      "[flaml.automl: 08-22 21:32:38] {1358} INFO - iteration 51, current learner xgboost\n",
      "[flaml.automl: 08-22 21:32:53] {1515} INFO -  at 40.0s,\tbest xgboost's error=0.3279,\tbest xgboost's error=0.3279\n",
      "[flaml.automl: 08-22 21:32:53] {1358} INFO - iteration 52, current learner xgboost\n",
      "[flaml.automl: 08-22 21:33:01] {1515} INFO -  at 47.6s,\tbest xgboost's error=0.3279,\tbest xgboost's error=0.3279\n",
      "[flaml.automl: 08-22 21:33:01] {1358} INFO - iteration 53, current learner catboost\n",
      "[flaml.automl: 08-22 21:33:01] {1515} INFO -  at 47.7s,\tbest catboost's error=0.3601,\tbest xgboost's error=0.3279\n",
      "[flaml.automl: 08-22 21:33:01] {1358} INFO - iteration 54, current learner catboost\n",
      "[flaml.automl: 08-22 21:33:02] {1515} INFO -  at 48.2s,\tbest catboost's error=0.3601,\tbest xgboost's error=0.3279\n",
      "[flaml.automl: 08-22 21:33:02] {1358} INFO - iteration 55, current learner catboost\n",
      "[flaml.automl: 08-22 21:33:02] {1515} INFO -  at 48.5s,\tbest catboost's error=0.3552,\tbest xgboost's error=0.3279\n",
      "[flaml.automl: 08-22 21:33:02] {1358} INFO - iteration 56, current learner catboost\n",
      "[flaml.automl: 08-22 21:33:02] {1515} INFO -  at 48.7s,\tbest catboost's error=0.3552,\tbest xgboost's error=0.3279\n",
      "[flaml.automl: 08-22 21:33:02] {1358} INFO - iteration 57, current learner catboost\n",
      "[flaml.automl: 08-22 21:33:02] {1515} INFO -  at 49.0s,\tbest catboost's error=0.3552,\tbest xgboost's error=0.3279\n",
      "[flaml.automl: 08-22 21:33:02] {1358} INFO - iteration 58, current learner catboost\n",
      "[flaml.automl: 08-22 21:33:03] {1515} INFO -  at 49.1s,\tbest catboost's error=0.3552,\tbest xgboost's error=0.3279\n",
      "[flaml.automl: 08-22 21:33:03] {1358} INFO - iteration 59, current learner catboost\n",
      "[flaml.automl: 08-22 21:33:03] {1515} INFO -  at 49.4s,\tbest catboost's error=0.3552,\tbest xgboost's error=0.3279\n",
      "[flaml.automl: 08-22 21:33:03] {1358} INFO - iteration 60, current learner catboost\n",
      "[flaml.automl: 08-22 21:33:06] {1515} INFO -  at 52.2s,\tbest catboost's error=0.3453,\tbest xgboost's error=0.3279\n",
      "[flaml.automl: 08-22 21:33:06] {1358} INFO - iteration 61, current learner catboost\n",
      "[flaml.automl: 08-22 21:33:07] {1515} INFO -  at 53.9s,\tbest catboost's error=0.3453,\tbest xgboost's error=0.3279\n",
      "[flaml.automl: 08-22 21:33:07] {1358} INFO - iteration 62, current learner catboost\n",
      "[flaml.automl: 08-22 21:33:09] {1515} INFO -  at 55.3s,\tbest catboost's error=0.3453,\tbest xgboost's error=0.3279\n",
      "[flaml.automl: 08-22 21:33:09] {1358} INFO - iteration 63, current learner catboost\n",
      "[flaml.automl: 08-22 21:33:10] {1515} INFO -  at 56.4s,\tbest catboost's error=0.3453,\tbest xgboost's error=0.3279\n",
      "[flaml.automl: 08-22 21:33:10] {1358} INFO - iteration 64, current learner catboost\n",
      "[flaml.automl: 08-22 21:33:11] {1515} INFO -  at 57.5s,\tbest catboost's error=0.3453,\tbest xgboost's error=0.3279\n",
      "[flaml.automl: 08-22 21:33:11] {1358} INFO - iteration 65, current learner lgbm\n",
      "[flaml.automl: 08-22 21:33:13] {1515} INFO -  at 59.8s,\tbest lgbm's error=0.3409,\tbest xgboost's error=0.3279\n",
      "[flaml.automl: 08-22 21:33:13] {1592} INFO - selected model: XGBClassifier(base_score=0.5, booster='gbtree',\n",
      "              colsample_bylevel=0.810466508891351, colsample_bynode=1,\n",
      "              colsample_bytree=0.8005378817953572, gamma=0, gpu_id=-1,\n",
      "              grow_policy='lossguide', importance_type='gain',\n",
      "              interaction_constraints='', learning_rate=0.06234183309508761,\n",
      "              max_delta_step=0, max_depth=0, max_leaves=1797,\n",
      "              min_child_weight=0.07275175679381725, missing=nan,\n",
      "              monotone_constraints='()', n_estimators=63, n_jobs=-1,\n",
      "              num_parallel_tree=1, random_state=0, reg_alpha=0.5768305704485758,\n",
      "              reg_lambda=6.867180836557797, scale_pos_weight=1,\n",
      "              subsample=0.9814772488195874, tree_method='hist',\n",
      "              use_label_encoder=False, validate_parameters=1, verbosity=0)\n",
      "[flaml.automl: 08-22 21:33:26] {1633} INFO - retrain xgboost for 13.0s\n",
      "[flaml.automl: 08-22 21:33:26] {1636} INFO - retrained model: XGBClassifier(base_score=0.5, booster='gbtree',\n",
      "              colsample_bylevel=0.810466508891351, colsample_bynode=1,\n",
      "              colsample_bytree=0.8005378817953572, gamma=0, gpu_id=-1,\n",
      "              grow_policy='lossguide', importance_type='gain',\n",
      "              interaction_constraints='', learning_rate=0.06234183309508761,\n",
      "              max_delta_step=0, max_depth=0, max_leaves=1797,\n",
      "              min_child_weight=0.07275175679381725, missing=nan,\n",
      "              monotone_constraints='()', n_estimators=63, n_jobs=-1,\n",
      "              num_parallel_tree=1, random_state=0, reg_alpha=0.5768305704485758,\n",
      "              reg_lambda=6.867180836557797, scale_pos_weight=1,\n",
      "              subsample=0.9814772488195874, tree_method='hist',\n",
      "              use_label_encoder=False, validate_parameters=1, verbosity=0)\n",
      "[flaml.automl: 08-22 21:33:26] {1199} INFO - fit succeeded\n",
      "[flaml.automl: 08-22 21:33:26] {1200} INFO - Time taken to find the best model: 40.023393869400024\n"
     ]
    },
    {
     "output_type": "execute_result",
     "data": {
      "text/plain": [
       "Pipeline(steps=[('imputuer', SimpleImputer()),\n",
       "                ('standardizer', StandardScaler()),\n",
       "                ('automl', <flaml.automl.AutoML object at 0x7f046d56fb50>)])"
      ],
      "text/html": [
       "<style>div.sk-top-container {color: black;background-color: white;}div.sk-toggleable {background-color: white;}label.sk-toggleable__label {cursor: pointer;display: block;width: 100%;margin-bottom: 0;padding: 0.2em 0.3em;box-sizing: border-box;text-align: center;}div.sk-toggleable__content {max-height: 0;max-width: 0;overflow: hidden;text-align: left;background-color: #f0f8ff;}div.sk-toggleable__content pre {margin: 0.2em;color: black;border-radius: 0.25em;background-color: #f0f8ff;}input.sk-toggleable__control:checked~div.sk-toggleable__content {max-height: 200px;max-width: 100%;overflow: auto;}div.sk-estimator input.sk-toggleable__control:checked~label.sk-toggleable__label {background-color: #d4ebff;}div.sk-label input.sk-toggleable__control:checked~label.sk-toggleable__label {background-color: #d4ebff;}input.sk-hidden--visually {border: 0;clip: rect(1px 1px 1px 1px);clip: rect(1px, 1px, 1px, 1px);height: 1px;margin: -1px;overflow: hidden;padding: 0;position: absolute;width: 1px;}div.sk-estimator {font-family: monospace;background-color: #f0f8ff;margin: 0.25em 0.25em;border: 1px dotted black;border-radius: 0.25em;box-sizing: border-box;}div.sk-estimator:hover {background-color: #d4ebff;}div.sk-parallel-item::after {content: \"\";width: 100%;border-bottom: 1px solid gray;flex-grow: 1;}div.sk-label:hover label.sk-toggleable__label {background-color: #d4ebff;}div.sk-serial::before {content: \"\";position: absolute;border-left: 1px solid gray;box-sizing: border-box;top: 2em;bottom: 0;left: 50%;}div.sk-serial {display: flex;flex-direction: column;align-items: center;background-color: white;}div.sk-item {z-index: 1;}div.sk-parallel {display: flex;align-items: stretch;justify-content: center;background-color: white;}div.sk-parallel-item {display: flex;flex-direction: column;position: relative;background-color: white;}div.sk-parallel-item:first-child::after {align-self: flex-end;width: 50%;}div.sk-parallel-item:last-child::after {align-self: flex-start;width: 50%;}div.sk-parallel-item:only-child::after {width: 0;}div.sk-dashed-wrapped {border: 1px dashed gray;margin: 0.2em;box-sizing: border-box;padding-bottom: 0.1em;background-color: white;position: relative;}div.sk-label label {font-family: monospace;font-weight: bold;background-color: white;display: inline-block;line-height: 1.2em;}div.sk-label-container {position: relative;z-index: 2;text-align: center;}div.sk-container {display: inline-block;position: relative;}</style><div class=\"sk-top-container\"><div class=\"sk-container\"><div class=\"sk-item sk-dashed-wrapped\"><div class=\"sk-label-container\"><div class=\"sk-label sk-toggleable\"><input class=\"sk-toggleable__control sk-hidden--visually\" id=\"b994edf1-5e76-4cd3-b719-4a204af673dc\" type=\"checkbox\" ><label class=\"sk-toggleable__label\" for=\"b994edf1-5e76-4cd3-b719-4a204af673dc\">Pipeline</label><div class=\"sk-toggleable__content\"><pre>Pipeline(steps=[('imputuer', SimpleImputer()),\n",
       "                ('standardizer', StandardScaler()),\n",
       "                ('automl', <flaml.automl.AutoML object at 0x7f046d56fb50>)])</pre></div></div></div><div class=\"sk-serial\"><div class=\"sk-item\"><div class=\"sk-estimator sk-toggleable\"><input class=\"sk-toggleable__control sk-hidden--visually\" id=\"c94ee64a-d8b1-4cbb-aeca-952bf6963c13\" type=\"checkbox\" ><label class=\"sk-toggleable__label\" for=\"c94ee64a-d8b1-4cbb-aeca-952bf6963c13\">SimpleImputer</label><div class=\"sk-toggleable__content\"><pre>SimpleImputer()</pre></div></div></div><div class=\"sk-item\"><div class=\"sk-estimator sk-toggleable\"><input class=\"sk-toggleable__control sk-hidden--visually\" id=\"6a28d11a-19e2-4243-8b85-e3ba5f6f2a7e\" type=\"checkbox\" ><label class=\"sk-toggleable__label\" for=\"6a28d11a-19e2-4243-8b85-e3ba5f6f2a7e\">StandardScaler</label><div class=\"sk-toggleable__content\"><pre>StandardScaler()</pre></div></div></div><div class=\"sk-item\"><div class=\"sk-estimator sk-toggleable\"><input class=\"sk-toggleable__control sk-hidden--visually\" id=\"03dcbe59-a8be-4f09-a944-115d90939f81\" type=\"checkbox\" ><label class=\"sk-toggleable__label\" for=\"03dcbe59-a8be-4f09-a944-115d90939f81\">AutoML</label><div class=\"sk-toggleable__content\"><pre><flaml.automl.AutoML object at 0x7f046d56fb50></pre></div></div></div></div></div></div></div>"
      ]
     },
     "metadata": {},
     "execution_count": 8
    }
   ],
   "metadata": {}
  },
  {
   "cell_type": "code",
   "execution_count": 9,
   "source": [
    "# Get the automl object from the pipeline\n",
    "automl = automl_pipeline.steps[2][1]\n",
    "\n",
    "# Get the best config and best learner\n",
    "print('Best ML leaner:', automl.best_estimator)\n",
    "print('Best hyperparmeter config:', automl.best_config)\n",
    "print('Best accuracy on validation data: {0:.4g}'.format(1-automl.best_loss))\n",
    "print('Training duration of best run: {0:.4g} s'.format(automl.best_config_train_time))"
   ],
   "outputs": [
    {
     "output_type": "stream",
     "name": "stdout",
     "text": [
      "Best ML leaner: xgboost\n",
      "Best hyperparmeter config: {'n_estimators': 63, 'max_leaves': 1797, 'min_child_weight': 0.07275175679381725, 'learning_rate': 0.06234183309508761, 'subsample': 0.9814772488195874, 'colsample_bylevel': 0.810466508891351, 'colsample_bytree': 0.8005378817953572, 'reg_alpha': 0.5768305704485758, 'reg_lambda': 6.867180836557797, 'FLAML_sample_size': 364083}\n",
      "Best accuracy on validation data: 0.6721\n",
      "Training duration of best run: 15.45 s\n"
     ]
    }
   ],
   "metadata": {}
  },
  {
   "cell_type": "code",
   "execution_count": 10,
   "source": [
    "automl.model"
   ],
   "outputs": [
    {
     "output_type": "execute_result",
     "data": {
      "text/plain": [
       "<flaml.model.XGBoostSklearnEstimator at 0x7f03a5eada00>"
      ]
     },
     "metadata": {},
     "execution_count": 10
    }
   ],
   "metadata": {}
  },
  {
   "cell_type": "markdown",
   "source": [
    "## 4. Persist the model binary file"
   ],
   "metadata": {}
  },
  {
   "cell_type": "code",
   "execution_count": 11,
   "source": [
    "# Persist the automl object as pickle file\n",
    "import pickle\n",
    "with open('automl.pkl', 'wb') as f:\n",
    "    pickle.dump(automl, f, pickle.HIGHEST_PROTOCOL)"
   ],
   "outputs": [],
   "metadata": {}
  },
  {
   "cell_type": "code",
   "execution_count": 12,
   "source": [
    "# Performance inference on the testing dataset\n",
    "y_pred = automl_pipeline.predict(X_test)\n",
    "print('Predicted labels', y_pred)\n",
    "print('True labels', y_test)\n",
    "y_pred_proba = automl_pipeline.predict_proba(X_test)[:,1]\n",
    "print('Predicted probas ',y_pred_proba[:5])"
   ],
   "outputs": [
    {
     "output_type": "stream",
     "name": "stdout",
     "text": [
      "Predicted labels [0 1 1 ... 0 1 0]\n",
      "True labels [0 0 0 ... 1 0 1]\n",
      "Predicted probas  [0.3764987  0.6126277  0.699604   0.27359942 0.25294745]\n"
     ]
    }
   ],
   "metadata": {}
  }
 ],
 "metadata": {
  "kernelspec": {
   "name": "python3",
   "display_name": "Python 3.8.0 64-bit ('blend': conda)"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.8.0"
  },
  "interpreter": {
   "hash": "0cfea3304185a9579d09e0953576b57c8581e46e6ebc6dfeb681bc5a511f7544"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 4
}