Go to file
Li Jiang da2cd7ca89
Add supporting using Spark as the backend of parallel training (#846)
* Added spark support for parallel training.

* Added tests and fixed a bug

* Added more tests and updated docs

* Updated setup.py and docs

* Added customize_learner and tests

* Update spark tests and setup.py

* Update docs and verbose

* Update logging, fix issue in cloud notebook

* Update github workflow for spark tests

* Update github workflow

* Remove hack of handling _choice_

* Allow for failures

* Fix tests, update docs

* Update setup.py

* Update Dockerfile for Spark

* Update tests, remove some warnings

* Add test for notebooks, update utils

* Add performance test for Spark

* Fix lru_cache maxsize

* Fix test failures on some platforms

* Fix coverage report failure

* resovle PR comments

* resovle PR comments 2nd round

* resovle PR comments 3rd round

* fix lint and rename test class

* resovle PR comments 4th round

* refactor customize_learner to broadcast_code
2022-12-23 08:18:49 -08:00
.devcontainer install editable package in codespace (#826) 2022-11-27 14:22:54 -05:00
.github Add supporting using Spark as the backend of parallel training (#846) 2022-12-23 08:18:49 -08:00
docs Finish the Multiple Choice Classification (#367) 2022-01-02 20:12:34 -05:00
flaml Add supporting using Spark as the backend of parallel training (#846) 2022-12-23 08:18:49 -08:00
notebook Add supporting using Spark as the backend of parallel training (#846) 2022-12-23 08:18:49 -08:00
test Add supporting using Spark as the backend of parallel training (#846) 2022-12-23 08:18:49 -08:00
website Add supporting using Spark as the backend of parallel training (#846) 2022-12-23 08:18:49 -08:00
.coveragerc code coverage (#79) 2021-04-26 20:04:57 -04:00
.flake8 v0.1.0 2020-12-04 09:40:27 -08:00
.gitignore adding evaluation (#495) 2022-03-25 17:00:08 -04:00
.pre-commit-config.yaml Updated hooks (#609) 2022-06-25 13:46:28 -04:00
CITATION.cff citation file (#364) 2022-01-04 15:13:14 -08:00
CODE_OF_CONDUCT.md v0.1.0 2020-12-04 09:40:27 -08:00
Dockerfile Add supporting using Spark as the backend of parallel training (#846) 2022-12-23 08:18:49 -08:00
LICENSE add NOTICE file (#91) 2021-05-24 14:35:08 -04:00
NOTICE.md Finish the Multiple Choice Classification (#367) 2022-01-02 20:12:34 -05:00
README.md Update .NET documentation links (#847) 2022-12-13 22:56:45 -05:00
SECURITY.md Finish the Multiple Choice Classification (#367) 2022-01-02 20:12:34 -05:00
pytest.ini Finish the Multiple Choice Classification (#367) 2022-01-02 20:12:34 -05:00
setup.py Add supporting using Spark as the backend of parallel training (#846) 2022-12-23 08:18:49 -08:00

README.md

PyPI version Conda version Build Python Version Downloads Join the chat at https://gitter.im/FLAMLer/community

A Fast Library for Automated Machine Learning & Tuning


🔥 An upcoming tutorial on FLAML at AAAI-23 (to be held on Feb 08, 2023)

🔥 A hands-on tutorial on FLAML presented at KDD 2022

What is FLAML

FLAML is a lightweight Python library that finds accurate machine learning models automatically, efficiently and economically. It frees users from selecting learners and hyperparameters for each learner. It can also be used to tune generic hyperparameters for MLOps workflows, pipelines, mathematical/statistical models, algorithms, computing experiments, software configurations and so on.

  1. For common machine learning tasks like classification and regression, it quickly finds quality models for user-provided data with low computational resources. It supports both classifcal machine learning models and deep neural networks.
  2. It is easy to customize or extend. Users can find their desired customizability from a smooth range: minimal customization (computational resource budget), medium customization (e.g., scikit-style learner, search space and metric), or full customization (arbitrary training and evaluation code).
  3. It supports fast automatic tuning, capable of handling complex constraints/guidance/early stopping. FLAML is powered by a new, cost-effective hyperparameter optimization and learner selection method invented by Microsoft Research.

FLAML has a .NET implementation in ML.NET, an open-source, cross-platform machine learning framework for .NET. In ML.NET, you can use FLAML via low-code solutions like Model Builder Visual Studio extension and the cross-platform ML.NET CLI. Alternatively, you can use the ML.NET AutoML API for a code-first experience.

Installation

Python

FLAML requires Python version >= 3.7. It can be installed from pip:

pip install flaml

To run the notebook examples, install flaml with the [notebook] option:

pip install flaml[notebook]

.NET

Use the following guides to get started with FLAML in .NET:

Quickstart

from flaml import AutoML
automl = AutoML()
automl.fit(X_train, y_train, task="classification")
  • You can restrict the learners and use FLAML as a fast hyperparameter tuning tool for XGBoost, LightGBM, Random Forest etc. or a customized learner.
automl.fit(X_train, y_train, task="classification", estimator_list=["lgbm"])
from flaml import tune
tune.run(evaluation_function, config={}, low_cost_partial_config={}, time_budget_s=3600)
  • Zero-shot AutoML allows using the existing training API from lightgbm, xgboost etc. while getting the benefit of AutoML in choosing high-performance hyperparameter configurations per task.
from flaml.default import LGBMRegressor

# Use LGBMRegressor in the same way as you use lightgbm.LGBMRegressor.
estimator = LGBMRegressor()
# The hyperparameters are automatically set according to the training data.
estimator.fit(X_train, y_train)

Documentation

You can find a detailed documentation about FLAML here where you can find the API documentation, use cases and examples.

In addition, you can find:

Contributing

This project welcomes contributions and suggestions. Most contributions require you to agree to a Contributor License Agreement (CLA) declaring that you have the right to, and actually do, grant us the rights to use your contribution. For details, visit https://cla.opensource.microsoft.com.

If you are new to GitHub here is a detailed help source on getting involved with development on GitHub.

When you submit a pull request, a CLA bot will automatically determine whether you need to provide a CLA and decorate the PR appropriately (e.g., status check, comment). Simply follow the instructions provided by the bot. You will only need to do this once across all repos using our CLA.

This project has adopted the Microsoft Open Source Code of Conduct. For more information see the Code of Conduct FAQ or contact opencode@microsoft.com with any additional questions or comments.