Go to file

Li Jiang 50334f2c52 Support spark dataframe as input dataset and spark models as estimators (#934 ) * add basic support to Spark dataframe add support to SynapseML LightGBM model update to pyspark>=3.2.0 to leverage pandas_on_Spark API * clean code, add TODOs * add sample_train_data for pyspark.pandas dataframe, fix bugs * improve some functions, fix bugs * fix dict change size during iteration * update model predict * update LightGBM model, update test * update SynapseML LightGBM params * update synapseML and tests * update TODOs * Added support to roc_auc for spark models * Added support to score of spark estimator * Added test for automl score of spark estimator * Added cv support to pyspark.pandas dataframe * Update test, fix bugs * Added tests * Updated docs, tests, added a notebook * Fix bugs in non-spark env * Fix bugs and improve tests * Fix uninstall pyspark * Fix tests error * Fix java.lang.OutOfMemoryError: Java heap space * Fix test_performance * Update test_sparkml to test_0sparkml to use the expected spark conf * Remove unnecessary widgets in notebook * Fix iloc java.lang.StackOverflowError * fix pre-commit * Added params check for spark dataframes * Refactor code for train_test_split to a function * Update train_test_split_pyspark * Refactor if-else, remove unnecessary code * Remove y from predict, remove mem control from n_iter compute * Update workflow * Improve _split_pyspark * Fix test failure of too short training time * Fix typos, improve docstrings * Fix index errors of pandas_on_spark, add spark loss metric * Fix typo of ndcgAtK * Update NDCG metrics and tests * Remove unuseful logger * Use cache and count to ensure consistent indexes * refactor for merge maain * fix errors of refactor * Updated SparkLightGBMEstimator and cache * Updated config2params * Remove unused import * Fix unknown parameters * Update default_estimator_list * Add unit tests for spark metrics		2023-03-25 19:59:46 +00:00
.devcontainer	precommit: end-of-file-fixer (#929 )	2023-02-28 16:27:14 +00:00
.github	Support spark dataframe as input dataset and spark models as estimators (#934 )	2023-03-25 19:59:46 +00:00
docs	Finish the Multiple Choice Classification (#367 )	2022-01-02 20:12:34 -05:00
flaml	Support spark dataframe as input dataset and spark models as estimators (#934 )	2023-03-25 19:59:46 +00:00
notebook	Support spark dataframe as input dataset and spark models as estimators (#934 )	2023-03-25 19:59:46 +00:00
test	Support spark dataframe as input dataset and spark models as estimators (#934 )	2023-03-25 19:59:46 +00:00
website	ChatGPT support (#942 )	2023-03-10 19:35:36 +00:00
.coveragerc	precommit: end-of-file-fixer (#929 )	2023-02-28 16:27:14 +00:00
.flake8	precommit: end-of-file-fixer (#929 )	2023-02-28 16:27:14 +00:00
.gitignore	Support spark dataframe as input dataset and spark models as estimators (#934 )	2023-03-25 19:59:46 +00:00
.pre-commit-config.yaml	precommit: end-of-file-fixer (#929 )	2023-02-28 16:27:14 +00:00
CITATION.cff	citation file (#364 )	2022-01-04 15:13:14 -08:00
CODE_OF_CONDUCT.md	v0.1.0	2020-12-04 09:40:27 -08:00
Dockerfile	Add supporting using Spark as the backend of parallel training (#846 )	2022-12-23 08:18:49 -08:00
LICENSE	add NOTICE file (#91 )	2021-05-24 14:35:08 -04:00
NOTICE.md	Finish the Multiple Choice Classification (#367 )	2022-01-02 20:12:34 -05:00
README.md	improve max_valid_n and doc (#933 )	2023-03-05 16:40:57 +00:00
SECURITY.md	precommit: end-of-file-fixer (#929 )	2023-02-28 16:27:14 +00:00
pytest.ini	precommit: end-of-file-fixer (#929 )	2023-02-28 16:27:14 +00:00
setup.py	Support spark dataframe as input dataset and spark models as estimators (#934 )	2023-03-25 19:59:46 +00:00

README.md

A Fast Library for Automated Machine Learning & Tuning

🔥 OpenAI GPT-3 models support in v1.1.3. ChatGPT support is coming.

🔥 A lab forum on FLAML at AAAI 2023.

🔥 A hands-on tutorial on FLAML presented at KDD 2022

What is FLAML

FLAML is a lightweight Python library that finds accurate machine learning models automatically, efficiently and economically. It frees users from selecting models and hyperparameters for each model. It can also be used to tune generic hyperparameters for large language models (LLM), MLOps/LMOps workflows, pipelines, mathematical/statistical models, algorithms, computing experiments, software configurations and so on.

For common machine learning or AI tasks like classification, regression, and generation, it quickly finds quality models for user-provided data with low computational resources. It supports both classical machine learning models and deep neural networks, including large language models such as the OpenAI GPT-3 models.
It is easy to customize or extend. Users can find their desired customizability from a smooth range: minimal customization (computational resource budget), medium customization (e.g., scikit-style learner, search space and metric), or full customization (arbitrary training and evaluation code).
It supports fast automatic tuning, capable of handling complex constraints/guidance/early stopping. FLAML is powered by a new, cost-effective hyperparameter optimization and model selection method invented by Microsoft Research, and many followup research studies.

FLAML has a .NET implementation in ML.NET, an open-source, cross-platform machine learning framework for .NET. In ML.NET, you can use FLAML via low-code solutions like Model Builder Visual Studio extension and the cross-platform ML.NET CLI. Alternatively, you can use the ML.NET AutoML API for a code-first experience.

Installation

Python

FLAML requires Python version >= 3.7. It can be installed from pip:

pip install flaml

To run the notebook examples, install flaml with the [notebook] option:

pip install flaml[notebook]

.NET

Use the following guides to get started with FLAML in .NET:

Quickstart

With three lines of code, you can start using this economical and fast AutoML engine as a scikit-learn style estimator.

from flaml import AutoML
automl = AutoML()
automl.fit(X_train, y_train, task="classification")

You can restrict the learners and use FLAML as a fast hyperparameter tuning tool for XGBoost, LightGBM, Random Forest etc. or a customized learner.

automl.fit(X_train, y_train, task="classification", estimator_list=["lgbm"])

You can also run generic hyperparameter tuning for a custom function.

from flaml import tune
tune.run(evaluation_function, config={…}, low_cost_partial_config={…}, time_budget_s=3600)

Zero-shot AutoML allows using the existing training API from lightgbm, xgboost etc. while getting the benefit of AutoML in choosing high-performance hyperparameter configurations per task.

from flaml.default import LGBMRegressor

# Use LGBMRegressor in the same way as you use lightgbm.LGBMRegressor.
estimator = LGBMRegressor()
# The hyperparameters are automatically set according to the training data.
estimator.fit(X_train, y_train)

Documentation

You can find a detailed documentation about FLAML here where you can find the API documentation, use cases and examples.

In addition, you can find:

Talks and tutorials about FLAML.
Research around FLAML here.
FAQ here.
Contributing guide here.
ML.NET documentation and tutorials for Model Builder, ML.NET CLI, and AutoML API.

Contributing

This project welcomes contributions and suggestions. Most contributions require you to agree to a Contributor License Agreement (CLA) declaring that you have the right to, and actually do, grant us the rights to use your contribution. For details, visit https://cla.opensource.microsoft.com.

If you are new to GitHub here is a detailed help source on getting involved with development on GitHub.

When you submit a pull request, a CLA bot will automatically determine whether you need to provide a CLA and decorate the PR appropriately (e.g., status check, comment). Simply follow the instructions provided by the bot. You will only need to do this once across all repos using our CLA.

This project has adopted the Microsoft Open Source Code of Conduct. For more information see the Code of Conduct FAQ or contact opencode@microsoft.com with any additional questions or comments.