burn/crates/burn-dataset
Ragy Abraham 04d7ff24f2
Add polars DataFrame support for Dataset (#2029)
* initial commit to try implement from_dataframes for a burn dataset

* added the beginnings of tests. removed ref to self in utility method

* added unit test for dataframe module. added utility methods to convert polars rows to burn dataset values

* putting polars and dataframe mod behind a fearure flag

* testing both methods

* added a if let OK so that it doesn't panic. if we can't convert serde map to json string. added comments

* using polars serializer, renaming vars

* removed prints. just unwrapping

* setting feature flags back

* return Value::Null rather than panic if we can't serialize list value. no longer convert to object before converting to string. no longer using serde_json to_string method

* Use native deserializer instead of serde_json

* added support for lazyframes. added support to deserialize a few more data. added a few more tests

* Remove lazy, add more testing and other fixes

* Update the book

* Remove lazy feature

* Put back lazy feature for polars

---------

Co-authored-by: Dilshod Tadjibaev <939125+antimora@users.noreply.github.com>
2024-07-31 17:22:49 -05:00
..
examples Added parameter trust_remote_code to hf dataset call. (#2013) 2024-07-17 16:40:23 -05:00
src Add polars DataFrame support for Dataset (#2029) 2024-07-31 17:22:49 -05:00
tests/data [refactor] Move burn crates to their own crates directory (#1336) 2024-02-20 13:57:55 -05:00
Cargo.toml Add polars DataFrame support for Dataset (#2029) 2024-07-31 17:22:49 -05:00
LICENSE-APACHE Update licenses symlinks (#1613) 2024-04-12 14:43:58 -04:00
LICENSE-MIT Update licenses symlinks (#1613) 2024-04-12 14:43:58 -04:00
README.md [refactor] Move burn crates to their own crates directory (#1336) 2024-02-20 13:57:55 -05:00

README.md

Burn Dataset

Burn dataset library

Current Crates.io Version license

The Burn Dataset library is designed to streamline your machine learning (ML) data pipeline creation process. It offers a variety of dataset implementations, transformation functions, and data sources.

Feature Flags

  • audio - enables audio dataset (SpeechCommandsDataset). Run the following example to try it out:

    cargo run --example speech_commands --features audio