From 61dd2c17b99771f8af50dc70be92eb864a248404 Mon Sep 17 00:00:00 2001 From: Aasheesh Singh Date: Mon, 29 Jan 2024 14:39:24 -0500 Subject: [PATCH] 1. Address comments :) 2. Fix formatting 3. remove dead link --- burn-book/src/basic-workflow/README.md | 8 ++- burn-book/src/basic-workflow/backend.md | 23 ++++--- burn-book/src/basic-workflow/data.md | 6 +- burn-book/src/basic-workflow/model.md | 6 +- burn-book/src/basic-workflow/training.md | 4 +- burn-book/src/building-blocks/dataset.md | 45 +++++++----- burn-book/src/building-blocks/tensor.md | 87 +++++++++++++++--------- burn-book/src/getting-started.md | 4 +- 8 files changed, 108 insertions(+), 75 deletions(-) diff --git a/burn-book/src/basic-workflow/README.md b/burn-book/src/basic-workflow/README.md index 697ec5122..ce928be68 100644 --- a/burn-book/src/basic-workflow/README.md +++ b/burn-book/src/basic-workflow/README.md @@ -4,9 +4,11 @@ This guide will walk you through the process of creating a custom model built wi train a simple convolutional neural network model on the MNIST dataset and prepare it for inference. For clarity, we sometimes omit imports in our code snippets. For more details, please refer to the -corresponding code in the `examples/guide` [directory](https://github.com/tracel-ai/burn/tree/main/examples/guide) -where this complete example is reproduced step-by-step from dataset & model definition to training. -The demo example can be executed from burn's base directory using the command: +corresponding code in the `examples/guide` [directory](https://github.com/tracel-ai/burn/tree/main/examples/guide). We +reproduce this example in a step-by-step fashion, from dataset creation to modeling and training in the following +sections. +The code for this demo can be executed from Burn's base directory using the command: + ```bash cargo run --example guide ``` diff --git a/burn-book/src/basic-workflow/backend.md b/burn-book/src/basic-workflow/backend.md index 494263c2b..fd83d836c 100644 --- a/burn-book/src/basic-workflow/backend.md +++ b/burn-book/src/basic-workflow/backend.md @@ -28,9 +28,10 @@ fn main() { You might be wondering why we use the `guide` prefix to bring the different modules we just implemented into scope. Instead of including the code in the current guide in a single file, we separated it into different files which group related code into _modules_. The `guide` is simply the -name we gave to our _crate_, which contains the different files. If you named your project crate as `my-first-burn-model`, -you can equivalently replace all usages of `guide` above with `my-first-burn-model`.Below is a brief explanation of the -different parts of the Rust module system. +name we gave to our _crate_, which contains the different files. If you named your project crate +as `my-first-burn-model`, +you can equivalently replace all usages of `guide` above with `my-first-burn-model`. Below is a brief explanation of the +different parts of the Rust module system. A **package** is a bundle of one or more crates that provides a set of functionality. A package contains a `Cargo.toml` file that describes how to build those crates. Burn is a package. @@ -40,8 +41,8 @@ split up crates into multiple _modules_ and possibly multiple files. A crate can forms: a binary crate or a library crate. When compiling a crate, the compiler first looks in the crate root file (usually `src/lib.rs` for a library crate or `src/main.rs` for a binary crate). Any module declared in the crate root file will be inserted in the crate for compilation. For this demo example, we will -define a library crate where all the individual modules(model, data, training etc.) are listed inside `src/lib.rs` as -following: +define a library crate where all the individual modules (model, data, training, etc.) are listed inside `src/lib.rs` as +follows: ``` pub mod data; @@ -51,11 +52,11 @@ pub mod training; ``` A **module** lets us organize code within a crate for readability and easy reuse. Modules also allow -us to control the _privacy_ of items. The pub keyword used above for example, is used to make a module publicly available -inside the crate. +us to control the _privacy_ of items. The `pub` keyword used above, for example, is employed to make a module publicly +available inside the crate. -In our examples, demonstrating the code within this guide, we defined a library crate with a single example where -the `main` function is defined inside the `guide.rs` file, as illustrated in the structure below. +The entry point of our program is the `main` function, defined in the `examples/guide.rs` file. The file structure +for this example is illustrated below: ``` guide @@ -71,8 +72,8 @@ guide ``` The source for this guide can be found in our -[GitHub repository](https://github.com/tracel-ai/burn/tree/main/examples/guide) which can be used to run this basic -work flow example end-to-end.\ +[GitHub repository](https://github.com/tracel-ai/burn/tree/main/examples/guide) which can be used to run this basic +workflow example end-to-end.\
diff --git a/burn-book/src/basic-workflow/data.md b/burn-book/src/basic-workflow/data.md index d00949dde..44a778143 100644 --- a/burn-book/src/basic-workflow/data.md +++ b/burn-book/src/basic-workflow/data.md @@ -9,8 +9,10 @@ To iterate over a dataset efficiently, we will define a struct which will implem trait. The goal of a batcher is to map individual dataset items into a batched tensor that can be used as input to our previously defined model. -Let us start by defining our dataset functionalities in a file `src/data.rs`. We shall omit some of the imports for brevity, -but the full code for following this guide can be found at `examples/guide/` [directory](https://github.com/tracel-ai/burn/tree/main/examples/guide). +Let us start by defining our dataset functionalities in a file `src/data.rs`. We shall omit some of the imports for +brevity, +but the full code for following this guide can be found +at `examples/guide/` [directory](https://github.com/tracel-ai/burn/tree/main/examples/guide). ```rust , ignore use burn::{ diff --git a/burn-book/src/basic-workflow/model.md b/burn-book/src/basic-workflow/model.md index 0918a914c..0150c4084 100644 --- a/burn-book/src/basic-workflow/model.md +++ b/burn-book/src/basic-workflow/model.md @@ -19,7 +19,7 @@ version = "0.1.0" edition = "2021" [dependencies] -burn = { version = "0.12.0", features=["train", "wgpu"]} +burn = { version = "0.12.0", features = ["train", "wgpu"] } ``` Our goal will be to create a basic convolutional neural network used for image classification. We @@ -281,12 +281,12 @@ network modules already built with Burn use the `forward` nomenclature, simply b standard in the field. Similar to neural network modules, the [`Tensor`](../building-blocks/tensor.md) struct given as a -parameter also takes the Backend trait as a generic argument, alongside its rank(or dimensionality). Even if it is not +parameter also takes the Backend trait as a generic argument, alongside its dimensionality. Even if it is not used in this specific example, it is possible to add the kind of the tensor as a third generic argument. For example, a 3-dimensional Tensor of different data types(float, int, bool) would be defined as following: ```rust , ignore -Tensor // Float tensor (default) +Tensor // Float tensor (default) Tensor // Float tensor (explicit) Tensor // Int tensor Tensor // Bool tensor diff --git a/burn-book/src/basic-workflow/training.md b/burn-book/src/basic-workflow/training.md index 112a4da23..7bf414a60 100644 --- a/burn-book/src/basic-workflow/training.md +++ b/burn-book/src/basic-workflow/training.md @@ -1,7 +1,7 @@ # Training -We are now ready to write the necessary code to train our model on the MNIST dataset. -We shall define the code for this training section in the file: `src/training.rs`. +We are now ready to write the necessary code to train our model on the MNIST dataset. +We shall define the code for this training section in the file: `src/training.rs`. Instead of a simple tensor, the model should output an item that can be understood by the learner, a struct whose responsibility is to apply an optimizer to the model. The output struct is used for all metrics diff --git a/burn-book/src/building-blocks/dataset.md b/burn-book/src/building-blocks/dataset.md index 6db2c23f1..ac87ff42e 100644 --- a/burn-book/src/building-blocks/dataset.md +++ b/burn-book/src/building-blocks/dataset.md @@ -1,6 +1,7 @@ # Dataset -Most deep learning training being done on datasets –with perhaps the exception of reinforcement learning–, it is essential to provide a convenient and performant API. +Most deep learning training being done on datasets –with perhaps the exception of reinforcement learning–, it is +essential to provide a convenient and performant API. The dataset trait is quite similar to the dataset abstract class in PyTorch: ```rust, ignore @@ -25,20 +26,23 @@ transformations is to provide you with the necessary tools so that you can model distributions. | Transformation | Description | -| ----------------- | ------------------------------------------------------------------------------------------------------------------------ | +|-------------------|--------------------------------------------------------------------------------------------------------------------------| | `SamplerDataset` | Samples items from a dataset. This is a convenient way to model a dataset as a probability distribution of a fixed size. | | `ShuffledDataset` | Maps each input index to a random index, similar to a dataset sampled without replacement. | | `PartialDataset` | Returns a view of the input dataset with a specified range. | | `MapperDataset` | Computes a transformation lazily on the input dataset. | | `ComposedDataset` | Composes multiple datasets together to create a larger one without copying any data. | -Let us look at the basic usages of each dataset transform and how they can be composed together. The full documentation -of each transform can be found at the [API reference](https://burn.dev/docs/burn/data/dataset/transform/index.html). +Let us look at the basic usages of each dataset transform and how they can be composed together. These transforms +are lazy by default except when specified, reducing the need for unnecessary intermediate allocations and improving +performance. The full documentation of each transform can be found at +the [API reference](https://burn.dev/docs/burn/data/dataset/transform/index.html). + +* **SamplerDataset**: This transform can be used to sample items from a dataset with (default) or without replacement. + Transform is initialized with a sampling size which can be bigger or smaller than the input dataset size. This is + particularly useful in cases where we want to checkpoint larger datasets more often during training + and smaller datasets less often as the size of an epoch is now controlled by the sampling size. Sample usage: -* **SamplerDataset**: This transform can be used to sample items from a dataset with(default) and without replacement. -Transform is initialized with a sampling size which can be bigger or smaller than the input dataset size. This is -particularly useful in cases where we want to checkpoint larger datasets more often during training -and smaller datasets less often as the size of an epoch is now controlled by the sampling size. Sample usage: ```rust, ignore type DbPedia = SqliteDataset; let dataset: DbPedia = HuggingfaceDatasetLoader::new("dbpedia_14") @@ -48,17 +52,23 @@ let dataset: DbPedia = HuggingfaceDatasetLoader::new("dbpedia_14") let dataset = SamplerDataset::new(dataset, 10000); ``` -* **ShuffledDataset**: This transform can be used to shuffle the items of a dataset. Particularly useful before splitting -the raw dataset into train/test splits. Can be initialized with a seed to ensure reproducibility. +* **ShuffledDataset**: This transform can be used to shuffle the items of a dataset. Particularly useful before + splitting + the raw dataset into train/test splits. Can be initialized with a seed to ensure reproducibility. + ```rust, ignore let dataset = ShuffledDataset::with_seed(dataset, 42); ``` -* **PartialDataset**: This transform is useful to return a view of the dataset with a specified start, end index. Used -to create train/val/test splits. Below example we show how to chain ShuffledDataset and PartialDataset to create splits. +* **PartialDataset**: This transform is useful to return a view of the dataset with specified start and end indices. + Used + to create train/val/test splits. In the example below, we show how to chain ShuffledDataset and PartialDataset to + create + splits. + ```rust, ignore // define chained dataset type here for brevity -let PartialData = PartialDataset>; +type PartialData = PartialDataset>; let dataset_len = dataset.len(); let split == "train"; // or "val"/"test" @@ -70,11 +80,10 @@ let data_split = match split { ``` * **MapperDataset**: This transform is useful to apply a transformation on each of the items of a dataset. Particularly -useful for normalization of image data when channel means are known. - -* **ComposedDataset**: This transform is useful to compose multiple datasets downloaded from multiple sources(say -different HuggingfaceDatasetLoader sources) into a single bigger dataset which can be sampled from one source. + useful for normalization of image data when channel means are known. +* **ComposedDataset**: This transform is useful to compose multiple datasets downloaded from multiple sources (say + different HuggingfaceDatasetLoader sources) into a single bigger dataset which can be sampled from one source. ## Storage @@ -82,7 +91,7 @@ There are multiple dataset storage options available for you to choose from. The dataset to use should be based on the dataset's size as well as its intended purpose. | Storage | Description | -| --------------- | ------------------------------------------------------------------------------------------------------------------------- | +|-----------------|---------------------------------------------------------------------------------------------------------------------------| | `InMemDataset` | In-memory dataset that uses a vector to store items. Well-suited for smaller datasets. | | `SqliteDataset` | Dataset that uses SQLite to index items that can be saved in a simple SQL database file. Well-suited for larger datasets. | diff --git a/burn-book/src/building-blocks/tensor.md b/burn-book/src/building-blocks/tensor.md index 0b77843a2..15aaa2f20 100644 --- a/burn-book/src/building-blocks/tensor.md +++ b/burn-book/src/building-blocks/tensor.md @@ -1,7 +1,7 @@ # Tensor As previously explained in the [model section](../basic-workflow/model.md), the Tensor struct has 3 -generic arguments: the backend, the number of dimensions (rank) and the data type. +generic arguments: the backend B, the dimensionality D, and the data type. ```rust , ignore Tensor // Float tensor (default) @@ -13,9 +13,10 @@ Tensor // Bool tensor Note that the specific element types used for `Float`, `Int`, and `Bool` tensors are defined by backend implementations. -In contrast to PyTorch where a Tensor is characterised by the size of it's each dimension, a Burn Tensor is -characterized by the number of dimensions D in its declaration. The actual shape of the tensor is inferred from -its initialization. For eg, a Tensor of size: (5,) is initialized as below: +Burn Tensors are defined by the number of dimensions D in its declaration as opposed to its shape. The +actual shape of the tensor is inferred from its initialization. For example, a Tensor of size: (5,) is initialized as +below: + ```rust, ignore // correct: Tensor is 1-Dimensional with 5 elements let tensor_1 = Tensor::::from_floats([1.0, 2.0, 3.0, 4.0, 5.0]); @@ -26,52 +27,68 @@ let tensor_1 = Tensor::::from_floats([1.0, 2.0, 3.0, 4.0, 5.0]); ### Initialization -Burn Tensors are primarily initialized using the `from_data()` method which takes the `Data` struct as input. -The `Data` struct has two fields: value & shape. To access the data of any tensor, we use the method `.to_data()`. -Let's look at a couple of examples for initializing a tensor from different inputs. +Burn Tensors are primarily initialized using the `from_data()` method which takes the `Data` struct as input. +The `Data` struct has two fields: value & shape. To retrieve the data from a tensor, the method `.to_data()` should be +employed when intending to reuse the tensor afterward. Alternatively, `.into_data()` is recommended for one-time use. +Let's look at a couple of examples for initializing a tensor from different inputs. ```rust, ignore -// Initialization from a float array -let tensor_1 = Tensor::::from_data([1.0, 2.0, 3.0]); -// equivalent to (recommended) -let tensor_2 = Tensor::::from_floats([1.0, 2.0, 3.0]); +// Initialization from a given Backend (WGpu) +let tensor_1 = Tensor::::from_data([1.0, 2.0, 3.0]); + +// Initialization from a generic Backend +let tensor_2 = Tensor::::from_data(Data::from([1.0, 2.0, 3.0]).convert()); + +// Initialization using from_floats (Recommended for f32 ElementType) +// Will be converted to Data internally. `.convert()` not needed as from_floats() defined for fixed ElementType +let tensor_3 = Tensor::::from_floats([1.0, 2.0, 3.0]); + +// Initalization of Int Tensor from array slices +let arr: [i32; 6] = [1, 2, 3, 4, 5, 6]; +let tensor_4 = Tensor::::from_data(Data::from(&arr[0..3]).convert()); // Initialization from a custom type -struct BMI { +struct BodyMetrics { age: i8, height: i16, weight: f32 } -let bmi = BMI{ - age: 10, - height: 100, - weight: 50.0 +let bmi = BodyMetrics{ + age: 25, + height: 180, + weight: 80.0 }; -let tensor_3 = Tensor::::from_data(Data::from([bmi.age as f32, bmi.height as f32, bmi.weight]).convert()); +let tensor_5 = Tensor::::from_data(Data::from([bmi.age as f32, bmi.height as f32, bmi.weight]).convert()); ``` -The `.convert()` method for Data struct is called in the last example to ensure that the data's primitive type is -consistent across all backends. The convert operation can also be done at element wise level as: -`let tensor_4 = Tensor::::from_data(Data::from([(item.age as i64).elem()])`. + +The `.convert()` method for Data struct is called to ensure that the data's primitive type is +consistent across all backends. With `.from_floats()` method the ElementType is fixed as f32 +and therefore no convert operation is required across backends. This operation can also be done at element wise +level as: +`let tensor_6 = Tensor::::from_data(Data::from([(item.age as i64).elem()])`. The `ElementConversion` trait +however needs to be imported for the element wise operation. ## Ownership and Cloning Almost all Burn operations take ownership of the input tensors. Therefore, reusing a tensor multiple times will necessitate cloning it. Let's look at an example to understand the ownership rules and cloning better. -Suppose we want to do a simple min-max normalization of an input tensor. +Suppose we want to do a simple min-max normalization of an input tensor. + ```rust, ignore let input = Tensor::::from_floats([1.0, 2.0, 3.0, 4.0]); let min = input.min(); let max = input.max(); let input = (input - min).div(max - min); ``` + With PyTorch tensors, the above code would work as expected. However, Rust's strict ownership rules will give an error and prevent using the input tensor after the first `.min()` operation. The ownership of the input tensor is transferred -to the variable `min` and the input tensor is no longer available for further operations. Burn Tensors like most -complex primitives do not implement the `Copy` trait and therefore have to be cloned explicitly. Now let's rewrite +to the variable `min` and the input tensor is no longer available for further operations. Burn Tensors like most +complex primitives do not implement the `Copy` trait and therefore have to be cloned explicitly. Now let's rewrite a working example of doing min-max normalization with cloning. ```rust, ignore @@ -82,14 +99,16 @@ a working example of doing min-max normalization with cloning. println!("{:?}", input.to_data()); // Success: [0.0, 0.33333334, 0.6666667, 1.0] // Notice that max, min have been moved in last operation so the below print will give an error. - // If we want to utlize them as well for further operations, they will need to be cloned in similar fashion. + // If we want to use them for further operations, they will need to be cloned in similar fashion. // println!("{:?}", min.to_data()); ``` -We don't need to be worried about memory overhead because with cloning, the tensor's buffer isn't copied, -and only a reference to it is increased. This makes it possible to determine exactly how many times a tensor is used, -which is very convenient for reusing tensor buffers and improving performance. For that reason, we don't provide -explicit inplace operations. If a tensor is used only one time, inplace operations will always be used when available. +We don't need to be worried about memory overhead because with cloning, the tensor's buffer isn't copied, +and only a reference to it is increased. This makes it possible to determine exactly how many times a tensor is used, +which is very convenient for reusing tensor buffers or even fusing operations into a single +kernel ([burn-fusion](https://burn.dev/docs/burn_fusion/index.htmls)). +For that reason, we don't provide explicit inplace operations. If a tensor is used only one time, inplace operations +will always be used when available. ## Tensor Operations @@ -108,7 +127,7 @@ for the sake of simplicity, we ignore type signatures. For more details, refer t Those operations are available for all tensor kinds: `Int`, `Float`, and `Bool`. | Burn | PyTorch Equivalent | -| ------------------------------------- | ------------------------------------ | +|---------------------------------------|--------------------------------------| | `Tensor::empty(shape, device)` | `torch.empty(shape, device=device)` | | `tensor.dims()` | `tensor.size()` | | `tensor.shape()` | `tensor.shape` | @@ -138,7 +157,7 @@ Those operations are available for all tensor kinds: `Int`, `Float`, and `Bool`. Those operations are available for numeric tensor kinds: `Float` and `Int`. | Burn | PyTorch Equivalent | -| ---------------------------------------------------------------- | ---------------------------------------------- | +|------------------------------------------------------------------|------------------------------------------------| | `tensor.into_scalar()` | `tensor.item()` (for single-element tensors) | | `tensor + other` or `tensor.add(other)` | `tensor + other` | | `tensor + scalar` or `tensor.add_scalar(scalar)` | `tensor + scalar` | @@ -194,7 +213,7 @@ Those operations are available for numeric tensor kinds: `Float` and `Int`. Those operations are only available for `Float` tensors. | Burn API | PyTorch Equivalent | -| -------------------------------------------- | ---------------------------------- | +|----------------------------------------------|------------------------------------| | `tensor.exp()` | `tensor.exp()` | | `tensor.log()` | `tensor.log()` | | `tensor.log1p()` | `tensor.log1p()` | @@ -226,7 +245,7 @@ Those operations are only available for `Float` tensors. Those operations are only available for `Int` tensors. | Burn API | PyTorch Equivalent | -| -------------------------------------- | ------------------------------------------------------- | +|----------------------------------------|---------------------------------------------------------| | `tensor.from_ints(ints)` | N/A | | `tensor.float()` | Similar to `tensor.to(torch.float)` | | `tensor.arange(5..10, device) ` | `tensor.arange(start=5, end=10, device=device)` | @@ -237,7 +256,7 @@ Those operations are only available for `Int` tensors. Those operations are only available for `Bool` tensors. | Burn API | PyTorch Equivalent | -| ---------------- | ----------------------------------- | +|------------------|-------------------------------------| | `tensor.float()` | Similar to `tensor.to(torch.float)` | | `tensor.int()` | Similar to `tensor.to(torch.long)` | | `tensor.not()` | `tensor.logical_not()` | @@ -245,7 +264,7 @@ Those operations are only available for `Bool` tensors. ## Activation Functions | Burn API | PyTorch Equivalent | -| ---------------------------------------- | ----------------------------------------------------- | +|------------------------------------------|-------------------------------------------------------| | `activation::gelu(tensor)` | Similar to `nn.functional.gelu(tensor)` | | `activation::log_sigmoid(tensor)` | Similar to `nn.functional.log_sigmoid(tensor)` | | `activation::log_softmax(tensor, dim)` | Similar to `nn.functional.log_softmax(tensor, dim)` | diff --git a/burn-book/src/getting-started.md b/burn-book/src/getting-started.md index 0140b2e69..4957dbffb 100644 --- a/burn-book/src/getting-started.md +++ b/burn-book/src/getting-started.md @@ -28,7 +28,7 @@ libraries/packages your code depends on, and build said libraries. Below is a quick cheat sheet of the main `cargo` commands you might use throughout this guide. | Command | Description | -| ------------------- | -------------------------------------------------------------------------------------------- | +|---------------------|----------------------------------------------------------------------------------------------| | `cargo new` _path_ | Create a new Cargo package in the given directory. | | `cargo add` _crate_ | Add dependencies to the Cargo.toml manifest file. | | `cargo build` | Compile the local package and all of its dependencies (in debug mode, use `-r` for release). | @@ -165,7 +165,7 @@ Tensor { ``` While the previous example is somewhat trivial, the upcoming -[basic workflow section](./basic-workflow/README.md) will walk you through a much more relevant example for +basic workflow section will walk you through a much more relevant example for deep learning applications. ## Running examples