burn/crates/burn-tensor
Guillaume Lagrange aa79e36a8d
Add more quantization support for burn-jit (#2275)
* Add cubecl quantization kernels and QTensorOps for burn-jit

* Fix typo

* Fix output vec factor

* Fix output dtype size_of

* Remove unused code in dequantize test

* Fix dequantize vectorization

* Handle tensors when number of elems is not a multiple of 4

* Support quantize for tensors with less than 4 elems (no vectorization)

* Fix equal 0 test

* Add quantize/dequantize tests

* Add q_to_device

* Refactor kernels for latest cubecl

* intermediate i32 cast

* Fix size_of output type

* Use strict=false to ignore floating point precision issues with qparams equality

* Only check that lhs & rhs strategies match (but not strict on qparams values)

* Use assert_approx_eq on dequant values

* Reduce precision for flaky test

* Remove todo comment

* Add comment for cast to unsigned

* More comment

---------

Co-authored-by: louisfd <louisfd94@gmail.com>
2024-09-17 10:08:20 -04:00
..
src Add more quantization support for burn-jit (#2275) 2024-09-17 10:08:20 -04:00
Cargo.toml Bump burn version to 0.15.0 2024-08-27 15:13:40 -04:00
LICENSE-APACHE Update licenses symlinks (#1613) 2024-04-12 14:43:58 -04:00
LICENSE-MIT Update licenses symlinks (#1613) 2024-04-12 14:43:58 -04:00
README.md [refactor] Move burn crates to their own crates directory (#1336) 2024-02-20 13:57:55 -05:00
env.bash [refactor] Move burn crates to their own crates directory (#1336) 2024-02-20 13:57:55 -05:00

README.md

Burn Tensor

Burn Tensor Library

Current Crates.io Version license

This library provides multiple tensor implementations hidden behind an easy to use API that supports reverse mode automatic differentiation.

Features

  • Flexible
  • CPU + GPU 🙏
  • Multi-Threads 🚀
  • Intuitive Usage 😌
  • No Global State 🚫
  • Multiple Backends 🦾
  • Reverse Mode Autodiff 🔥

Backends

For now, three backends are implemented, and some more are planned.

Autodiff

Automatic differentiation is implemented as just another tensor backend without any global state. It's possible since we keep track of the order in which each operation as been executed and the tape is only created when calculating the gradients. To do so, each operation creates a new node which has a reference to its parent nodes. Therefore, creating the tape only requires a simple and efficient graph traversal algorithm.

    let x = AutodiffTensor::from_tensor(x_ndarray);
    let y = ADtodiffTensor::from_tensor(y_ndarray);

    let z = x.matmul(&y);

    let grads = z.backward();

    let x_grad = x.grad(&grads);
    let y_grad = y.grad(&grads);

Cuda

To run with CUDA set TORCH_CUDA_VERSION=cu121.

Notes

This crate can be used alone without the entire burn stack and with only selected backends for smaller binaries.

Feature Flags

This crate can be used without the standard library (#![no_std]) with alloc by disabling the default std feature.

  • std - enables the standard library.
  • burn-tensor-testgen - enables test macros for generating tensor tests.