Go to file

Nicolas Patry 834e1b197b Adding a documentation book.		2023-07-26 18:06:31 +02:00
.cargo	Improve the wasm ui. (#178 )	2023-07-16 14:22:40 +01:00
.github/workflows	Adding a documentation book.	2023-07-26 18:06:31 +02:00
candle-book	Adding a documentation book.	2023-07-26 18:06:31 +02:00
candle-core	Use bail rather than wrapping a string where possible. (#249 )	2023-07-26 15:42:46 +01:00
candle-examples	Use bail rather than wrapping a string where possible. (#249 )	2023-07-26 15:42:46 +01:00
candle-flash-attn	Use bail rather than wrapping a string where possible. (#249 )	2023-07-26 15:42:46 +01:00
candle-kernels	Add a test for scatter add. (#238 )	2023-07-25 09:12:14 +01:00
candle-nn	Move some shared functions to the nn module. (#221 )	2023-07-22 13:25:11 +01:00
candle-pyo3	Centralize the dependency versions and inherit them. (#177 )	2023-07-16 07:47:17 +01:00
candle-transformers	Fix the mkl build. (#204 )	2023-07-19 19:41:11 +01:00
candle-wasm-examples	Updated.	2023-07-26 15:21:50 +02:00
.gitignore	Re-organize the wasm examples (#231 )	2023-07-24 12:36:02 +01:00
.gitmodules	Add flash attention (#241 )	2023-07-26 07:48:10 +01:00
.pre-commit-config.yaml	Fixing tokenizers dep.	2023-06-22 12:25:58 +02:00
Cargo.toml	Add flash attention (#241 )	2023-07-26 07:48:10 +01:00
Makefile	Fix two cuda bugs (matmul and where_cond).	2023-06-27 11:31:04 +01:00
README.md	Polish the llama2 wasm ui. (#232 )	2023-07-24 15:28:27 +01:00

README.md

candle

ML framework for Rust

let a = Tensor::zeros((2, 3), DType::F32, &Device::Cpu)?;
let b = Tensor::zeros((3, 4), DType::F32, &Device::Cpu)?;

let c = a.matmul(&b)?;

Check out our examples

Check out our examples:

Whisper
Llama and Llama-v2
Bert (Useful for sentence embeddings)
Falcon

cargo run --example bert --release
cargo run --example whisper --release
cargo run --example llama --release
cargo run --example falcon --release

In order to use CUDA add --features cuda to the example command line.

There are also some wasm examples for whisper and llama2.c. You can either build them with trunk or try them online: whisper, llama2.

For llama2, run the following command to retrieve the weight files and start a test server:

cd candle-wasm-examples/llama2-c
wget https://karpathy.ai/llama2c/model.bin
wget https://github.com/karpathy/llama2.c/raw/master/tokenizer.bin
trunk serve --release --public-url /candle-llama2/ --port 8081

And then browse to http://localhost:8081/candle-llama2.

Features

Simple syntax (looks and like PyTorch)
CPU and Cuda backends, m1, f16, bf16 (and tentatively wasm)
Enable serverless (CPU), small and fast deployments
Model training
Distributed computing (NCCL).
Models out of the box (Llama, Whisper, Falcon, ...)
Emphasis on enabling users to use custom ops/kernels

How to use ?

Cheatsheet:

	Using PyTorch	Using Candle
Creation	`torch.Tensor([[1, 2], [3, 4]])`	`Tensor::new(`
		`&[[1f32, 2.]], [3., 4.]],`
		`&Device::Cpu)?`
Indexing	`tensor[:, :4]`	`tensor.i((.., ..4))?`
Operations	`tensor.view((2, 2))`	`tensor.reshape((2, 2))?`
Operations	`a.matmul(b)`	`a.matmul(&b)?`
Arithmetic	`a + b`	`&a + &b`
Device	`tensor.to(device="cuda")`	`tensor.to_device(&Device::Cuda(0))?`
Dtype	`tensor.to(dtype=torch.float16)`	`tensor.to_dtype(&DType::F16)?`
Saving	`torch.save({"A": A}, "model.bin")`	`tensor.save_safetensors("A", "model.safetensors")?`
Loading	`weights = torch.load("model.bin")`	TODO (see the examples for now)

Structure

candle-core: Core ops, devices, and Tensor struct definition
candle-nn: Facilities to build real models
candle-examples: Real-world like examples on how to use the library in real settings
candle-kernels: CUDA custom kernels

FAQ

Why Candle?

Candle stems from the need to reduce binary size in order to enable serverless possible by making the whole engine smaller than PyTorch very large library volume. This enables creating runtimes on a cluster much faster.

And simply removing Python from production workloads. Python can really add overhead in more complex workflows and the GIL is a notorious source of headaches.

Rust is cool, and a lot of the HF ecosystem already has Rust crates safetensors and tokenizers.

Other ML frameworks

dfdx is a formidable crate, with shapes being included in types preventing a lot of headaches by getting compiler to complain about shape mismatch right off the bat However we found that some features still require nightly and writing code can be a bit dauting for non rust experts.

We're leveraging and contributing to other core crates for the runtime so hopefully both crates can benefit from each other
burn is a general crate that can leverage multiple backends so you can choose the best engine for your workload
tch-rs Bindings to the torch library in Rust. Extremely versatile, but they do bring in the entire torch library into the runtime. The main contributor of tch-rs is also involved in the development of candle.

Missing symbols when compiling with the mkl feature.

If you get some missing symbols when compiling binaries/tests using the mkl features, e.g.:

  = note: /usr/bin/ld: (....o): in function `blas::sgemm':
          .../blas-0.22.0/src/lib.rs:1944: undefined reference to `sgemm_' collect2: error: ld returned 1 exit status

  = note: some `extern` functions couldn't be found; some native libraries may need to be installed or have their path specified
  = note: use the `-l` flag to specify native libraries to link
  = note: use the `cargo:rustc-link-lib` directive to specify the native libraries to link with Cargo (see https://doc.rust-lang.org/cargo/reference/build-scripts.html#cargorustc-link-libkindname)

This is likely due to some missing linker flag that enable the mkl library. You can try adding the following at the top of your binary:

extern crate intel_mkl_src;

How to know where an error comes from.

You can set RUST_BACKTRACE=1 to be provided with backtraces when a candle error is generated.