Commit Graph

112 Commits

Author SHA1 Message Date
Laurent Mazare d728e646c2
Use resolver 2 explicitely. (#597) 2023-08-25 09:35:40 +01:00
Laurent Mazare aba1e90797
Add some group parameter to convolutions. (#566)
* Add some group parameter to convolutions.

* Avoid some unnecessary groups checks.

* Move the tensor convolution bits.

* Properh handling of groups.

* Bump the crate version.

* And add a changelog.
2023-08-23 12:58:55 +01:00
Laurent Mazare 20ce3e9f39
Sketch the yolo wasm example. (#546)
* Sketch the yolo wasm example.

* Web ui.

* Get the web ui to work.

* UI tweaks.

* More UI tweaks.

* Use the natural width/height.

* Add a link to the hf space in the readme.
2023-08-22 11:56:43 +01:00
Laurent Mazare a8f61e66cc
Bump the crates version to 0.1.2. (#522) 2023-08-20 08:07:07 +01:00
Laurent Mazare 531f23b4d0
Rename vec-dot to vec-ops. (#449)
* Rename vec-dot to vec-ops.

* Also bump the crate version.

* Add a currently empty readme.
2023-08-15 10:48:57 +01:00
Laurent Mazare 495e0b7580
Simd support (#448)
* Import the simd intrinsics in candle-core.

* simd version of reduce-sum.

* Bugfix.

* Fix some clippy lints.
2023-08-15 09:50:38 +01:00
Laurent Mazare c84883ecf2
Add a cuda kernel for upsampling. (#441)
* Add a cuda kernel for upsampling.

* Update for the latest tokenizers version.
2023-08-14 13:12:17 +01:00
Laurent Mazare e29c7809ec
Parallelise the CPU kernels for the conv ops. (#401)
* Parallelise the conv2d op.

* Tighter control on threading.

* Also parallelise conv1d.

* Add some safety comment.
2023-08-11 05:51:58 +01:00
Nicolas Patry 379eadc68e Working now. 2023-08-10 19:43:25 +02:00
Nicolas Patry 7e4fbc1e17 [DO NOT MERGE] temporary PR so users can try out on older GPUs. 2023-08-10 19:36:31 +02:00
Laurent Mazare c8039579a5
Conv1d optimize (#392)
* Reorder the conv1d loops in the cpu backend.

* Optimize the 1d convolution.

* Conv1D optimize.

* Fix some clippy lints.
2023-08-10 15:23:52 +01:00
Lei 3bbc08a8df
Fix randn cpu (#382)
* Change distributions

Standard generates in [0, 1), Normal is correct.

* Add test

Not sure if this is the best place to put  the test

* Remove unnecessary use
2023-08-10 05:33:44 +01:00
Laurent Mazare da26e2832c
Update gemm to 0.15.6. (#378) 2023-08-09 21:04:28 +01:00
Laurent Mazare 3a62aee91f
Write the generated images using the image crate. (#363)
* Use the image crate to write the generated images.

* Make the dependency optional.
2023-08-09 15:26:44 +01:00
Laurent Mazare e72ba0b9e7
Add the license files. (#335) 2023-08-07 14:11:27 +01:00
Laurent Mazare b278834267
Support the Accelerate BLAS on macOS. (#325)
* Add the accelerate feature.

* Ffi tweaks.
2023-08-05 17:25:24 +01:00
Laurent Mazare 620f83cf66
Add the candle-datasets crate (#322)
* Move the vision datasets to a separate crate.

* Move the batcher bits.

* Update the readme.

* Move the tiny-stories bits.

---------

Co-authored-by: Jane Doe <jane.doe@example.org>
2023-08-05 08:56:50 +01:00
Laurent Mazare 4fe8a02f88
Update the repo location. (#305) 2023-08-02 11:12:18 +01:00
Laurent Mazare d38943aadc
Add version numbers for all the candle crates (#303)
* Switch to candle-gemm for the time being.

* Add the missing versions.
2023-08-02 10:52:13 +01:00
Laurent Mazare 6e33ff62d6
Update cudarc now that it includes the cublas-f16 and nccl changes. (#300) 2023-08-02 05:54:28 +01:00
Nicolas Patry d2dea11ef6 Fixing nccl feature. 2023-07-28 12:19:20 +02:00
Nicolas Patry 4f260ef025
Merge pull request #216 from LaurentMazare/llama_multiprocess2
TP sharding v2
2023-07-28 08:06:13 +01:00
Nicolas Patry ca479a873e Upgrading hf-hub to `0.2.0` (Modified API to not pass the Repo around
all the time)
2023-07-27 20:05:02 +02:00
Nicolas Patry b7814f66b4 PyO3 is back. 2023-07-27 09:58:47 +02:00
Nicolas Patry ed58de7551 Fixed TP sharded version. 2023-07-27 09:58:46 +02:00
Nicolas Patry 1735e4831e TP sharding v2 2023-07-27 09:58:14 +02:00
Laurent Mazare 6475bfadfe
Simplify Tensor::randn. (#255)
* Simplify Tensor::randn.

* Also switch Tensor::rand to use a generic dtype.

* Support sampling for f16.

* Cleanup.
2023-07-27 07:40:36 +01:00
Laurent Mazare 89fd988836
Update to the latest gemm. (#250) 2023-07-26 17:00:02 +01:00
Laurent Mazare d9f9c859af
Add flash attention (#241)
* Add some flash-attn kernel, import the code for flash-attn v2 from Dao-AILab.

* More flash attn.

* Set up the flash attn parameters.

* Get things to compile locally.

* Move the flash attention files in a different directory.

* Build the static C library with nvcc.

* Add more flash attention.

* Update the build part.

* Better caching.

* Exclude flash attention from the default workspace.

* Put flash-attn behind a feature gate.

* Get the flash attn kernel to run.

* Move the flags to a more appropriate place.

* Enable flash attention in llama.

* Use flash attention in llama.
2023-07-26 07:48:10 +01:00
Laurent Mazare 5a26cba733
Re-organize the wasm examples (#231)
* Move the whisper example.

* More renaming.

* Add llama2 as a new wasm example.

* Live generation.

* More of the llama wasm example.

* Formatting.
2023-07-24 12:36:02 +01:00
Laurent Mazare dc416243a3
Bump the hf-hub dependency to 0.1.3. (#206) 2023-07-20 07:27:52 +01:00
Laurent Mazare c34f932319
Fix the mkl build. (#204)
* Fix the mkl build.

* Fix the build properly.
2023-07-19 19:41:11 +01:00
Nicolas Patry 439321745a Removing `candle-hub` internal to extract into `hf-hub` standalone. 2023-07-19 15:04:38 +02:00
Laurent Mazare b8abe2bb4b
Factorize the tokenizers version in the workspace cargo def. (#186) 2023-07-18 06:48:13 +01:00
Laurent Mazare f0cccd08f0
Bert tracing (#184)
* Add some tracing to bert.

* More tracing.

* Add a flag for tracing.
2023-07-17 19:40:42 +01:00
Laurent Mazare 49ea09c73c
Gemm update (#183)
* Update the gemm dependency.

* Update the comment too.

* Pin the sha256 dependency.
2023-07-17 14:05:39 +01:00
Laurent Mazare 104f89df31
Centralize the dependency versions and inherit them. (#177) 2023-07-16 07:47:17 +01:00
Laurent Mazare d1f5d44c04
Reenable pyo3 in the workspace list (#170)
* Enable pyo3 back.

* Adapt the CI.
2023-07-14 19:54:38 +01:00
Nicolas Patry 4ed56d7861 Removing cuda default.
Seems very important for a lot of exploring users usually on laptop
without GPUs.

Adding more README instructions in a follow up.
2023-07-14 16:52:15 +02:00
Laurent Mazare 88f666781f
Wasm proof of concept. (#167)
* Wasm proof of concept.

* Run whisper inference in the browser.

* Some fixes.

* Move the wasm example.

* Change the tokenizer config.
2023-07-14 14:51:46 +01:00
Laurent Mazare 21aa29ddce
Use a rwlock for inner mutability. (#156)
* Use a rw-lock.

* Make clippy happier.
2023-07-13 11:25:24 +01:00
Laurent Mazare 50b0946a2d
Tensor mutability (#154)
* Working towards tensor mutability.

* Use a ref-cell to provide tensor mutability.
2023-07-13 11:04:40 +01:00
Laurent Mazare ba35d895e7
Sketch the candle-transformers crate. (#147)
* Sketch the candle-transformers crate.

* Format the empty files.
2023-07-12 13:49:31 +01:00
Laurent Mazare 9ce0f1c010
Sketch the candle-nn crate. (#115)
* Sketch the candle-nn crate.

* Tweak the cuda dependencies.

* More cuda tweaks.
2023-07-10 08:50:09 +01:00
Laurent Mazare 4afa461b34
Sketch the Falcon model. (#93)
* Sketch the Falcon model.

* Add more substance to the falcon example.

* Falcon (wip).

* Falcon (wip again).

* Falcon inference.

* Get the weights from the api and properly generate the model.

* Use the proper model.

* Fix the file/revision names.

* Fix bias handling.

* Recompute the rot embeddings.

* Fix the input shape.

* Add the release-with-debug profile.

* Silly bugfix.

* More bugfixes.

* Stricter shape checking in matmul.
2023-07-06 19:01:21 +01:00
laurent fdb1acd2ff Move llama in a cargo-examples directory. 2023-07-03 11:30:58 +01:00
laurent ebb0fedf14 Very simple pyo3 bindings for candle. 2023-07-01 20:36:44 +01:00
laurent af66f0829e Revert the new profile. 2023-06-29 19:08:50 +01:00
laurent 3232df9458 Add some KV cache to llama. 2023-06-29 15:29:40 +01:00
Nicolas Patry 1a82bc50c9 [Tmp] Adding candle-hub 2023-06-27 13:58:23 +02:00
Nicolas Patry d7f729fb8f Refactor the hierarchy. 2023-06-27 11:57:27 +02:00
laurent 22da2c7e02 More f16 and bf16 support. 2023-06-26 20:52:01 +01:00
laurent a31411fd91 Start adding f16/bf16 support. 2023-06-26 19:37:47 +01:00
laurent 11696e6377 Faster model weight loading. 2023-06-26 07:40:11 +01:00
laurent 96c098b6cd Remove the unecessary features. 2023-06-24 18:15:44 +01:00
laurent a7f80e258f Read and write npy files. 2023-06-24 18:12:10 +01:00
Nicolas Patry 04cf14f35a Moving to `gemm` and adding matmul backprop.
- Tentative `T` operator.
2023-06-22 12:37:02 +02:00
Nicolas Patry 9ea220fc6e Fixing tokenizers dep. 2023-06-22 12:25:58 +02:00
Nicolas Patry ce977b489e Adding matmul? 2023-06-22 12:25:58 +02:00
laurent 083ced4428 Integrate the kernels bits. 2023-06-22 09:59:00 +01:00
laurent 7adffafeda Abstract the gradient storage. 2023-06-21 14:29:48 +01:00
laurent 9698211d56 Add some very basic tensor type. 2023-06-19 17:26:50 +01:00