Laurent Mazare
d728e646c2
Use resolver 2 explicitely. ( #597 )
2023-08-25 09:35:40 +01:00
Laurent Mazare
aba1e90797
Add some group parameter to convolutions. ( #566 )
...
* Add some group parameter to convolutions.
* Avoid some unnecessary groups checks.
* Move the tensor convolution bits.
* Properh handling of groups.
* Bump the crate version.
* And add a changelog.
2023-08-23 12:58:55 +01:00
Laurent Mazare
20ce3e9f39
Sketch the yolo wasm example. ( #546 )
...
* Sketch the yolo wasm example.
* Web ui.
* Get the web ui to work.
* UI tweaks.
* More UI tweaks.
* Use the natural width/height.
* Add a link to the hf space in the readme.
2023-08-22 11:56:43 +01:00
Laurent Mazare
a8f61e66cc
Bump the crates version to 0.1.2. ( #522 )
2023-08-20 08:07:07 +01:00
Laurent Mazare
531f23b4d0
Rename vec-dot to vec-ops. ( #449 )
...
* Rename vec-dot to vec-ops.
* Also bump the crate version.
* Add a currently empty readme.
2023-08-15 10:48:57 +01:00
Laurent Mazare
495e0b7580
Simd support ( #448 )
...
* Import the simd intrinsics in candle-core.
* simd version of reduce-sum.
* Bugfix.
* Fix some clippy lints.
2023-08-15 09:50:38 +01:00
Laurent Mazare
c84883ecf2
Add a cuda kernel for upsampling. ( #441 )
...
* Add a cuda kernel for upsampling.
* Update for the latest tokenizers version.
2023-08-14 13:12:17 +01:00
Laurent Mazare
e29c7809ec
Parallelise the CPU kernels for the conv ops. ( #401 )
...
* Parallelise the conv2d op.
* Tighter control on threading.
* Also parallelise conv1d.
* Add some safety comment.
2023-08-11 05:51:58 +01:00
Nicolas Patry
379eadc68e
Working now.
2023-08-10 19:43:25 +02:00
Nicolas Patry
7e4fbc1e17
[DO NOT MERGE] temporary PR so users can try out on older GPUs.
2023-08-10 19:36:31 +02:00
Laurent Mazare
c8039579a5
Conv1d optimize ( #392 )
...
* Reorder the conv1d loops in the cpu backend.
* Optimize the 1d convolution.
* Conv1D optimize.
* Fix some clippy lints.
2023-08-10 15:23:52 +01:00
Lei
3bbc08a8df
Fix randn cpu ( #382 )
...
* Change distributions
Standard generates in [0, 1), Normal is correct.
* Add test
Not sure if this is the best place to put the test
* Remove unnecessary use
2023-08-10 05:33:44 +01:00
Laurent Mazare
da26e2832c
Update gemm to 0.15.6. ( #378 )
2023-08-09 21:04:28 +01:00
Laurent Mazare
3a62aee91f
Write the generated images using the image crate. ( #363 )
...
* Use the image crate to write the generated images.
* Make the dependency optional.
2023-08-09 15:26:44 +01:00
Laurent Mazare
e72ba0b9e7
Add the license files. ( #335 )
2023-08-07 14:11:27 +01:00
Laurent Mazare
b278834267
Support the Accelerate BLAS on macOS. ( #325 )
...
* Add the accelerate feature.
* Ffi tweaks.
2023-08-05 17:25:24 +01:00
Laurent Mazare
620f83cf66
Add the candle-datasets crate ( #322 )
...
* Move the vision datasets to a separate crate.
* Move the batcher bits.
* Update the readme.
* Move the tiny-stories bits.
---------
Co-authored-by: Jane Doe <jane.doe@example.org>
2023-08-05 08:56:50 +01:00
Laurent Mazare
4fe8a02f88
Update the repo location. ( #305 )
2023-08-02 11:12:18 +01:00
Laurent Mazare
d38943aadc
Add version numbers for all the candle crates ( #303 )
...
* Switch to candle-gemm for the time being.
* Add the missing versions.
2023-08-02 10:52:13 +01:00
Laurent Mazare
6e33ff62d6
Update cudarc now that it includes the cublas-f16 and nccl changes. ( #300 )
2023-08-02 05:54:28 +01:00
Nicolas Patry
d2dea11ef6
Fixing nccl feature.
2023-07-28 12:19:20 +02:00
Nicolas Patry
4f260ef025
Merge pull request #216 from LaurentMazare/llama_multiprocess2
...
TP sharding v2
2023-07-28 08:06:13 +01:00
Nicolas Patry
ca479a873e
Upgrading hf-hub to `0.2.0` (Modified API to not pass the Repo around
...
all the time)
2023-07-27 20:05:02 +02:00
Nicolas Patry
b7814f66b4
PyO3 is back.
2023-07-27 09:58:47 +02:00
Nicolas Patry
ed58de7551
Fixed TP sharded version.
2023-07-27 09:58:46 +02:00
Nicolas Patry
1735e4831e
TP sharding v2
2023-07-27 09:58:14 +02:00
Laurent Mazare
6475bfadfe
Simplify Tensor::randn. ( #255 )
...
* Simplify Tensor::randn.
* Also switch Tensor::rand to use a generic dtype.
* Support sampling for f16.
* Cleanup.
2023-07-27 07:40:36 +01:00
Laurent Mazare
89fd988836
Update to the latest gemm. ( #250 )
2023-07-26 17:00:02 +01:00
Laurent Mazare
d9f9c859af
Add flash attention ( #241 )
...
* Add some flash-attn kernel, import the code for flash-attn v2 from Dao-AILab.
* More flash attn.
* Set up the flash attn parameters.
* Get things to compile locally.
* Move the flash attention files in a different directory.
* Build the static C library with nvcc.
* Add more flash attention.
* Update the build part.
* Better caching.
* Exclude flash attention from the default workspace.
* Put flash-attn behind a feature gate.
* Get the flash attn kernel to run.
* Move the flags to a more appropriate place.
* Enable flash attention in llama.
* Use flash attention in llama.
2023-07-26 07:48:10 +01:00
Laurent Mazare
5a26cba733
Re-organize the wasm examples ( #231 )
...
* Move the whisper example.
* More renaming.
* Add llama2 as a new wasm example.
* Live generation.
* More of the llama wasm example.
* Formatting.
2023-07-24 12:36:02 +01:00
Laurent Mazare
dc416243a3
Bump the hf-hub dependency to 0.1.3. ( #206 )
2023-07-20 07:27:52 +01:00
Laurent Mazare
c34f932319
Fix the mkl build. ( #204 )
...
* Fix the mkl build.
* Fix the build properly.
2023-07-19 19:41:11 +01:00
Nicolas Patry
439321745a
Removing `candle-hub` internal to extract into `hf-hub` standalone.
2023-07-19 15:04:38 +02:00
Laurent Mazare
b8abe2bb4b
Factorize the tokenizers version in the workspace cargo def. ( #186 )
2023-07-18 06:48:13 +01:00
Laurent Mazare
f0cccd08f0
Bert tracing ( #184 )
...
* Add some tracing to bert.
* More tracing.
* Add a flag for tracing.
2023-07-17 19:40:42 +01:00
Laurent Mazare
49ea09c73c
Gemm update ( #183 )
...
* Update the gemm dependency.
* Update the comment too.
* Pin the sha256 dependency.
2023-07-17 14:05:39 +01:00
Laurent Mazare
104f89df31
Centralize the dependency versions and inherit them. ( #177 )
2023-07-16 07:47:17 +01:00
Laurent Mazare
d1f5d44c04
Reenable pyo3 in the workspace list ( #170 )
...
* Enable pyo3 back.
* Adapt the CI.
2023-07-14 19:54:38 +01:00
Nicolas Patry
4ed56d7861
Removing cuda default.
...
Seems very important for a lot of exploring users usually on laptop
without GPUs.
Adding more README instructions in a follow up.
2023-07-14 16:52:15 +02:00
Laurent Mazare
88f666781f
Wasm proof of concept. ( #167 )
...
* Wasm proof of concept.
* Run whisper inference in the browser.
* Some fixes.
* Move the wasm example.
* Change the tokenizer config.
2023-07-14 14:51:46 +01:00
Laurent Mazare
21aa29ddce
Use a rwlock for inner mutability. ( #156 )
...
* Use a rw-lock.
* Make clippy happier.
2023-07-13 11:25:24 +01:00
Laurent Mazare
50b0946a2d
Tensor mutability ( #154 )
...
* Working towards tensor mutability.
* Use a ref-cell to provide tensor mutability.
2023-07-13 11:04:40 +01:00
Laurent Mazare
ba35d895e7
Sketch the candle-transformers crate. ( #147 )
...
* Sketch the candle-transformers crate.
* Format the empty files.
2023-07-12 13:49:31 +01:00
Laurent Mazare
9ce0f1c010
Sketch the candle-nn crate. ( #115 )
...
* Sketch the candle-nn crate.
* Tweak the cuda dependencies.
* More cuda tweaks.
2023-07-10 08:50:09 +01:00
Laurent Mazare
4afa461b34
Sketch the Falcon model. ( #93 )
...
* Sketch the Falcon model.
* Add more substance to the falcon example.
* Falcon (wip).
* Falcon (wip again).
* Falcon inference.
* Get the weights from the api and properly generate the model.
* Use the proper model.
* Fix the file/revision names.
* Fix bias handling.
* Recompute the rot embeddings.
* Fix the input shape.
* Add the release-with-debug profile.
* Silly bugfix.
* More bugfixes.
* Stricter shape checking in matmul.
2023-07-06 19:01:21 +01:00
laurent
fdb1acd2ff
Move llama in a cargo-examples directory.
2023-07-03 11:30:58 +01:00
laurent
ebb0fedf14
Very simple pyo3 bindings for candle.
2023-07-01 20:36:44 +01:00
laurent
af66f0829e
Revert the new profile.
2023-06-29 19:08:50 +01:00
laurent
3232df9458
Add some KV cache to llama.
2023-06-29 15:29:40 +01:00
Nicolas Patry
1a82bc50c9
[Tmp] Adding candle-hub
2023-06-27 13:58:23 +02:00
Nicolas Patry
d7f729fb8f
Refactor the hierarchy.
2023-06-27 11:57:27 +02:00
laurent
22da2c7e02
More f16 and bf16 support.
2023-06-26 20:52:01 +01:00
laurent
a31411fd91
Start adding f16/bf16 support.
2023-06-26 19:37:47 +01:00
laurent
11696e6377
Faster model weight loading.
2023-06-26 07:40:11 +01:00
laurent
96c098b6cd
Remove the unecessary features.
2023-06-24 18:15:44 +01:00
laurent
a7f80e258f
Read and write npy files.
2023-06-24 18:12:10 +01:00
Nicolas Patry
04cf14f35a
Moving to `gemm` and adding matmul backprop.
...
- Tentative `T` operator.
2023-06-22 12:37:02 +02:00
Nicolas Patry
9ea220fc6e
Fixing tokenizers dep.
2023-06-22 12:25:58 +02:00
Nicolas Patry
ce977b489e
Adding matmul?
2023-06-22 12:25:58 +02:00
laurent
083ced4428
Integrate the kernels bits.
2023-06-22 09:59:00 +01:00
laurent
7adffafeda
Abstract the gradient storage.
2023-06-21 14:29:48 +01:00
laurent
9698211d56
Add some very basic tensor type.
2023-06-19 17:26:50 +01:00