candle

Commit Graph

Author	SHA1	Message	Date
Laurent Mazare	3144150b8d	Move the tensor-tools binary in a separate crate. (#1969 )	2024-03-30 15:49:37 +01:00
Laurent Mazare	efe4a0c84b	Add a print command to tensor-tools. (#1967 ) * Add a print command to tensor-tools. * Add some flags to tweak the formatting.	2024-03-30 11:34:33 +01:00
Laurent Mazare	badf886583	Cuda kernel for dequantizing q8k. (#1760 ) * Cuda kernel for dequantizing q8k. * Clippy lints.	2024-02-26 08:42:44 +01:00
Laurent Mazare	2f22afd80e	Cuda acceleration for quantized model. (#1754 ) * Boilerplate for the quantized cuda support. * More basic cuda support. * More cuda quantization (quantize on cpu for now). * Add the dequantization bit. * Start adding some dedicated cuda kernels from llama.cpp. * Move the kernel code. * Start interfacing with the kernel. * Tweak the kernel launch params. * Bugfix for quantized metal. * Fix some clippy lints. * Tweak the launch parameters. * Tweak cuda basics to perform a quantized matmul. * Perform the dequantization on the cpu + use cublas for matmul. * Add the dequantization kernel. * Test the qmatmul. * More kernels. * Matmul-vec kernel. * Add a couple kernels. * More dequantization kernels.	2024-02-25 18:11:47 +01:00
Dilshod Tadjibaev	b75e8945bc	Enhance pickle to retrieve state_dict with a given key (#1671 )	2024-02-06 21:17:33 +01:00
Nicolas Patry	403680f17d	Quantized GGUF style (#1523 ) * Metal quantized modifications proposal. - Add a device param, wherever needed. - Create new QMetal storage thing that implements QuantizedType. - Update everywhere needed. Fix Python. Fixing examples. Fix: fmt + clippy + stub. Moving everything around. Only missing the actual implems. Fixing everything + adding dequantized kernels. More work. Fixing matmul. Fmt + Clippy Some clippy fixes. Working state. Q2K Metal -> Bugged (also present in GGML). Q4K CPU -> Bugged (present previously, new test catch it). Q5K CPU -> Bugged (present previously). Q8_1 Both -> Never really implemented it seems Q8K metal -> Never implemented in metal Fixing Q2K bug (present in ggml). * Cleanup. * Fix the rebase. * Removing the fences speeds everything up and is correct this time... * Cleanup the fence. * After rebase. * Bad code removal. * Rebase after phi2 merge + fix replit default to CPU. * Making the CI happy. * More happy tests. --------- Co-authored-by: Nicolas Patry <nicolas@Nicolass-MacBook-Pro.local>	2024-01-17 10:27:58 +01:00
Laurent Mazare	0fc95c9f0c	Add a dequantize command to tensor-tools. (#1565 ) * Add a dequantize command to tensor-tools. * Clippy fixes.	2024-01-11 11:21:01 +01:00
Laurent Mazare	bfa7c8fc01	Implement the module trait directly for QMatMul. (#1372 )	2023-11-25 10:09:45 +00:00
Laurent Mazare	deee7612da	Quantized version of mistral. (#1009 ) * Quantized version of mistral. * Integrate the quantized mistral variant. * Use the quantized weight files. * Tweak the quantization command. * Fix the dtype when computing the rotary embeddings. * Update the readme with the quantized version. * Fix the decoding of the remaining tokens.	2023-09-30 18:25:47 +01:00
Laurent Mazare	ccf352f3d1	Use yoke to provide a self-referential container for mmaped safetenso… (#939 ) * Use yoke to provide a self-referential container for mmaped safetensor files. * Add the new self-owned type for safetensor files without removing the previous version. * Add routing. * Add an initializer for the case of multiple files.	2023-09-23 15:43:11 +01:00
Laurent Mazare	912a3d63b0	Use the proper block size for quantizing models. (#933 ) * Use the proper block size for quantizing models. * Use the proper dimension.	2023-09-22 21:36:56 +01:00
Laurent Mazare	3b557765e8	T5 quantized example (#922 ) * Load gguf files for the quantized t5. * Add the quantized t5 example. * Allow for loading local files. * Add some support for quantizing safetensor files. * Transpose before quantizing. * Quantized t5. * Retrieve the weights from the hub.	2023-09-21 12:33:15 +01:00
Laurent Mazare	1c9e5394a5	Add a custom softmax implementation. (#744 ) * Add a custom softmax implementation. * Add softmaxlastdim to the benchmarks. * And add a test. * Support more dtypes. * Polish the code. * Use the slow implementation on cuda. * Add a todo for the cuda kernel.	2023-09-05 14:20:23 +01:00
Laurent Mazare	a044907ffc	Dilated convolutions (#657 ) * Add the dilation parameter. * Restore the basic optimizer example. * Dilation support in cudnn. * Use the dilation parameter in the cpu backend. * More dilation support. * No support for dilation in transposed convolutions. * Add dilation to a test. * Remove a print. * Helper function.	2023-08-29 16:12:11 +01:00
Laurent Mazare	be471d50ab	Llama quantization. (#625 )	2023-08-27 14:08:15 +01:00
Laurent Mazare	7151f2cf63	Add the quantize command. (#624 ) * Add the quantize command. * Bugfix for writing gguf files. * And add a comment.	2023-08-27 11:35:19 +01:00
Laurent Mazare	2cde0cb74b	More pickle support. (#588 ) * More pickle support. * Be more verbose.	2023-08-24 18:45:10 +01:00
Laurent Mazare	ca318a6ec7	Add to the cuda example a reproduction of the issue. (#579 ) * Add to the cuda example a reproduction of the issue. * Tweak. * Add a test using non-square matrixes. * Fix the conv2d kernel. * Display the error. * And tweak the comment.	2023-08-24 12:07:31 +01:00
Laurent Mazare	dd64465899	Add a test for conv2d with padding + bugfix the random number generation on cuda. (#578 ) * Add a test for conv2d with padding. * Cosmetic changes. * Bugfix the rand function on the cuda backend.	2023-08-24 10:16:37 +01:00
Laurent Mazare	aba1e90797	Add some group parameter to convolutions. (#566 ) * Add some group parameter to convolutions. * Avoid some unnecessary groups checks. * Move the tensor convolution bits. * Properh handling of groups. * Bump the crate version. * And add a changelog.	2023-08-23 12:58:55 +01:00
Laurent Mazare	0764741cc4	Handle GGUF files in tensor-tools. (#558 )	2023-08-23 06:32:07 +01:00
Laurent Mazare	551409092e	Small tweaks to tensor-tools. (#517 )	2023-08-19 16:50:26 +01:00
Laurent Mazare	6431140250	Retrieve tensor data from PyTorch files. (#516 )	2023-08-19 15:57:18 +01:00
Laurent Mazare	607ffb9f1e	Retrieve more information from PyTorch checkpoints. (#515 ) * Retrieve more information from PyTorch checkpoints. * Add enough support to load dino-v2 backbone weights.	2023-08-19 15:05:34 +01:00
Laurent Mazare	f861a9df6e	Add ggml support to tensor-tools (#512 ) * Pickle work-in-progress. * More unpickling. * More pickling. * Proper handling of setitems. * Clippy. * Again more pickling. * Restore the example. * Add enough pickle support to get the list of tensors. * Read the data from zip files. * Retrieve the tensor shape. * Extract the size and dtype. * More storage types. * Improve the destructuring. * Also support ggml files.	2023-08-19 11:45:22 +01:00
Laurent Mazare	ad33715c61	Preliminary support for importing PyTorch weights. (#511 ) * Pickle work-in-progress. * More unpickling. * More pickling. * Proper handling of setitems. * Clippy. * Again more pickling. * Restore the example. * Add enough pickle support to get the list of tensors. * Read the data from zip files. * Retrieve the tensor shape. * Extract the size and dtype. * More storage types. * Improve the destructuring.	2023-08-19 11:26:32 +01:00
Laurent Mazare	90ff04e77e	Add the tensor-tools binary. (#510 )	2023-08-19 09:06:44 +01:00
Laurent Mazare	a22b1bed7b	Tensor -> QTensor conversion (#496 ) * Sketch some qmatmul test. * Add the quantization function. * More testing. * Make the test smaller and faster. * Add some shape checking.	2023-08-18 08:19:20 +01:00
Laurent Mazare	306c8eee7a	AVX version of the vecdot for q4_0. (#474 ) * AVX version of the vecdot for q4_0. * Tweak the avx bits. * Add a qmatmul benchmark. * Fix the quantized test.	2023-08-17 07:03:32 +01:00
Laurent Mazare	90374097dc	Cudnn support (#445 ) * Add a cudnn feature to be used for conv2d. * Allocate the proper workspace. * Only create a single cudnn handle per cuda device. * Proper cudnn usage. * Bugfix.	2023-08-14 21:30:41 +01:00
Laurent Mazare	d379a76a9e	Add a softmax bench. (#433 ) * Add a softmax bench. * Add the vectorized sum reduce.	2023-08-13 20:09:18 +01:00
Laurent Mazare	5a63b51f14	Add a matmul benchmark. (#429 )	2023-08-13 13:41:03 +01:00
Laurent Mazare	9aca398a4f	More accelerate optimizations (#427 ) * Add more tracing to the whisper example. * Support accelerate in more examples. * Use accelerate for pointwise functions. * Use accelerate for binary operations too. * Bugfix for binary operation: use the rhs before the lhs.	2023-08-13 12:53:34 +01:00
Laurent Mazare	ff53f38467	Small example for benchmarking some cpu ops (#394 ) * Refactor the benchmark example. * Rename the example. * Add some comments.	2023-08-10 17:00:17 +01:00
Laurent Mazare	fcfdcbd337	Add a conv1d benchmark based on the whisper sizes. (#377 ) * Add a conv1d benchmark based on the whisper sizes. * Enforce the batch-dim in conv1d.	2023-08-09 20:27:03 +01:00
Laurent Mazare	608b2358c6	Add some conv1d test + bugfix using padding. (#349 )	2023-08-08 20:50:20 +01:00
Laurent Mazare	b278834267	Support the Accelerate BLAS on macOS. (#325 ) * Add the accelerate feature. * Ffi tweaks.	2023-08-05 17:25:24 +01:00
Laurent Mazare	51e51da896	Rename the candle crate to candle-core (#301 ) * Rename to candle-core. * More candle-core renaming.	2023-08-02 08:20:22 +01:00
Laurent Mazare	6475bfadfe	Simplify Tensor::randn. (#255 ) * Simplify Tensor::randn. * Also switch Tensor::rand to use a generic dtype. * Support sampling for f16. * Cleanup.	2023-07-27 07:40:36 +01:00
Laurent Mazare	a2f72edc0d	Simplify the parameters used by sum and sum_keepdim. (#165 )	2023-07-14 08:22:08 +01:00
Laurent Mazare	2bfa791336	Use the same default as pytorch for sum. (#164 )	2023-07-13 21:32:32 +01:00
Laurent Mazare	e676f85f00	Sketch a fast cuda kernel for reduce-sum. (#109 ) * Sketch a fast cuda kernel for reduce-sum. * Sketch the rust support code for the fast sum kernel. * More work on the fast kernel. * Add some testing ground. * A couple fixes for the fast sum kernel.	2023-07-08 12:43:56 +01:00
Laurent Mazare	33479c5f1b	Add some very simple sum benchmark. (#108 ) * Add some very simple sum benchmark. * Rename the file.	2023-07-08 08:39:27 +01:00
Laurent Mazare	c297a50960	Add mkl support for matrix multiply. (#86 ) * Fix some rebase issues. * Use mkl instead. * Use mkl in bert. * Add the optional mkl feature. * Conditional compilation based on the mkl feature. * Add more mkl support.	2023-07-06 11:05:05 +01:00
laurent	fdb1acd2ff	Move llama in a cargo-examples directory.	2023-07-03 11:30:58 +01:00
Nicolas Patry	81cec86e75	Adding a bit more docs around safety.	2023-07-03 11:55:54 +02:00
laurent	783b7054ee	Move more safetensors bits to the shared module.	2023-07-03 09:34:08 +01:00
laurent	cf2789fb81	Move some safetensors bits in the candle-core crate.	2023-07-03 08:37:46 +01:00
laurent	7c65e2d187	Add a flag for custom prompt.	2023-07-01 06:36:22 +01:00
laurent	679b6987b6	Early conversion for the llama weights.	2023-06-30 16:42:53 +01:00

1 2

72 Commits