Commit Graph

610 Commits

Author SHA1 Message Date
Nicolas Patry 5ac3302fac Prebuild all our kernels. 2024-03-18 16:39:38 +01:00
OlivierDehaene b60064780d
feat: add silu activation function (#1706)
* feat: add silu activation function

* use silu/arg in grad

* update candle-nn

* use node
2024-02-14 10:27:22 +01:00
Laurent Mazare 0de0795220
Qmetal tweaks (#1704)
* Add the dummy qmetal backend.

* Fix the metal compilation.
2024-02-13 18:11:17 +01:00
Nicolas Patry c1b418586c
Fixing quantized llama demo on metal. (#1703) 2024-02-13 16:28:56 +01:00
Laurent Mazare ad73e93da2
Detach the tensors on batch-norm eval. (#1702)
* Detach the tensors on batch-norm eval.

* Fix pyo3 bindings.

* Black tweak.

* Formatting.

* Also update the pyo3-onnx formatting.

* Apply black.
2024-02-13 14:26:32 +01:00
Laurent Mazare d0aa197b07
ConvTranspose1d cuda support. (#1697)
* ConvTranspose1d cuda support.

* Add the conv-transpose1d kernel.

* Remove some unused variables.
2024-02-12 15:03:18 +01:00
Laurent Mazare 274bf11633
Support defaultdict in PyTorch checkpoints. (#1696)
* Support defaultdict in PyTorch checkpoints.

* Fix clippy lint.
2024-02-12 10:26:56 +01:00
Laurent Mazare cdc3823d8f
Pickle support: dig within the _rebuild_parameter calls. (#1681) 2024-02-08 13:09:49 +01:00
Dilshod Tadjibaev e5eb9602d0
Add support for loading Fortran contiguous tensors (#1672)
* Add support for loading Fortran contiguous tensors

This commit introduces the ability to handle Fortran contiguous tensors in the tensor loading process. Previously, the code only supported loading tensors that were contiguous in memory, failing with an error for non-contiguous tensors. With this update, tensors identified as Fortran contiguous (column-major order) are now correctly handled by reversing their dimensions after loading. This enhancement ensures broader compatibility with different tensor layouts, improving the robustness of tensor loading operations.

- Check if a tensor is Fortran contiguous using the `is_fortran_contiguous` flag.
- For Fortran contiguous tensors, reverse the dimensions after loading to correctly represent their layout in memory.
- Continue to bail out with an error for tensors that are neither C contiguous nor Fortran contiguous, maintaining the previous behavior for non-contiguous tensors without explicit support.

This change addresses the issue of loading Fortran contiguous tensors, which was previously unsupported, thereby extending the functionality of the tensor loading mechanism to accommodate a wider variety of tensor layouts.

* Add reshape step to handle fortran contiguous case

* Skip fortran contiguous fix if rank is < 2

* Fail on rank 0, 1 if contiguous
2024-02-07 21:49:59 +01:00
Dilshod Tadjibaev b75e8945bc
Enhance pickle to retrieve state_dict with a given key (#1671) 2024-02-06 21:17:33 +01:00
Laurent Mazare adfae2460a
Fix rustfmt. (#1669) 2024-02-06 12:06:06 +01:00
Laurent Mazare b545f54a19
Fix clippy lints. (#1667) 2024-02-06 09:03:36 +01:00
Roma Klapaukh 1ba11f22d6
Fix: pth files don't load on Windows (#1661)
* Don't treat zip path as OS path

* Add a test case

* Add code to generate test pth data
2024-02-06 08:50:55 +01:00
Jiayu Liu 982722019b
add roll function to tensor (#1666) 2024-02-06 08:49:45 +01:00
Ivar Flakstad db923517b3 Merge branch 'main' into ivarflakstad/metal-prng 2024-01-17 18:03:57 +01:00
Nicolas Patry 403680f17d
Quantized GGUF style (#1523)
* Metal quantized modifications proposal.

- Add a device param, wherever needed.
- Create new QMetal storage thing that implements QuantizedType.
- Update everywhere needed.

Fix Python.

Fixing examples.

Fix: fmt + clippy + stub.

Moving everything around.

Only missing the actual implems.

Fixing everything + adding dequantized kernels.

More work.

Fixing matmul.

Fmt + Clippy

Some clippy fixes.

Working state.

Q2K Metal -> Bugged (also present in GGML).
Q4K CPU -> Bugged (present previously, new test catch it).
Q5K CPU -> Bugged (present previously).
Q8_1 Both -> Never really implemented it seems
Q8K metal -> Never implemented in metal

Fixing Q2K bug (present in ggml).

* Cleanup.

* Fix the rebase.

* Removing the fences speeds everything up and *is* correct this time...

* Cleanup the fence.

* After rebase.

* Bad code removal.

* Rebase after phi2 merge + fix replit default to CPU.

* Making the CI happy.

* More happy tests.

---------

Co-authored-by: Nicolas Patry <nicolas@Nicolass-MacBook-Pro.local>
2024-01-17 10:27:58 +01:00
Ivar Flakstad 86a8e58897 Update metal random kernel and set_seed method
* set_seed via buffer content pointer copy + did_modify_range

* ensure random.metal kernel does not write outside of buffer range when tid==0
2024-01-17 09:12:44 +01:00
Ivar Flakstad 79478ff5a1 Seed should be updated by random kernel result. 2024-01-15 11:58:25 +01:00
Laurent Mazare bdd8107fda
Expose the ndarray trait. (#1586) 2024-01-14 20:09:49 +01:00
Ivar Flakstad ecf88a6d38 Merge branch 'main' into ivarflakstad/metal-prng 2024-01-14 17:10:54 +01:00
Laurent Mazare e6d86b0819
Add the pow operator. (#1583)
* Add the pow operator.

* Support the pow operation in onnx.
2024-01-13 20:24:06 +01:00
Nicolas Patry bafe95b660
Fix format. (#1576) 2024-01-12 14:23:17 +01:00
ivarflakstad a3d92ab226
Metal: Activate bfloat affine and add benchmark (#1543)
* Use cfg to seperate benchmark results based on features

* Add bfloat affine and benchmarks

* Fix flops calculation

* Remove allow pragma

* Avoid some unnecessary returns.

* Improve benchmarks layout

---------

Co-authored-by: Laurent <laurent.mazare@gmail.com>
Co-authored-by: Nicolas Patry <patry.nicolas@protonmail.com>
2024-01-12 11:19:49 +01:00
ivarflakstad e90bcdcc7c
Metal: f16 and bf16 where_cond + benchmark (#1545)
* Use cfg to seperate benchmark results based on features

* Add metal where_cond for f16 and bf16. Add benchmark

* Remove allow pragma

* Avoid some unnecessary returns.

* Improve benchmarks layout

* Updated feature separated benchmarks

---------

Co-authored-by: Laurent <laurent.mazare@gmail.com>
2024-01-12 11:18:11 +01:00
Ivar Flakstad e63bb8661b Merge branch 'main' into ivarflakstad/metal-prng 2024-01-12 07:19:58 +01:00
Laurent Mazare 41915184bb
Bugfix for dequantizing q5k layers. (#1569) 2024-01-11 23:15:11 +01:00
Kyle McCarthy 402349d120
feat(bf16): add cast support + tests for cast + bin ops (#1524) 2024-01-11 15:49:13 +01:00
ivarflakstad 9f0c99f0c1
Seperate benchmarks by enabled features (#1538)
* Use cfg to seperate benchmark results based on features

* Remove allow pragma

* Avoid some unnecessary returns.

* Improve benchmarks layout

* Derive bench_name from actual device

* Run CPU benchmarks even when GPU feature is enabled

---------

Co-authored-by: Laurent <laurent.mazare@gmail.com>
2024-01-11 15:35:38 +01:00
Laurent Mazare 0fc95c9f0c
Add a dequantize command to tensor-tools. (#1565)
* Add a dequantize command to tensor-tools.

* Clippy fixes.
2024-01-11 11:21:01 +01:00
Juarez Bochi ae06cb74bb
Add relu kernel for metal (#1488)
* Add relu kernel for metal

* Copy error messages proposed in #1491

* Revert non relu changes

* Fix name changes

* Fix the last of us (:

* Fix copy and paste mistakes

* Fix typo

* Revert order changes

* Revert order change

* Add deleted functions back

* Run rustfmt
2024-01-10 18:27:17 +01:00
Ivar Flakstad 87efb5d8eb Updated feature separated benchmarks 2024-01-09 19:04:31 +01:00
Ivar Flakstad ad181f9cdc Merge branch 'ivarflakstad/seperate-benchmarks-by-feature' into ivarflakstad/metal-prng 2024-01-09 18:55:40 +01:00
Ivar Flakstad 88945f2c22 Improve benchmarks layout 2024-01-09 18:31:28 +01:00
Laurent Mazare 12b2a337f3
Handle start-offset when loading a tensor from a pickle file. (#1546) 2024-01-08 09:20:48 +01:00
Laurent fb05af4c42 Avoid some unnecessary returns. 2024-01-08 07:19:59 +01:00
Ivar Flakstad ad075a5f7e Remove allow pragma 2024-01-08 06:48:33 +01:00
Laurent Mazare 0eb90ed783
Simpler repro for the neon optimization issue + bugfix (#1544)
* Simpler repro for the neon optimization issue.

* Bugfix for q4k.

* Improve the fix, share the dot-prod bit.

* Clippy fixes.

* Fix for q6k.

* Also fix for q2k.

* Use the new shared dotprod.

* Add more testing.
2024-01-07 20:21:49 +01:00
Ivar Flakstad 3f04a79ada Use cfg to seperate benchmark results based on features 2024-01-07 14:40:15 +01:00
Nicolas Patry b4cb982e49
Simplifying our internal cargo dependencies. (#1529) 2024-01-07 12:04:14 +01:00
Ivar Flakstad 6ebe043273 Merge branch 'main' into ivarflakstad/metal-prng 2024-01-07 11:52:03 +01:00
Ivar Flakstad 6bf52b9fdf Gaussian normal distribution of PRNG via Box-Muller transform 2024-01-07 11:39:46 +01:00
Ivar Flakstad 955e63c803 Implement hybrid Tausworthe + LCG psuedo random number generator in metal 2024-01-05 13:27:59 +01:00
Nicolas Patry fa3ea98ba9
Adding bfloat16 support for the cast kernels. (#1520) 2024-01-04 12:12:56 +01:00
Gonzalo 0a245e6fa4
Metal: support unary abs (#1503)
* Metal: support unary abs

* cargo fmt
2023-12-30 00:00:12 +01:00
Gonzalo 87d7f81b43
Metal: more u8/u32 (#1502)
* Adds more metal u8

* Metal: more u32
2023-12-29 23:56:21 +01:00
Gonzalo 4373534d59
Metal: i64 basic support (#1495)
* Adds basic metal i64 support

* metal copy i64
2023-12-29 19:42:50 +01:00
Nicolas Patry 488e02a3f6
Merge pull request #1496 from bayedieng/unary
Implement urecip op for metal backend
2023-12-29 12:20:52 +01:00
Nicolas Patry f5c98f22c7
Merge pull request #1491 from mimiquate/metal-errors
Improves metal's not implemented error messages
2023-12-29 12:03:40 +01:00
Baye Dieng cc06ba2294 fix bad pattern matching and function name 2023-12-29 09:46:24 +00:00
Baye Dieng 3922b42c18 add urecip op to metal backend 2023-12-28 21:50:12 +00:00