Default Branch

d01207dbf3 · Add a RotatingKVCache. (#2493) · Updated 2024-09-23 19:14:32 +08:00

Branches

fd3b53f48b · Fix for the quantized model. · Updated 2024-09-25 18:34:46 +08:00

0
5

a0ebe70429 · Deploy d01207dbf3 to gh-pages · Updated 2024-09-23 19:14:44 +08:00

2165
1

42c702a023 · Update cudarc to 0.12.1. · Updated 2024-09-23 02:16:57 +08:00

3
11

7ec4f64d38 · Attempt at fixing M1/M2 metal async copy bug · Updated 2024-09-06 21:59:35 +08:00

51
10

afa1ea118c · Merge branch 'main' into ivarflakstad/metal-reduce-3 · Updated 2024-09-03 02:47:43 +08:00

19
29

9105aa4390 · batched gemm work · Updated 2024-07-27 00:53:58 +08:00

84
5

56a1b7d97e · Apply rustfmt. · Updated 2024-06-05 04:47:20 +08:00

104
18

84cd5158ad · Update gemm requirement from 0.17.0 to 0.18.0 · Updated 2024-06-01 14:19:34 +08:00

110
1

a394dfe4c1 · Update imageproc requirement from 0.24.0 to 0.25.0 · Updated 2024-05-22 03:49:19 +08:00

123
1

567247fdcf · Update metal requirement from 0.27.0 to 0.28.0 · Updated 2024-05-22 03:45:53 +08:00

124
1

f7980abbcd · Improve the sampling methods. · Updated 2024-05-04 16:53:30 +08:00

138
1

6d6d87f8b3 · Use BF16 for llama v3 by default. · Updated 2024-04-19 20:22:01 +08:00

176
1

3754b834f4 · More prep work for phi. · Updated 2024-04-17 16:23:15 +08:00

183
3

6e92129f54 · Add missing bfloat unary strided kernels · Updated 2024-04-11 22:20:45 +08:00

204
3

33c9b66554 · Add the new gemma models. (#2023) · Updated 2024-04-07 03:25:38 +08:00

211
0
Included

09fafcfa99 · Copy multi metal [do not merge] · Updated 2024-04-06 16:11:16 +08:00

214
1

8c0db87992 · Avoid using the attn mask when not necessary. · Updated 2024-03-25 01:55:56 +08:00

277
0
Included

5ac3302fac · Prebuild all our kernels. · Updated 2024-03-18 23:39:38 +08:00

386
1

53f951f6e2 · Merge remote-tracking branch 'origin/main' into cuda-conv-tr1d · Updated 2024-03-18 04:17:56 +08:00

313
6

101a4c8389 · Moondream first bits. · Updated 2024-03-18 00:49:56 +08:00

315
1