vllm/csrc at 7878958c0de9da8c372495dab7d4f25257894c4f - vllm

History

Woosuk Kwon 6ef00b03a2 Enable CUDA graph for GPTQ & SqueezeLLM (#2318 )		2024-01-03 09:52:29 -08:00
..
attention	[FIX] Support non-zero CUDA devices in custom kernels (#1959 )	2024-01-02 19:09:59 -08:00
quantization	Enable CUDA graph for GPTQ & SqueezeLLM (#2318 )	2024-01-03 09:52:29 -08:00
activation_kernels.cu	[FIX] Support non-zero CUDA devices in custom kernels (#1959 )	2024-01-02 19:09:59 -08:00
cache.h	Avoid multiple redefinition (#1817 )	2023-12-14 09:35:58 -08:00
cache_kernels.cu	[FIX] Support non-zero CUDA devices in custom kernels (#1959 )	2024-01-02 19:09:59 -08:00
cuda_compat.h	Merge EmbeddedLLM/vllm-rocm into vLLM main (#1836 )	2023-12-07 23:16:52 -08:00
cuda_utils.h	Avoid multiple redefinition (#1817 )	2023-12-14 09:35:58 -08:00
cuda_utils_kernels.cu	Merge EmbeddedLLM/vllm-rocm into vLLM main (#1836 )	2023-12-07 23:16:52 -08:00
dispatch_utils.h	Avoid multiple redefinition (#1817 )	2023-12-14 09:35:58 -08:00
layernorm_kernels.cu	[FIX] Support non-zero CUDA devices in custom kernels (#1959 )	2024-01-02 19:09:59 -08:00
ops.h	Add GPTQ support (#916 )	2023-12-15 03:04:22 -08:00
pos_encoding_kernels.cu	[FIX] Support non-zero CUDA devices in custom kernels (#1959 )	2024-01-02 19:09:59 -08:00
pybind.cpp	Add GPTQ support (#916 )	2023-12-15 03:04:22 -08:00
reduction_utils.cuh	Merge EmbeddedLLM/vllm-rocm into vLLM main (#1836 )	2023-12-07 23:16:52 -08:00