dash-infer

Commit Graph

Author	SHA1	Message	Date
yejunjin	5ebd2c01a3	fix: update README since we support 32k context length (#12 )	2024-06-06 19:04:19 +08:00
yejunjin	82c03732fd	fix: change to size_t to avoid overflow when seq is long (#11 )	2024-06-06 18:45:20 +08:00
laiwen	f940c2cf01	support glm-4-9b-chat (#10 )	2024-06-05 19:28:59 +08:00
Yifei LI	9d50215832	Merge pull request #8 from laiwenzh/dev examples: update qwen prompt template, add print func to examples	2024-05-30 15:24:52 +08:00
zhenglaiwen.zlw	577e95c51c	examples: update qwen prompt template, add print func to examples	2024-05-30 15:09:55 +08:00
laiwen	5657453f2c	Merge pull request #7 from leefige/fix-cache-mode fix: remove currently unsupported cache mode	2024-05-30 14:57:43 +08:00
Yifei Li	73e7f827be	fix: remove currently unsupported cache mode.	2024-05-30 11:35:58 +08:00
zhenglaiwen.zlw	1b9b010ced	support Qwen2, change dashinfer model extensions - support Qwen2, add model_type Qwen_v20 - change dashinfer model extensions (asgraph, asparam -> dimodel, ditensors) - remove xxx_quantize.json config file, use command line arg instead	2024-05-29 10:13:16 +08:00
Yingda Chen	add989c267	Update README.md	2024-05-14 16:50:16 +08:00
zhenglaiwen.zlw	9ef6e3566f	fix memory leak bug, add default config to helper, update convert_model api - bugfix - helper: check if get empty generated_elem - fix python input memory leak - avoid async copy python inputs - fix bug caused by inconsistent definition of RequestHandle - engine - worker, model: EnqueueRequest -> StartRequestImpl - generation: output token_logprobs - helper - add defualt config - add ConfigManager to merge and check user config - use torch related api only within the helper class - release torch model after conversion - examples - cpp: erase screen before get inputs - py: shutdown executor after finishing tasks - py: use jinja template to format prompt - py: update ipynb basic example and corresponding doc - doc - add model_type to root readme - update modelscope notebook pic and doc - update future plan in root readme	2024-05-13 16:12:26 +08:00
zhenglaiwen.zlw	fd2536453c	update examples, dockerfiles, test scripts - cpp example: inplace print - helper: add in/out len to Request class, add env setting before lscpu - arm dockerfile: dont specify torch version - add scripts for automatically building and testing many whl - add basic_example_qwen_v10.ipynb	2024-04-28 14:42:39 +08:00
zhenglaiwen.zlw	b8a6c3da51	add modelscope demo, update doc	2024-04-23 19:56:56 +08:00
zhenglaiwen.zlw	2355f84427	use single-numa examples by default, add EngineHelper to pkg, new dockerfiles - update examples: use single-numa by default - x86 gemm: remove meaningless if-else - add or update dockerfile: arm-centos8, arm/x86 manylinux env, arm/x86 test env - move EngineHelper class to DashInfer python pkg - update doc	2024-04-22 11:28:23 +08:00
zhenglaiwen.zlw	c6e3cb5475	update doc, dockerfiles, compilation scripts, mpirun env setting - update doc: - git lfs pull - update desc of conan pkg - add desc of model fmt - add desc of cxx11 abi option - dockerfile: - arm: update arm compiler 22.1 -> 24.04 - move model python dependencies from dockerfile to requirements.txt - conan: - remove xz_utils - arm: force build conan pkg - others: - cpp example use medium matmul precision by default - allow mpirun without plm_rsh_agent	2024-04-11 14:25:33 +08:00
laiwen	6f448735a6	Merge pull request #1 from modelscope/readme readme: typo fix and other refinements.	2024-04-09 16:57:56 +08:00
Yifei Li	51b83bbad3	readme: typo fix and other refinements.	2024-04-09 16:22:47 +08:00
Yingda Chen	e8bdf16d46	Update README.md	2024-04-08 09:40:38 +08:00
Jiejing Zhang	42f5da7dc8	readme: add pipy link.	2024-04-06 22:28:23 +08:00
Laiwen Zheng	877529e7f0	add source code	2024-04-04 21:50:11 +08:00
Yingda Chen	b4cd6cf7e2	Initial commit	2024-04-01 16:19:50 +08:00

20 Commits All Branches Search

20 Commits

All Branches