Commit Graph

20 Commits

Author SHA1 Message Date
yejunjin 5ebd2c01a3
fix: update README since we support 32k context length (#12) 2024-06-06 19:04:19 +08:00
yejunjin 82c03732fd
fix: change to size_t to avoid overflow when seq is long (#11) 2024-06-06 18:45:20 +08:00
laiwen f940c2cf01
support glm-4-9b-chat (#10) 2024-06-05 19:28:59 +08:00
Yifei LI 9d50215832
Merge pull request #8 from laiwenzh/dev
examples: update qwen prompt template, add print func to examples
2024-05-30 15:24:52 +08:00
zhenglaiwen.zlw 577e95c51c examples: update qwen prompt template, add print func to examples 2024-05-30 15:09:55 +08:00
laiwen 5657453f2c
Merge pull request #7 from leefige/fix-cache-mode
fix: remove currently unsupported cache mode
2024-05-30 14:57:43 +08:00
Yifei Li 73e7f827be
fix: remove currently unsupported cache mode. 2024-05-30 11:35:58 +08:00
zhenglaiwen.zlw 1b9b010ced support Qwen2, change dashinfer model extensions
- support Qwen2, add model_type Qwen_v20
- change dashinfer model extensions (asgraph, asparam -> dimodel, ditensors)
- remove xxx_quantize.json config file, use command line arg instead
2024-05-29 10:13:16 +08:00
Yingda Chen add989c267
Update README.md 2024-05-14 16:50:16 +08:00
zhenglaiwen.zlw 9ef6e3566f fix memory leak bug, add default config to helper, update convert_model api
- bugfix
  - helper: check if get empty generated_elem
  - fix python input memory leak
  - avoid async copy python inputs
  - fix bug caused by inconsistent definition of RequestHandle
- engine
  - worker, model: EnqueueRequest -> StartRequestImpl
  - generation: output token_logprobs
- helper
  - add defualt config
  - add ConfigManager to merge and check user config
  - use torch related api only within the helper class
  - release torch model after conversion
- examples
  - cpp: erase screen before get inputs
  - py: shutdown executor after finishing tasks
  - py: use jinja template to format prompt
  - py: update ipynb basic example and corresponding doc
- doc
  - add model_type to root readme
  - update modelscope notebook pic and doc
  - update future plan in root readme
2024-05-13 16:12:26 +08:00
zhenglaiwen.zlw fd2536453c update examples, dockerfiles, test scripts
- cpp example: inplace print
- helper: add in/out len to Request class, add env setting before lscpu
- arm dockerfile: dont specify torch version
- add scripts for automatically building and testing many whl
- add basic_example_qwen_v10.ipynb
2024-04-28 14:42:39 +08:00
zhenglaiwen.zlw b8a6c3da51 add modelscope demo, update doc 2024-04-23 19:56:56 +08:00
zhenglaiwen.zlw 2355f84427 use single-numa examples by default, add EngineHelper to pkg, new dockerfiles
- update examples: use single-numa by default
    - x86 gemm: remove meaningless if-else
    - add or update dockerfile: arm-centos8, arm/x86 manylinux env, arm/x86 test env
    - move EngineHelper class to DashInfer python pkg
    - update doc
2024-04-22 11:28:23 +08:00
zhenglaiwen.zlw c6e3cb5475 update doc, dockerfiles, compilation scripts, mpirun env setting
- update doc:
  - git lfs pull
  - update desc of conan pkg
  - add desc of model fmt
  - add desc of cxx11 abi option
- dockerfile:
  - arm: update arm compiler 22.1 -> 24.04
  - move model python dependencies from dockerfile to requirements.txt
- conan:
  - remove xz_utils
  - arm: force build conan pkg
- others:
  - cpp example use medium matmul precision by default
  - allow mpirun without plm_rsh_agent
2024-04-11 14:25:33 +08:00
laiwen 6f448735a6
Merge pull request #1 from modelscope/readme
readme: typo fix and other refinements.
2024-04-09 16:57:56 +08:00
Yifei Li 51b83bbad3
readme: typo fix and other refinements. 2024-04-09 16:22:47 +08:00
Yingda Chen e8bdf16d46
Update README.md 2024-04-08 09:40:38 +08:00
Jiejing Zhang 42f5da7dc8 readme: add pipy link. 2024-04-06 22:28:23 +08:00
Laiwen Zheng 877529e7f0 add source code 2024-04-04 21:50:11 +08:00
Yingda Chen b4cd6cf7e2
Initial commit 2024-04-01 16:19:50 +08:00