- bugfix
- helper: check if get empty generated_elem
- fix python input memory leak
- avoid async copy python inputs
- fix bug caused by inconsistent definition of RequestHandle
- engine
- worker, model: EnqueueRequest -> StartRequestImpl
- generation: output token_logprobs
- helper
- add defualt config
- add ConfigManager to merge and check user config
- use torch related api only within the helper class
- release torch model after conversion
- examples
- cpp: erase screen before get inputs
- py: shutdown executor after finishing tasks
- py: use jinja template to format prompt
- py: update ipynb basic example and corresponding doc
- doc
- add model_type to root readme
- update modelscope notebook pic and doc
- update future plan in root readme
- cpp example: inplace print
- helper: add in/out len to Request class, add env setting before lscpu
- arm dockerfile: dont specify torch version
- add scripts for automatically building and testing many whl
- add basic_example_qwen_v10.ipynb
- update doc:
- git lfs pull
- update desc of conan pkg
- add desc of model fmt
- add desc of cxx11 abi option
- dockerfile:
- arm: update arm compiler 22.1 -> 24.04
- move model python dependencies from dockerfile to requirements.txt
- conan:
- remove xz_utils
- arm: force build conan pkg
- others:
- cpp example use medium matmul precision by default
- allow mpirun without plm_rsh_agent