vllm/examples at 1856aff4d66833b258ce64132413ab8a18cc18a6 - vllm

Peter Salas 57792ed469 [Doc] Fix incorrect docs from #7615 (#7788 )	2024-08-22 10:02:06 -07:00
..
fp8	Update README.md (#6847 )	2024-07-27 00:26:45 +00:00
production_monitoring	[CI/Build] Pin OpenTelemetry versions and make errors clearer (#7266 )	2024-08-20 10:02:21 -07:00
api_client.py	[bugfix] make args.stream work (#6831 )	2024-07-27 09:07:02 +00:00
aqlm_example.py	[Frontend] Add FlexibleArgumentParser to support both underscore and dash in names (#5718 )	2024-06-20 17:00:13 -06:00
cpu_offload.py	[core][model] yet another cpu offload implementation (#6496 )	2024-07-17 20:54:35 -07:00
gguf_inference.py	[Core] Support loading GGUF model (#5191 )	2024-08-05 17:54:23 -06:00
gradio_openai_chatbot_webserver.py	[CI] Try introducing isort. (#3495 )	2024-03-25 07:59:47 -07:00
gradio_webserver.py	Remove deprecated parameter: concurrency_count (#2315 )	2024-01-03 09:56:21 -08:00
llm_engine_example.py	[Frontend] Add FlexibleArgumentParser to support both underscore and dash in names (#5718 )	2024-06-20 17:00:13 -06:00
logging_configuration.md	[Doc][CI/Build] Update docs and tests to use `vllm serve` (#6431 )	2024-07-17 07:43:21 +00:00
lora_with_quantization_inference.py	[Feature][Kernel] Support bitsandbytes quantization and QLoRA (#4776 )	2024-06-01 14:51:10 -06:00
multilora_inference.py	[CI] Try introducing isort. (#3495 )	2024-03-25 07:59:47 -07:00
offline_inference.py	[Quality] Add code formatter and linter (#326 )	2023-07-03 11:31:55 -07:00
offline_inference_arctic.py	[Model] Snowflake arctic model implementation (#4652 )	2024-05-09 22:37:14 +00:00
offline_inference_audio_language.py	[Doc] Fix incorrect docs from #7615 (#7788 )	2024-08-22 10:02:06 -07:00
offline_inference_chat.py	Chat method for offline llm (#5049 )	2024-08-15 19:41:34 -07:00
offline_inference_distributed.py	[mypy] Enable type checking for test directory (#5017 )	2024-06-15 04:45:31 +00:00
offline_inference_embedding.py	[Model][Misc] Add e5-mistral-7b-instruct and Embedding API (#3734 )	2024-05-11 11:30:37 -07:00
offline_inference_encoder_decoder.py	[Core] Support serving encoder/decoder models (#7258 )	2024-08-09 10:39:41 +08:00
offline_inference_mlpspeculator.py	[BugFix] Fix cuda graph for MLPSpeculator (#5875 )	2024-06-27 04:12:10 +00:00
offline_inference_neuron.py	Unmark more files as executable (#5962 )	2024-06-28 17:34:56 -04:00
offline_inference_openai.md	[Frontend] Support embeddings in the run_batch API (#7132 )	2024-08-09 09:48:21 -07:00
offline_inference_tpu.py	[CI/Build][TPU] Add TPU CI test (#6277 )	2024-07-15 14:31:16 -07:00
offline_inference_vision_language.py	[VLM][Doc] Add `stop_token_ids` to InternVL example (#7354 )	2024-08-09 14:51:04 +00:00
offline_inference_with_prefix.py	[Bugfix] Add warmup for prefix caching example (#5235 )	2024-06-03 19:36:41 -07:00
openai_audio_api_client.py	[Model] Add UltravoxModel and UltravoxConfig (#7615 )	2024-08-21 22:49:39 +00:00
openai_chat_completion_client.py	Add example scripts to documentation (#4225 )	2024-04-22 16:36:54 +00:00
openai_completion_client.py	lint: format all python file instead of just source code (#2567 )	2024-01-23 15:53:06 -08:00
openai_embedding_client.py	[Bugfix] Fix encoding_format in examples/openai_embedding_client.py (#6755 )	2024-07-24 22:48:07 -07:00
openai_example_batch.jsonl	[docs] Fix typo in examples filename openi -> openai (#4864 )	2024-05-17 00:42:17 +09:00
openai_vision_api_client.py	[Model] Initialize support for InternVL2 series models (#6514 )	2024-07-29 10:16:30 +00:00
run_cluster.sh	[doc][distributed] doc for setting up multi-node environment (#6529 )	2024-07-22 21:22:09 -07:00
save_sharded_state.py	[Frontend] Add FlexibleArgumentParser to support both underscore and dash in names (#5718 )	2024-06-20 17:00:13 -06:00
template_alpaca.jinja	Support chat template and `echo` for chat API (#1756 )	2023-11-30 16:43:13 -08:00
template_baichuan.jinja	Fix Baichuan chat template (#3340 )	2024-03-15 21:02:12 -07:00
template_blip2.jinja	[Model] Initial support for BLIP-2 (#5920 )	2024-07-27 11:53:07 +00:00
template_chatglm.jinja	Add chat templates for ChatGLM (#3418 )	2024-03-14 23:19:22 -07:00
template_chatglm2.jinja	Add chat templates for ChatGLM (#3418 )	2024-03-14 23:19:22 -07:00
template_chatml.jinja	Support chat template and `echo` for chat API (#1756 )	2023-11-30 16:43:13 -08:00
template_falcon.jinja	Add chat templates for Falcon (#3420 )	2024-03-14 23:19:02 -07:00
template_falcon_180b.jinja	Add chat templates for Falcon (#3420 )	2024-03-14 23:19:02 -07:00
template_inkbot.jinja	Support chat template and `echo` for chat API (#1756 )	2023-11-30 16:43:13 -08:00
template_llava.jinja	[Frontend] Add OpenAI Vision API Support (#5237 )	2024-06-07 11:23:32 -07:00
tensorize_vllm_model.py	[Frontend] Add FlexibleArgumentParser to support both underscore and dash in names (#5718 )	2024-06-20 17:00:13 -06:00