[Model] tool calling support for ibm-granite/granite-20b-functioncalling (#8339)

Signed-off-by: Max de Bayser <mbayser@br.ibm.com>
Co-authored-by: Max de Bayser <mbayser@br.ibm.com>
Co-authored-by: Maximilien de Bayser <maxdebayser@gmail.com>
This commit is contained in:
Will Eaton 2024-10-29 18:07:37 -04:00 committed by GitHub
parent 67bdf8e523
commit 882a1ad0de
No known key found for this signature in database
GPG Key ID: B5690EEEBB952194
7 changed files with 456 additions and 28 deletions

View File

@ -185,7 +185,9 @@ from HuggingFace; and you can find an example of this in a `tokenizer_config.jso
If your favorite tool-calling model is not supported, please feel free to contribute a parser & tool use chat template!
#### Hermes Models (`hermes`)
All Nous Research Hermes-series models newer than Hermes 2 Pro should be supported.
* `NousResearch/Hermes-2-Pro-*`
* `NousResearch/Hermes-2-Theta-*`
@ -197,7 +199,9 @@ step in their creation_.
Flags: `--tool-call-parser hermes`
#### Mistral Models (`mistral`)
Supported models:
* `mistralai/Mistral-7B-Instruct-v0.3` (confirmed)
* Additional mistral function-calling models are compatible as well.
@ -216,7 +220,9 @@ when tools are provided, that results in much better reliability when working wi
Recommended flags: `--tool-call-parser mistral --chat-template examples/tool_chat_template_mistral_parallel.jinja`
#### Llama Models (`llama3_json`)
Supported models:
* `meta-llama/Meta-Llama-3.1-8B-Instruct`
* `meta-llama/Meta-Llama-3.1-70B-Instruct`
@ -236,7 +242,9 @@ it works better with vLLM.
Recommended flags: `--tool-call-parser llama3_json --chat-template examples/tool_chat_template_llama3_json.jinja`
#### InternLM Models (`internlm`)
Supported models:
* `internlm/internlm2_5-7b-chat` (confirmed)
* Additional internlm2.5 function-calling models are compatible as well
@ -246,6 +254,7 @@ Known issues:
Recommended flags: `--tool-call-parser internlm --chat-template examples/tool_chat_template_internlm2_tool.jinja`
#### Jamba Models (`jamba`)
AI21's Jamba-1.5 models are supported.
* `ai21labs/AI21-Jamba-1.5-Mini`
@ -255,6 +264,16 @@ AI21's Jamba-1.5 models are supported.
Flags: `--tool-call-parser jamba`
#### IBM Granite (`granite-20b-fc`)
Supported models:
* `ibm-granite/granite-20b-functioncalling`
Flags: `--tool-call-parser granite-20b-fc --chat-template examples/tool_chat_template_granite_20b_fc.jinja`
The example chat template deviates slightly from the original on Huggingface, which is not vLLM compatible. It blends function description elements from the Hermes template and follows the same system prompt as "Response Generation" mode from [the paper](https://arxiv.org/abs/2407.00121). Parallel function calls are supported.
### How to write a tool parser plugin
A tool parser plugin is a Python file containing one or more ToolParser implementations. You can write a ToolParser similar to the `Hermes2ProToolParser` in vllm/entrypoints/openai/tool_parsers/hermes_tool_parser.py.
@ -312,5 +331,5 @@ Then you can use this plugin in the command line like this.
--tool-parser-plugin <absolute path of the plugin file>
--tool-call-parser example \
--chat-template <your chat template> \
```
```

View File

@ -0,0 +1,130 @@
{%- macro json_to_python_type(json_spec) %}
{%- set basic_type_map = {
"string": "str",
"number": "float",
"integer": "int",
"boolean": "bool"
} %}
{%- if basic_type_map[json_spec.type] is defined %}
{{- basic_type_map[json_spec.type] }}
{%- elif json_spec.type == "array" %}
{{- "list[" + json_to_python_type(json_spec|items) + "]" }}
{%- elif json_spec.type == "object" %}
{%- if json_spec.additionalProperties is defined %}
{{- "dict[str, " + json_to_python_type(json_spec.additionalProperties) + ']' }}
{%- else %}
{{- "dict" }}
{%- endif %}
{%- elif json_spec.type is iterable %}
{{- "Union[" }}
{%- for t in json_spec.type %}
{{- json_to_python_type({"type": t}) }}
{%- if not loop.last %}
{{- "," }}
{%- endif %}
{%- endfor %}
{{- "]" }}
{%- else %}
{{- "Any" }}
{%- endif %}
{%- endmacro %}
{%- if not full_function_description is defined %}
{%- set full_function_description = false %}
{%- endif %}
{%- macro full_description(tool) %}
{{- tool.name + '(' }}
{%- if tool.parameters is defined %}
{%- for param_name, param_fields in tool.parameters.properties|items %}
{{- param_name + ": " + json_to_python_type(param_fields) }}
{%- if not loop.last %}
{{- ", " }}
{%- endif %}
{%- endfor %}
{%- endif %}
{{- ")" }}
{%- if tool.return is defined %}
{{- " -> " + json_to_python_type(tool.return) }}
{%- endif %}
{{- " - " + tool.description + "\n\n" }}
{%- if tool.parameters is defined %}
{%- for param_name, param_fields in tool.parameters.properties|items %}
{%- if loop.first %}
{{- " Args:\n" }}
{%- endif %}
{{- " " + param_name + "(" + json_to_python_type(param_fields) + "): " + param_fields.description|trim }}
{%- endfor %}
{%- endif %}
{%- if tool.return is defined and tool.return.description is defined %}
{{- "\n Returns:\n " + tool.return.description }}
{%- endif %}
{{- '"' }}
{%- endmacro %}
{%- macro simple_description(tool) %}
{{- tool.description }}
{%- endmacro %}
{%- macro function_description(tool) %}
{%- if full_function_description %}
{{- full_description(tool) }}
{%- else %}
{{- simple_description(tool) }}
{%- endif %}
{%- endmacro %}
{%- if messages[0]["role"] == "system" %}
{%- set sys_prompt = messages[0]["content"] %}
{%- set loop_messages = messages[1:] %}
{%- else %}
{%- set loop_messages = messages %}
{% set sys_prompt = 'You are a helpful assistant with access to the following function calls. Your task is to understand the given conversation with function calls and responses and generate natural language response as the ASSISTANT to continue the conversation. You may use the following function calls to understand how to respond to the user query.' %}
{%- endif %}
{{ 'SYSTEM: ' + sys_prompt }}
{% if tools is iterable and tools | length > 0 %}
<|function_call_library|>
{%- for tool in tools %}
{%- if tool.function is defined %}
{%- set tool = tool.function %}
{%- endif %}
{{- '{"name": "' + tool.name + '", ' }}
{{- '"description": "' + function_description(tool) }}
{{- ', "parameters": ' }}
{%- if not tool.parameters is defined or tool.parameters.properties | length == 0 %}
{{- "{}" }}
{%- else %}
{{- tool.parameters|tojson }}
{%- endif %}
{{- "}" }}
{%- if not loop.last %}
{{- "\n" }}
{%- endif %}
{%- endfor %}
If none of the functions are relevant or the given question lacks the parameters required by the function, please output \"<function_call> {\"name\": \"no_function\", \"arguments\": {}}\".
{%- endif %}
{% for message in messages %}
{% if message['role'] == 'user' %}
{{- '\nUSER: ' + message['content'] }}
{% elif message['role'] == 'assistant' and message.tool_calls is defined %}
{{- '\nASSISTANT:' }}
{% for tc in message.tool_calls %}
{{- '<function_call> ' + {'name': tc.function.name, 'arguments': tc.function.arguments}|tojson }}
{% endfor %}
{{- '<|endoftext|>' }}
{% elif message['role'] == 'assistant' %}
{{- '\nASSISTANT: ' + message['content'] + ' <|endoftext|>' }}
{% elif message['role'] == 'tool' %}
{{- '<function_response> ' + message['content'] }}
{%- else %}
{{- raise_exception("Unexpected combination of role and message content") }}
{% endif %}
{% if loop.last and add_generation_prompt %}
{{- '\nASSISTANT: ' }}
{% endif %}
{% endfor %}

View File

@ -88,6 +88,18 @@ CONFIGS: Dict[str, ServerConfig] = {
"without calling a tool. DO NOT CALL A TOOL THAT IS IRRELEVANT "
"to the user's question - just respond to it normally."
},
## FIXME: temporary disabled due to lack of hardware specification
## for individual runs
#"granite20b": {
# "model":
# "ibm-granite/granite-20b-functioncalling",
# "arguments": [
# "--tool-call-parser", "granite-20b-fc", "--chat-template",
# str(VLLM_PATH / "examples/tool_chat_template_granite_20b_fc.jinja")
# ],
# "supports_parallel":
# False,
#},
"internlm": {
"model":
"internlm/internlm2_5-7b-chat",

View File

@ -1,4 +1,5 @@
from .abstract_tool_parser import ToolParser, ToolParserManager
from .granite_20b_fc_tool_parser import Granite20bFCToolParser
from .hermes_tool_parser import Hermes2ProToolParser
from .internlm2_tool_parser import Internlm2ToolParser
from .jamba_tool_parser import JambaToolParser
@ -6,7 +7,7 @@ from .llama_tool_parser import Llama3JsonToolParser
from .mistral_tool_parser import MistralToolParser
__all__ = [
"ToolParser", "ToolParserManager", "Hermes2ProToolParser",
"MistralToolParser", "Internlm2ToolParser", "Llama3JsonToolParser",
"JambaToolParser"
"ToolParser", "ToolParserManager", "Granite20bFCToolParser",
"Hermes2ProToolParser", "MistralToolParser", "Internlm2ToolParser",
"Llama3JsonToolParser", "JambaToolParser"
]

View File

@ -0,0 +1,251 @@
import json
import re
from json import JSONDecoder
from typing import Dict, Sequence, Union
import partial_json_parser
from partial_json_parser.core.options import Allow
from vllm.entrypoints.openai.protocol import (ChatCompletionRequest,
DeltaFunctionCall, DeltaMessage,
DeltaToolCall,
ExtractedToolCallInformation,
FunctionCall, ToolCall)
from vllm.entrypoints.openai.tool_parsers.abstract_tool_parser import (
ToolParser, ToolParserManager)
from vllm.entrypoints.openai.tool_parsers.utils import (consume_space,
find_common_prefix,
is_complete_json,
partial_json_loads)
from vllm.logger import init_logger
from vllm.transformers_utils.tokenizer import AnyTokenizer
from vllm.utils import random_uuid
logger = init_logger(__name__)
@ToolParserManager.register_module("granite-20b-fc")
class Granite20bFCToolParser(ToolParser):
"""
Tool call parser for the granite-20b-functioncalling model intended
for use with the examples/tool_chat_template_granite20b_fc.jinja
template.
Used when --enable-auto-tool-choice --tool-call-parser granite-20-fc
are all set
"""
def __init__(self, tokenizer: AnyTokenizer):
super().__init__(tokenizer)
self.bot_token = "<function_call>"
self.tool_start_token = self.bot_token
self.tool_call_regex = re.compile(r"<function_call>\s*")
def extract_tool_calls(
self, model_output: str,
request: ChatCompletionRequest) -> ExtractedToolCallInformation:
if self.tool_start_token not in model_output:
return ExtractedToolCallInformation(tools_called=False,
tool_calls=[],
content=model_output)
dec = JSONDecoder()
try:
matches = list(self.tool_call_regex.finditer(model_output))
logger.debug("Found %d tool call matches", len(matches))
raw_function_calls = []
for i, match in enumerate(matches):
# position after the <function_call> tag
start_of_json = match.end()
# end_index == the start of the next function call
# (if exists)
next_function_call_start = (matches[i + 1].start()
if i + 1 < len(matches) else None)
raw_function_calls.append(
dec.raw_decode(
model_output[start_of_json:next_function_call_start])
[0])
logger.debug("Extracted %d tool calls", len(raw_function_calls))
tool_calls = [
ToolCall(
type="function",
function=FunctionCall(
name=function_call["name"],
# function call args are JSON but as a string
arguments=json.dumps(function_call["arguments"]),
),
) for function_call in raw_function_calls
]
content = model_output[:model_output.find(self.bot_token)]
return ExtractedToolCallInformation(
tools_called=True,
tool_calls=tool_calls,
content=content if content else None,
)
except Exception as e:
logger.error("Error in extracting tool call from response %s", e)
return ExtractedToolCallInformation(tools_called=False,
tool_calls=[],
content=model_output)
def extract_tool_calls_streaming(
self,
previous_text: str,
current_text: str,
delta_text: str,
previous_token_ids: Sequence[int],
current_token_ids: Sequence[int],
delta_token_ids: Sequence[int],
request: ChatCompletionRequest,
) -> Union[DeltaMessage, None]:
if len(current_text) < len(
self.bot_token) and self.bot_token.startswith(current_text):
return None
if not current_text.startswith(self.bot_token):
return DeltaMessage(content=delta_text)
# bit mask flags for partial JSON parsing. If the name hasn't been
# sent yet, don't allow sending
# an incomplete string since OpenAI only ever (as far as I have
# seen) allows sending the entire tool/ function name at once.
flags = Allow.ALL if self.current_tool_name_sent \
else Allow.ALL & ~Allow.STR
try:
tool_call_arr = []
is_complete = []
try:
start_idx = len(self.bot_token)
start_idx = consume_space(start_idx, current_text)
while start_idx < len(current_text):
(obj,
end_idx) = partial_json_loads(current_text[start_idx:],
flags)
is_complete.append(
is_complete_json(current_text[start_idx:start_idx +
end_idx]))
start_idx += end_idx
start_idx = consume_space(start_idx, current_text)
start_idx += len(self.bot_token)
start_idx = consume_space(start_idx, current_text)
tool_call_arr.append(obj)
except partial_json_parser.core.exceptions.MalformedJSON:
logger.debug('not enough tokens to parse into JSON yet')
return None
# select as the current tool call the one we're on the state at
current_tool_call: Dict = tool_call_arr[self.current_tool_id] \
if len(tool_call_arr) > 0 else {}
# case -- if no tokens have been streamed for the tool, e.g.
# only the array brackets, stream nothing
if len(tool_call_arr) == 0:
return None
# case: we are starting a new tool in the array
# -> array has > 0 length AND length has moved past cursor
elif (len(tool_call_arr) > 0
and len(tool_call_arr) > self.current_tool_id + 1):
# if we're moving on to a new call, first make sure we
# haven't missed anything in the previous one that was
# auto-generated due to JSON completions, but wasn't
# streamed to the client yet.
if self.current_tool_id >= 0:
cur_arguments = current_tool_call.get("arguments")
if cur_arguments:
cur_args_json = json.dumps(cur_arguments)
sent = len(
self.streamed_args_for_tool[self.current_tool_id])
argument_diff = cur_args_json[sent:]
logger.debug("got arguments diff: %s", argument_diff)
delta = DeltaMessage(tool_calls=[
DeltaToolCall(index=self.current_tool_id,
function=DeltaFunctionCall(
arguments=argument_diff).
model_dump(exclude_none=True))
])
self.streamed_args_for_tool[
self.current_tool_id] += argument_diff
else:
delta = None
else:
delta = None
# re-set stuff pertaining to progress in the current tool
self.current_tool_id = len(tool_call_arr) - 1
self.current_tool_name_sent = False
self.streamed_args_for_tool.append("")
logger.debug("starting on new tool %d", self.current_tool_id)
return delta
# if the current tool name hasn't been sent, send if available
# - otherwise send nothing
elif not self.current_tool_name_sent:
function_name = current_tool_call.get("name")
if function_name:
delta = DeltaMessage(tool_calls=[
DeltaToolCall(index=self.current_tool_id,
type="function",
id=f"chatcmpl-tool-{random_uuid()}",
function=DeltaFunctionCall(
name=function_name).model_dump(
exclude_none=True))
])
self.current_tool_name_sent = True
else:
delta = None
# now we know we're on the same tool call and we're streaming
# arguments
else:
cur_arguments = current_tool_call.get("arguments")
delta = None
if cur_arguments:
sent = len(
self.streamed_args_for_tool[self.current_tool_id])
cur_args_json = json.dumps(cur_arguments)
prev_arguments = self.prev_tool_call_arr[
self.current_tool_id].get("arguments")
argument_diff = None
if is_complete[self.current_tool_id]:
argument_diff = cur_args_json[sent:]
elif prev_arguments:
prev_args_json = json.dumps(prev_arguments)
if cur_args_json != prev_args_json:
prefix = find_common_prefix(
prev_args_json, cur_args_json)
argument_diff = prefix[sent:]
if argument_diff is not None:
delta = DeltaMessage(tool_calls=[
DeltaToolCall(index=self.current_tool_id,
function=DeltaFunctionCall(
arguments=argument_diff).
model_dump(exclude_none=True))
])
self.streamed_args_for_tool[
self.current_tool_id] += argument_diff
self.prev_tool_call_arr = tool_call_arr
return delta
except Exception as e:
logger.error("Error trying to handle streaming tool call: %s", e)
logger.debug(
"Skipping chunk as a result of tool streaming extraction "
"error")
return None

View File

@ -1,6 +1,6 @@
import json
import re
from json import JSONDecodeError, JSONDecoder
from json import JSONDecoder
from typing import Dict, List, Sequence, Union
import partial_json_parser
@ -14,34 +14,15 @@ from vllm.entrypoints.openai.protocol import (ChatCompletionRequest,
FunctionCall, ToolCall)
from vllm.entrypoints.openai.tool_parsers.abstract_tool_parser import (
ToolParser, ToolParserManager)
from vllm.entrypoints.openai.tool_parsers.utils import find_common_prefix
from vllm.entrypoints.openai.tool_parsers.utils import (find_common_prefix,
is_complete_json,
partial_json_loads)
from vllm.logger import init_logger
from vllm.utils import random_uuid
logger = init_logger(__name__)
# partial_json_parser doesn't support extra data and
# JSONDecorder.raw_decode doesn't support partial JSON
def partial_json_loads(input_str, flags):
try:
return (partial_json_parser.loads(input_str, flags), len(input_str))
except JSONDecodeError as e:
if "Extra data" in e.msg:
dec = JSONDecoder()
return dec.raw_decode(input_str)
else:
raise
def is_complete_json(input_str):
try:
json.loads(input_str)
return True
except JSONDecodeError:
return False
@ToolParserManager.register_module("llama3_json")
class Llama3JsonToolParser(ToolParser):
"""

View File

@ -1,3 +1,11 @@
import json
from json import JSONDecodeError, JSONDecoder
from typing import Any, List, Tuple
import partial_json_parser
from partial_json_parser.core.options import Allow
def find_common_prefix(s1: str, s2: str) -> str:
"""
Finds a common prefix that is shared between two strings, if there is one.
@ -72,7 +80,7 @@ def extract_intermediate_diff(curr: str, old: str) -> str:
return diff
def find_all_indices(string, substring):
def find_all_indices(string: str, substring: str) -> List[int]:
"""
Find all (starting) indices of a substring in a given string. Useful for
tool call extraction
@ -85,3 +93,29 @@ def find_all_indices(string, substring):
break
indices.append(index)
return indices
# partial_json_parser doesn't support extra data and
# JSONDecorder.raw_decode doesn't support partial JSON
def partial_json_loads(input_str: str, flags: Allow) -> Tuple[Any, int]:
try:
return (partial_json_parser.loads(input_str, flags), len(input_str))
except JSONDecodeError as e:
if "Extra data" in e.msg:
dec = JSONDecoder()
return dec.raw_decode(input_str)
raise
def is_complete_json(input_str: str) -> bool:
try:
json.loads(input_str)
return True
except JSONDecodeError:
return False
def consume_space(i: int, s: str) -> int:
while i < len(s) and s[i].isspace():
i += 1
return i