Skip to content

Latest commit

 

History

History
163 lines (134 loc) · 147 KB

File metadata and controls

163 lines (134 loc) · 147 KB

ToolAssistedChat

(serverless.tool_assisted_chat)

Overview

Available Operations

  • complete - Tool assisted chat completions
  • stream - Stream tool assisted chat completions

complete

Given a list of messages forming a conversation, the model generates a response. Additionally, the model can utilize built-in tools for tool calls, enhancing its capability to provide more comprehensive and actionable responses.

Example Usage

import os

from friendli import SyncFriendli

with SyncFriendli(
    token=os.getenv("FRIENDLI_TOKEN", ""),
) as friendli:
    res = friendli.serverless.tool_assisted_chat.complete(
        messages=[
            {
                "content": "What is 3 + 6?",
                "role": "user",
            },
        ],
        model="meta-llama-3.1-8b-instruct",
        max_tokens=200,
        stream=False,
        tools=[
            {
                "type": "math:calculator",
            },
        ],
    )

    # Handle response
    print(res)

Parameters

Parameter Type Required Description Example
messages List[models.Message] ✔️ A list of messages comprising the conversation so far. [
{
"content": "You are a helpful assistant.",
"role": "system"
},
{
"content": "Hello!",
"role": "user"
}
]
model str ✔️ Code of the model to use. See available model list. meta-llama-3.1-8b-instruct
x_friendli_team OptionalNullable[str] ID of team to run requests as (optional parameter).
chat_template_kwargs Dict[str, Any] Additional keyword arguments supplied to the template renderer. These parameters will be available for use within the chat template.
eos_token List[int] A list of endpoint sentence tokens.
frequency_penalty OptionalNullable[float] Number between -2.0 and 2.0. Positive values penalizes tokens that have been sampled, taking into account their frequency in the preceding text. This penalization diminishes the model's tendency to reproduce identical lines verbatim.
logit_bias Dict[str, Any] Accepts a JSON object that maps tokens to an associated bias value. Mathematically, the bias is added to the logits generated by the model prior to sampling. The exact effect will vary per model.
logprobs OptionalNullable[bool] Whether to return log probabilities of the output tokens or not.
max_tokens OptionalNullable[int] The maximum number of tokens to generate. For decoder-only models like GPT, the length of your input tokens plus max_tokens should not exceed the model's maximum length (e.g., 2048 for OpenAI GPT-3). For encoder-decoder models like T5 or BlenderBot, max_tokens should not exceed the model's maximum output length. This is similar to Hugging Face's max_new_tokens argument. 200
min_p OptionalNullable[float] A scaling factor used to determine the minimum token probability threshold. This threshold is calculated as min_p multiplied by the probability of the most likely token. Tokens with probabilities below this scaled threshold are excluded from sampling. Values range from 0.0 (inclusive) to 1.0 (inclusive). Higher values result in stricter filtering, while lower values allow for greater diversity. The default value of 0.0 disables filtering, allowing all tokens to be considered for sampling.
n OptionalNullable[int] The number of independently generated results for the prompt. Defaults to 1. This is similar to Hugging Face's num_return_sequences argument.
parallel_tool_calls OptionalNullable[bool] Whether to enable parallel function calling.
presence_penalty OptionalNullable[float] Number between -2.0 and 2.0. Positive values penalizes tokens that have been sampled at least once in the existing text.
repetition_penalty OptionalNullable[float] Penalizes tokens that have already appeared in the generated result (plus the input tokens for decoder-only models). Should be positive value (1.0 means no penalty). See keskar et al., 2019 for more details. This is similar to Hugging Face's repetition_penalty argument.
resume_generation Optional[bool] Enable to continue text generation even after an error occurs during a tool call.

Note that enabling this option may use more tokens, as the system generates additional content to handle errors gracefully.
However, if the system fails more than 8 times, the generation will stop regardless.

Tip
This is useful in scenarios where you want to maintain text generation flow despite errors, such as when generating long-form content.
The user will not be interrupted by tool call issues, ensuring a smoother experience.
seed OptionalNullable[models.ServerlessToolAssistedChatCompletionBodySeed] Seed to control random procedure. If nothing is given, random seed is used for sampling, and return the seed along with the generated result. When using the n argument, you can pass a list of seed values to control all of the independent generations.
stop List[str] When one of the stop phrases appears in the generation result, the API will stop generation. The stop phrases are excluded from the result. Defaults to empty list.
stream Optional[bool] Whether to stream generation result. When set true, each token will be sent as server-sent events once generated.
stream_options OptionalNullable[models.StreamOptions] Options related to stream.
It can only be used when stream: true.
temperature OptionalNullable[float] Sampling temperature. Smaller temperature makes the generation result closer to greedy, argmax (i.e., top_k = 1) sampling. Defaults to 1.0. This is similar to Hugging Face's temperature argument.
tool_choice Optional[models.ServerlessToolAssistedChatCompletionBodyToolChoice] Determines the tool calling behavior of the model.
When set to none, the model will bypass tool execution and generate a response directly.
In auto mode (the default), the model dynamically decides whether to call a tool or respond with a message.
Alternatively, setting required ensures that the model invokes at least one tool before responding to the user.
You can also specify a particular tool by {"type": "function", "function": {"name": "my_function"}}.
tools List[models.ToolAssistedChatTool] A list of tools the model may call.
A maximum of 128 functions is supported.
Use this to provide a list of functions the model may generate JSON inputs for.
For more detailed information about each tool, please refer here.
top_k OptionalNullable[int] Limits sampling to the top k tokens with the highest probabilities. Values range from 0 (no filtering) to the model's vocabulary size (inclusive). The default value of 0 applies no filtering, allowing all tokens.
top_logprobs OptionalNullable[int] The number of most likely tokens to return at each token position, each with an associated log probability. logprobs must be set to true if this parameter is used.
top_p OptionalNullable[float] Keeps only the smallest set of tokens whose cumulative probabilities reach top_p or higher. Values range from 0.0 (exclusive) to 1.0 (inclusive). The default value of 1.0 includes all tokens, allowing maximum diversity.
retries Optional[utils.RetryConfig] Configuration to override the default retry behavior of the client.

Response

models.ContainerChatCompleteSuccess

Errors

Error Type Status Code Content Type
models.SDKError 4XX, 5XX */*

stream

Given a list of messages forming a conversation, the model generates a response. Additionally, the model can utilize built-in tools for tool calls, enhancing its capability to provide more comprehensive and actionable responses.

Example Usage

import os

from friendli import SyncFriendli

with SyncFriendli(
    token=os.getenv("FRIENDLI_TOKEN", ""),
) as friendli:
    res = friendli.serverless.tool_assisted_chat.stream(
        messages=[
            {
                "content": "What is 3 + 6?",
                "role": "user",
            },
        ],
        model="meta-llama-3.1-8b-instruct",
        max_tokens=200,
        stream=True,
        tools=[
            {
                "type": "math:calculator",
            },
        ],
    )

    with res as event_stream:
        for event in event_stream:
            # handle event
            print(event, flush=True)

Parameters

Parameter Type Required Description Example
messages List[models.Message] ✔️ A list of messages comprising the conversation so far. [
{
"content": "You are a helpful assistant.",
"role": "system"
},
{
"content": "Hello!",
"role": "user"
}
]
model str ✔️ Code of the model to use. See available model list. meta-llama-3.1-8b-instruct
x_friendli_team OptionalNullable[str] ID of team to run requests as (optional parameter).
chat_template_kwargs Dict[str, Any] Additional keyword arguments supplied to the template renderer. These parameters will be available for use within the chat template.
eos_token List[int] A list of endpoint sentence tokens.
frequency_penalty OptionalNullable[float] Number between -2.0 and 2.0. Positive values penalizes tokens that have been sampled, taking into account their frequency in the preceding text. This penalization diminishes the model's tendency to reproduce identical lines verbatim.
logit_bias Dict[str, Any] Accepts a JSON object that maps tokens to an associated bias value. Mathematically, the bias is added to the logits generated by the model prior to sampling. The exact effect will vary per model.
logprobs OptionalNullable[bool] Whether to return log probabilities of the output tokens or not.
max_tokens OptionalNullable[int] The maximum number of tokens to generate. For decoder-only models like GPT, the length of your input tokens plus max_tokens should not exceed the model's maximum length (e.g., 2048 for OpenAI GPT-3). For encoder-decoder models like T5 or BlenderBot, max_tokens should not exceed the model's maximum output length. This is similar to Hugging Face's max_new_tokens argument. 200
min_p OptionalNullable[float] A scaling factor used to determine the minimum token probability threshold. This threshold is calculated as min_p multiplied by the probability of the most likely token. Tokens with probabilities below this scaled threshold are excluded from sampling. Values range from 0.0 (inclusive) to 1.0 (inclusive). Higher values result in stricter filtering, while lower values allow for greater diversity. The default value of 0.0 disables filtering, allowing all tokens to be considered for sampling.
n OptionalNullable[int] The number of independently generated results for the prompt. Defaults to 1. This is similar to Hugging Face's num_return_sequences argument.
parallel_tool_calls OptionalNullable[bool] Whether to enable parallel function calling.
presence_penalty OptionalNullable[float] Number between -2.0 and 2.0. Positive values penalizes tokens that have been sampled at least once in the existing text.
repetition_penalty OptionalNullable[float] Penalizes tokens that have already appeared in the generated result (plus the input tokens for decoder-only models). Should be positive value (1.0 means no penalty). See keskar et al., 2019 for more details. This is similar to Hugging Face's repetition_penalty argument.
resume_generation Optional[bool] Enable to continue text generation even after an error occurs during a tool call.

Note that enabling this option may use more tokens, as the system generates additional content to handle errors gracefully.
However, if the system fails more than 8 times, the generation will stop regardless.

Tip
This is useful in scenarios where you want to maintain text generation flow despite errors, such as when generating long-form content.
The user will not be interrupted by tool call issues, ensuring a smoother experience.
seed OptionalNullable[models.ServerlessToolAssistedChatCompletionStreamBodySeed] Seed to control random procedure. If nothing is given, random seed is used for sampling, and return the seed along with the generated result. When using the n argument, you can pass a list of seed values to control all of the independent generations.
stop List[str] When one of the stop phrases appears in the generation result, the API will stop generation. The stop phrases are excluded from the result. Defaults to empty list.
stream Optional[bool] Whether to stream generation result. When set true, each token will be sent as server-sent events once generated.
stream_options OptionalNullable[models.StreamOptions] Options related to stream.
It can only be used when stream: true.
temperature OptionalNullable[float] Sampling temperature. Smaller temperature makes the generation result closer to greedy, argmax (i.e., top_k = 1) sampling. Defaults to 1.0. This is similar to Hugging Face's temperature argument.
tool_choice Optional[models.ServerlessToolAssistedChatCompletionStreamBodyToolChoice] Determines the tool calling behavior of the model.
When set to none, the model will bypass tool execution and generate a response directly.
In auto mode (the default), the model dynamically decides whether to call a tool or respond with a message.
Alternatively, setting required ensures that the model invokes at least one tool before responding to the user.
You can also specify a particular tool by {"type": "function", "function": {"name": "my_function"}}.
tools List[models.ToolAssistedChatTool] A list of tools the model may call.
A maximum of 128 functions is supported.
Use this to provide a list of functions the model may generate JSON inputs for.
For more detailed information about each tool, please refer here.
top_k OptionalNullable[int] Limits sampling to the top k tokens with the highest probabilities. Values range from 0 (no filtering) to the model's vocabulary size (inclusive). The default value of 0 applies no filtering, allowing all tokens.
top_logprobs OptionalNullable[int] The number of most likely tokens to return at each token position, each with an associated log probability. logprobs must be set to true if this parameter is used.
top_p OptionalNullable[float] Keeps only the smallest set of tokens whose cumulative probabilities reach top_p or higher. Values range from 0.0 (exclusive) to 1.0 (inclusive). The default value of 1.0 includes all tokens, allowing maximum diversity.
retries Optional[utils.RetryConfig] Configuration to override the default retry behavior of the client.

Response

Union[eventstreaming.EventStream[models.ServerlessToolAssistedChatCompletionStreamSuccess], eventstreaming.EventStreamAsync[models.ServerlessToolAssistedChatCompletionStreamSuccess]]

Errors

Error Type Status Code Content Type
models.SDKError 4XX, 5XX */*