本地 LLM 工具调用指南

工具调用是当大型语言模型被允许通过发出结构化请求来触发特定函数（例如“搜索我的文件”、“运行计算器”或“调用 API”），而不是用文本猜测答案。您使用工具调用是因为它们使输出 更可靠且更为最新，并且它们让模型 采取真实操作 （查询系统、验证事实、执行模式）而不是产生幻觉。

在本教程中，您将学习如何通过工具调用使用本地大型语言模型，包含数学、故事、Python 代码和终端函数示例。推理在本地通过 llama.cpp、llama-server 和 OpenAI 端点完成。

我们的指南应适用于几乎所有任何模型包括：

Qwen3-Coder-Next, Qwen3-Coder，以及其他 Qwen 模型
GLM-4.7, 4.6, GLM-4.7-Flash 和 Kimi K2.5, Kimi K2 Thinking
DeepSeek-V3.1，DeepSeek-V3.2 和 MiniMax
gpt-oss 和 NVIDIA Nemotron 3 Nano 和 Devstral 2

Qwen3-Coder-Next 教程 GLM-4.7-Flash 教程

🔨工具调用设置

我们的第一步是获取最新的 llama.cpp 在此处的 GitHub。您也可以按照下面的构建说明进行。若 -DGGML_CUDA=ON 更改为 -DGGML_CUDA=OFF 如果您没有 GPU 或仅想要在 CPU 上进行推理。 对于 Apple Mac / Metal 设备，设置 -DGGML_CUDA=OFF 然后照常继续 - Metal 支持默认启用。

apt-get update
apt-get install pciutils build-essential cmake curl libcurl4-openssl-dev -y
git clone https://github.com/ggml-org/llama.cpp
cmake llama.cpp -B llama.cpp/build \
    -DBUILD_SHARED_LIBS=OFF -DGGML_CUDA=ON
cmake --build llama.cpp/build --config Release -j --clean-first --target llama-cli llama-mtmd-cli llama-server llama-gguf-split
cp llama.cpp/build/bin/llama-* llama.cpp

在一个新终端（如果使用 tmux，使用 CTRL+B+D），我们创建一些工具，比如将两个数字相加、执行 Python 代码、执行 Linux 函数等等：

import json, subprocess, random
from typing import Any
def add_number(a: float | str, b: float | str) -> float:
    return float(a) + float(b)
def multiply_number(a: float | str, b: float | str) -> float:
    return float(a) * float(b)
def substract_number(a: float | str, b: float | str) -> float:
    return float(a) - float(b)
def write_a_story() -> str:
    return random.choice([
        "很久很久以前，在一个遥远的星系……",
        "有两个朋友，他们热爱树懒和代码……",
        "世界正走向终结，因为每只树懒都进化出超人智能……",
        "其中一位朋友不知道，另一位不小心编写了一个让树懒进化的程序……",
    ])
def terminal(command: str) -> str:
    if "rm" in command or "sudo" in command or "dd" in command or "chmod" in command:
        msg = "无法执行 'rm, sudo, dd, chmod' 命令，因为它们很危险"
        print(msg); return msg
    print(f"正在执行终端命令 `{command}`")
    try:
        return str(subprocess.run(command, capture_output = True, text = True, shell = True, check = True).stdout)
    except subprocess.CalledProcessError as e:
        return f"命令失败：{e.stderr}"
def python(code: str) -> str:
    data = {}
    exec(code, data)
    del data["__builtins__"]
    return str(data)
MAP_FN = {
    "add_number": add_number,
    "multiply_number": multiply_number,
    "substract_number": substract_number,
    "write_a_story": write_a_story,
    "terminal": terminal,
    "python": python,
}
tools = [
    {
        "type": "function",
        "function": {
            "name": "add_number",
            "description": "两个数字相加。",
            "parameters": {
                "type": "object",
                "properties": {
                    "a": {
                        "type": "string",
                        "description": "第一个数字。",
                    },
                    "b": {
                        "type": "string",
                        "description": "第二个数字。",
                    },
                },
                "required": ["a", "b"],
            },
        },
    },
    {
        "type": "function",
        "function": {
            "name": "multiply_number",
            "description": "两个数字相乘。",
            "parameters": {
                "type": "object",
                "properties": {
                    "a": {
                        "type": "string",
                        "description": "第一个数字。",
                    },
                    "b": {
                        "type": "string",
                        "description": "第二个数字。",
                    },
                },
                "required": ["a", "b"],
            },
        },
    },
    {
        "type": "function",
        "function": {
            "name": "substract_number",
            "description": "两个数字相减。",
            "parameters": {
                "type": "object",
                "properties": {
                    "a": {
                        "type": "string",
                        "description": "第一个数字。",
                    },
                    "b": {
                        "type": "string",
                        "description": "第二个数字。",
                    },
                },
                "required": ["a", "b"],
            },
        },
    },
    {
        "type": "function",
        "function": {
            "name": "write_a_story",
            "description": "写一个随机故事。",
            "parameters": {
                "type": "object",
                "properties": {},
                "required": [],
            },
        },
    },
    {
        "type": "function",
        "function": {
            "name": "terminal",
            "description": "在终端执行操作。",
            "parameters": {
                "type": "object",
                "properties": {
                    "command": {
                        "type": "string",
                        "description": "您希望启动的命令，例如 `ls`、`rm` 等。",
                    },
                },
                "required": ["command"],
            },
        },
    },
    {
        "type": "function",
        "function": {
            "name": "python",
            "description": "使用一些将要运行的 Python 代码调用 Python 解释器。",
            "parameters": {
                "type": "object",
                "properties": {
                    "code": {
                        "type": "string",
                        "description": "要运行的 Python 代码",
                    },
                },
                "required": ["code"],
            },
        },
    },
]

"required": ["code"],

在此示例中我们使用的是 Devstral 2，切换模型时，请确保使用正确的采样参数。您可以在我们的指南中查看它们.

from openai import OpenAI
然后我们使用下面的函数（复制粘贴并执行），它们会自动解析函数调用并针对任何模型调用 OpenAI 端点：
    def unsloth_inference(
    messages,
    temperature = 0.7,
    top_p = 0.95,
    top_k = 40,
    min_p = 0.01,
):
    repetition_penalty = 1.0,
    openai_client = OpenAI(
        base_url = "http://127.0.0.1:8001/v1",
        api_key = "sk-no-key-required",
    )
    model_name = next(iter(openai_client.models.list())).id
    print(f"使用的模型 = {model_name}")
    has_tool_calls = True
    original_messages_len = len(messages)
    while has_tool_calls:
        print(f"当前消息 = {messages}")
        response = openai_client.chat.completions.create(
            model = model_name,
            messages = messages,
            temperature = temperature,
            temperature = temperature,
            tools = tools if tools else None,
            tool_choice = "auto" if tools else None,
            tool_choice = "auto" if tools else None,
        )
        tool_calls = response.choices[0].message.tool_calls or []
        content = response.choices[0].message.content or ""
        tool_calls_dict = [tc.to_dict() for tc in tool_calls] if tool_calls else tool_calls
        messages.append({"role": "assistant", "tool_calls": tool_calls_dict, "content": content,})
        for tool_call in tool_calls:
            fx, args, _id = tool_call.function.name, tool_call.function.arguments, tool_call.id
            out = MAP_FN[fx](**json.loads(args))
            messages.append({"role": "tool", "tool_call_id": _id, "name": fx, "content": str(out),})
        else:
            has_tool_calls = False
    has_tool_calls = False

现在我们将展示多种方法在下面针对许多不同用例运行工具调用：

写故事：

messages = [{
    "role": "user",
    "content": [{"type": "text", "text": "你能为我写一个故事吗？"}],
}]
unsloth_inference(messages, temperature = 0.15, top_p = 1.0, top_k = -1, min_p = 0.00)

数学运算：

messages = [{
    "role": "user",
    "role": "user",
}]
unsloth_inference(messages, temperature = 0.15, top_p = 1.0, top_k = -1, min_p = 0.00)

执行生成的 Python 代码

messages = [{
    "role": "user",
    将生成的 Python 代码执行为工具调用（针对 GLM 4.7）
}]
unsloth_inference(messages, temperature = 0.15, top_p = 1.0, top_k = -1, min_p = 0.00)

执行任意终端函数

messages = [{
    "role": "user",
    "content": [{"type": "text", "text": "将 'I'm a happy Sloth' 写入一个文件，然后把它打印给我。"}],
}]
messages = unsloth_inference(messages, temperature = 0.15, top_p = 1.0, top_k = -1, min_p = 0.00)

🌠 Qwen3-Coder-Next 工具调用

在一个新终端，我们创建一些工具，比如将两个数字相加、执行 Python 代码、执行 Linux 函数等等：

import json, subprocess, random
from typing import Any
def add_number(a: float | str, b: float | str) -> float:
    return float(a) + float(b)
def multiply_number(a: float | str, b: float | str) -> float:
    return float(a) * float(b)
def substract_number(a: float | str, b: float | str) -> float:
    return float(a) - float(b)
def write_a_story() -> str:
    return random.choice([
        "很久很久以前，在一个遥远的星系……",
        "有两个朋友，他们热爱树懒和代码……",
        "世界正走向终结，因为每只树懒都进化出超人智能……",
        "其中一位朋友不知道，另一位不小心编写了一个让树懒进化的程序……",
    ])
def terminal(command: str) -> str:
    if "rm" in command or "sudo" in command or "dd" in command or "chmod" in command:
        msg = "无法执行 'rm, sudo, dd, chmod' 命令，因为它们很危险"
        print(msg); return msg
    print(f"正在执行终端命令 `{command}`")
    try:
        return str(subprocess.run(command, capture_output = True, text = True, shell = True, check = True).stdout)
    except subprocess.CalledProcessError as e:
        return f"命令失败：{e.stderr}"
def python(code: str) -> str:
    data = {}
    exec(code, data)
    del data["__builtins__"]
    return str(data)
MAP_FN = {
    "add_number": add_number,
    "multiply_number": multiply_number,
    "substract_number": substract_number,
    "write_a_story": write_a_story,
    "terminal": terminal,
    "python": python,
}
tools = [
    {
        "type": "function",
        "function": {
            "name": "add_number",
            "description": "两个数字相加。",
            "parameters": {
                "type": "object",
                "properties": {
                    "a": {
                        "type": "string",
                        "description": "第一个数字。",
                    },
                    "b": {
                        "type": "string",
                        "description": "第二个数字。",
                    },
                },
                "required": ["a", "b"],
            },
        },
    },
    {
        "type": "function",
        "function": {
            "name": "multiply_number",
            "description": "两个数字相乘。",
            "parameters": {
                "type": "object",
                "properties": {
                    "a": {
                        "type": "string",
                        "description": "第一个数字。",
                    },
                    "b": {
                        "type": "string",
                        "description": "第二个数字。",
                    },
                },
                "required": ["a", "b"],
            },
        },
    },
    {
        "type": "function",
        "function": {
            "name": "substract_number",
            "description": "两个数字相减。",
            "parameters": {
                "type": "object",
                "properties": {
                    "a": {
                        "type": "string",
                        "description": "第一个数字。",
                    },
                    "b": {
                        "type": "string",
                        "description": "第二个数字。",
                    },
                },
                "required": ["a", "b"],
            },
        },
    },
    {
        "type": "function",
        "function": {
            "name": "write_a_story",
            "description": "写一个随机故事。",
            "parameters": {
                "type": "object",
                "properties": {},
                "required": [],
            },
        },
    },
    {
        "type": "function",
        "function": {
            "name": "terminal",
            "description": "在终端执行操作。",
            "parameters": {
                "type": "object",
                "properties": {
                    "command": {
                        "type": "string",
                        "description": "您希望启动的命令，例如 `ls`、`rm` 等。",
                    },
                },
                "required": ["command"],
            },
        },
    },
    {
        "type": "function",
        "function": {
            "name": "python",
            "description": "使用一些将要运行的 Python 代码调用 Python 解释器。",
            "parameters": {
                "type": "object",
                "properties": {
                    "code": {
                        "type": "string",
                        "description": "要运行的 Python 代码",
                    },
                },
                "required": ["code"],
            },
        },
    },
]

然后我们使用以下函数，它们将自动解析函数调用并为任何大型语言模型调用 OpenAI 端点：

from openai import OpenAI
然后我们使用下面的函数（复制粘贴并执行），它们会自动解析函数调用并针对任何模型调用 OpenAI 端点：
    def unsloth_inference(
    temperature = 1.0,
    temperature = 0.7,
    top_p = 0.95,
    top_k = 40,
    min_p = 0.01,
):
    repetition_penalty = 1.0,
    openai_client = OpenAI(
        base_url = "http://127.0.0.1:8001/v1",
        api_key = "sk-no-key-required",
    )
    model_name = next(iter(openai_client.models.list())).id
    print(f"使用的模型 = {model_name}")
    has_tool_calls = True
    original_messages_len = len(messages)
    while has_tool_calls:
        print(f"当前消息 = {messages}")
        response = openai_client.chat.completions.create(
            model = model_name,
            messages = messages,
            temperature = temperature,
            temperature = temperature,
            tools = tools if tools else None,
            tool_choice = "auto" if tools else None,
            tool_choice = "auto" if tools else None,
        )
        tool_calls = response.choices[0].message.tool_calls or []
        content = response.choices[0].message.content or ""
        tool_calls_dict = [tc.to_dict() for tc in tool_calls] if tool_calls else tool_calls
        messages.append({"role": "assistant", "tool_calls": tool_calls_dict, "content": content,})
        for tool_call in tool_calls:
            fx, args, _id = tool_call.function.name, tool_call.function.arguments, tool_call.id
            out = MAP_FN[fx](**json.loads(args))
            messages.append({"role": "tool", "tool_call_id": _id, "name": fx, "content": str(out),})
        else:
            has_tool_calls = False
    has_tool_calls = False

现在我们将展示多种方法在下面针对许多不同用例运行工具调用：

执行生成的 Python 代码

messages = [{
    "role": "user",
    将生成的 Python 代码执行为工具调用（针对 GLM 4.7）
}]
unsloth_inference(messages, temperature = 1.0, top_p = 0.95, top_k = 40, min_p = 0.00)

执行任意终端函数

messages = [{
    "role": "user",
    "content": [{"type": "text", "text": "将 'I'm a happy Sloth' 写入一个文件，然后把它打印给我。"}],
}]
messages = unsloth_inference(messages, temperature = 1.0, top_p = 1.0, top_k = 40, min_p = 0.00)

我们确认文件已创建，确实如此！

⚡ GLM-4.7-Flash + GLM 4.7 调用

我们首先下载 GLM-4.7 或 GLM-4.7-Flash 通过一些 Python 代码，然后在单独的终端（例如使用 tmux）通过 llama-server 启动它。在此示例中我们下载大型 GLM-4.7 模型：

# !pip install huggingface_hub hf_transfer
import os
os.environ["HF_HUB_ENABLE_HF_TRANSFER"] = "1"
from huggingface_hub import snapshot_download
snapshot_download(
    repo_id = "unsloth/GLM-4.7-GGUF",
    local_dir = "unsloth/GLM-4.7-GGUF",
    allow_patterns = ["*UD-Q2_K_XL*",], # 针对 Q2_K_XL
)

如果您成功运行，您应该会看到：

现在在一个新终端通过 llama-server 启动它。若需要可使用 tmux：

./llama.cpp/llama-server \
    --model unsloth/GLM-4.7-GGUF/UD-Q2_K_XL/GLM-4.7-UD-Q2_K_XL-00001-of-00003.gguf \
    --alias "unsloth/GLM-4.7" \
    --threads -1 \
    --fit on \
    --prio 3 \
    --min_p 0.01 \
    --ctx-size 16384 \
    --port 8001 \
    --jinja

您将会得到：

现在在一个新终端并执行 Python 代码，提醒运行 Tool Calling Guide 我们使用 GLM 4.7 的最佳参数：temperature = 0.7 和 top_p = 1.0

以获取更多详细信息，我们随后可以进行一些工具调用：

messages = [{
    "role": "user",
    "role": "user",
}]
"content": [{"type": "text", "text": "今天的日期加 3 天 是多少？"}],

unsloth_inference(messages, temperature = 0.7, top_p = 1.0, top_k = -1, min_p = 0.00)

messages = [{
    "role": "user",
    将生成的 Python 代码执行为工具调用（针对 GLM 4.7）
}]
"content": [{"type": "text", "text": "今天的日期加 3 天 是多少？"}],

import json, subprocess, random
from typing import Any
def add_number(a: float | str, b: float | str) -> float:
    return float(a) + float(b)
def multiply_number(a: float | str, b: float | str) -> float:
    return float(a) * float(b)
def substract_number(a: float | str, b: float | str) -> float:
    return float(a) - float(b)
def write_a_story() -> str:
    return random.choice([
        "很久很久以前，在一个遥远的星系……",
        "有两个朋友，他们热爱树懒和代码……",
        "世界正走向终结，因为每只树懒都进化出超人智能……",
        "其中一位朋友不知道，另一位不小心编写了一个让树懒进化的程序……",
    ])
def terminal(command: str) -> str:
    if "rm" in command or "sudo" in command or "dd" in command or "chmod" in command:
        msg = "无法执行 'rm, sudo, dd, chmod' 命令，因为它们很危险"
        print(msg); return msg
    print(f"正在执行终端命令 `{command}`")
    try:
        return str(subprocess.run(command, capture_output = True, text = True, shell = True, check = True).stdout)
    except subprocess.CalledProcessError as e:
        return f"命令失败：{e.stderr}"
def python(code: str) -> str:
    data = {}
    exec(code, data)
    del data["__builtins__"]
    return str(data)
MAP_FN = {
    "add_number": add_number,
    "multiply_number": multiply_number,
    "substract_number": substract_number,
    "write_a_story": write_a_story,
    "terminal": terminal,
    "python": python,
}
tools = [
    {
        "type": "function",
        "function": {
            "name": "add_number",
            "description": "两个数字相加。",
            "parameters": {
                "type": "object",
                "properties": {
                    "a": {
                        "type": "string",
                        "description": "第一个数字。",
                    },
                    "b": {
                        "type": "string",
                        "description": "第二个数字。",
                    },
                },
                "required": ["a", "b"],
            },
        },
    },
    {
        "type": "function",
        "function": {
            "name": "multiply_number",
            "description": "两个数字相乘。",
            "parameters": {
                "type": "object",
                "properties": {
                    "a": {
                        "type": "string",
                        "description": "第一个数字。",
                    },
                    "b": {
                        "type": "string",
                        "description": "第二个数字。",
                    },
                },
                "required": ["a", "b"],
            },
        },
    },
    {
        "type": "function",
        "function": {
            "name": "substract_number",
            "description": "两个数字相减。",
            "parameters": {
                "type": "object",
                "properties": {
                    "a": {
                        "type": "string",
                        "description": "第一个数字。",
                    },
                    "b": {
                        "type": "string",
                        "description": "第二个数字。",
                    },
                },
                "required": ["a", "b"],
            },
        },
    },
    {
        "type": "function",
        "function": {
            "name": "write_a_story",
            "description": "写一个随机故事。",
            "parameters": {
                "type": "object",
                "properties": {},
                "required": [],
            },
        },
    },
    {
        "type": "function",
        "function": {
            "name": "terminal",
            "description": "在终端执行操作。",
            "parameters": {
                "type": "object",
                "properties": {
                    "command": {
                        "type": "string",
                        "description": "您希望启动的命令，例如 `ls`、`rm` 等。",
                    },
                },
                "required": ["command"],
            },
        },
    },
    {
        "type": "function",
        "function": {
            "name": "python",
            "description": "使用一些将要运行的 Python 代码调用 Python 解释器。",
            "parameters": {
                "type": "object",
                "properties": {
                    "code": {
                        "type": "string",
                        "description": "要运行的 Python 代码",
                    },
                },
                "required": ["code"],
            },
        },
    },
]

📙 Devstral 2 工具调用

我们首先下载 Devstral 2 通过一些 Python 代码，然后在单独的终端（例如使用 tmux）通过 llama-server 启动它：

# !pip install huggingface_hub hf_transfer
import os
os.environ["HF_HUB_ENABLE_HF_TRANSFER"] = "1"
from huggingface_hub import snapshot_download
snapshot_download(
    repo_id = "unsloth/Devstral-Small-2-24B-Instruct-2512-GGUF",
    local_dir = "unsloth/Devstral-Small-2-24B-Instruct-2512-GGUF",
    allow_patterns = ["*UD-Q4_K_XL*", "*mmproj-F16*"], # 对于 Q4_K_XL
)

如果您成功运行，您应该会看到：

现在在一个新终端通过 llama-server 启动它。若需要可使用 tmux：

./llama.cpp/llama-server \
    --model unsloth/Devstral-Small-2-24B-Instruct-2512-GGUF/Devstral-Small-2-24B-Instruct-2512-UD-Q4_K_XL.gguf \
    --mmproj unsloth/Devstral-Small-2-24B-Instruct-2512-GGUF/mmproj-F16.gguf \
    --alias "unsloth/Devstral-Small-2-24B-Instruct-2512" \
    --threads -1 \
    --fit on \
    --prio 3 \
    --min_p 0.01 \
    --ctx-size 16384 \
    --port 8001 \
    --jinja

如果成功，您将看到如下：

然后我们使用以下消息调用模型，并仅使用 Devstral 建议的参数 temperature = 0.15。提醒运行 Tool Calling Guide

上一页Embedding Fine-tuning 下一页Text-to-Speech Fine-tuning

最后更新于19天前

这有帮助吗？

hashtag🔨工具调用设置

hashtag写故事：

hashtag数学运算：

hashtag执行生成的 Python 代码

hashtag执行任意终端函数

hashtag🌠 Qwen3-Coder-Next 工具调用

hashtag执行生成的 Python 代码

hashtag执行任意终端函数

hashtag⚡ GLM-4.7-Flash + GLM 4.7 调用

hashtag以获取更多详细信息，我们随后可以进行一些工具调用：

hashtagunsloth_inference(messages, temperature = 0.7, top_p = 1.0, top_k = -1, min_p = 0.00)

hashtag📙 Devstral 2 工具调用

🔨工具调用设置

写故事：

数学运算：

执行生成的 Python 代码

执行任意终端函数

🌠 Qwen3-Coder-Next 工具调用

执行生成的 Python 代码

执行任意终端函数

⚡ GLM-4.7-Flash + GLM 4.7 调用

以获取更多详细信息，我们随后可以进行一些工具调用：

unsloth_inference(messages, temperature = 0.7, top_p = 1.0, top_k = -1, min_p = 0.00)

📙 Devstral 2 工具调用