🥝Kimi K2.5：本地运行指南

在你自己的本地设备上运行 Kimi-K2.5 的指南！

Kimi-K2.5 是 Moonshot 的新型号，在视觉、代码、代理和聊天任务上达到了最先进的性能。这个 1T 参数的混合推理模型需要 600GB 的磁盘空间，而量化后的 Unsloth 动态 1.8 位 版本将其减少到 240GB（-60% 大小）: Kimi-K2.5-GGUF

所有上传都使用 Unsloth Dynamic 2.0 以获得 SOTA Aider 和 5-shot MMLU 性能。请查看我们的 1–2 位动态 GGUF 在编码基准.

⚙️ 推荐需求

您需要 >240GB 的磁盘空间 来运行 1 位量化！

为获得最佳性能，请确保您可用的总内存（VRAM + 系统 RAM）大于您下载的量化模型文件的大小。如果不是，llama.cpp 仍可通过 SSD/HDD 异地卸载运行，但推理会更慢。

1.8 位（UD-TQ1_0）量化如果将所有 MoE 层卸载到系统内存（或快速 SSD），可以在单个 24GB GPU 上运行。使用约 ~256GB RAM，预计 ~10 令牌/秒。完整的 Kimi K2.5 模型为 630GB，通常至少需要 4× H200 GPU。

如果模型适配，使用 B200 时您将获得 >40 令牌/秒。

要在接近 全精度的情况下运行模型，您可以使用 4 位或 5 位量化。您也可以使用更高位数以更安全。

为获得强劲性能，目标是 >240GB 的统一内存（或 RAM+VRAM 总和）以达到 10+ 令牌/秒。如果低于该值，模型仍能工作，但速度会下降（llama.cpp 仍可通过 mmap/磁盘卸载运行），速度可能从 ~10 令牌/秒降到 <2 令牌/秒。

我们推荐 UD-Q2_K_XL（375GB）作为较好的大小/质量平衡。经验法则：RAM+VRAM ≈ 量化大小；否则仍可运行，但由于卸载会更慢。

🥝 运行 Kimi K2.5 指南

Kimi-K2.5 在不同用例下需要不同的采样参数。

目前有 不支持视觉 该模型，但希望 llama.cpp 很快会支持。

要以全精度运行模型，您只需使用 4 位或 5 位动态 GGUF（例如 UD_Q4_K_XL），因为模型最初以 INT4 格式发布。

您可以选择更高位的量化以防止小的量化差异，但在大多数情况下这是不必要的。

Kimi K2.5 与 Kimi K2 Thinking 的差异

两款模型都使用修改过的 DeepSeek V3 MoE 架构。
rope_scaling.beta_fast K2.5 使用 32.0，而 K2 Thinking 使用 1.0。
MoonViT 是原生分辨率的 200M 参数视觉编码器。它类似于 Kimi-VL-A3B-Instruct 中使用的编码器。

🌙 使用指南：

根据 Moonshot AI，以下是 Kimi K2.5 推理的推荐设置：

默认设置（即时模式）

思考模式

temperature = 0.6

temperature = 1.0

top_p = 0.95

min_p = 0.01

将 temperature 设为 1.0 以减少重复和不连贯性。
建议上下文长度 = 98,304（最高可达 256K）
注意：使用不同工具可能需要不同设置

我们建议将 min_p 设为 0.01 以抑制低概率不太可能出现的标记的发生。并且 禁用或将重复惩罚设为 = 1.0 如果需要。

Kimi K2.5 的聊天模板

运行 tokenizer.apply_chat_template([{"role": "user", "content": "What is 1+1?"},]) 将得到：

<|im_system|>system<|im_middle|>You are Kimi, an AI assistant created by Moonshot AI.<|im_end|><|im_user|>user<|im_middle|>What is 1+1?<|im_end|><|im_assistant|>assistant<|im_middle|><think>

✨ 在 llama.cpp 中运行 Kimi K2.5

在本指南中我们将运行最小的 1 位量化，大小为 240GB。您可以随意将量化类型更改为 2 位、3 位等。要在接近 全精度的情况下运行模型，您可以使用 4 位或 5 位量化。您也可以使用更高位数以更安全。

获取最新的 llama.cpp 在 GitHub 在此。你也可以按照下面的构建说明。若 -DGGML_CUDA=ON 改为 -DGGML_CUDA=OFF 如果你没有 GPU 或仅想使用 CPU 推理。 对于 Apple Mac / Metal 设备，设置 -DGGML_CUDA=OFF 然后照常继续 - Metal 支持默认开启。

apt-get update
apt-get install pciutils build-essential cmake curl libcurl4-openssl-dev -y
git clone https://github.com/ggml-org/llama.cpp
cmake llama.cpp -B llama.cpp/build \
    -DBUILD_SHARED_LIBS=OFF -DGGML_CUDA=ON
cmake --build llama.cpp/build --config Release -j --clean-first --target llama-cli llama-mtmd-cli llama-server llama-gguf-split
cp llama.cpp/build/bin/llama-* llama.cpp

如果你想直接使用 llama.cpp 来加载模型，您可以如下操作：（:UD-TQ1_0）是量化类型。您也可以通过 Hugging Face（第 3 点）下载。这类似于 ollama run 类似。使用 export LLAMA_CACHE="folder" 来强制 llama.cpp 保存到特定位置。

LLAMA_SET_ROWS=1 使 llama.cpp 稍微快一些！使用它！ --fit on 会自动将模型在您的所有 GPU 和 CPU 上进行最佳适配。

export LLAMA_CACHE="unsloth/Kimi-K2.5-GGUF"
LLAMA_SET_ROWS=1 ./llama.cpp/llama-cli \
    -hf unsloth/Kimi-K2.5-GGUF:UD-TQ1_0\
    --temp 1.0 \
    --min-p 0.01 \
    --top-p 0.95 \
    --ctx-size 16384 \
    --seed 3407

--fit on 将自动将模型适配到您的系统。如果不使用 --fit on 并且您大约有 360GB 的合并 GPU 内存，请移除 -ot ".ffn_.*_exps.=CPU" 以获得最大速度。

使用 --fit on 在 GPU 和 CPU 上自动适配。如果这不起作用，请参阅下面：

请尝试使用 -ot ".ffn_.*_exps.=CPU" 将所有 MoE 层卸载到 CPU！这实际上允许您将所有非 MoE 层放在 1 块 GPU 上，从而提高生成速度。如果您有更多 GPU 容量，可以自定义正则表达式以适配更多层。

如果您有更多 GPU 内存，请尝试 -ot ".ffn_(up|down)_exps.=CPU" 这会卸载上投影和下投影的 MoE 层。

尝试 -ot ".ffn_(up)_exps.=CPU" 如果您有更多 GPU 内存。这仅卸载上投影的 MoE 层。

最后通过卸载所有层使用 -ot ".ffn_.*_exps.=CPU" 这使用最少的 VRAM。

您也可以自定义正则表达式，例如 -ot "\.(6|7|8|9|[0-9][0-9]|[0-9][0-9][0-9])\.ffn_(gate|up|down)_exps.=CPU" 表示从第 6 层开始卸载 gate、up 和 down MoE 层。

通过以下方式下载模型（在安装后 pip install huggingface_hub hf_transfer 之后）下载模型。我们建议使用我们的 2bit 动态量化 UD-Q2_K_XL 以平衡大小和准确性。所有版本位于： huggingface.co/unsloth/Kimi-K2.5-GGUF 如果下载卡住，请参阅 Hugging Face Hub，XET 调试

pip install -U huggingface_hub
hf download unsloth/Kimi-K2.5-GGUF \
    --local-dir unsloth/Kimi-K2.5-GGUF \
    --include "*UD-TQ1_0*" # 对于 Dynamic 2bit 使用 "*UD-Q2_K_XL*"

如果您发现下载在 90 到 95% 等位置卡住，请查看我们的故障排除指南.

运行任何提示。
编辑 --ctx-size 16384 以设置上下文长度。您也可以省略此项以通过自动上下文长度发现（auto context length discovery）进行自动检测。 --fit on

LLAMA_SET_ROWS=1 ./llama.cpp/llama-cli \
    --model unsloth/Kimi-K2.5-GGUF/UD-TQ1_0/Kimi-K2.5-UD-TQ1_0-00001-of-00005.gguf \
    --temp 1.0 \
    --min_p 0.01 \
    --top-p 0.95 \
    --ctx-size 16384 \
    --seed 3407

例如尝试："在 HTML 中创建一个 Flappy Bird 游戏"，您将得到：

<!DOCTYPE html>
<html lang="en">
<head>
    <meta charset="UTF-8">
    <meta name="viewport" content="width=device-width, initial-scale=1.0">
    <title>Flappy Bird</title>
    <style>
        body {
            margin: 0;
            padding: 0;
            background: #222;
            display: flex;
            justify-content: center;
            align-items: center;
            height: 100vh;
            font-family: 'Segoe UI', sans-serif;
            overflow: hidden;
            touch-action: none;
        }
        
        #game-container {
            position: relative;
            width: 400px;
            height: 600px;
            background: linear-gradient(to bottom, #70c5ce 0%, #70c5ce 80%, #c23810 80%, #c23810 100%);
            box-shadow: 0 0 20px rgba(0,0,0,0.5);
            overflow: hidden;
        }
        
        canvas {
            display: block;
        }
        
        .overlay {
            position: absolute;
            top: 50%;
            left: 50%;
            transform: translate(-50%, -50%);
            text-align: center;
            color: white;
            text-shadow: 2px 2px 0 #000;
            font-weight: bold;
            pointer-events: none;
        }
        
        .game-title {
            font-size: 48px;
            margin-bottom: 20px;
        }
        
        .score-display {
            font-size: 36px;
            margin-bottom: 10px;
        }
        
        .best-score {
            font-size: 24px;
            color: #ffe;
        }
        
        .instruction {
            font-size: 20px;
            animation: pulse 1s infinite;
        }
        
        @keyframes pulse {
            0%, 100% { opacity: 1; }
            50% { opacity: 0.5; }
        }
        
        .hidden { display: none; }
    </style>
</head>
<body>
    <div id="game-container">
        <canvas id="canvas" width="400" height="600"></canvas>
        
        <!-- Start Screen -->
        <div id="start-screen" class="overlay">
            <div class="game-title">FLAPPY BIRD</div>
            <div class="instruction">点击或按空格飞起</div>
        </div>
        
        <!-- Game Over Screen -->
        <div id="game-over-screen" class="overlay hidden">
            <div class="game-title">GAME OVER</div>
            <div class="score-display">分数：<span id="final-score">0</span></div>
            <div class="best-score">最高：<span id="best-score">0</span></div>
            <div class="instruction">点击以重新开始</div>
        </div>
        
        <!-- Score Counter -->
        <div id="current-score" class="overlay hidden" style="top: 10%; font-size: 72px; color: white; text-shadow: 4px 4px 0 #000;">
            0
        </div>
    </div>

    <script>
        const canvas = document.getElementById('canvas');
        const ctx = canvas.getContext('2d');
        
        // 游戏常量
        const GRAVITY = 0.4;
        const JUMP_STRENGTH = -7;
        const PIPE_SPEED = 3;
        const PIPE_SPAWN_RATE = 120; // 帧
        const PIPE_GAP = 120;
        
        // 游戏状态
        let bird = { x: 50, y: 200, velocity: 0, radius: 15, wingState: 0 };
        let pipes = [];
        let score = 0;
        let bestScore = localStorage.getItem('flappyBest') || 0;
        let frameCount = 0;
        let isGameOver = false;
        let isPlaying = false;
        
        // DOM 元素
        const startScreen = document.getElementById('start-screen');
        const gameOverScreen = document.getElementById('game-over-screen');
        const currentScoreDisplay = document.getElementById('current-score');
        const finalScoreEl = document.getElementById('final-score');
        const bestScoreEl = document.getElementById('best-score');
        
        // 输入处理
        function handleInput(e) {
            if (!isPlaying) {
                if (isGameOver) {
                    resetGame();
                }
                startGame();
            } else if (!isGameOver) {
                bird.velocity = JUMP_STRENGTH;
                bird.wingState = 1;
            }
        }
        
        document.addEventListener('keydown', (e) => {
            if (e.code === 'Space' || e.code === 'ArrowUp') handleInput(e);
        });
        canvas.addEventListener('pointerdown', handleInput);
        
        function startGame() {
            isPlaying = true;
            isGameOver = false;
            startScreen.classList.add('hidden');
            currentScoreDisplay.classList.remove('hidden');
            resetGameState();
            gameLoop();
        }
        
        function resetGameState() {
            bird = { x: 50, y: 200, velocity: 0, radius: 15, wingState: 0 };
            pipes = [];
            score = 0;
            frameCount = 0;
            currentScoreDisplay.textContent = score;
        }
        
        function resetGame() {
            isGameOver = false;
            isPlaying = true;
            gameOverScreen.classList.add('hidden');
            currentScoreDisplay.classList.remove('hidden');
            resetGameState();
            gameLoop();
        }
        
        function spawnPipe() {
            const minHeight = 100;
            const maxHeight = 400;
            const topHeight = Math.floor(Math.random() * (maxHeight - minHeight + 1) + minHeight);
            const bottomHeight = canvas.height - topHeight - PIPE_GAP;
            
            pipes.push({
                x: canvas.width,
                topHeight: topHeight,
                bottomY: topHeight + PIPE_GAP,
                bottomHeight: bottomHeight,
                passed: false
            });
        }
        
        function update() {
            if (isGameOver) return;
            
            // 小鸟物理
            bird.velocity += GRAVITY;
            bird.y += bird.velocity;
            
            // 地板/天花板 碰撞
            if (bird.y + bird.radius > canvas.height || bird.y - bird.radius < 0) {
                gameOver();
                return;
            }
            
            // 管道生成
            frameCount++;
            if (frameCount % PIPE_SPAWN_RATE === 0) {
                spawnPipe();
            }
            
            // 管道移动与碰撞
            for (let i = pipes.length - 1; i >= 0; i--) {
                const pipe = pipes[i];
                pipe.x -= PIPE_SPEED;
                
                // 移除画面外的管道
                if (pipe.x + 60 < 0) {
                    pipes.splice(i, 1);
                    continue;
                }
                
                // 碰撞检测（简化的矩形-圆形）
                const pipeWidth = 60;
                const pipeX = pipe.x;
                const pipeLeft = pipeX;
                const pipeRight = pipeX + pipeWidth;
                
                // 小鸟为圆，管道为矩形
                const birdLeft = bird.x - bird.radius + 4; // +4 用于喙的偏移
                const birdRight = bird.x + bird.radius + 2;
                const birdTop = bird.y - bird.radius;
                const birdBottom = bird.y + bird.radius;
                
                // 水平碰撞检查
                if (birdRight > pipeLeft && birdLeft < pipeRight) {
                    // 顶部管道碰撞
                    if (birdTop < pipe.topHeight) {
                        gameOver();
                        return;
                    }
                    // 底部管道碰撞
                    if (birdBottom > pipe.bottomY) {
                        gameOver();
                        return;
                    }
                }
                
                // 计分
                if (pipe.x + pipeWidth < bird.x && !pipe.passued) {
                    pipe.passed = true;
                    score++;
                    currentScoreDisplay.textContent = score;
                }
            }
            
            // 翅膀动画
            if (bird.wingState > 0) {
                bird.wingState = (bird.wingState + 0.2) % 2;
            }
        }
        
        function draw() {
            // 清除画布
            ctx.clearRect(0, 0, canvas.width, canvas.height);
            
            // 绘制管道
            pipes.forEach(pipe => {
                // 顶部管道
                ctx.fillStyle = '#46c';
                ctx.fillRect(pipe.x, 0, 60, pipe.topHeight);
                ctx.fillStyle = '#34a';
                ctx.fillRect(pipe.x, pipe.topHeight - 20, 60, 20); // 管帽
                
                // 底部管道
                ctx.fillStyle = '#46c';
                ctx.fillRect(pipe.x, pipe.bottomY, 60, canvas.height - pipe.bottomY);
                ctx.fillStyle = '#34a';
                ctx.fillRect(pipe.x, pipe.bottomY - 20, 60, 20); // 管帽
            });
            
            // 绘制小鸟（带喙的圆）
            ctx.fillStyle = '#e3bc4e';
            ctx.beginPath();
            ctx.arc(bird.x, bird.y, bird.radius, 0, Math.PI * 2);
            ctx.fill();
            
            // 嘴喙
            ctx.fillStyle = '#e04c4c';
            ctx.beginPath();
            ctx.moveTo(bird.x + bird.radius - 4, bird.y - 4);
            ctx.lineTo(bird.x + bird.radius + 10, bird.y);
            ctx.lineTo(bird.x + bird.radius - 4, bird.y + 4);
            ctx.fill();
            
            // 眼睛
            ctx.fillStyle = 'black';
            ctx.beginPath();
            ctx.arc(bird.x + 5, bird.y - 6, 3, 0, Math.PI * 2);
            ctx.fill();
            
            // 翅膀
            ctx.fillStyle = '#c4a';
            ctx.beginPath();
            ctx.ellipse(bird.x - 5, bird.y + 5, 10, 6, 0, 0, Math.PI * 2);
            ctx.fill();
        }
        
        function gameOver() {
            isGameOver = true;
            isPlaying = false;
            
            // 更新最高分
            if (score > bestScore) {
                bestScore = score;
                localStorage.setItem('flappyBest', bestScore);
            }
            
            // 显示游戏结束画面
            currentScoreDisplay.classList.add('hidden');
            gameOverScreen.classList.remove('hidden');
            finalScoreEl.textContent = score;
            bestScoreEl.textContent = bestScore;
        }
        
        function gameLoop() {
            if (!isPlaying) return;
            
            update();
            draw();
            requestAnimationFrame(gameLoop);
        }
        
        // 初始绘制
        draw();
    </script>
</body>
</html>

✨ 使用 llama-server 和 OpenAI 的 completion 库部署

使用 --kv-unified 可以使 llama.cpp 的推理服务更快！参见 https://www.reddit.com/r/LocalLLaMA/comments/1qnwa33/glm_47_flash_huge_performance_improvement_with_kvu/

按照 Kimi K2.5安装 llama.cpp 后，您可以使用以下命令启动兼容 OpenAI 的服务器：

LLAMA_SET_ROWS=1 ./llama.cpp/llama-server \
    --model unsloth/Kimi-K2.5-GGUF/UD-TQ1_0/Kimi-K2.5-UD-TQ1_0-00001-of-00005.gguf \
    --special \
    --alias "unsloth/Kimi-K2.5" \
    --min_p 0.01 \
    --ctx-size 16384 \
    --port 8001 \
    --kv-unified

然后在 pip install openai :

from openai import OpenAI
import json
openai_client = OpenAI(
    base_url = "http://127.0.0.1:8001/v1",
    api_key = "sk-no-key-required",
)
completion = openai_client.chat.completions.create(
    completion = openai_client.chat.completions.create(
    model = "unsloth/Kimi-K2.5",
)
print(completion.choices[0].message.content)

print(completion.choices[0].message.content)

我们将得到：

📊 基准测试

您可以在下方以表格形式查看更多基准：

基准

Kimi K2.5

GPT-5.2

Claude 4.5 Opus

Gemini 3 Pro

DeepSeek V3.2

Qwen3-VL-235B-A22B-Thinking

30.1

34.5

30.8

37.5

HLE-Full

25.1†

50.2

45.5

43.2

45.8

HLE-Full（含工具）

40.8†

96.1

100

92.8

95.0

93.1

AIME 2025

95.4

99.4

92.9*

97.3*

92.5

HMMT 2025（2 月）

81.8

86.3

78.5*

83.1*

78.3

IMO-AnswerBench

87.6

92.4

87.0

91.9

82.4

GPQA-Diamond

87.1

86.7*

89.3*

90.1

85.0

图像与视频

基准

Kimi K2.5

GPT-5.2

Claude 4.5 Opus

Gemini 3 Pro

DeepSeek V3.2

MMMU-Pro

78.5

79.5*

74.0

81.0

69.3

CharXiv（RQ）

77.5

82.1

67.2*

81.4

66.1

MathVision

84.2

83.0

77.1*

86.1*

74.6

MathVista（迷你）

90.1

82.8*

80.2*

89.8*

85.8

ZeroBench

ZeroBench（含工具）

12*

OCRBench

92.3

80.7*

86.5*

90.3*

87.5

OmniDocBench 1.5

88.8

85.7

87.7*

88.5

82.0*

InfoVQA（验证集）

92.6

84*

76.9*

57.2*

89.5

SimpleVQA

71.2

55.8*

69.7*

56.8*

WorldVQA

46.3

28.0

36.8

47.4

23.5

VideoMMMU

86.6

85.9

84.4*

87.6

80.0

MMVU

80.4

80.8*

77.3

77.5

71.1

MotionBench

70.4

64.8

60.3

70.3

VideoMME

87.4

86.0*

88.4*

79.0

LongVideoBench

79.8

76.5*

67.2*

77.7*

65.6*

LVBench

75.9

73.5*

63.6

编程

基准

Kimi K2.5

GPT-5.2

Claude 4.5 Opus

Gemini 3 Pro

DeepSeek V3.2

SWE-Bench 已验证

76.8

80.0

80.9

76.2

73.1

SWE-Bench 专业版

50.7

55.6

55.4*

SWE-Bench 多语言

73.0

72.0

77.5

65.0

70.2

终端基准 2.0

50.8

54.0

59.3

54.2

46.4

PaperBench

63.5

63.7*

72.9*

47.1

CyberGym

41.3

50.6

39.9*

17.3*

SciCode

48.7

52.1

49.5

56.1

38.9

OJBench（C++）

57.4

54.6*

68.5*

54.7*

LiveCodeBench（v6）

85.0

82.2*

87.4*

83.3

长上下文

基准

Kimi K2.5

GPT-5.2

Claude 4.5 Opus

Gemini 3 Pro

DeepSeek V3.2

Longbench v2

61.0

54.5*

64.4*

68.2*

59.8*

AA-LCR

70.0

72.3*

71.3*

65.3*

64.3*

智能代理搜索

基准

Kimi K2.5

GPT-5.2

Claude 4.5 Opus

Gemini 3 Pro

DeepSeek V3.2

BrowseComp

60.6

65.8

37.0

37.8

51.4

BrowseComp（含上下文管理）

74.9

65.8

57.8

59.2

67.6

BrowseComp（代理群）

78.4

WideSearch（项-f1）

72.7

76.2*

57.0

32.5*

WideSearch（项-f1 代理群）

79.0

DeepSearchQA

77.1

71.3*

76.1*

63.2*

60.9*

FinSearchCompT2&T3

67.8

66.2*

49.9

59.1*

Seal-0

57.4

45.0

47.7*

45.5*

49.5*

注释

* ＝分数由作者重新评估（之前未公开）。
† ＝ DeepSeek V3.2 的分数对应其仅文本子集（如脚注所述）。
- ＝未评估 / 不可用。

上一页GLM-4.7-Flash 下一页GLM-5

最后更新于19天前

这有帮助吗？

hashtag⚙️ 推荐需求

hashtag🥝 运行 Kimi K2.5 指南

hashtagKimi K2.5 与 Kimi K2 Thinking 的差异

hashtag🌙 使用指南：

hashtagKimi K2.5 的聊天模板

hashtag✨ 在 llama.cpp 中运行 Kimi K2.5

hashtag✨ 使用 llama-server 和 OpenAI 的 completion 库部署

hashtag📊 基准测试

hashtag您可以在下方以表格形式查看更多基准：

hashtag图像与视频

hashtag编程

hashtag长上下文

hashtag智能代理搜索

hashtag注释

⚙️ 推荐需求

🥝 运行 Kimi K2.5 指南

Kimi K2.5 与 Kimi K2 Thinking 的差异

🌙 使用指南：

Kimi K2.5 的聊天模板

✨ 在 llama.cpp 中运行 Kimi K2.5

✨ 使用 llama-server 和 OpenAI 的 completion 库部署

📊 基准测试

您可以在下方以表格形式查看更多基准：

图像与视频

编程

长上下文

智能代理搜索

注释