comment-dotsHow to Run models with Unsloth Studio

Run AI models, LLMs and GGUFs locally with Unsloth Studio.

Unsloth Studio lets you run AI models 100% offline on your computer. Run model formats like GGUF and safetensors from Hugging Face or from your local files.

  • Works on all MacOS, CPU, Windows, Linux, WSL setups! No GPU required

  • Search + Download + Run any model like GGUFs, LoRA adapters, safetensors etc.

  • Compare two different model outputs side-by-side

  • Self-healing tool calling / web search, code execution and call OpenAI-compatible APIs

  • Auto inference parameter tuning (temp, top-p etc.) and edit chat templates

  • Upload images, audio, PDFs, code, DOCX and more file types to chat with.

Using Unsloth Studio Chat

Search and run models

You can search and download any model via Hugging Face or use local files.

Studio supports a wide range of model types, including GGUF, vision-language, and text-to-speech models. Run the latest models like Qwen3.5 or NVIDIA Nemotron 3.

Upload images, audio, PDFs, code, DOCX and more file types to chat with.

circle-check

Code execution

Unsloth Studio lets LLMs run Bash and Python, not just JavaScript. It also sandboxes programs like Claude Artifacts so models can test code, generate files, and verify answers with real computation.

This makes answers from models more reliable and accurate.

Auto-healing tool calling

Unsloth Studio not only allows tool calling and web search, but also auto-fixes any errors that might happen.

This means you'll always get inference outputs without broken tool calling.

E.g. Qwen3.5-4B searched 20+ websites and cited sources, with web search happening inside its thinking trace.

Auto parameter tuning

Inference parameters like temperature, top-p, top-k are automatically pre-set for new models like Qwen3.5 so you can get the best outputs without worrying about settings. You can also adjust parameters manually and edit the system prompt.

Context length adjustment is no longer necessary with llama.cpp’s smart auto context, which uses only the context you need without loading anything extra.

Chat Workspace

Enter prompts, attach any documents, images (webp, png), code files, txt, or audio as additional context, and see the model’s responses in real time.

Toggle on or off: Thinking + Web search.

Model Arena

Studio Chat lets you compare any two models side-by-side using the same prompt. E.g. compare the base model and LoRa adapter. Inference will firstly load for one model, then the second one (parallel inference is being worked on).

After training, you can compare the base and fine-tuned models side by side with the same prompt to see what changed and whether results improved.

This workflow makes it easy to see how your fine-tuning changed the model’s responses and whether it improved results for your use case.

Adding Files as Context

Studio Chat supports multimodal inputs directly in the conversation. You can attach documents, images, or audio as additional context for a prompt.

This makes it easy to test how a model handles real-world inputs such as PDFs, screenshots, or reference material. Files are processed locally and included as context for the model.

circle-check

Using old / existing GGUF models

Mar 27 update: Unsloth Studio now automatically detects older / pre-existing models downloaded from Hugging Face, LM Studio etc.

Manual instructions: Unsloth Studio detects models downloaded to your Hugging Face Hub cache (C:\Users{your_username}.cache\huggingface\hub). If you have GGUF models downloaded through LM Studio, note that these are stored in C:\Users{your_username}.cache\lm-studio\models OR C:\Users{your_username}\lm-studio\models and are not visible to llama.cpp by default - you will need to move or copy those .gguf files into your Hugging Face Hub cache directory (or another path accessible to llama.cpp) for Unsloth Studio to load them.

After fine-tuning a model or adapter in Studio, you can export it to GGUF and run local inference with llama.cpp directly in Studio Chat. Unsloth Studio is powered by llama.cpp and Hugging Face.

Deleting model files

You can delete old model files either from the bin icon in model search or by removing the relevant cached model folder from the default Hugging Face cache directory. By default, Hugging Face uses ~/.cache/huggingface/hub/ on macOS/Linux/WSL and C:\Users\<username>\.cache\huggingface\hub\ on Windows.

  • MacOS, Linux, WSL: ~/.cache/huggingface/hub/

  • Windows: %USERPROFILE%\.cache\huggingface\hub\

If HF_HUB_CACHE or HF_HOME is set, use that location instead. On Linux and WSL, XDG_CACHE_HOME can also change the default cache root.

Last updated

Was this helpful?