Mistral AI just made a bold move. Instead of maintaining separate models for chat, reasoning, and vision, they've merged everything into one: Mistral Small 4. It's fast, it reasons, it sees images — and it's fully open source under the Apache 2.0 license.

For developers and teams tired of juggling multiple models for different tasks, this is a big deal.

What Is Mistral Small 4?

Mistral Small 4 is a hybrid Mixture of Experts (MoE) model that combines the capabilities of three previously separate Mistral models:

Magistral — deep reasoning
Pixtral — multimodal (text + image)
Devstral — agentic coding

Instead of choosing between a fast instruct model, a reasoning engine, or a vision assistant, you get all three in one package.

The Numbers

Spec	Detail
Architecture	MoE — 128 experts, 4 active per token
Total Parameters	119B
Active Parameters	6B per token (8B with embeddings)
Context Window	256K tokens
License	Apache 2.0
Modalities	Text + Image input

With only 6–8B parameters active per token, Mistral Small 4 is remarkably efficient for its capability level. It's not trying to be the biggest model — it's trying to be the smartest per compute dollar.

Reasoning on Demand

The standout feature is the reasoning_effort parameter. You can dynamically control how much the model "thinks" before responding:

reasoning_effort="none" — Fast, lightweight responses. Great for everyday chat, quick questions, and simple tasks. Behaves like the previous Mistral Small 3.2.
reasoning_effort="high" — Deep, step-by-step reasoning for complex problems. Equivalent to the verbosity and depth of Magistral models.

This is a practical design choice. Most conversations don't need chain-of-thought reasoning — but when you're debugging a tricky algorithm or working through a math proof, you want the model to slow down and think. Now you can toggle that on the fly.

Performance: Faster and More Efficient

Compared to Mistral Small 3, the improvements are significant:

40% reduction in end-to-end completion time (latency-optimized)
3× more requests per second (throughput-optimized)

On benchmarks, Mistral Small 4 with reasoning enabled matches or surpasses GPT-OSS 120B across LiveCodeBench, AIME 2025, and other reasoning benchmarks — while generating significantly shorter outputs.

This efficiency gap matters in practice:

Shorter outputs → lower latency and reduced inference costs
Higher accuracy per token → better value for enterprise deployments
Consistent quality → fewer retries and less manual intervention

On LiveCodeBench specifically, Mistral Small 4 outperforms GPT-OSS 120B while producing 20% less output. On reasoning benchmarks, it achieves comparable scores with 3–4× fewer characters than competing models like Qwen.

Native Multimodality

Mistral Small 4 accepts both text and image inputs natively. This opens up use cases like:

Document parsing — extract structured data from scanned documents, receipts, or forms
Visual analysis — describe charts, diagrams, or screenshots
Code from UI — generate code from design mockups or wireframes
Image-based Q&A — answer questions about photos, diagrams, or visual content

Unlike models that bolt vision onto a text backbone as an afterthought, multimodality is built into Small 4's architecture from the ground up.

Infrastructure Requirements

For teams looking to self-host:

Setup	Hardware
Minimum	4× NVIDIA HGX H100, 2× HGX H200, or 1× DGX B200
Recommended	4× NVIDIA HGX H100, 4× HGX H200, or 2× DGX B200

It's also available on vLLM, llama.cpp, SGLang, and Transformers — so you can plug it into your existing serving stack.

Don't want to manage GPU infrastructure? On Oboe Chat, we handle the serving so you can just use the model directly — pay only for the tokens you consume.

Who Should Care?

Developers

Coding automation, codebase exploration, and agentic workflows. The Devstral DNA means Small 4 is strong at understanding and generating code across languages.

Enterprises

A single model for chat assistants, document understanding, and multimodal analysis. Fewer models to manage means simpler infrastructure and lower operational overhead.

Researchers

Math, science, and complex reasoning tasks. The configurable reasoning effort lets you balance speed and depth depending on the problem.

The Bigger Picture

Mistral Small 4 represents a trend we're seeing across the industry: convergence. Instead of maintaining a zoo of specialized models, teams are building unified models that can adapt their behavior based on the task.

This is good for everyone:

Simpler deployments — one model endpoint instead of three
Lower costs — no need to route between models or maintain multiple serving stacks
Better UX — users don't need to know which model to pick for their task

Combined with the Apache 2.0 license, Mistral Small 4 is a strong contender for teams that want frontier-level capabilities without proprietary lock-in.

Try It on Oboe Chat

Mistral Small 4 is available on Oboe Chat alongside dozens of other open-source and proprietary models. No subscription required — just pick the model, start chatting, and pay only for what you use.

Whether you're using it for quick code reviews with reasoning_effort="none" or deep problem-solving with reasoning_effort="high", Mistral Small 4 adapts to your workflow.

Want to try Mistral Small 4? Head to oboe.chat and start chatting — no subscription required.