Mixtral 8x22B - Mistral AI

Overview

Mixtral 8x22B uses a Mixture of Experts (MoE) architecture to deliver flagship performance with improved efficiency. Only 39B parameters are active per token, making it faster than dense models of similar capability.

Architecture

8x22B MoE (39B active)

Context

64K tokens

License

Apache 2.0

Access

Open weights + API

✅Strengths

✓Open weights - self-host or use API
✓Efficient MoE architecture
✓Strong multilingual performance
✓Good code generation capabilities
✓Permissive Apache 2.0 license

⚠️Weaknesses

✗Requires significant VRAM for self-hosting
✗Older than Mistral Large 2
✗Complex architecture harder to optimize
✗Less efficient than newer MoE designs

Best Use Cases

🏢 Self-Hosted AI

Enterprise deployment

🔬 Research

MoE architecture study

🌐 Multilingual

European languages

💻 Code

General development

📊 Analysis

Document processing

🎓 Education

AI research

Benchmarks

MMLU85.5%

HumanEval82.8%

GSM8K88.4%

Other Mistral Models

Flagship

Efficient

Code specialist

12B efficient

View All Mistral Models →

🚀 Try Mixtral 8x22B →