Mixtral 8x22B

Mistral's sparse MoE model - efficiency at scale

Last updated: May 22, 2026

Mistral AI Mistral AI
πŸ“… Released: April 2024 πŸ†“ Open Weights MOE ARCHITECTURE

Overview

Mixtral 8x22B uses a Mixture of Experts (MoE) architecture to deliver flagship performance with improved efficiency. Only 39B parameters are active per token, making it faster than dense models of similar capability.

Architecture
8x22B MoE (39B active)
Context
64K tokens
License
Apache 2.0
Access
Open weights + API

βœ…Strengths

  • βœ“Open weights - self-host or use API
  • βœ“Efficient MoE architecture
  • βœ“Strong multilingual performance
  • βœ“Good code generation capabilities
  • βœ“Permissive Apache 2.0 license

⚠️Weaknesses

  • βœ—Requires significant VRAM for self-hosting
  • βœ—Older than Mistral Large 2
  • βœ—Complex architecture harder to optimize
  • βœ—Less efficient than newer MoE designs

Best Use Cases

🏒 Self-Hosted AI

Enterprise deployment

πŸ”¬ Research

MoE architecture study

🌐 Multilingual

European languages

πŸ’» Code

General development

πŸ“Š Analysis

Document processing

πŸŽ“ Education

AI research

Benchmarks

MMLU85.5%
HumanEval82.8%
GSM8K88.4%

Other Mistral Models

πŸš€ Try Mixtral 8x22B β†’