Massive neural networks trained on internet-scale data — the engines under every modern AI app. Foundation models are pre-trained once, then adapted to thousands of downstream tasks.
← Back to AI Landscape| Family | Maker | Open? | Strengths |
|---|---|---|---|
| GPT (4o, 4.1, 5) | OpenAI | Closed | Multimodal, broad capability, ChatGPT. |
| Claude (Sonnet, Opus, Haiku) | Anthropic | Closed | Long context, coding, safety, agentic workflows. |
| Gemini (Pro, Flash, Ultra) | Google DeepMind | Closed | Native multimodal, huge context windows, Workspace integration. |
| Llama (3.x, 4) | Meta | Open weights | Most-used open model; runs anywhere. |
| Mistral / Mixtral | Mistral AI | Mixed | European, efficient, MoE architecture. |
| DeepSeek | DeepSeek | Open weights | Strong reasoning at fraction of training cost. |
| Qwen | Alibaba | Open weights | Multilingual, strong on Asian languages. |
| Grok | xAI | Mixed | X (Twitter) integration, real-time data. |
Almost all modern foundation models are transformers — a 2017 architecture from Google ("Attention Is All You Need"). The core idea: self-attention lets the model weigh how every token relates to every other token, in parallel. This scales beautifully with compute.
A new generation (OpenAI o-series, Claude with extended thinking, Gemini 2.0 Flash Thinking, DeepSeek R1) spends extra compute on a private "thinking" pass before answering — vastly better at math, code, and multi-step problems, at higher latency & cost.
Modern models accept images, audio, and video alongside text. Native multimodal models (GPT-4o, Gemini, Claude) are trained jointly on all modalities; older systems bolt vision encoders on top.