AI Glossary

What is Transformer?

The Transformer is a neural network architecture introduced in the 2017 paper "Attention Is All You Need" that revolutionized natural language processing and now powers virtually all modern LLMs. Its key innovation is the self-attention mechanism, which allows the model to weigh the importance of different parts of the input when processing each element. Unlike previous recurrent architectures, Transformers process all tokens in parallel, enabling much faster training on large datasets. The architecture consists of encoder and decoder stacks, though many modern LLMs use decoder-only variants. Transformers also power vision models (ViT), audio models, and multimodal systems.
Related Terms
Related Articles
LLM Engineers Must Master These 5 Key Concepts Meta AI Unveils Timer-XL: Revolutionary Time-Series Forecasting Model
Frequently Asked Questions

What is a Transformer in AI?

A Transformer is a neural network architecture that uses self-attention to process sequences in parallel. It is the foundation of all modern large language models including GPT, Claude, and Gemini.

Why are Transformers important?

Transformers enabled training on much larger datasets than previous architectures, leading to breakthrough AI capabilities in language, vision, and multimodal understanding.

All Glossary Terms
Large Language ModelRetrieval-Augmented GenerationFine-TuningPrompt EngineeringHallucinationTokenEmbeddingVector DatabaseInferenceGPTDiffusion ModelReinforcement LearningMultimodal AIContext WindowAgentic AIModel Context ProtocolTool UseChain-of-ThoughtDistillation