AI Glossary

What is Reinforcement Learning?

Reinforcement Learning (RL) is a machine learning paradigm where an agent learns optimal behavior by interacting with an environment and receiving rewards or penalties. In AI language models, RLHF (Reinforcement Learning from Human Feedback) is used to align model outputs with human preferences after pre-training. RL has also driven breakthroughs in game playing (AlphaGo, AlphaZero), robotics, and autonomous systems. Recent developments include RLAIF (RL from AI Feedback), DPO (Direct Preference Optimization) as a simpler alternative to PPO-based RLHF, and RL-based reasoning training that enables models to "think" before answering.
Related Terms
Related Articles
This AI weather startup is out-forecasting government agencies AI Startup Recursive Superintelligence Emerges from Stealth Breakthrough in AI Coding: 'Vibe Coding' Revolutionizes App Development Decoding AI: New Glossary Breaks Down Tech Jargon Decoding AI Lingo: A Guide to Key Terms
Frequently Asked Questions

What is reinforcement learning?

Reinforcement learning is a type of machine learning where an agent learns by trial and error, receiving rewards for good actions and penalties for bad ones, optimizing behavior over time.

What is RLHF?

RLHF (Reinforcement Learning from Human Feedback) trains AI models to produce outputs that humans prefer. It is a key technique used to align LLMs like ChatGPT and Claude.

All Glossary Terms
Large Language ModelRetrieval-Augmented GenerationFine-TuningTransformerPrompt EngineeringHallucinationTokenEmbeddingVector DatabaseInferenceGPTDiffusion ModelMultimodal AIContext WindowAgentic AIModel Context ProtocolTool UseChain-of-ThoughtDistillation