How the DeepSeek-R1 AI model was taught to teach itself to reason | Explained
Reinforcement learning alone, with the right design, could produce reasoning behaviour that was previously thought to require human examples