DQN vs Double DQN vs Dueling DQN: Atari Breakout Benchmark

#dqn #doubledqn #duelingdqn #reinforcementlearnin

The Overestimation Problem Cost Me 40% Performance

Vanilla DQN scored 287 average reward on Breakout after 10M frames. Double DQN hit 412. Dueling DQN reached 438.

That's not just a numbers game. The gap between vanilla and Double DQN represents the cost of Q-value overestimation bias — a silent failure mode that takes hours to surface in training curves. I ran all three variants on the same hardware (RTX 3080, Gymnasium 0.29.1, Python 3.11) with identical hyperparameters to isolate the architectural differences. Here's what actually breaks and why.