DEV Community

TildAlice
TildAlice

Posted on • Originally published at tildalice.io

DQN vs Double DQN vs Dueling DQN: Atari Breakout Benchmark

The Overestimation Problem Cost Me 40% Performance

Vanilla DQN scored 287 average reward on Breakout after 10M frames. Double DQN hit 412. Dueling DQN reached 438.

That's not just a numbers game. The gap between vanilla and Double DQN represents the cost of Q-value overestimation bias — a silent failure mode that takes hours to surface in training curves. I ran all three variants on the same hardware (RTX 3080, Gymnasium 0.29.1, Python 3.11) with identical hyperparameters to isolate the architectural differences. Here's what actually breaks and why.

Two fencers engaged in a duel with the scoreboard showing 8:4 indoors.

Photo by Tima Miroshnichenko on Pexels

Why Vanilla DQN Overestimates Everything

The core DQN update uses the Bellman equation to learn Q-values:

$$Q(s, a) \leftarrow r + \gamma \max_{a'} Q(s', a')$$


Continue reading the full article on TildAlice

Top comments (0)