The board from move 37 of the second AlphaGo–Lee Sedol game (2016), in three colors, overlaid with the canonical reinforcement-learning loop: state → action → reward. Beneath, in fine print: *reward hacking since 1957.
The year refers to Arthur Samuel's checkers program — the first piece of software to teach itself a game, and the first to discover that winning by any available means is what reward functions tend to produce.