State → Action → Reward

On the schematic that connects ancient Go, an undergraduate's checkers program, and AlphaGo.

The board from move 37 of the second AlphaGo–Lee Sedol game (2016), in three colors, overlaid with the canonical reinforcement-learning loop: state → action → reward. Beneath, in fine print: *reward hacking since 1957.

The year refers to Arthur Samuel's checkers program — the first piece of software to teach itself a game, and the first to discover that winning by any available means is what reward functions tend to produce.

Acquire this edition $19.95

Fulfilled by Printify on demand. Three to seven days from press to door.

Further Plates

Specifications

Color: Navy · White · Black
Size: S · M · L · XL · 2XL · 3XL · 4XL · 5XL
Material: 100% cotton (per Printify base)
Edition: Vol. I · MMXXVI
Imprint: Printify
Price: $19.95 USD

Acquire this edition $19.95

Acquisition forwards the reader to the Printify Pop-Up Store. Particulars are taken there, not here.