New Tack Wins Prisoner’s Dilemma

[ For more on the Prisoner’s Dilemma, visit Serge Helfrich’s page. Thanks to Jared Childers for forwarding the article. –BL ]

Oct. 13, 2004 | Wired

by Wendy M. Grossman

Proving that a new approach can secure victory in a classic strategy game, a team from England’s Southampton University has won the 20th-anniversary Iterated Prisoner’s Dilemma competition, toppling the long-term winner from its throne.

The Southampton group, whose primary research area is software agents, said its strategy involved a series of moves allowing players to recognize each other and act cooperatively.

The Prisoner’s Dilemma is a game-theory problem for two players. As typically described, two accomplices are arrested and separated for interrogation by the police, who give each the same choice: confess to authorities (defect) or remain silent (cooperate). If one defects and the other cooperates, the defector walks free and the cooperator gets 10 years in jail. If both cooperate, both get six months. If both defect, both get six years. Neither suspect knows the other’s choice.

“The Prisoner’s Dilemma is this canonical problem of how to get cooperation to emerge from selfish agents,” said Nick Jennings, a professor in computer science at Southampton University and leader of the winning team along with his Ph.D. student, Gopal Ramchurn. “People are very keen on it because they can see so many parallels in real life.”

Before Southampton came along, a strategy called Tit for Tat had a consistent record of winning the game. Under that strategy, a player’s first move is always to cooperate with other players. Afterward, the player echoes whatever the other players do. The strategy is similar to the one nuclear powers adopted during the Cold War, each promising not to use its weaponry so long as the other side refrained from doing so as well.

The 20th-anniversary competition was the brainchild of Graham Kendall, a lecturer in the University of Nottingham’s School of Computer Science and Information Technology and a researcher in game theory, and was based on the original 1984 competition run by a University of Michigan political scientist, Robert Axelrod.

The Iterated Prisoner’s Dilemma is a version of the game in which the choice is repeated over and over again and in which the players can remember their previous moves, allowing them to evolve a cooperative strategy. The 2004 competition had 223 entries, with each player playing all the other players in a round robin setup. Because Axelrod’s original competition was run twice, Kendall will run a second competition in April 2005, for which he hopes to attract even more entries.

Teams could submit multiple strategies, or players, and the Southampton team submitted 60 programs. These, Jennings explained, were all slight variations on a theme and were designed to execute a known series of five to 10 moves by which they could recognize each other. Once two Southampton players recognized each other, they were designed to immediately assume “master and slave” roles — one would sacrifice itself so the other could win repeatedly.

If the program recognized that another player was not a Southampton entry, it would immediately defect to act as a spoiler for the non-Southampton player. The result is that Southampton had the top three performers — but also a load of utter failures at the bottom of the table who sacrificed themselves for the good of the team.

Another twist to the game was the addition of noise, which allowed some moves to be deliberately misrepresented. In the original game, the two prisoners could not communicate. But Southampton’s design lets the prisoners do the equivalent of signaling to each other their intentions by tapping in Morse code on the prison wall.

Kendall noted that there was nothing in the competition rules to preclude such a strategy, though he admitted that the ability to submit multiple players means it’s difficult to tell whether this strategy would really beat Tit for Tat in the original version. But he believes it would be impossible to prevent collusion between entrants.

“Ultimately,” he said, “what’s more important is the research.”

“What’s interesting from our point of view,” he said, “was to test some ideas we had about teamwork in general agent systems, and this detection of working together as a team is a quite fundamental problem. What was interesting was to see how many colluders you need in a population. It turns out we had far too many — we would have won with around 20.”

Jennings is also interested in testing the strategy on an evolutionary variant of the game in which each player plays only its neighbors on a grid. If your neighbors do better than you do, you adopt their strategy.

“Our initial results tell us that ours is an evolutionarily stable strategy — if we start off with a reasonable number of our colluders in the system, in the end everyone will be a colluder like ours,” he said.

The winners don’t get much — an unexpected $50 check and a small plaque. But, says Kendall, “Everybody in our field knows the name of Anatol Rapoport, who won the Axelrod competition. So if you can win the 20th-anniversary one, in our field there’s a certain historical significance.”

Leave a comment