Prisoner's Dilemma

Created Nov 11, 2000 | Updated Jan 28, 2002

Some of this text has been incorporated from an unedited article ( A248483 A248483 ) by another researcher ( U107233 Serendipity )

Irrationality is the square root of all evil...

- Douglas Hofstadter

Prisoner's Dilemma will be familiar to many people from the extended treatment Richard Dawkins gives it in his classic popular science work The Selfish Gene. It is a fairly simple problem which has, all the same, exercised - and exasperated - the minds of people drawn from diverse fields such as political science, economics, social psychology, and philosophy, since it first came to prominence in the 1950s. The basic problem at the heart of the dilemma is the question, 'How can co-operation emerge among rational, self-interested individuals without there being any form of central authority imposed on them?' In other words, it can be seen as an attempt to find a secular, rational alternative to old-fashioned 'top-down' moral codes such as those of religious doctrines.

The term is used to refer to any situation in which there appears to be a conflict between the rational individual's self-interest and the common good. The basic premise underpinning the Prisoners Dilemma is the Darwinian insight that human beings are essentially selfish creatures genetically programmed to place their own survival above all other considerations. However, as we must all know, an individual who works against the 'common good' can in fact be undermining the very foundations on which his/her own self-interest can thrive - an example of this being the continued short-sighted waste of the planet's resources by us as individuals, without taking the wider view that, since everyone else is doing the same, there may soon be little left of the planet for us to live on at all.

Brief outline of the Prisoner's Dilemma

The Prisoner's Dilemma has conventionally been illustrated by means of an example involving two prisoners trying to decide whether or not they should inform on one another. These two prisoners find themselves in jail in separate cells awaiting trial, having been caught and charged for their participation in the same crime. The police go to each prisoner in turn and offer them the same deal - if you inform on your friend, we will see to it that you get a shorter sentence. Both men know that precisely the same deal will have been offered to their partner-in-crime; however, neither man knows for certain, or has any way of finding out, which decision the other will make. This is the crux of the problem - the outcome of either prisoner's decision depends in part on the decision made by the other prisoner, which decision the other has no way of knowing for certain in advance.

There are four possible outcomes. If they both stick to their story and refuse to talk (ie, they 'co-operate' with one another), the law will have a hard time pinning anything on either of them and therefore they will both benefit - they will both end up with, at worst, a minimal sentence. However, each prisoner knows that, if he 'co-operates' while the other 'defects' (ie, turns informer), he will end up losing heavily, because in effect he will be doing the sentence for both of them - this outcome is known as the 'sucker's payoff'. Likewise, the other prisoner is aware that the same thing could happen to his disadvantage, if he keeps quiet while the other prisoner turns informer. The most likely outcome, then, if both men are rational, will be for both of them to inform and therefore both will 'lose', but each loses less than they would have done if they had got the sucker's payoff by keeping quiet while the other informed.

For those who find the prisoner example a little obscure, perhaps it is better explained in the form of a simple game. Two players face one another each with two cards in their hand, one of which says 'co-operate' and the other says 'defect'. Each player has to lay one of their two cards at the same time as the other player also lays one of their cards. Neither player has any way of knowing which choice the other will make. The point of the game is, not to eliminate the opponent, but simply to accrue as many points as possible for oneself. This is not a 'zero-sum' game such as chess - success does not depend on the failure of one's opponent, but rather on one's ability to adapt appropriately to their behaviour. The game is 'iterated' - that is, there will be a series of rounds rather than it simply being a one-off event. The four possible outcomes for each move are essentially the same as for the situation involving the two prisoners, outlined above. These outcomes are shown in the following table. For Player 1 read across the table, for Player 2 read down¹ :-

	CO-OPERATE	DEFECT
CO-OPERATE	R=3, R=3 Reward for mutual co-operation	S=0, T=5 Sucker's payoff, and temptation to defect
DEFECT	T=5, S=0 Temptation to defect, and sucker's payoff	P=1, P=1 Punishment for mutual defection

The optimum outcome for both players is mutual co-operation, for which either player gets a reward (R) of 3 points. However, both players have the temptation (T) of knowing that, if they defect while the other co-operates, they will score 5 points while the other player gets the sucker's payoff (S) - no points at all. Therefore, if the game is being played between two rational players, the logical outcome - bearing in mind that neither player knows what the other is going to do - is that both will defect, and therefore they will both end up with P - only 1 point apiece. This, clearly, is by no means the most favourable outcome for either player. In fact, the most favourable outcome is for the two players to consistently co-operate with one another. However, two rational players will accept the results of mutual defection because of the possibility of an even worse outcome if they do not. Therefore both players, as a logical consequence of the rational pursuit of their own self-interest, end up with less than they would have got if they had been able to trust one another enough to co-operate.

So does this mean we're all doomed?

Maybe, maybe not. At any rate, it will be a disagreeable conclusion for anyone still prey to the notion that the human being is somehow distinct from other animals by virtue of having a 'soul', or some other kind of 'spirit' or innate 'moral sense'. Even for the open-minded sceptical type, however, it is a problematic conclusion. After all, in the real world, most of us will probably have some idea of what it feels like to get the sucker's payoff from time to time. Not nice - you do everyone else's work for them, and they end up with most of the rewards. Then you get the blame when everything goes wrong, which usually happens about two minutes after you stormed off in protest at your shabby treatment. So it would seem to be in most of our interests to see if there is some rational way of dealing with the problem.

The first thing to bear in mind is that Nature is not simply governed by a brutal 'survival of the fittest' ethic. Actually, in Nature, there is quite a surprisingly high amount of co-operation between members of a species, and also between members of different species. In other words, survival in Nature is not all about 'dog eat dog'. Many species have evolved sophisticated co-operative techniques which enable all participating parties to benefit. An example of this is mentioned in Matt Ridley's book The Origins of Virtue. Shoals of fish often stop off at specific points where they know there will be smaller fish waiting to 'clean' them of parasites. The benefit is mutual - the larger fish gets cleaned, the smaller fish get something to eat. However, from the point of view of the larger fish, the benefit would appear to be greater still if it were to simply eat the smaller fish after the latter has done its job. After all, it does not need the fish anymore, now it has done its work - so why not give it the sucker's payoff and take all the reward for oneself, in other words eat the poor creature?

The answer has to be simply that the larger fish knows that, if it were to 'defect' in this way, it would suffer for it later on because other smaller fish, once they got wind of what had happened to one of their number, would no longer be so inclined to provide their useful service to that particular larger fish - they would, in other words, isolate the defector and make it more difficult for it to get back into the 'game' another time. So here we see the policy of reciprocity (or, 'Tit-for-Tat') being enacted in real life - these fish have a mutual understanding between one another to act, not just purely for their own benefit, but also with the interests of the wider fish community in mind.

Tit-for-Tat

At any rate, all is not lost. In the late 1970s a political scientist named Robert Axelrod set about trying to find a rational solution to the Prisoner's Dilemma. He invited people within academia to send in computer programs containing a strategy for coping with the dilemma. He then pitted these programs against one another in a series of virtual Prisoner's Dilemma 'tournaments'. Numerous people took part from a wide range of disciplines, including psychology, philosophy, biology, and mathematics.

Axelrod found that the most effective strategies were, almost invariably, 'nice' ones - that is, they tended towards co-operation rather than defection. Furthermore, and rather pleasingly, the strategy that kept coming out on top was also the most straightforward, involving nothing more complex than a child's game of 'Tit-for-Tat'. You simply do what your opponent did on the previous move. However, for your first move, you always take the risk of co-operating before having any knowledge of what your opponent is going to do. This calculated risk is worth taking because if your opponent also co-operates you will be in a (potentially long-lasting) win-win situation, and everyone involved will go home as happy as can be. If he defects - well, fair enough, you get the sucker's payoff once, but next time you will be wise to your opponent and will know to defect from then on if necessary.

Many of the worst performers were those strategies which attempted to exploit the weak points of others - for example, a strategy which always defected when confronted with a program that always co-operated. Such 'nasty' (ie, exploitative) strategies, while they often started out looking successful, would soon begin to reveal themselves as self-undermining because, of course, the easy prey they were feeding off was destined to be knocked out early on in the tournament. Thus, it became increasingly difficult for those 'nasty' strategies to find suckers to exploit, and consequently they themselves also tended to disappear as the tournament progressed to its later stages.

Axelrod argued that his experiment demonstrated that co-operation can evolve organically, from within a system of interactions, without having to be imposed by some or other external authority. This would seem to be the case in real life, even in situations where there is no particular trust or spirit of 'friendship' between people. Axelrod cites the example of British and German soldiers during the First World War, many of whom adopted a (highly unofficial) policy of 'live and let live' during quiet moments in trench warfare. This basically involved leaving the other side alone unless provoked into defensive action, and was a direct contravention of orders imposed from above.

The encouraging thing about 'Tit-for-Tat' is that it does not require any particular intellect or even self-awareness to be able to play it. Even lowly life forms such as bacteria are to some degree capable of playing out a version of this strategy, so small are the requirements. All that is required is that the entity 'playing' the 'game' should need to be able to interact with at least one other entity, and that it is capable of responding to the last action made by the other player². So we clever humans should be able to figure it out too...

Problems with the 'Tit-for-Tat' approach

However, Tit-for-Tat is not a panacea for all evils. One very obvious question which an attentive reader may already have thought of, is 'If it works so effectively, and evolves so inevitably, how come we do not now live in a world full of organisms that have evolved to live in a state of near-complete harmony with one another?' Obviously, the world isn't really like that. As Ridley points out, while some animals do use the 'Tit-for-Tat' strategy, most do not. While reciprocity seems to be prominent among human beings, and some other species, the truth is that in nature it occurs far from universally.

Another problem with Tit-for-Tat is that it requires stability over an extended time period to work effectively. In other words, it is only any use in an iterated Prisoner's Dilemma-type game, in a situation where interactions between people are repeated. In a one-off situation, 'Tit-for-Tat' simply cannot work, because it is plain common sense to defect if one knows that one will never come across this particular situation again - simply because it is unlikely that the other player will ever have a chance to return the disfavour. Stability and longevity are features which tend to be in rather short supply, in our globalised laissez-faire economy, and it is perhaps accurate to say that, for this among other reasons, Tit-for-Tat is in itself insufficient for the evolution of co-operative behaviour among people.

Even on the theoretical level, 'Tit-for-Tat' is no universal solution. Most problematic of all is that, if left to its own devices, it can evolve into other strategies which are less conducive to co-operation. Or it can lead to situations where non-cooperative strategies can once again begin to flourish. For example, as Ridley notes, if two Tit-for-Tat players come across one another, it only needs a single accidental defection, or a misunderstanding of some sort, for the players to become locked in a perpetual cycle of mutual defection - one defects, the other defects in retaliation, and this continues ad infinitum because neither player has the mechanism to break out of the cycle! Worse still, as Ridley also notes, is that, in an environment in which everyone is accustomed to co-operating, things can degenerate easily into naive 'always co-operate' strategies, which then leave the territory ripe for exploitation by unscrupulous defectors. So, paradoxically, the 'nicest' strategies, if left to their own devices, can pave the way for the return to prominence of the 'nastiest' ones!...

Superrationality

It would seem, then, that we need to look beyond the merely 'technical' level if we wish to solve the Prisoner's Dilemma. However, this is not to say that rationality is no use to us but that, rather, our conventional understanding of what it means to be 'rational' is simply too narrow and needs broadening a little. Thus, we come back to the quote at the start of this article. Irrationality, says Douglas Hofstadter, is the 'square root' - ie, the cause - of all evil. Fair enough - but how can we distinguish, once and for all, what is rational from what is irrational? A possible answer lies in Hofstadter's concept of superrationality - that is, looking outside one's own decision and taking into account the decisions of others too, and consequently making the decision that one would hope they would also make. In other words, the 'superrationalist' thinks 'globally' - in the wider interest - rather than 'locally', simply with his/her own interest in mind.

The simple question to ask, when confronted with a Prisoner's Dilemma-type situation, is 'Which world would I prefer to live in - which is more in my interests?' A world in which all rational people recognised that to co-operate is more rational than to defect, or a world in which people get stuck at the point that says defecting is more rational in the short-term? The truth is that the latter world would soon become (as this world is progressively becoming) uninhabitable. No one would be able to have any trust in anyone or anything at all - therefore, to choose to defect, even in a one-off Prisoner's Dilemma-type situation, is ultimately an irrational choice because one is undermining the very foundations of reason on which one depends and hopes to live with. So the rational thing to do is to make the leap to that second, higher level of thought, and assume that one is dealing with other people who are also rational enough to make this leap.

As an illustration of this, consider the following example. Imagine you are on holiday hiking through a lovely part of the world, some unspoilt green area that the masses haven't yet got their hands on. You stop for a picnic, in the process of which, naturally, you generate a certain amount of litter. "Why bother to clear it up?" is the thought that might flash through the average mind. "I'll never be coming back here, and it'll probably all be ruined by next summer anyway. Besides, it's only a few bits and pieces." But, of course, you do take your litter home with you because you know that, if you leave the place in a mess, it will also discourage others from respecting the natural beauty of the area. You also know that, if you had come across such a mess yourself, it would have made your holiday a little less enjoyable. This, then, is the 'superrational' approach - only by living the value of superrationality can we expect our fellow hitch-hikers to live it also. The more superrational we become, the more superrational we can expect our fellow travellers to be.

Other examples spring to mind. The decision to turn your heating down a notch, putting on an extra pullover instead, is a superrational decision. Hearing a rumour that there is going to be a shortage of some commodity, coffee for example, and therefore buying a little bit less than normal, rather than stocking up and actually helping to bring about the rumoured shortage, is a superrational decision - because, by taking account of the collective good, your superrational choice will eventually be reflected back to work for your own good. You hope. The fear, of course, is that those people who are addicted to their caffeine in a serious way will panic and hoard up on supplies, clearing the shelves so that next time around you will go short. And it is that fear which exerts such a strong grip over our mind, making us want to buy in bulk too, in order to guarantee our own supply. Just like the Prisoner's Dilemma game, the overriding thought is that you can only be worse off as a result of co-operating. As we saw previously, the rational case for defection seems to be overwhelming. But, far from being rational, defecting is thoroughly irrational. We can only promote sanity with our own sane behaviour.

Only by promoting 'superrationality' - ie, by making the choice ourselves - will we be able to make the choice with any confidence, because one thus makes it more likely that other people will co-operate. After all, as Hofstadter points out, in a game played between truly (super) rational thinkers, choosing to defect "undermines your reasons for doing so" - that is to say, if you suspect that all of the others will behave as you behave, then logically you are saying that they are likely to co-operate with you, and therefore reason says that between truly rational (ie, superrational) people the only rational thing to do is to co-operate³.

Conclusion

No solution is perfect. After all, you can not necessarily be sure that the people you are dealing with are sufficiently rational to understand the principles involved. But, as Hofstadter says, once the principle of superrationality has been established in a person's mind, there is no reason to suppose that a rational thinker will deviate from it, just as there is no reason to suppose that a person who has been taught basic mathematics will ever conclude that 2+2=5. It is a simple principle that, in theory, everyone can learn - and the more people who learn it, the better it is for all of us.

The attempts of Axelrod, Hofstadter, et al, to solve the Prisoner's Dilemma logically, may seem a little simplistic to some and possibly rather too optimistic in their apparent faith that logic can indeed solve all problems eventually. At any rate, though, they stand as commendable enough attempts to try to think through the problems of living together in a complex world, without retreating into the superstition of a pre-scientific age.

References

Robert Axelrod The Evolution of Co-operation (1990, Penguin. First published 1984)

Richard Dawkins The Selfish Gene 2nd edition (1989, Oxford University Press. First published 1976)

Douglas R. Hofstadter Ch.29 and Ch.30 from Metamagical Themas : questing for the essence of mind and pattern (1986, Penguin, First published 1985)

Matt Ridley The Origins of Virtue (1997, Penguin. First published 1996)

¹See Axelrod, p.8 for the original version of this table²See Axelrod, Ch.5; also Hofstadter, p.729.³See Hofstadter, pp.746-8.

h2g2 The Hitchhiker's Guide to the Galaxy: Earth Edition

Find h2g2 Entries:

The
Hitchhikers Guide
To The Galaxy

Earth Edition

Prisoner's Dilemma

Brief outline of the Prisoner's Dilemma

So does this mean we're all doomed?

Tit-for-Tat

Problems with the 'Tit-for-Tat' approach

Superrationality

Conclusion

References

Conversations About This Entry

Title

Latest Post

Entry

Infinite Improbability Drive

Read a random Edited Entry

Written and Edited by

Credits

References

h2g2 Entries

Disclaimer

Write an Entry

Help

About Us

Contact Us

Follow Us

Statistics

Other Stuff

h2g2 The Hitchhiker's Guide to the Galaxy: Earth Edition

Find h2g2 Entries:

TheHitchhikers GuideTo The Galaxy

Earth Edition

Brief outline of the Prisoner's Dilemma

So does this mean we're all doomed?

Tit-for-Tat

Problems with the 'Tit-for-Tat' approach

Superrationality

Conclusion

References

Conversations About This Entry

Title

Latest Post

Entry

Infinite Improbability Drive

Read a random Edited Entry

Written and Edited by

Credits

References

h2g2 Entries

Disclaimer

Write an Entry

Help

About Us

Contact Us

Follow Us

Statistics

Other Stuff

The
Hitchhikers Guide
To The Galaxy