Danny Loeb

Challenges in
Multi-Player
Gaming by Computers:

A Treatise on the Diplomacy Programming Project

Dedicated to my daughter Gabrielle Jeanette Loeb


Overview

This document is organized into an Introductory section, which is followed by a mathematical evaluation of negotiation in such games. Finally, the application of this theory and research in a programming project called the Bordeaux diplomat is detailed, and the conclusions which can be drawn from this article are presented. A bibliography containing the references cited in this article is provided as an appendix.

This article is also available for downloading in DVI and LaTeX format from the web site:

http://www.labri.u-bordeaux.fr/~loeb/dpp/table.of.contents.html

Introduction

Whereas a great deal of work has been done on two-player games, multi-player games are only beginning to become understood both on a theoretical and on a practical level. In so far as one studies games in an attempt to model the real world, this is a terrible loss, for large number of participants are involved in most real world conflicts.

The game of Diplomacy was invented in 1959 by Allan Calhamer to simulate one such conflict: the First World War. This game was chosen as a test bed for our multi-player game playing theories.

As in any multi-player game, positions are extremely difficult to analyze. The mathematical machinery developed to analyze two-player games crumbles, and must be rebuilt from scratch. In the final analysis, the most important factor in evaluating a position of a multi-player game is not located on the game board, but rather in the minds of the players. For this reason, in any multi-player game negotiation becomes necessary in order to determine, and perhaps modify these attitudes.

Diplomacy is an extremely rich example of a multi-player game. The number of players allows a large number of free-wheeling alliances without degenerating into total chaos. Many heuristics exist by which to measure a player's progress to victory, and there are a large number of points which need to be negotiated during the formation and continued operation of an alliance.

Furthermore, just as real life, Diplomacy is not a sequential game. Moves must be submitted simultaneously for all players for all of their units. Thus, normal search techniques either no longer apply or get bogged down in the tremendous size of the game tree, for it is impossible to complete even a one-ply search.

Finally, Diplomacy is an extremely suitable testbed for different approaches to multi-player games, since it is sufficiently popular to encourage unrelated groups of researchers to work on the game.

According to legend, the first play-by-mail Diplomacy game (1963A) included a diplomat (diplomacy automata) by Dave McDaniel which was eliminated in 1903. (See bibliography entry "MB".) In 1982, the Avalon Hill Game Corporation marketed "Computer Diplomacy." The program is designed primarily as a game master, but has very limited playing capabilities as well.

At the 1989 Conference on Computer Games, Kraus, Ephrati, and Lehmann introduced a diplomat they developed over three years at Hebrew University (see KLE1,KLE2). Their program is very good, and negotiates like a human. It constitutes the first serious attempt at a computer Diplomacy program.

During the 1991 Conference on Computer Games, Michael Hall and the author elaborated several ideas by which a computer could play the game Diplomacy. Whereas the Hebrew University diplomat is based primarily on the negotiation, our diplomat is based primarily on the strategy of the game. See bibliography entry HL for details. The present article outlines the Diplomacy Programming Project's progress since last year and connects our ideas with a new mathematical theory developed for multi-player games. (See also bibliography entry DL.)

In order to make the article as accessible as possible, we have preferred to omit all references to the detailed rules of the game. (For these, see the bibliography entry Rules.) In fact, most of the ideas discussed below can be be applied to any multi-person conflict.

I would like to express my appreciation to Constantin Staykoff for his dedication and hard work in implementing the strategic module of the Bordeaux Diplomat, and Ken Lowe for the use of his Diplomacy Adjudicator. I would further like to thank all the members of the Diplomacy Programming Project for their encouragement and help.

Three-Player Position Evaluation

Traditional Techniques

In theory, the minimax algorithm can be used to play any two-player, deterministic, sequential-move game perfectly.

  1. A tree is drawn representing the game with the current position as root. The children of each node represent the various positions obtained after making all of the possible moves. The type of a node is given by the player whose turn it is to move.
  2. Assign position values to all leaves. Increasing values indicate better results for player A. For example, in a game with win, loss, and draw, we can assign v(win)=1, v(loss)=-1, and v(draw)=0.
  3. For each internal node, calculate the value of the position assuming perfect play on both sides. For an A-node, the value is the maximum of the children's values. For a B-node, the value is the minimum of the children's values.
  4. A should choose a move from the root node with maximum value.
However, due to time constraints, it is seldom possible to analyze the entire game tree. Therefore, game programs often use various search techniques (such as alpha-beta from bibliography entry AB) and approximative evaluation functions which analyze the position using different heuristics, and thus need not search to the end of the game.

However, the above analysis guarantees that there exists a perfect strategy to which any approximation thereof can be compared.

However, what happens when a third player is added?

For simplicity, assume that one player will win, and the other two players lose. However, similar remarks hold in the absence of such assumptions.

A game tree can still be formed as above with values assigned to the leaves. However, problems are encountered when minimaxing the values back up the tree.

Let n be an A-node in the tree whose children have all been evaluated. v(n) must now be calculated. Obviously, if n has a child whose value is equal to an A-win, then this child is a good strategy for A, and the value of n will be a win for A. Similarly, if all of the children of n are wins for say B, then n is itself a win for B. However, what happens if some of the children of n are wins for B and some are wins for C? It is then impossible to evaluate n without some psychological information regarding A's attitude to such situations. His choice of move is indifferent, yet greatly affects the strategy of the other players.

Queer Games

Jim Propp calls such positions "queer" (see bibliography entry 3PIG).

                        0+1--0+0
                       / n    p
                      /    
                   0+2--0+0
                  / n    p
                 /           
              1+2--1+1--0+1--0+0
               q \  f \  n    p         
                  \    \        
                   1+0  1+0--0+0 
                    n \  n    p
                       \        
                        0+0
                         p
        
            Figure 1: Three player nim
For example, consider the game of "Nim" played with three players. Players take turns removing stones from two piles. Any number of stones (at least one) can be removed from a single pile. The player to make the last move wins. In such a game, 0+0 is a win for the previous player. Thus, 1+0, 0+1, 2+0 and 0+2 are wins for the next player. Hence, 1+1 is a win for the following player. Therefore, the position 1+2 is queer, as it yields a choice for the next player between wins for the previous player and the following player.

Are queer positions very common? If not, then we can hope to frequently apply the traditional techniques mentioned above.

Unfortunately, given any impartial game (one in which to answer the question "What moves are possible?", you need not pose the question "Whose turn is it?" Thus, Chess is not an impartial game, since only White can move White's pieces), G, the game 2+2+G is queer. Thus, impartial games large enough to contain a 2+2 component are queer.

A simple probabilistic argument shows that virtually all games are queer. Let a,b,c,q represent the probabilities that a position is a win for any given player or that it is queer respectively. Without loss of generality, we consider games in which all internal nodes have two children. By symmetry, a=b=c. Thus, the probability to be queer q=1-3a is equal to the probability that neither choice is a win for the current player, yet the two choices are wins for different opponents (1-a)^{2} - 2a^{2}. The only solution that represents a valid probability is a=0. Thus, q=1.

Sums of Games

A traditional technique in analyzing games has been to break down the game into a sum of simpler games. That is, the game G+H is the game in which players may play in either G or H but not both. The game is over when play is no longer possible in either summand. The last player to have played is the winner.

Typically, the most powerful results have been found for impartial games. The two possible results of a two-player game are N (win for the next player), and P (win for the previous player).

  +  |     N      P
  ---+-------------
  N  |  N or P    N
  P  |     N      P
The table above gives the sums of all two-player impartial games except N+N. In fact, the sum of two N's depends on the particular games, yet a theorem of Grundy allows us without loss of generality to consider only sums of games of "Nim" with one pile. In these trivial games, the P positions are equivalent to empty piles, and N positions are equivalent to non-empty piles. The sum of two N positions is N itself if and only if the sizes of their associated "Nim" piles are distinct (see bibliography entry WinningWays).

For three-player impartial games, the possible sums are quite complex (see the table below). The sums are calculated on the premise that play rotates NFPNFPNFP.... (see 3PIG).

  +  |  P     N     F     Q
  ---+-----------------------
  P  |  PQ    NQ    FQ    Q
  N  |  NQ    NFQ   PNQ   NQ
  F  |  FQ    PNQ   NQ    NFQ
  Q  |  Q     NQ    NFQ   FQ
Only in one particular case (P+Q) is the sum predetermined. To determine the value of the sum, there are two further problems. First, there is no generalized theorem of Grundy. Grundy's proof supposes that all "reversible" moves can be ignored. However, in three-player impartial games even if P's move is reversed by N, he may have gained Zugzwang, and placed F in a difficult position. Second, as can be seen below, even the game of "Nim" is no longer trivial once there are three players.

Classifying Queer Games

In the above classification, queer games are all considered to have the same value. Indeed, it has been believed (see JVN) that all three-player games are strictly-determined or symmetric non-strictly determined. However, are queer games always symmetric? For example, suppose N could choose between two queer positions: one where F would name N or P as winner, and one where N would name F or P as winner. Perhaps, N would prefer the first choice even though both are queer games, since he would then retain some chance of winning.

The two positions above must then be given different values: f1 and n1 respectively. Obviously, p1 can be defined in a similar manner. Now, denote forced wins by f0, p0, and n0 respectively. N has the following preferences n0 > f1 , p1 > f0 , p0. He is indifferent regarding f0 and p0. A position leading to such a choice can not be simplified by his opponents, and must be labeled simply n1.

Is N necessarily indifferent between f1 and p1? Unfortunately, we can not answer this question until we clarify how a player makes a choice between equally interesting possibilities. One assumption is that he makes a random move. In that case, f1 may be preferred to p1 or visa versa. However, consider a game with additional moves equivalent to certain other moves. For example, in Chess allowing the Rooks to be moved by either hand and all other pieces to be moved only by the left hand. Is it then more likely the player will move his rook?

Although plausible, this is not a very pleasing model since it does not allow us to prune redundant branches from our game tree. Instead, I propose the model by which the players establish equivalence classes of positions. (Two positions are said to be equivalent if the equivalence classes of their non-dominated options are in one-to-one correspondence via equivalence.) Moves are made by randomly choosing a non-dominated equivalence class, and then selecting a move within this class.

In this way, f1 and p1 are equivalence classes of equal interest to N. He makes his choice randomly between the two. This choice between F and P to select the eventual winner can be represented as n2.

Similarly, n3 is the choice by N of who will choose the person who will choose the person who selects the winner. In fact, it can be shown that any position in a three-player finite sequential perfect-knowlege game is equivalent to exactly one of the equivalence classes ni,pi,fi where i is an integer. ni+1 for example is defined to be a choice for N between pi and fi. Thus, we have the counter-intuitive result that no game is symmetric under the hypotheses above.

Calculations with these refined classes become even more complicated. For example, whereas "Nim" with two piles is completely trivial with two players (the next player wins unless the two piles are of equal size), there is no apparent rhyme or reason in the three-player two-pile "Nim" table (see the table below).


     i  \  j  |  1     2     3     4     5     6     7     8     9    10
     ---------+---------------------------------------------------------
     0    p0  | n0    n0    n0    n0    n0    n0    n0    n0    n0    n0
     1    n0  | n1    f1    n2    n2    n2    n2    n2    n2    n2    n2
     2    n0  | n1    f1    n2    n2    n2    n2    n2    n2    n2    n2
     3    n0  | f1    n2    p1    p1    p1    p1    p1    p1    p1    p1
     4    n0  | n2    n2    p1    f2    n3    f3    n4    n4    n4    n4
     5    n0  | n2    n2    p1    n3    f3    n4    n4    n4    n4    n4
     6    n0  | n2    n2    p1    f3    n4    p3    p3    p3    p3    p3
     7    n0  | n2    n2    p1    n4    n4    p3    f4    n5    f5    n6
     8    n0  | n2    n2    p1    n4    n4    p3    n5    f5    n6    n6
     9    n0  | n2    n2    p1    n4    n4    p3    f5    n6    p5    p5
     10   n0  | n2    n2    p1    n4    n4    p3    n6    n6    p5    f6

More players

Until now, we have only discussed games with three players. When a fourth player is added, the refined classification (ni,fi,pi) breaks down completely. However, the crude classification scheme N,F,P,Q can be extended. Instead of four types of games, now twelve are needed.
  1. [1-4] Forced win for a certain player.
  2. [5-8] An alliance of three players is needed to stop a certain player from winning.
  3. [9-12] One player plays no significant role whatsoever and the three remaining players participate in a "Queer" game. (In other word, any of the three players can be stopped by the other two.)
Under this view, game types are in one-to-one correspondence with voting schemes (see bibliography entry Isbell).

             Number of Voting Schemes

     n     0  1  2  3   4   5    6      7
     ---------------------------------------
     v(n)  0  1  2  4  12  81  2646  1422564
A voting scheme consists of a set of voters V, and a set of winning coalitions W subject to the condition that if V is a superset of B and B a superset of A in coalition set W, then B is in W, and if A=V-B, then exactly one of A and B can be a member of W. In other words, a voting scheme is an order preserving map f from the subsets of V to \{-1,1\} such that f(A)=-f(V-A).

Given a game G and a partition P of its players, we define the quotient game G/ to be a new game with one player for each block (or alliance) of P. The player B in P in the game G/P must make a move whenever it is the turn of some p\in B. Thus, in G/P there might not be simple rotation of play even if in G the players take turns playing in simple rotation.

The moves available to B in the game G/P are the same as those available to P in the game G. B wins the game G/P if any P in B wins.

A is said to be a winning (resp. losing) alliance of players in the game G if the quotient game G/{A,P-A} is a win (resp. loss) for the alliance A. The set of winning games is a voting scheme, since adding players to an alliance can only strengthen it, and given a pair of opposing coalitions exactly one can win while the other must lose. (If the game includes results other the simple wins and losses, then the notion of a voting scheme must be generalized. A generalized voting scheme is a order preserving map from the subsets of V into the set of results {-n,-n+2,...,n-2,n} such that f(A)=-f(V-A))

The play of a game can be viewed as the selection of a winning coalition (see Sh). The difference between the coarse classification and the fine classification of game is that when a winning alliance is formed in fact the alliance itself does not win, but rather only one member of the alliance. Thus, in the end, most alliances are unstable. It is this unstability which the fine types fi,ni,pi attempt to measure in three-player games.

In conclusion, just as two-dimensional projections are often insufficient to recreate a three-dimensional object, these coarse classifications based on two-player projections of a multi-player game do not contain the subtle nuances which give the game its interest. However, it is difficult to develop a single complete and rigorous theory as was the case for the theory of two-player games.

Negotiation

The Theoretical Basis

Three-player games are much more complicated than two-player games. Whereas, in a two-player game, there is only one adversarial relationship, in a three-player game, any of the three players can be opposed by an alliance of the other two players.

The exact strategy derived depends on many "psychological" assumptions on the behavior of the other players when confronted with choices of equal interest. For example, when a loss seems certain, many players will choose to share the defeat with the player most responsible for their loss. While maximizing your position against such a psychology, one must be careful not to give the impression that you are minimizing the position of any other player.

Whereas, when a victory seems possible, players chose their winning coalition according to their estimated chances of having the victory accorded to them. Thus, all critical members of an alliance must be assured of the likelihood of victory for them.

In any two-player game, players can only be pleasantly surprised when their opponent makes a bad move. In that way, the player will often find a previously unavailable winning strategy. However, this is not the case in three-player games. Suppose for example that, N chooses f0 (a forced win for F) when he could have chosen f1 (by which F would have decided whether P or N would win). This is a bad move for N since he no longer has any chance to win. Nevertheless, P also loses from this error.

Thus, in any three-player game, negotiation is necessary for several reasons. First, as we saw above, each player will attempt to shift the blame for the other player's misfortunes on the third player. Here we discuss matters of loyalty and treason, and try to place an appropriate "spin" on the moves that we have made.

Secondly, negotiation can be used to avoid the sort of error that N made in the example above. If P had convinced N of the danger that he was in, then perhaps N would have made the move that was in their mutual interest.


By giving a player useful suggestions, one can hope that his trust for you will be raised. On the other hand, you can suggest a move to a gullible player which is more in your interest than his. If you have promoted his trust in you up to now, then perhaps he will accept your advice (against his better judgement).

On the other hand, if the other player no longer has any trust in you, then he will be less receptive to any advice you might give. In an extreme case, he might ignore his own normal preferences, and simply hand the victory to the third player.

In a game with a very complicated search tree, communication does not merely serve for negotiation itself, but rather these suggestions allow a pair of players temporarily working together to exchange information, and thus examine the search tree in parallel.

Diplomacy

In order to better study the relationship between negotiation and multi-player games, we chose the game Diplomacy.

Obviously, many other multi-player games exist. However, certain of these prohibit negotiation among players. For example, in the game of Poker, perhaps some information could usefully be exchanged among the players. However, this is normally forbidden.

Some games such as three-player "Nim" are so simple that there is little to discuss. Moreover, in any impartial game, there is a great symmetry which overshadows the negotiation. In a position n1, what can the player F say to N different from what P might say in order to persuade N to attribute the victory to him.

On the other hand, although there is no inherent symmetry in the game Diplomacy, statistics show that each country has an approximately equal chance to win the game (see PG). In fact, according to Alan Calhamer, creator of the game, a perfect game of Diplomacy is one in which no country is eliminated, but rather everyone allies against the current leader until he is weakened and a new leader appears. All of this coalition building and rebuilding generally involves much detailed negotiation. Furthermore, an effective alliance will require coordinated movements. Through these negotiations the participants in any alliance will determine which of the units will make attacks, and which will support those attacks.

Moreover, the strategy of the game Diplomacy is very rich. Thus, in proposing alliances with Turkey, Russia and Austria-Hungary will each be able to offer a number of advantages and disadvantages to Turkey. Depending on the temperament of the Turkish player, he can accept one of the offers, hold off while waiting further developments, or even pretend to accept both offers while intending to soon betray the trust of one or both of the other players.

Diplomacy Programming Project

When testing game programs, we have learned much more from machine-versus-machine competitions than from machine-versus-human competitions. The humans are very unpredictable. Thus, a victory one day might not mean that you have improved your program, but rather that the human is tired, or has gotten overconfident, etc.

On the other hand, computer tests are repeatable. One variable can be changed at a time, while hundreds of test games are played automatically. Using such techniques, we are able to automatically fine-tune the position evaluation heuristics of a computer game program. Some programs use automated learning techniques; in this case, it is vital that the program be able to play thousands of games automatically.

For a game like Diplomacy, it is even more important to emphasize automated testing. Since there are seven players, an "average" program should win about one seventh of the time. Thus, as compared with a two-player game, approximately three or four times as many test games should be run in order to statistically prove a similar percentage increase in the number of wins. Furthermore, the heuristic position evaluation used by the Bordeaux Diplomat contains a large number of parameters whose values should eventually be optimized through repeated tests.

When testing a Diplomacy program with human players, we have seen that each player has many preconceived notions about the computer. These preconceptions lead to unwarranted prejudices for or against the computer, which in turn influence the alliance formation process much more than any purely game-related factor. Thus, the real capacity of the program is clouded by a large number of extraneous issues.

To avoid this, test games against humans have been organized using Gunboat rules. By these rules, the identity of each player is kept secret, and no communication is allowed other than through the movements themselves. However, by eliminating communication, we have eliminated the negotiation which was our main object of study.

Indeed, the ideal solution is to test our program against computer opponents. The first opponents one might think of are copies of our program, or alternate versions of our program. However, this is an extremely inadequate solution. Obviously, programs could recognize copies of themselves by sending special messages at the beginning of the game. The program would then ally with copies of itself, eliminate any remaining players, and finally declare a draw.

Here, we assume that the programmer is trying to recognize his own program. However, similar effects can occur without such intentions on the part of the programmer. For example, it is reasonable, all things being equal, to follow advice from an intelligent player -- yet how should a program judge the intelligence of others? The most reasonable criterion is that an intelligent player would suggest moves that you consider very strong. However, if the other program is a copy of yourself, then it will evaluate moves the same way you do, and then will probably suggest to you the very moves you would have thought of making yourself. Under such conditions, your program would inevitably ally with itself.

Once again the subject at hand is being clouded by a host of irrelevant details. To avoid these problems, any Diplomacy program should be tested against a set of programs written by different groups of programmers using completely different techniques.

But how can seven diplomats programmed by different people carry out any sort of meaningful dialog? How should messages be transmitted between programs? And how can these messages be understood?

In many computer game competitions, moves are transmitted by hand. The computers sit side-by-side while operators enter moves. This guarantees compatibility, but in a game of Diplomacy, where for each turn up to 34 movements must be entered, and may be preceded by arbitrarily many diplomatic messages, the game would inevitably slow down to a standstill. Indeed, the test at the 1992 Computer Olympiad (see DL) was terminated after 12 movement phases due to lack of time, whereas it is important to run a large number of complete game in order to accurately measure the potential of a Diplomacy program.

To run the necessary number of games in a reasonable amount of time, the computers communicate directly. The Diplomat Interface has been developed by the Diplomacy Programming Project to facilitate such communication. This interface is written in LCS, is a variant of SML developed by Bernard Berthomeiu and the Outil et Logiciel pour le Communication group at the Laboratoire d'Automatique et de l'Analyse de Systeme in Toulouse. LCS differs from SML in that it allows the management of parallel processing. This implementation choice notwithstanding, the interface is capable of launching programs written in any language within the UNIX operating system. The standard output of each player-diplomat is rerouted via the use of an LCS instream to the LCS agent responsible for the communication with that player-diplomat. This agent then can output in the corresponding LCS outstream where it is available on a FIFO basis as standard input for the diplomat.

To simulate the play of humans against machines, the Diplomacy Interface is capable of opening X or suntools windows for specified countries. Input and output for these windows are handled in a manner similar to that described above. Actually, it is impossible for a diplomat to determine who is playing the other countries: a human or a machine. Thus, no prejudices against humans can be built into the diplomats, and humans must perform a sort of Turing test if they wish to distinguish the machines from other humans.

Each player communicates only with the Diplomat Interface. The Diplomat Interface acts as a game master. That is to say, players can submit their moves with the SUB command. When all the moves are in and all players are ready to go, or when the deadline arrive, results are computed and distributed with the ORD message. The current position is announced via the NOW message, the missing orders for next turn are requested with the MIS message. It is the Diplomat Interface that declares the game over either because a player has reached the victory condition of 18 supply centers, or because the players have agreed on a draw with the DRW command.

However, the Diplomat Interface is also a real interface; that is to say, it can pass messages from one player to another.

It was decided that this communication should not be made in a natural language. The study of natural languages is an entirely separate and difficult field of research; but we wish to concentrate on the problems directly related to negotiation and game playing. Thus, a simplified language called the DPP Protocol was devised by which the players would be required to communicate (see DPPP,DPPEx).

Tests using actual negotiation transcripts show that this simple language allows the expression of nearly all of the concepts usually employed when negotiating in Diplomacy. It is impossible to say that it is sunny outside, but it is easy to threaten a player with war if he takes one of your cities. For example,


     France: "SND ENG (IFF DMZ(PAR BRE MAR) THN PCE ELS NOT(PCE))"
This language was developed independently of the diplomat we are developing. Thus, one can hope that other programmers will find the language as convenient as we are.

In order to make the language as rich as possible, several dozen predicates have been defined. In addition to their grammar, definitions of their suggested meaning and examples of their use are given. However, there are no guarantees that other programs will interpret the messages accordingly. One can always write a psychopathic program which attacks only those who propose peace. However, one could expect that, as in the real world, psychopaths would have a terribly short life expectancy in Diplomacy. In consequence, most reasonable diplomats will start all relationships with the tentative assumption that their partners will react "normally" to any signals made.

Certain programs might not understand all possible constructions in the protocol. However, for the most part, they should understand the messages we indicate in the dictionary as the most basic. This includes the message TRY. Its purpose is that upon receipt of an unknown message, we respond with TRY and a list of constructions understood by our program. In this way, some sort of communication can be held between programs of widely differing levels of sophistication.

The Bordeaux Diplomat

Division of Control

The Diplomat proposed in bibliography entry HL is based on a strategic core. This lower brain of the diplomat does all the calculations and thus forms the basis for the decisions taken by the upper brain, or negotiator part of the diplomat.

The negotiator does not know the details of the map, nor the rules of the game. On the other hand, it must be able to communicate with the other players, distinguish between friends and enemies, form alliances, and so on. To make decisions, the negotiators poses concrete questions to the strategic core, and following the answers to these questions, it chooses the appropriate moves to submit, or messages to send out.

The strategic core does not know which country the diplomat is playing. It is simply a disinterested, objective observer who calculates the best moves for any country given a certain number of constraints along with the value of the resulting position of all countries.

This strategic core can be tested separately from the rest of the program (as was done at the 1992 Computer Olympiad). However, its failure to communicate and to grasp the notions of balance of power, greatly limit its potential.

On the other hand, the program's division between the strategic core and the negotiator is completely modular. Thus, it would be possible to redesign the Bordeaux diplomat simply by replacing one of the two modules with a new program. Given a collection of several negotiators and several strategic cores based on different algorithms, we can easily construct an enormous number of full diplomats.

The strategic core is given the current position including the supply center distribution, and a set of "concerned" countries, which has been pre-divided into alliances by the negotiator. Each alliance can optionally have a list of "mortal enemies" designated. Next, the length and depth of the search is specified. The core can be told that certain countries are diplomatically bound not to enter certain "demilitarized zones," or to make or not make certain moves. Finally, a "seed" can be specified for the search.

In return, the strategic core must return for each concerned country the move that maximizes the total interest of their alliance (while minimizing the interest of their enemies) subject to the constraints given along with the values themselves.

Obviously the strategic core is used to select the moves to be used subject to the constraints of all of the treaties which we have agreed to, and decided to follow. However, the core has many other applications:

Further discussion of the diplomatic aspects of our program are postponed to a later article as continued work progresses on the negotiation module of our program by Per Westling (Linkoping, Sweden).

The Strategic Core

Many difficulties were encountered in writing the strategic core of our program. It was first outlined in bibliography entry HL, but several difficulties were noted in the search routine. In the end, a new algorithm was implemented by Constantin Staykoff.

As we have seen above, perfect-knowledge position evaluation is much more difficult to define in a multi-player environment -- especially in a game such as Diplomacy with as many as seven players.

Already in two-player games such as Chess, the strategy is sufficiently rich to prevent programs from playing perfectly. Hence, a search is made through the most "important" parts of the game tree, and value of the leave positions are then "calculated." The selection of tree paths and the evaluation of the terminal positions are governed by various heuristics.

The game tree of Diplomacy is much more complex. All units must move simultanously. Thus, the game is not a sequential game, but rather a seven-dimensional matricial game. For example, the number of possible non-equivalent openings is 4,430,690,040,914,420 (over four quadrillion, see bibliography entry Green) as compared to 20 in chess. In chess, the average number of moves mounts to about three dozen during the mid-game. However, in Diplomacy, the number of available moves increases by an additional factor of several billion during the movement phases as the number of units and possible supports rises.

An efficient simplex algorithm has been invented by Lemke and Howson (see LH) to find Nash equilibria in two-dimension matricial games. However, even if this method could be generalized to multi-player games, there would not be sufficient time to input the entire game matrix to the algorithm. Hence, whereas it is possible to search several turns ahead in chess, it is impossible to conduct a complete one-ply search during a movement phase of the game Diplomacy. Instead, we must carefully converge on the Nash equilibria in the enormous matrix without looking at all the possible moves.

It is senseless to consider looking at two movement phases at the same time. All predictional capacity must then be built into our position evaluation heuristics.

Fortunately, during the building and retreat phases, the number of possible moves is never quite so astronomical. It is thus possible to do a complete one-ply analysis of these phases of the game.

Optionally, the search of a build or retreat phase could be less complete, but anticipate the following phase via a recursive call to the strategic module of the program.

Search Techniques with Simultaneous Movement

A complete description of the heuristics of our position evaluator can be found in bibliography entry HL, so we will concentrate on the search techniques itself. What moves should be considered? In what combinations? And in what order?

The search technique proposed in EBBFS (Even Better Best-First Search) involved an iterative search for moves for all of the concerned countries in parallel. Moves were considered one at a time by each alliance following a set agenda and adding orders for one unit a time (Best-First). The results of a given move were calculated in relation to the "current best" moves for each of the other alliances. The position is kept as current best move or is rejected according to the output the position evaluator gives to the resulting position.

Suppose a move MA was rejected for the alliance A because the alliance B had considered the move MB to be its best move at the time A evaluated MA, but later B rejects MB for a better move M'B. According to the above algorithm, the alliance A would not be given a chance to reconsider the interest of the move MA in light of this development.

Given the drawbacks of EBBFS, we propose a new search technique called RES (Refined Evolutionary Search). The name is derived from the fact that this search strategy is similar to that adopted by nature to play the game "Biology."

In the game biology, each player must chose DNA sequences families. These sequences then define a certain number of living organisms. If the descendants of these living organisms are numerous after several billion years in a certain prechosen environment, then the player has won.

To play the game of biology, a player must in theory test every DNA sequences family in competition against every possible set of rivals. However, such a research is impossible. Thus, nature has adopted the evolutionary search techniques. By this method, animals are mutated one at a time by randomly changing a certain number of genomes. The successful mutations replace the original species, and the unsuccessful mutations are rejected.

Similarly, in Diplomacy, a large alliance cannot possibly consider all of its movements. Thus, the research is started by considering a certain "seed" move. This move being considered is then mutated by selected a certain number of units, and randomly selecting new moves for these units. Mutations are made for one alliance after another until the maximum search length has been reached.

The value of any given mutation with respect to the current moves of all other alliances is compared to the value of the original move with respect to the current moves of all other alliances. In other words, when an alliance adopts a new strategy, this may change the value of the strategies already being considered by the other alliances.

A certain number of consistency rules cut the loss of time due to the consideration of uninteresting mutations. In all cases where allied forces are ordered to attack each other, the moves are changed into mutual support. Furthermore, as a time saving measure, support is only considered among units belonging to the same alliance.

If the search passes through Nash equilibrium, then no alliance will have any interest in changing its moves. Thus, no Nash equilibria will be lost.

Nevertheless, the program only considers pure strategies whereas pure Nash equilibria do not exist for all games. Suppose Italy with his fleet in the Tyrrhenian Sea is trying to counter Turkey's incursion into the Ionian Sea. If Italy defends Naples, then Turkey should attack Tunis. However, if Italy defends Tunis, then Turkey should attack Naples.

Assuming that Italy and Turkey are the only countries concerned by this situation, the RES search will eventually enter a cyclic pattern of length four as described above. Eventually, we would like our program to be able to detect such cycles. The strategies involved in these cycles could then be extracted and subjected to a detailed matricial game analysis. A mixed Nash equilibrium would then be found by which Italy moves to Naples and Tunis with certain probabilities. At that point, a random choice would be made, and the move evaluated according the expected return.

Currently our program does not do such detailed analysis but merely halts at a random point (the search limit) and returns the strategies currently being considered and their actual return.

The research is designated as "refined," since that is how the mutation rate is set. At the beginning of the research, all units are reassigned new orders. There is no reason to believe that the old orders are good, so we attempt to examine as many completely different possibilities as we can. Later on in the research, we are more confident in the moves being considered, and therefore restrict ourselves to smaller and smaller modifications. As the search limit approaches, we only consider changing the order of one unit at a time, or permute the supports used.

The following procedure can be used to calculate best moves and their values for each given alliance with respect to a certain position (units and supply centers) in the game Diplomacy and under the following constraints: demilitarized zones, required moves, forbidden moves, search depth, and search width.

  1. [RES1] (Initialize.) Set best moves BestM, and best raw moves RBestM to the default moves.
    Calculate the position P resulting from BestM.
    For each alliance k, let VBestMk be the value of P for alliance k. (see note below)
    For each unit u, create a list of legal moves LM(u) given the map, demilitarized zones, forbidden moves, and required moves. Abort if LM(u) is empty.
    Set i=1.
  2. [RES2] (Choose Alliance.) Let p be i mod the number of alliances.
  3. [RES3] (Choose Units.) Let n be the smallest integer greater than or equal to the number of units in alliance p times width - width.
    Select a set SU of n units belonging to alliance p.
  4. [RES4] (Make Moves.) Set RM to RBestMp.
    For all units u in SU, set RM(u) to a random move or hold from LM(u).
    Set M to RM.
  5. [RES5] (Make Supports.) Replace in M all orders of the form A MovesTo C and B MovesTo C with appropriate random supports.
    Replace in M all orders of the form A MovesTo B and B Holds with appropriate supports.
  6. [RES6] (Evaluate Moves.) Calculate the position P resulting from M and BestMj for j not equal to p.
    Let NewVj be the value of P for the alliance j. (see note below)
  7. [RES7] (Select Moves.) If NewVp>VBestMp, then set BestMp to M, set RBestMp to RM, and for all alliances j, set VBestMj to NewVj.
  8. [RES8] (Test Exit.) Increment i.
    If i is less than width, then continue with step 2.
    Otherwise, output BestMj and VBestMj for each alliance, and exit.
NOTE: The value of a position for an alliance is the sum of the values for each member of the alliance minus the value for any indicated mortal enemy.

In biology, Mammals have retained their deep-diving instinct to some measure despite having left the oceans millions of years ago. Similarly, in Diplomacy, a player may attempt to defend a space which comes under attack early in the research. In response, the attacking alliance may add enough support that defense becomes hopeless. However, the defender will persist in his futile support unless there is positive gain by making some other move. (When there is a positive gain available from another move, a cyclic situation as described above often results.)

Such behavior in biology is useful; for example, when people fall into icy waters. Similarly, such a behavior is useful in Diplomacy in order to make life as difficult as possible for the attacking player.

Conclusions

The theory of multi-player games and the game Diplomacy in particular is too deep and rich to expect a complete analysis. However, in view of the applications of multi-player games, we must persist in our efforts to model such games.

Mathematically, we seem to lack a good language in which to describe such problems. Underlying psychological hypotheses are necessary, which open the game up to the fuzzy world of negotiation.

Our view of negotiation is that it must be based on a solid strategic core. Otherwise, the diplomat cannot decide what to ask for, nor evaluate any sort of offer. Early versions of this strategic core show good survival skills, but lack the savvy to pull off a win. Nevertheless, in blind tests, players are unable to distinguish the strategic core from human players.

In combination with an automatic negotiator which will temper the aggressivity of its strategic alter-ego, we should be able to compete at an equal level with humans in the game of Diplomacy. Simultaneously, we hope that existing diplomats will be made available in conformity with a universal language of communication among Diplomat which we have developed, and that many new programs are introduced.

Thus, we will be able to begin the repeated testing necessary to compare various theories, and to optimize our strategies.

By attempting to solve these difficult problems resulting from the interactions of intelligent agents, we hope to better understand what sort of capabilities such an agent should have, when it cannot assume that everyone in the world is its friend. Creating programs that can play complex multi-player games is a step towards creating programs that can deal with the complexities of the real world and the craftiness of humans.

Bibliography

Danny Loeb
Universite de Bordeaux I, France
(
[email protected])

If you wish to e-mail feedback on this article to the author, and clicking on the mail address above does not work for you, feel free to use the "Dear DP..." mail interface, which is located here....