Figure 8.2. Frequency distribution of agreements when neither player knows both prizes. Source:Roth and Murnighan, 1982.
There is also a strikingly different pattern of agreements when there are two focal points as compared to one focal point. With two focal points, the agreements were spread out between the focal points (see Figure 8.1), whereas with a unique focal point, the agreements were concentrated at the focal point, with some variation around it (see Figure 8.2). Similar results have been obtained in other experiments (see, e.g., Roth and Malouf, 1979; Roth, Malouf, and Murnighan, 1981).
25 30 35 40 45 50
A's share of tickets
These results lend support to the proposition that a bargaining norm is a form of social capital: it has economic value because it facilitates coordination. But how do bargaining norms become established in the
first place? In previous chapters we have argued that conventions arise through the accumulation of precedent: people come to expect a certain division of the pie because other people have agreed to divide the pie in a similar way under similar circumstances. In the case of bargaining, this feedback loop between precedents and expectations can be demonstrated in the laboratory, that is, bargainers can be conditioned by precedent to favor one focal point over another through the reinforcement effect of precedent. Roth and Schoumaker (1983) demonstrated this in a series of experiments that paralleled those of Roth and Murnighan (1982). Each pair of subjects was given one hundred lottery tickets to divide. The prize was $40 for A and $10 for B. Since this was common knowledge, there were two focal points: dividing the tickets 50:50 and dividing them 20:80. Each subject played twenty-five games in succession. Unknown to the subjects, however, the first fifteen plays were against a computer that was programmed to play one or the other focal point. After the first fifteen rounds, each subject was paired against a succession of other subjects who had been playing against the same programmed opponent. Furthermore, the solutions that each subject had agreed to in the previous five rounds were published for both to see. The conjecture was that the experience of the subjects in the programmed rounds would create mutual expectations that would tend to lock them into whatever solution they had become accustomed to playing. (A control group of subjects played the full twenty-five rounds without programmed opponents.)
The results confirmed the hypothesis that expectations formed in early rounds of play strongly influence players' subsequent behavior. Almost all of the players who became accustomed to an opponent who insisted on 50:50 in the first fifteen rounds continued to make 50:50 agreements in the remaining ten rounds. This was true even though subjects in role  would have been better off under the alternative 20:80 norm. Similarly, the players in role A who had become accustomed to playing opponents who insisted on the 20:80 division continued to make agreements of this sort even though the alternative 50:50 norm would have been preferable.
These results provide empirical evidence for two general propositions: a bargaining norm has economic value as a coordination device, and the choice of norm can be conditioned through precedent. In the next few sections we shall draw out the implications of these propositions, using the framework developed in previous chapters. Before doing so, however, let us recall what classical bargaining theory has to say about these situations. In the bargaining model originated by Nash (1950), the outcome of a two-person distributive bargain depends only on the utility functions of the two parties (their attitudes toward risk), and their alternatives if they fail to reach agreement. Let í(õ) be the row player's von Neumann-Morgenstern utility function, which we assume is concave and strictly increasing in the row player's share x of the pie. Similarly, let v(y) be the utility function of the column player as a function of that player's share 1/. Assume that if the bargaining breaks down, their shares are x° and y°, respectively, where ,r° +1/° <1. Let u° = u(x°) and v° - v(if). The Nash bargaining solution is the unique division of the pie that maximizes the product of the utility gains relative to the disagreement alternatives, that is, it is the unique division (x*. 1 - x*) that maximizes [í(õ) - ì°][ó(1 - x) -1>°] subject to 0 < x < 1.
This division arises in the subgame perfect equilibrium of the following noncooperative bargaining game (Stahl, 1972; Rubinstein, 1982). Two players take turns making offers to each other (an offer is a proposed division of the pie). First, player A makes an offer, which player  accepts or rejects. If player  accepts the offer, the game is over. If player  rejects the offer, she gets to make a counteroffer, which player A accepts or rejects, and so forth. After each rejection there is a small probability p that the game will cease with no further offers (the negotiations "break down"). It can be shown that this game has a unique subgame perfect equilibrium, and that the outcome is arbitrarily close to the Nash division when p is sufficiently small.
For the players to actually reach this outcome requires, however, that their utility functions be common knowledge and that the structure of the bargaining have the exact form described above. These assumptions strain credulity: if anything is common knowledge, it is that utility functions are almost never common knowledge. Moreover, there is no reason why the parties need to bargain with each other via alternating offers with a fixed probability p of breakdown. The Nash outcome depends crucially on these assumptions.
The model of norm formation that we propose dispenses entirely with common knowledge, common beliefs, and common priors. We posit instead that people take their cues from what other people have actually done before them. If lawyers usually get one-third of the award as a contingency fee, clients come to expect that lawyers will insist on this much, and lawyers come to expect that their clients will accept this. In short, common expectations emerge endogenously through the accumulation of precedent. It is also reasonable to assume that random perturbations jostle these expectations to some extent. We shall show that when all agents have the same sample size, and all agents within each population have the same utility function, the stochastically stable norm corresponds to the Nash bargaining solution. When the populations are heterogeneous in both sample sizes and utility functions, one obtains a generalization of the Nash solution that differs from the Harsanyi- Selten extension of the Nash solution. This result shows, in particular, how high-rationality solutions from classical game theory can emerge in low-rationality environments through the process of social learning.
8.2 Adaptive Learning in Bargaining
Consider two populations of players—landlords and tenants, lawyers and clients, franchisers and franchisees—who periodically bargain pair- wise over their shares of a common pie. We shall refer to these (disjoint) populations as "row players" and "column players." Generically, let x denote the share that a row player gets, and let ó denote the share that a column player gets. For the moment we shall assume that each population is homogeneous, that is, that everyone in the same population has the same utility function. Let u(x) denote the row players' utility as a function of the share x, and let v(y) denote the column players' utility as a function of the share y, where x, ó e [0,1]. As usual, we assume that è and v are concave and strictly increasing. For simplicity, we shall also assume that the disagreement shares satisfy x° = = 0. This involves no real loss of generality, because we could always say that the pie to be divided is the surplus over and above the disagreement shares. Without loss of generality we can normalize è and v so that //(0) = i>(0) = 0, and m(1) = v{\) = 1.
At the beginning of each period, one row and one column player are drawn at random from their respective populations. They play the Nash demand game: the row player demands some number x G (0,1], and simultaneously thecolumnplayerdemandssomenumbery e (0,1]. Note that the demands are strictly positive—it makes no sense to "demand" nothing. The outcomes and payoffs are as follows:
Demands Outcomes Payoffs äã + ó < 1 x, ó u(x),v(y) x + y> 1 0,0 0,0
To keep the state space finite, we shall discretize the strategies by allowing only demands that are expressible in d decimal places, where d is a fixed positive integer. The precision of the demands is 8 = I0~d. Let X{ = {5,28 1) denote the finite space of discretized demands.
The evolutionary process is adaptive play with memory m and error rate £. It will be convenient to express the sample sizes as fractions of m. (The reason for this will become apparent later on.) Let a be the rational fraction of precedents that the row players sample, and let b be the rational fraction of the precedents that the column players sample, where 0 < a, b < 1. To avoid rounding problems, we shall henceforth assume that m is chosen so that am and bm are both integers (this is purely a matter of mathematical convenience). Let (x', i/') denote the amounts demanded by the row player and column player in period t. At the end of period f, the state is
/i' = ((x'-m+1.y'-m+1)...., (Ëó')).
At the beginning of period t + 1, the current row player draws a sample of size am from the y-values in /;'. Simultaneously and independently, the current column player draws a sample of size bm from the x-values in ti.
Letg' (y) be the frequency distribution of demandsyin the row player's sample. Thus£' is a random variable with cumulative distribution function
G'(y) = [ g'(z)dz. Jo
With probability 1-å the row player chooses a best reply given G', that is,
x'+I = argmaxn(x)G'(l — x).
With probability e he chooses a demand at random from Xj. A similar rule applies to the column players. This yields a Markov process pc S a b-m on the state space H = X'J'.
A conventional division or norm is a state of form
where 0 < x < 1. Within the remembered history, row players have
always demanded (and received) x, while column players have always demanded (and received) 1 - x. Thus their actions and expectations are fully coordinated in the absence of errors and other stochastic perturbations. We shall say that a division (x. 1 - x) is stochastically stable for a given precision S, if the corresponding norm hx is stochastically stable for all sufficiently large m such that am and bm are integers.
Theorem 8.1. Let G be the discrete Nash demand game with precision S pla\/ed adaptivehj with memory m and sample sizes am and bm, where 0 < a, b < 1 /2. As S becomes small, the stochastically stable division(s) converge to the asymmetric Nash bargaining solution, namely, the unique division that maximizes (è(õ)ÓØ - x))[1]'.
Note that if both sides have the same utility function (/< = v) but have different amounts of information (à ô b), the outcome favors the side that is better informed. Note also that the analog of theorem 8.1 holds when players demand discrete shares x > x9 and ó > i/°: the stochastically stable division comes arbitrarily close to the unique division (.t, y) that maximizes (u(x) - i/(x°))"(tf(y) - v(y°))h subject to x > x°, ó > y°, and x + ó = 1.
We stress that this result does not depend on agents' calculations about others' utility functions, their information, or the extent of their rationality. None of these things is common knowledge (or even mutual knowledge). The only thing we assume is that agents respond more or less rationally to concrete information about actions taken by their predecessors. Since their responses depend on their attitudes toward risk, the outcome does in fact depend on their utility functions (even though the agents may not be aware of it). The Nash solution is favored over the long run simply because it is the most stable given the agents' preferences and the random forces that constantly buffet the process about.
8.3 Sketch of the Proof of Theorem 8.1
We shall outline the proof of theorem 8.1 under two simplifying assumptions: (Ä complete proof without these assumptions is given in Young, 1993b.)
To motivate the argument, consider a concrete example. Let the precision <5 = .1, and suppose that the current convention is .3 for the row player and .7 for the column player. Then the state looks like this:
m periods
Row player's previous demands: .3 .3 .3 ... .3
Column player's previous demands: .7 .7 .7 ... .7
To tip the process into the basin of attraction of another norm requires a succession of errors. Since by assumption all errors are local, the row player can deviate only to .2 or .4, and the column player can deviate only to .6 or .8. Consider the possibility that the row player demands .4 for several periods. This is too high given the current convention, and the players who demand .4 will almost surely fail to make a bargain, at least at first. However, once enough of these errors accumulate, they will eventually change the expectations of future column players. For example, a column player will respond with .6 if there are i instances of .4 in her sample and i>(.6) > (1 - i/bm)v(.7). In other words, i mistakes by the row player can push the process into the basin of attraction of the norm (.4, .6) if
i > Ãbm(v(.7) - v(.6))/v(.7)-\. (8.1)
Similarly, the column player can push the process into the basin of attraction of (.4, .6) by demanding .6, which is too little given the current convention. This weakness will eventually be exploited by the row player if there are j instances of .6 in the row player's sample and (Öàò)è(À) > u(.3), that is, if
/ > [am u(.3)/u(.4)l. (8.2)
Each x e X/j such that 8 < x < 1 - 8 corresponds to a norm that, for brevity, we shall denote by (äã, 1 - x). The resistance to going from the norm (x, 1 - x) to the norm (x + 3,1 - x — 5) is
r(x. x + 5) = \bm(v(\ - x) - t>(l - x - S))/v(l - õ)1 ë ëòè(õ)/è(õ + 5)1.