CHAPTER 9. Contracts 131 5 page

The stationarity equation (3.7) has a unique solution if and only if P has a unique recurrent class, in which case we refer to the stationary distribution д of P. A fundamental result for finite Markov chains states: If P has a unique recurrent class, then the stationary distribution ц describes the time-average asymptotic behavior of the process independently of the initial state z°:

lim n'(z | z°) = p^x(z | z°) = д(г). (3.8)

l-*00

By contrast, if P has more than one recurrent class, then it is nonergodic, and the asymptotic distribution depends on where the process starts.

The learning processes we shall consider have a further property, which allows us to make even sharper statements about their asymptotic behavior. Let P be a finite Markov process on the set Z, and for each state z, let N_z be the set of all integers n > 1 such that there is a positive probability of moving from z to z in exactly n periods. The process is "Periodic if for every z, the greatest common divisor of N_z is unity. tha°^{r a pr}°^{Cess t,1at is} aperiodic and irreducible, not only is it true at the time-average behavior of the process converges to the unique Юпагу distribution д; its position at each point in time t is also approximated by ц when t is sufficiently large. More precisely, let v'(z | z°) be the probability that the process is in state z at time t, given that the process was in state 2° at time t = 0. (By contrast, p'(z | 2°) is the frequency distribution over the first t periods, conditional on beginning in 2°.) Letting P' be the f-fold product of P, it follows that

v'(z | 2°) = P^_z. (3.9)

If the process is irreducible and aperiodic, it can be shown that P¹ converges to the matrix P* in which every row equals p. Thus we have

lim v'(z | 2°) = д(2) for all z e Z, (3.10)

t-POO

where p is the unique stationary distribution. It follows that in every aperiodic, irreducible, finite Markov process, the probability i>'(2 | 2°) of being in state z at a given time t is essentially the same as the probability p'(z | 2°) of being in state z up to time t when t is large. That is, with probability one, both v'(z | 2°) and p'(z | 2°) converge to p(z) independently of the initial state.

These ideas can be illustrated with the learning model introduced in the previous chapter. Let G be a finite //-person game with strategy space X = ПХ.. Consider adaptive learning with memory ///, sample size s, and error rate e. This is clearly a finite Markov chain: the state space Z = X'" consists of all length-/// histories, and for each pair of histories //, //' б X'", there is a fixed probability P;,/,- of transiting from // to h in one period. We shall denote this process by its transition matrix, P"' ^{s e}.

We claim that when the error rate e is positive, the process is irreducible. To establish this, suppose that the process is in state h =

(_x'-m+i......... У) at the end of period t. Because of random error, there is

a positive probability of generating an arbitrary set of actions in the next period. Thus within/// periods, there is a positive probability of reaching any other history of length ///. It follows that the process is irreducible and its asymptotic distribution is the unique solution to (3.7). (Note also that the process is aperiodic.) We shall denote this distribution by p"'^s-¹, or sometimes by p.', when /// and s are understood.

When the error rate is zero, by contrast, the process may be reducible. To see this, suppose that G has a strict pure strategy Nash equilibrium x*. Consider the history in which x* was played /// times in succession:

h* = (.t*. x*........... .r*). For any values of s and /// such that 1 < s < ///, h"

is an absorbing state of the process, because for every i, xj is the unique
best reply to every set of samples that i might draw from the history h'. Such a state h* is called a convention. It is a situation in which, for as long as anyone can remember, people have always played .r*. Assuming that everyone else continues to adhere to the convention, it is in each person's interest to adhere to it also, and thus the convention stays in place—it is a self-enforcing mechanism of social coordination. The proof of the following statement is straightforward and is left to the reader.

The only absorbing states of the process P"^,s0 are the conventions, that is, states of form (x",x^m л*) where x' is a strict Nash equilibrium ofG.

D Q

5,5	0.0
0,0	4.4

Any game with more than one strict Nash equilibrium has multiple absorbing states; hence the learning process p^{m s 0} is reducible and therefore not ergodic. To illustrate, consider the typewriter game with the following payoff matrix:

D Q

Choose the parameters in adaptive learning as follows: memory, m = 4; sample size, s = 2; and error rate, e = 0. Each state can be represented by a 2 x 4 block of Qs and Ds, where the top line represents the row player's previous four actions and the bottom line represents the column player's previous four actions. For example, the state

DQDQ QDDQ

means that the row player chose D four periods ago, Q three periods ago, D two periods ago, and Q one period ago. Similarly, the column player chose Q four periods ago, D three and two periods ago, and Q one period ago. The probability of transiting to other states depends on the samples that the agents happen to choose from this history. (For convenience we assume that all samples are equally likely to be drawn.) Once the new actions are played, the leftmost (oldest) actions are deleted and the new actions are adjoined to the right. Since coordinating on D has a higher payoff than does coordinating on Q, D is the unique best reply to all samples of size two except the sample {Q, Q). Therefore the
one-period transition probabilities from the above state are those shown in Figure 3.1.

It is cumbersome to write down all 256 states, and even more cumbersome to solve the stationarity equations (3.7). Fortunately we can say a good deal about the asymptotic behavior of the process without taking this step. We already know that the process has two absorbing states: the conventions all-D and all-Q. The first question we need to address is whether these constitute the only recurrent classes. Note that this does not follow immediately, because there could exist a recurrent class that contains multiple states, none of which is absorbing. In the preceding example, however, this cannot happen because from any state there is a positive probability of moving in a finite number of periods to the all-D or all-Q state. (The verification is left to the reader.) It follows that all-D and all-Q are the only recurrent classes, and that all other states are transient. The process is nonergodic in the sense that the asymptotic distribution—the probabilities that all-D versus all-Q will eventually be reached—depends on the state in which the process starts. In other words, from any initial state, the process will end up in all-D or all-Q with probability one. However, the process is nonergodic because the probability that it will end up in all-D or all-Q depends on the starting position. Figure 3.2 shows two routes (among many) from a given starting state to all-D on the one hand and to all-Q on the other. In this sense, the long-run behavior of the process is only weakly predictable.

QDQD DDQD

tfi,

QDQD DDQQ *

Figure 3.1. Transition probabilities to neighboring states.

QDQQ DDQQ

When the error rate is positive, by contrast, the process is ergodic and its long-run average behavior does not depend on the initial conditions. In particular, the stationary distribution gives the (approximate)

DDDD DDDD

t¹

QDDD QDDD

t¹

DQDD DQDD

- ti ₄__ QDQD _ DDQD

X ^{Т ?6} ^ ₄ QDQD 21 DQDQ QDQQ DDQQ QDDQ ~~* DDQD "

. QDQQ ^ DDQQ

iiV

< DQQQ DQQQ l

QQQQ QQQQ

Date: 2016-04-22; view: 709

<== previous page	\|	next page ==>
CHAPTER 9. Contracts 131 4 page	\|	Figure 3.2. Alternative paths to different absorbing states.

doclecture.net - lectures - 2014-2025 year. Copyright infringement or personal data (1.425 sec.)