In more or less complete violation of representative design precepts, a large body of research has emerged that broadly addresses subjects' abilities to learn to use probabilistic information. The general format is to present the subject with a (long) series of trials in each of which several cues are presented and the subject is asked to predict the value of some criterion variable to which the cues are related. After the subject makes an estimate, he or she is told the correct answer before proceeding to the next trial. Such a format lends itself to endless variations in task characteristics: number of cues presented, their validity, the functional form of their relationship to the underlying variable that the subject is to estimate, the quality of feedback presented, whether the task is embedded in a meaningful verbal context, whether learning aids are provided, and so on.
The evidence from dozens of such studies is that except for the simplest versions, these multiple-cue probability learning (MCPL) tasks are very hard to learn. "Simple" generally means one or two cues, strongly and linearly related to the criterion, under conditions of low feedback error. For example, Slovic (1974) used a task with one linear cue that correlated .80 with the criterion and found subject estimates approaching maximum possible performance in the last of 100 trials. However, when the cue validity was -.80, learning after 100 trials was less than half this level, Deane, Hammond, and Summers (1972), using a three-cue task, found reasonable learning after 150 trials when all three relationships were positive, but almost no learning when the relationships were U-shaped. Learning improves somewhat when the subjects are warned about possible nonlinearities
496 judgment and Decision Making
(Earle, 1970). Two-cue interactions are learned only if helpful verbal cues are provided (Camerer, 1981). Even after reaching high levels of performance under low-error feedback, subjects' performances rapidly decline when feedback error levels are increased (Connolly & Miklausich, 1978). In short, as Klayman (1988) suggested, learning from outcome feedback is "learning the hard way."
In many real-world tasks, of course, feedback is probably much less helpful than is the outcome feedback provided in these MCPL laboratory tasks. A human resources (HR) professional trying to learn the task of predicting candidates' potentials from application materials receives feedback only after significant delay (when the applicant has been hired and on the job for some time); under high error (supervisor ratings may introduce new sources of error); and, crucially, only for those applicants actually hired (see Einhorn, 1980, on the inferential problems facing waiters who believe that they can spot good tippers). Laboratory MCPL tasks show excruciatingly slow learning of simple tasks under relatively good outcome feedback. Real-world tasks are almost certainly more difficult, and real-world feedback almost certainly less helpful, than are the laboratory conditions. It thus seems unlikely that outcome feedback is the key to learning real-world tasks of this sort, and interest in laboratory MCPL studies seems to have largely subsided in recent years.
Policy Capturing
Policy capturing, also known as judgment analysis (Stewart, 1988), is the process of developing a quantitative model of a specific person making a specific judgment. The general form of such a model is an equation, often first-order linear, relating the judgments, J, to a weighted sum of the information "cues," jc,-. Hundreds of such studies have been conducted, dating at least to Wallace (1923) who modeled expert judges of corn. Hammond and Adelman( 1976) studied judgments of handgun ammunition; Slovic (1969) studied stockbrokers; Phelps and Shanteau (1978) studied hog judges; and Doyle and Thomas (1995) studied audiologists. In addition, policy capturing has been commonly used for organizational applications, such as decisions concerning salary raises (Sherer, Schwab, & Heneman, 1987), alternative work arrangements (Powell & Mainiero, 1999), and applicant ratings and recommended starting salaries (Hitt & Barr, 1989). Policy capturing is thus a very widely used procedure.
It is also fair to say that the technique has been widely abused and that many of the findings are hard to assess or interpret. The basic approach is so simple and obvious that it is easy to overlook some important subtleties that vitiate the final conclusions. We shall sketch some of these points here;
see Stewart (1988) and A. Brehmer and Brehmer (1988) f0 fuller discussion.
Suppose one were interested in modeling the judgm? process of a university department head who is selectino can didates for graduate school. The department head reads an applicant's file, writes a merit score between 0 and 100 on the cover, and moves to another file. At a later stage the files are rank ordered, and applicants are admitted in descending order of merit score until all the places are filled. How might one model the department head's judgment process?
A first step is to establish what information she is collecting from each file: the cues. Simply asking her what cues she is using may be misleading: It is possible that she is biased toward (or against) women, minorities, left-handers, or scrabble players and is either unaware of the fact or chooses not to admit it. Second, how does she code this information? What counts as a "strong" GPA or an "acceptable" letter of reference? Significant work may be needed to translate the department head's inspection of the file into a set of scale scores representing the cues that she discovers and the scores in it. Stewart (1988) provided helpful practical advice on this process, and A. Brehmer and Brehmer (1988) discussed common failures, Doyle and Thomas (1995) reported an exemplary study in identifying the cues used by audiologists in assessing patients for hearing aids. Once cues and judgments have been identified and scored, estimation of a standard multiple linear regression model is straightforward. Interpretation, however, may not be. In particular, the interpretation of the relative weights given to each cue is conceptually difficult (see Stevenson, Busemeyer, & Naylor, 1991).
One subtle (and, in our view, unsolved) problem in policy capturing is how to meet Brunswik's goal of representative design. This goal plainly prohibits constructing simple orthogonal designs among the cues: Such independence destroys patterns of cue intercorrelations on which expert judges may rely. Cue ranges and intercorrelations should reflect those found in some relevant environment, such as the pool of applicants or patients with whom the expert regularly deals. A sample of recent actual cases would appear to meet this requirement, but even here complexities arise. If one wishes to compare expert predictions with actual performance, then only the subset of applicants hired or admitted is relevant—and this subset will have predictably truncated cue ranges and intercorrelations compared to the entire pool. Changes in pool parameters arising from changes in the employment rate, prescreening, self-selection into or out of the pool, or even of educational practices may all affect the modeled judgment. The underlying problem of what exactly defines the environment that the sample of cases is intended to represent is a conceptually subtle and confusing one.
Given these methodological worries, some caution is needed in summaries of research findings. Common generalizations (A. Brehmer & Brehmer, 1988; Slovic & Lichtenstein, 1971) include the following: (a) Judges generally use few cues, and their use of these cues is adequately modeled by simple first-order linear models; (b) judges describe themselves as using cues in complex, nonlinear, and interactive ways; (c) judges show modest test-retest reliabilities; and (d) inter-judge agreement is often moderate or low, even in areas of established expertise. In light of the methodological shortcomings just noted, we propose that such broad generalizations be taken as working hypotheses for new applications, not as settled fact.