Home Random Page


CATEGORIES:

BiologyChemistryConstructionCultureEcologyEconomyElectronicsFinanceGeographyHistoryInformaticsLawMathematicsMechanicsMedicineOtherPedagogyPhilosophyPhysicsPolicyPsychologySociologySportTourism






Multiple-Cue Probability Learning Studies

In more or less complete violation of representative design precepts, a large body of research has emerged that broadly addresses subjects' abilities to learn to use probabilistic in­formation. The general format is to present the subject with a (long) series of trials in each of which several cues are pre­sented and the subject is asked to predict the value of some criterion variable to which the cues are related. After the sub­ject makes an estimate, he or she is told the correct answer before proceeding to the next trial. Such a format lends itself to endless variations in task characteristics: number of cues presented, their validity, the functional form of their relation­ship to the underlying variable that the subject is to estimate, the quality of feedback presented, whether the task is embed­ded in a meaningful verbal context, whether learning aids are provided, and so on.

The evidence from dozens of such studies is that except for the simplest versions, these multiple-cue probability learning (MCPL) tasks are very hard to learn. "Simple" gen­erally means one or two cues, strongly and linearly related to the criterion, under conditions of low feedback error. For ex­ample, Slovic (1974) used a task with one linear cue that cor­related .80 with the criterion and found subject estimates approaching maximum possible performance in the last of 100 trials. However, when the cue validity was -.80, learn­ing after 100 trials was less than half this level, Deane, Hammond, and Summers (1972), using a three-cue task, found reasonable learning after 150 trials when all three rela­tionships were positive, but almost no learning when the re­lationships were U-shaped. Learning improves somewhat when the subjects are warned about possible nonlinearities


496 judgment and Decision Making

(Earle, 1970). Two-cue interactions are learned only if help­ful verbal cues are provided (Camerer, 1981). Even after reaching high levels of performance under low-error feed­back, subjects' performances rapidly decline when feedback error levels are increased (Connolly & Miklausich, 1978). In short, as Klayman (1988) suggested, learning from outcome feedback is "learning the hard way."

In many real-world tasks, of course, feedback is probably much less helpful than is the outcome feedback provided in these MCPL laboratory tasks. A human resources (HR) pro­fessional trying to learn the task of predicting candidates' potentials from application materials receives feedback only after significant delay (when the applicant has been hired and on the job for some time); under high error (supervisor rat­ings may introduce new sources of error); and, crucially, only for those applicants actually hired (see Einhorn, 1980, on the inferential problems facing waiters who believe that they can spot good tippers). Laboratory MCPL tasks show excruciat­ingly slow learning of simple tasks under relatively good out­come feedback. Real-world tasks are almost certainly more difficult, and real-world feedback almost certainly less help­ful, than are the laboratory conditions. It thus seems unlikely that outcome feedback is the key to learning real-world tasks of this sort, and interest in laboratory MCPL studies seems to have largely subsided in recent years.



Policy Capturing

Policy capturing, also known as judgment analysis (Stewart, 1988), is the process of developing a quantitative model of a specific person making a specific judgment. The general form of such a model is an equation, often first-order linear, relat­ing the judgments, J, to a weighted sum of the information "cues," jc,-. Hundreds of such studies have been conducted, dating at least to Wallace (1923) who modeled expert judges of corn. Hammond and Adelman( 1976) studied judgments of handgun ammunition; Slovic (1969) studied stockbrokers; Phelps and Shanteau (1978) studied hog judges; and Doyle and Thomas (1995) studied audiologists. In addition, policy capturing has been commonly used for organizational appli­cations, such as decisions concerning salary raises (Sherer, Schwab, & Heneman, 1987), alternative work arrangements (Powell & Mainiero, 1999), and applicant ratings and recom­mended starting salaries (Hitt & Barr, 1989). Policy captur­ing is thus a very widely used procedure.

It is also fair to say that the technique has been widely abused and that many of the findings are hard to assess or in­terpret. The basic approach is so simple and obvious that it is easy to overlook some important subtleties that vitiate the final conclusions. We shall sketch some of these points here;


see Stewart (1988) and A. Brehmer and Brehmer (1988) f0 fuller discussion.

Suppose one were interested in modeling the judgm? process of a university department head who is selectino can didates for graduate school. The department head reads an applicant's file, writes a merit score between 0 and 100 on the cover, and moves to another file. At a later stage the files are rank ordered, and applicants are admitted in descending order of merit score until all the places are filled. How might one model the department head's judgment process?

A first step is to establish what information she is collect­ing from each file: the cues. Simply asking her what cues she is using may be misleading: It is possible that she is biased to­ward (or against) women, minorities, left-handers, or scrab­ble players and is either unaware of the fact or chooses not to admit it. Second, how does she code this information? What counts as a "strong" GPA or an "acceptable" letter of refer­ence? Significant work may be needed to translate the depart­ment head's inspection of the file into a set of scale scores representing the cues that she discovers and the scores in it. Stewart (1988) provided helpful practical advice on this process, and A. Brehmer and Brehmer (1988) discussed com­mon failures, Doyle and Thomas (1995) reported an exem­plary study in identifying the cues used by audiologists in assessing patients for hearing aids. Once cues and judgments have been identified and scored, estimation of a standard multiple linear regression model is straightforward. Interpre­tation, however, may not be. In particular, the interpretation of the relative weights given to each cue is conceptually dif­ficult (see Stevenson, Busemeyer, & Naylor, 1991).

One subtle (and, in our view, unsolved) problem in policy capturing is how to meet Brunswik's goal of representative design. This goal plainly prohibits constructing simple or­thogonal designs among the cues: Such independence de­stroys patterns of cue intercorrelations on which expert judges may rely. Cue ranges and intercorrelations should re­flect those found in some relevant environment, such as the pool of applicants or patients with whom the expert regularly deals. A sample of recent actual cases would appear to meet this requirement, but even here complexities arise. If one wishes to compare expert predictions with actual perfor­mance, then only the subset of applicants hired or admitted is relevant—and this subset will have predictably truncated cue ranges and intercorrelations compared to the entire pool. Changes in pool parameters arising from changes in the em­ployment rate, prescreening, self-selection into or out of the pool, or even of educational practices may all affect the mod­eled judgment. The underlying problem of what exactly de­fines the environment that the sample of cases is intended to represent is a conceptually subtle and confusing one.


Given these methodological worries, some caution is needed in summaries of research findings. Common general­izations (A. Brehmer & Brehmer, 1988; Slovic & Lichtenstein, 1971) include the following: (a) Judges generally use few cues, and their use of these cues is adequately modeled by simple first-order linear models; (b) judges describe themselves as using cues in complex, nonlinear, and interactive ways; (c) judges show modest test-retest reliabilities; and (d) inter-judge agreement is often moderate or low, even in areas of established expertise. In light of the methodological shortcom­ings just noted, we propose that such broad generalizations be taken as working hypotheses for new applications, not as settled fact.


Date: 2016-03-03; view: 726


<== previous page | next page ==>
NORMATIVE-PRESCRIPTIVE VERSUS BEHAVIORAL-DESCRIPTIVE THEMES IN JUDGMENT AND DECISION MAKING | Heuristics and Biases
doclecture.net - lectures - 2014-2024 year. Copyright infringement or personal data (0.006 sec.)