Edwards (1968) ran the following simple experiment. He I showed subjects two book bags containing 100 poker chips. Bag A contained mainly red chips. Bag B mainly black. He randomly chose one of the bags, drew out a small sample of chips, and showed them to the subjects. He then asked the subjects for their estimate of how likely it was that he was drawing from Bag A. He found that subjects, initially persuaded that the probabilities were 50/50 before seeing the sample, generally revised their estimates in the direction suggested by the sample (i.e., toward A if the sample was mainly red chips) but not as far as would be required by Bayes's theorem. Edwards (1968) labeled the phenomenon conservatism. It involves three elements: a well-structured probabilistic task (e.g., sampling from two known populations); a sensible normative model for how the task should be performed (Bayes's theorem); and an observation that actual behavior is systematically biased with regard to this normative model.
The dominant paradigm for research on judgment under uncertainty through the 1970s and 1980s, the so-called heuristics and biases paradigm (Tversky & Kahneman, 1981), Was founded on observations of systematic errors of this sort: probabilistic tasks in which human behavior deviated system-Really from a normative rule. The paradigm was, however, more than a simple catalog of errors. Tversky and Kahneman 11981) argued that the observed errors were manifestations of ^gnitive rules of thumb or heuristics that, though generally Active and low-cost, can be misleading in certain unusual ^cumstances. Thus, for example, we might be well guided as 0 foe relative popularity among our acquaintances of various °bbies by noting the ease or difficulty with which we could nn§ examples to mind (the availability heuristic). We would ety be misled, however, about embarrassing or illegal hob-es> whose practitioners might well take pains to conceal Ir 'Merest. Similarly, dramatic causes of death are judged
Prediction 497
to be commoner than less dramatic ones (Slovic, Fischhoff, & Lichtenstein, 1979), and easily found words as more likely than those more difficult to search for (Tversky & Kahneman, 1973). (We discuss examples of heuristics and biases in prediction research more fully in the following section on simple prediction.)
Work in this paradigm has declined in recent years. First, whatever the theoretical intentions, much of it became an ever-growing catalog of errors, with modest or no theoretical underpinnings that might allow prediction of when a particular heuristic would be evoked or error displayed. Second, there was growing doubt about the appropriateness of some of the normative models invoked to demonstrate that errors had been made. Third, it became clear that at least some of the claimed errors were actually the result of subjects' working successfully on problems other than the one the experimenter intended. (See Jungerman, 1983, and Gigerenzer, 1991, for extended critiques of the heuristics and biases approach.) Research interest in documenting our shortcomings seems to have declined. Increasingly, researchers are exploring the actual mechanisms that account for our performance, including the sometimes excellent performance of experts in real settings. (See Goldstein & Hogarth, 1997, and Connolly et al., 2000, for recent samplings of the literature.)
PREDICTION
Simple Prediction
There is evidence that, in making predictions, we use a variety of the heuristics discussed earlier. We will discuss three such heuristics: anchoring and adjustment, availability, and representativeness.
Imagine that an organization wants to predict sales for the coming quarter. A common approach would be to start with current sales as an initial estimate (the anchor), and then make an adjustment to account for market trends, new incentives, and so on. While this anchor-and-adjust heuristic may provide a reasonable estimate, research indicates that two potential problems may arise. First, the anchor may not be appropriate: If a new motivation program is applied to only a subset of salespeople, then the average of this group's sales should be used as an anchor, rather than the average of all salespeople. Second, adjustment from the anchor may not be sufficient: The predicted value may be too close to the anchor of average sales. Bolger and Harvey (1993) found that decision makers used an anchor-and-adjust strategy for predicting events over time (e.g., sales) and that their adjustments were insufficient.
498 Judgment and Decision Making
Another method for making predictions uses the availability heuristic: The likelihood of an event is judged by how easily instances come to mind through either memory or imagination. This heuristic is generally reasonable because frequent events will tend to be noticed and remembered more than will less frequent events. A manager may predict how likely a particular employee is to be late for work based on recollections of past episodes. However, availability may lead to biased predictions when we selectively attend to information that is available (e.g., a vivid or recent event) instead of considering historical-statistical data systematically. For instance, people who had recently experienced an accident or a natural disaster estimated similar future events as more likely than those who had not experienced these events (Kunreuther et al., 1978). Similarly, managers conducting performance appraisals can produce biased evaluations (either positive or negative) when they rely on memory alone: Vivid episodes and events within three months prior to the evaluation are overweighted relative to other information (Bazerman, 1998).
A third heuristic used in prediction is representativeness, in which the likelihood of an event is judged by its similarity to a stereotype of similar events. Thus, a manager might predict the success of an employee by how similar he is to other known successful employees. Again, while this is generally a good initial estimate, using the representativeness heuristic can lead to systematic biases. First, people have a tendency to make nonregressive predictions from unreliable predictors. For example, Tversky and Kahneman (1974) attempted to teach Israeli flight instructors that positive reinforcement promotes learning faster than does negative reinforcement. The flight instructors objected, citing examples of poor performance following praise and improved performance after reprimands. The instructors were attributing fluctuations in performance to interventions alone and not recognizing the effect of chance elements. Those trainees who received praise had performed at a level above their average performance, whereas those who were reprimanded had performed below their average. Statistically, both groups should tend to perform closer to their average performance on subsequent flights. Thus, the flight instructors falsely concluded that praise hurts and reprimands help because they predicted, by representativeness, that performance should be similar to the previous episode instead of regressing their predictions of performance to the mean. A parallel fallacy arises when we predict that the best performing salesperson this year will be the top performer the following year.
Another bias that has been attributed to using the representativeness heuristic is the tendency to neglect base rates or the prior probabilities of outcomes (Kahneman & Tversky,
1973). Imagine that a company knows that a small perce age (e.g., 1%) of its employees is using illegal drugs Th company conducts a random drug test in order to determi which employees are using drugs and are subject to termina tion. The test is relatively accurate, being correct 90% of th time; that is, the test will be incorrect only 10% of the time when either a drug user tests negative (false negative) or a nonuser tests positive (false positive). Should the company fire employees who test positive for drugs? Most would sav yes because the probability of being a drug user given the positive test result should be representative of the accuracy of the test (somewhere around 90%). The true answer is that it js very unlikely that this person is a drug user: Although the test is relatively accurate, it is not very diagnostic in this situation because the probability that a person who tests positive is a drug user is only 8.3%. The reason for this counterintuitive probability is that we neglect the influence of the base rate of drug users. Because the probability of being a drug user is so low, most of the people testing positive will not be drug users. For example, imagine that there were 1,000 employees in this company: 10 (1%) would be drug users, and 990 would be nonusers. Because the test is 90% accurate, 9 of the 10 drug users will test positive. However, 99 (10%) of the nonusers would also test positive (the false positives). Thus, of the 108 people who test positive, only 9 (8.3%) will be drug users. Note that even if the accuracy of the test in this example is increased to 99%, the probability that an individual who receives a positive test result is actually a drug user is still only 50%. This drug-testing example is an adaptation of the well known cab problem from Kahneman and Tversky (1973).
There are other potential difficulties in making predictions. In some situations, our judgments are overconfident. Experiments demonstrating overconfidence often ask difficult almanac questions in which subjects either choose between two options (e.g., "Which river is longer, the Tigris or the Volga?") or state a range of values in which they are 90% confident a true value lies (e.g., "How long is the Tigris river?"). Klayman, Soil, Gonzalez-Vallejo, and Barlas (1999) found a general overconfidence for almanac questions, but the overconfidence was much higher for subjective confidence intervals than for the two-choice questions (approximately 45% versus 5%). They found significant differences between individuals, but overconfidence was stable across individuals answering questions from different domains (e.g., prices of shampoo and life expectancies in different countries). A person who was overconfident in one domain was likely to be overconfident in another. Overconfidence has been found in many, though not all, contexts (Yates, 1990). There is evidence that it declines with experience (Keren, 1987) and with instructions to think of ways in which an
estimate might be wrong (Fischhoff, 1982). Overconfidence and its control have obvious implications in such organizational contexts as hiring, estimating timelines and costs, and developing business strategies.
There are also problems with learning from experience to make better predictions. The hindsight bias (Fischhoff & Beyth, 1975) hinders us in learning from our mistakes. In retrospect, we believe that we knew all along what was going to happen and are unable to recover fully the uncertainty we faced before the event. This impedes learning the real relationships between decisions and outcomes that are necessary for good predictions. Unfortunately, warning people of this bias does not help (Fischhoff, 1977). In addition, we may not seek the necessary information to test our beliefs because we have a tendency to seek confirming evidence (also known as the confirmation bias; Wason, 1960) rather than disconfirm-ing evidence. (See the section on information search, infor-) mation purchase.) Finally, the structure of the environment may not readily provide information to test relationships because some information is naturally hidden. For example, personnel selection is often based on HR test scores whose correlations with future job performance may be low. This will be true even for valid predictors of performance. We hire only applicants with high scores, so the variance of test scores for those hired is low, and any variation in job performance will likely be due to other factors (e.g., motivation, training, random elements). We generally do not observe the performance of those we do not hire—data essential to testing the validity of our predictions.
Idea Generation
Before an outcome's likelihood can be assessed, it must first be identified as a possibility. There is good evidence that we do not routinely generate many of the possible outcomes that •nay flow from our actions (Gettys & Fisher, 1979), and numerous remedial techniques have been proposed. One popular approach, group brainstorming, was first proposed in a nt>nacademic book (Osborn, 1953) as a way to generate as many ideas as possible. The participants were encouraged to Wprove, combine, and piggyback off other ideas without theism in order to generate more ideas than could be gen-dated when working individually. While this approach is Wuitively appealing, subsequent research (McGrath, 1984) assn°wn that compared to brainstorming groups, the same ^r of individuals working alone {called nominal groups) produce more ideas with the same level of quality. Ienl and Stroebe (1987) concluded that the main reason Ppears to be production blocking: Because only one group Nber can talk at a time, the other members may forget
Preferences 499
their ideas, construct counterarguments, and so on in the meantime.
In the 1980s computerized technology was developed to aid group brainstorming and decision-making processes (fortunately ignoring the evidence discussed earlier!). One popular system consists of several networked computers with a common main screen that can be seen by all in the room (Connolly, 1997; Nunamaker, Dennis, Valacich, Vogel, & George, 1991). Group members type ideas on their computers and interact by passing files between machines. All members can thus be productive simultaneously, while drawing stimulation from reading and adding to one another's files. This form of interaction appears to overcome the problems of face-to-face brainstorming. Electronic brainstorming (EBS) groups can outperform equivalent nominal groups (Valacich, Dennis, & Connolly, 1994), at least when the EBS groups are large (approximately eight or more). It is not entirely clear why large EBS groups enjoy this advantage in idea generation (Connolly, 1997). Anonymity provided by the EBS system increases the number of ideas produced (Connolly, Jessup, & Valacich, 1990) and the number of controversial ideas (Cooper, Gallupe, Pollard, & Cadsby, 1998), but may decrease satisfaction with the task (Connolly et al., 1990).
It is interesting to note that businesses continue to use face-to-face group brainstorming even though the literature clearly shows that it is inferior to both nominal groups and EBS. One reason may be its strong intuitive appeal. Paulus, Dzindolet, Poletes, and Camacho (1993) found that subjects predicted future performance and perceived actual performance as better in face-to-face brainstorming groups than in nominal groups, when in fact performance was superior in the latter. Another reason for the popularity of face-to-face brainstorming is the lack of access to EBS equipment. There is also some evidence that the performance of face-to-face groups can be raised to that of nominal groups by using highly trained facilitators (Oxley, Dzindolet, & Paulus, 1996).