The microprocessor technology has advanced to the point that performance levels rivalling those of mainframes can be achieved. However, requirements for ever increasing performance, particularly for engineering workstations, combined with continuing work in design and implementation of microprocessor architectures, has recently led to an explosion of alternative architectures [11, 13, 27, 40, 41, 42, 44, 57, 61, 62].
These alternative architectures are based on new concepts of microprocessor design, collectively referred to as Reduced Instruction Set Computers (RISC). RISC advocates have turned this idea on its head by proclaiming that when it comes to microprocessor instruction design, "less is more." The idea behind the RISC architectural philosophy is that by simplifying and reducing instruction sets by eliminating all non-essential instructions, the remaining instructions can be made to run very much faster. By non-essential instructions we mean those executed so infrequently that replacing them with sequences of simpler instructions does not have any noticeable impact on efficiency. The fundamental observation that inspired the original RISC research was that only a small part of the instruction set of most other processors were commonly executed ? a large number of instructions were executed rather infrequently.
The existing philosophy, which has been described as Complex Instruction Set Computers (CISC) in a somewhat derisive fashion, is by no means dead, and indeed personal computers still use CISC chips.
The continuing controversy between the CISC and RISC camps is a fierce one. The winner in the judgment of the reporter was RISC. We look at a number of representative CISC and RISC microprocessor designs, as well as some which do not clearly fall into either category ? the line between the two philosophies is not always completely clear from a technical point of view. Our intention is to understand the strengths and weaknesses of the two approaches, and to begin to guess how the argument will eventually be settled.
A question that is commonly asked by both applications programmers and operating system designers is: To what extent does the processor provide specialized instructions that aid in solving the problem at hand? RISC designs generally provide only a minimal set of instructions from which more complex instructions can be constructed. On CISC processors we often find elaborate instructions intended to simplify programming of frequently occurring specialized operations.
The basic CISC philosophy is to provide an extensive set of instructions covering all sorts of special-purpose needs. In this approach, the cost of these extra instructions seems minimal ? ?You don't have to use them if you don't need them.? The contrasting RISC attitude is that these fancy instructions are not really used often enough to justify the extra complexity in implementing the hardware and that this complexity tends to slow down the more commonly executed instructions.
In practice, the dividing line is not so clear. For example, floating-point division is an extremely complicated operation that, like other complicated operations, can be programmed using simpler instructions. However, nearly all RISC processors include a floating-point division because in this particular case it seems that the complicated instruction is used often enough to justify its inclusion. On the other hand, the more sophisticated system-level instructions appearing on some CISC processors, such as those that handle tasking, do not seem to be important enough to be universally included in RISC designs.
What is CISC?We will consider a computer variant with the maximal increase of amount of commands, which became dead-locked on the path of development of CISC architectures in a clean kind. The VAX-11 is a 32-bit computer. All addresses and data paths are 32-bit wide. The main memory space is byte addressable. Therefore, a 32-bit address reaches a total of 232 bytes (4 gigabytes). This address space is more than adequate for most programming tasks; indeed the physical main memory provided in most implementations of the VAX-11 is usually only up to a few megabytes. There are sixteen 32-bit CPU registers, named R0 through R15. Registers R0 through R15 are general-purpose registers that can be used to hold data or addresses. Register R14 is the stack pointer (SP) and R15 is the program counter (PC). Registers R12 and R13 have special roles in conjunction with handling procedure calls and parameter passing. Their use will be discussed later.
The VAX-11 supports many different data types. Signed integers in byte, word (2 bytes), long-word (4 bytes), and quad-word (8 bytes) sizes are handled by the instruction set. Floating-point numbers in both long-word and quad-word sizes are also included.
All of these numeric data types can be stored in the main memory beginning at an arbitrary byte address; that is, there is no word, long-word, or quad-word boundary restrictions on the location of the multiple-byte types. Some typical examples are shown in Fig. 3.1. Note that the least significant byte of multiple-byte integers is stored in the lowest address location. In addition to these, specific formats are provided for representing binary-coded decimal (BCD) numbers, character strings, and bit strings. The instruction set in the VAX-11 is very extensive in comparison to the PDP-11. A large number of instructions are provided for operating directly on the various data types. In addition to the basic instructions, which include arithmetic and logic operations, tests, branches, and subroutine calls, there are a number of more complex machine instructions which facilitate the implementation of high-level language constructs. The addressing modes provided in the VAX-11 include the PDP-11 modes. Other modes are provided for efficient access to data arrays and compact representation for short immediate data.
Advances in VLSI technology, and, more specifically, the number of transistors that could be packed onto a single chip made this possible. The general mind set of most designers favored supporting instructions that mirrored the high-level operations and addressing modes of high-level programming languages. These complicated instructions and addressing modes were provided in the belief that if either clever compilers or assembly language programmers could utilize them, substantial gains in the performance of such machines would result. One of the basic assumptions underlying this belief was that anything implemented in hardware would be faster than if it were implemented in software.
What is RISC?The RISC design philosophy is based on two important observations. First of all, some of the instructions provided by CISC processors are so esoteric that many compilers simply do not attempt to use them. Second, even if compilers could use such specialized instructions, it is hard to imagine that they would be used very frequently. For example, very rarely and of short duration time commands heritable from the previous realized versions of architectures of x86 are used. Tuning under the previous releases of architectures of x86 forces computers firms for the maintenance of market of sale to save the row of old features and proper archaic architectural decisions. Therefore for development of such computers a successful compromise appeared more viable between CISC and RISC. These observations are of great importance to those who implement hardware, because complex instructions contribute in many ways to reducing the efficiency of a chip as a whole. In the minds of RISC designers, the only complex instructions worth including are those whose benefit is clearly established in terms of performance. Not only can complex instructions require many cycles to execute, but the extra logic required to implement even a single instruction may lengthen the basic cycle time. One of the main motivations behind the idea of RISC is to simplify all of the architectural aspects of the design of a machine so that its implementation can be made more efficient.
George Radin is often regarded as the "father of RISC", the head of the IBM 801 research effort, described this effect as follows: ?Complex, high-function instructions, which require several cycles to execute, are conventionally realized by some combination of random logic and microcode.
We have no objection to this strategy, provided the frequency of use justifies the cost, and more importantly, provided these complex functions in no way slow down the primitive instructions.?
But it is just this pernicious effect on the primitive instructions that has made us suspicious. Most instruction frequency studies show a sharp skew in favor of high usage of primitive instructions (such as LOAD, STORE, BRANCH, COMPARE, ADD). If the presence of a more complex set adds just one logic level to a ten-level basic machine cycle (e.g., to fetch a microinstruction from ROM), the CPU has been slowed down by 10%. The frequency and performance improvement of the complex functions must first overcome this 10% degradation and then justify the additional cost.
We should point out that the term Reduced Instruction Set Computer really refers to a set of reduced instructions, not a reduced set of instructions. The goal of RISC is not simply to reduce the number of instructions, but rather to simplify the instructions that are included in the instruction set of a machine. Each individual instruction needs to be simplified to the extent that some significant advantage in performance can be obtained when the instruction set is implemented. RISC machines do, in general, have a smaller total number of instructions than many CISC processors, but this is a consequence of the fact that careful study of instruction frequencies fails to justify larger numbers of instructions. Some machines are classified as RISC machines, and others equally clearly fall into the CISC classification. However, the line is not well defined, and we will encounter a number of machines that incorporate some, but not all, of the RISC design techniques.
Not all of these characteristics are present in all RISC designs, but taken together they represent a general philosophy of instruction set and architectural design that stands in definite contrast to the CISC tradition of providing as many useful instructions as possible.
The IBM 801 Project. The IBM 801 Project was the first attempt to develop an architecture whose design could be intentionally described as RISC. Almost all modern RISC ideas are present in the original 801 design. Although the 6600 has sometimes been called the first RISC processor due to its similarities with current RISC architectures, the 801 can certainly be credited with being the first machine designed with these goals explicitly in mind. A decision was made to investigate the development of special-purpose architecture suitable for this application. The first 801 designs were significantly influenced by these requirements. In particular, there was no requirement for floating-point, so the first 801 did not provide any hardware floating-point capability. The primary requirements were extremely fast execution and low cost. As is often the case with development projects in computer science, the original application was eventually forgotten. Even though the idea of telephone exchanges was abandoned, the processor concepts were evidently important and powerful, and so work proceeded with the goal of producing an extremely fast computer suitable for use with high-level languages. From the inception of the 801 Project, it was decided that a pipelined design in which all instructions would take the same time to execute would be used. The design constraints were that instructions had to execute in one cycle and that the length of the cycle should not be increased unless it was absolutely necessary. Those were day-in and day-out constraints that affected which instructions were implemented and which instructions were excluded.
With the decision to use pipelining, several other issues needed to be resolved. Pretty quickly, the researchers came to the conclusion that they would not expose the pipeline. In other words, from the programmer's point of view, the semantics of instruction execution would be strictly sequential. The alternative approach exposes the pipeline and requires the compiler to ensure that instructions are not issued before their operands are available. A compiler can deal with an exposed pipeline in straight-line code, but handling an exposed pipeline across branches is much harder. Writing in assembly language, humans have an even harder time. The conclusion was that the cost of using interlocking was so small that it was unnecessary to expose the pipeline (As we shall see, the MIPS, which makes the pipeline partly visible to the programmer, was designed with a different point of view). The one exception was that jumps would have delay slots (i.e., the instruction after the jump would always be executed). This idea was introduced in the 801 design and, as we will see, has been adopted by nearly all subsequent RISC designs.
Briefly put, the IBM view in the design of the original 801 was that reordering of instructions might be needed for maximum efficiency of code. Although the compiler was expected to schedule instructions with this goal in mind, it was seen as undesirable for a compiler to be forced to reorder instructions in order to ensure correct code. This philosophy has been a central characteristic of all the IBM RISC designs.
The Influence of Paging. One of the important problems in maintaining the single-cycle property is dealing with storage references and stores, particularly loads that cannot take a single cycle. Part of the problem of loads is what to do about page faults. The 801 viewpoint was that a load instruction had completed when it had requested the storage access. It takes one cycle to compute an effective address and put it into an independent parallel memory unit that can do the fetch and return the value. The target register is then interlocked on that load so that the software can try (but doesn't have to) schedule other instructions while the value is required. The expectation was that a machine could be designed in which the value of a load would only become available two instructions past the load, that is, the instruction immediately following the load could not use the result of the load instruction.
The problem was page faults. It is difficult to determine quickly enough whether or not a load instruction can succeed. The first 801 did not address this issue and did not provide virtual memory mapping. This decision simplified the implementation of the load instruction, since the issue of how to back out subsequent instructions on a page fault did not arise. In retrospect, this decision was inappropriate. Virtual memory implementations are required on modern computer systems, not only to deal with programs with very large memory requirements, but also to accommodate the needs of modern operating systems for multiprogramming, and address space management. Even at the expense of complicating the design and slowing down instruction throughput, it is necessary to provide page fault traps that can recover and continue execution.
Instruction Sets.Reducing the size of the instruction set was never an explicit goal of the IBM RISC effort. However, the IBM researchers had access to extensive libraries of trace tapes that had been gathered at customer sites. These tapes were used by IBM to plan future performance enhancements to the 370 line, but, of course, they also contained exactly the critical information needed for the design of RISC architectures ? namely, the instruction execution frequency figures that can be used to decide which instructions are needed and which may be omitted. These trace tapes were probably biased in favor of commercial applications, since this is typical of the environment of IBM mainframe customers. The 801 design reflects this bias in not having floating-point, and in providing an assist for packed decimal instructions (similar to the instructions on the x86). The 801 is the only RISC architecture to provide for decimal arithmetic ? even IBM abandons this feature in its latest RISC designs. Another feature of commercial applications is that they tend to be rather rich semantically, and to use a larger instruction set. Most of the other RISC designs have been biased by systems-type applications written in C, which tend to be much more Spartan in their use of instructions.
The Second 801 Architecture.The most important change was in the register structure ? the second architecture used thirty-two 32-bit registers rather than the sixteen 24-bit registers of the first 801. From experiments with the simulator, IBM concluded that 32 registers was the best choice. With the 16 registers of the earlier design, there were many programs for which the compiler ran out of registers. With 32 registers, this occurred much less frequently, and experimentation with various numbers showed that going beyond 32 registers encountered diminishing returns.
Instruction Size.Another important change for the second 801 architecture was that all instructions were uniform length (32 bits). In the earlier 801, there were 16-bit and 32-bit instructions. The 16-bit instructions were introduced because of concerns with code size. With the development of faster and cheaper memories, it became clear that the extra complexity of a non-uniform instruction size was not worth the saving in code size.
The Berkeley RISC and Stanford MIPS Projects. The Berkeley RISC I and RISC II processors were designed and implemented from 1980 through 1983, by a team of graduate students led by David Patterson. One of its most unique features was the use of overlapping register windows, which was carried over into the design of the SPARC (Scalable Processor Architecture). The two designs are, in fact, very similar in many, but not all, respects. The other major early RISC chips were designed and built at Stanford by a group led by John Hennessy. The design of the MIPS was distinguished by the lack of interlocks ? MIPS is an acronym for Multiprocessor without Interlocking Pipeline Stages.
The MIPS R10000, which is designed and manufactured by MIPS Computer Systems, is based on one of the earliest RISC designs, the Stanford MIPS chip. The MIPS chip is composed of two logically independent processors, a main processor known simply as the CPU, and an internal coprocessor known as CP0. The CPU is a 32-bit RISC processor that incorporates a standard set of arithmetic and logical instructions. Memory management facilities, as well as exception management and other operating system functions, are all under the control of the coprocessor CP0. In addition to CP0, the MIPS is able to support up to three additional coprocessors, CP1 through CP3, of which CP1 is conventionally used for floating-point calculations and the other two are free for special-purpose use. The instruction set of the MIPS includes special instructions that allow the main processor to communicate with any of the coprocessors that happen to be attached to it. Coprocessor instructions allow the CPU to read from and write to the registers of a coprocessor. Sending a pattern of bits whose interpretation is up to the designer of the coprocessor triggers instructions on the coprocessors. The special coprocessor CP0 has exactly the same interface, but because it is on board the main chip, and because its structure is defined, it has some additional predefined and specialized instructions that deal with some of the memory and interrupt management functions. MIPS Inc. also produces a floating-point chip called the R2010 FPA, which is intended to be attached to the MIPS as CP1.