Designing Instruction Formats

In designing instruction formats, there are two extreme positions. One approach is to have a very small number of formats and fit the instructions into this small set. The other approach is to design an optimal format for each instruction. Roughly speaking, RISC designers take the first approach, and CISC designers tend more to the second, although even in CISC processors there will be a degree of uniformity in that very similar instructions might as well have very similar formats.

A fundamental decision has to do with the size of the opcode, that is, the number of bits reserved for indicating the particular operation to be performed. Obviously, if more bits are used, then more distinct instructions can be supported. The cleanest approach is to use a fixed number of opcode bits for all instructions. Interestingly, although RISC processors do have uniform instruction sets, they are not quite that uniform, whereas there have been some CISC designs in the past which always used an 8-bit opcode.

In practice, a designer will recognize that certain operations are much more common than others and react by adjusting the number of opcode bits appropriately. For example, if we have determined that only 4 bits are necessary to represent the most commonly used instructions then 16 possible bit patterns are available. Fifteen of these are used for the most common 15 instructions, and the sixteenth is used to indicate all of the other instructions. Additional bits then need to be allocated elsewhere in the instruction format so that these less common instructions can be distinguished. In the CISC designs, this sort of principle is carried to extremes. For example, the number of opcode bits in x86 instructions ranges from 5 to 19. On the other hand, RISC machines tend to have fewer instructions, so fewer opcode bits are needed.

Since various operations need different numbers and kinds of operands, space can be saved if the layout of instructions is specialized to the particular needs of the instruction. Furthermore, once this typical CISC philosophy is followed, there is no particular requirement that different instructions have similar operand structures. On the other hand, RISC designs strongly favor a small number of uniform instruction formats, preferably all of the same size. The regularity of these formats simplifies the instruction-decoding mechanism, and means that a technique known as pipelining can be used. One aspect of pipelining is that several instructions will typically be in the pipeline, allowing the overlapped decoding and execution of several instructions. This kind of overlapped decoding becomes much more difficult for the numerous and complex instruction formats of CISC processors.

The operation of adding two numbers is a fundamental capability in any computer. We will use an Add instruction to illustrate some possibilities for instruction formats. From the programmer's viewpoint, the simplest form of addition is C ← A + B, where A, B, and C are the names of three variables. Let the values of these variables be stored in distinct memory locations. We will associate the variable names A, B, and C with the addresses of these locations. The above expression then has the following meaning. The summands in memory locations A and B are to be fetched from the memory and transferred into the CPU where they will be added in the ALU (arithmetic and logic unit). Then the resulting sum is to be stored into memory location C.

If the complete process is to be specified by a single machine instruction, three address fields, specifying A, B, and C, will be needed, together with the operation field that specifies the addition operation. This three-address instruction can be represented symbolically as

Add A,B,C

A problem with this approach is that the inclusion of three address fields in an instruction may require a large number of bits. If A, B, and C are arbitrary addresses in the memory space 0 to 2^m - 1, then m bits are needed to specify each address. Therefore, the instruction must contain 3m bits for addressing purposes, in addition to the bits needed to denote the operation to be performed.

An alternate approach is to perform the desired calculation by using a sequence of instructions, each of which requires fewer than three memory address fields. Suppose we use only two addresses. If they are used in an Add instruction to specify the two operands A and B, an implicit assumption will need to be made about where the sum is to be sent. A number of computers have two-address Add instructions that send the sum back into memory to one of the operand locations, thus destroying the original operand. The instruction

Add A,B

performs the operation B ← A + B. A single two-address Add instruction cannot be used to solve our original problem, which has to add the contents of locations A and B, destroying neither of them, and place the sum in location C. However the problem can be solved through the use of another two-address instruction that moves the contents of one memory location into another location. The desired instruction Move B,C performs the operation C ← B. Thus the operation C ← A + B can be performed by executing the two instructions: Move B,C and Add A,C

The next possibility is to consider using only one-address instructions. Of course, since addition is a two-operand operation, an implicit assumption must be made regarding the location of one of the operands as well as the result. A general-purpose CPU register, usually called an accumulator, may be used for this purpose. The machine instruction Add A then means add the contents of memory location A to the contents of the accumulator and place the sum into the accumulator. Let us also introduce the one-address instructions: Load A and Store A. Execution of the Load instruction moves the contents of memory location A into the accumulator, and execution of the Store instruction moves the contents of the accumulator into memory location A. The operation C ← A + B can then be performed by executing the sequence of instructions: Load A, Add B, and Store C.

Many computers have a number of general-purpose CPU registers, each of which can be used as an accumulator. If there are 8 (or 16) registers, then 3 (or 4) bits will be needed in a field of an instruction to address the register that is to take part in the operation. This is considerably less than the number of bits needed to address a location in the main memory. Let R_i represents a CPU general-purpose register. Then, the instructions Load A,R_i, Store A,R_i, and Add A,R_i are generalizations of the earlier Load, Store, and Add instructions of the single-accumulator case. In particular, execution of Load A,R_i moves the contents of memory location A into register R_i. Execution of Store A,R_i moves the contents of register R_i into memory location A, and execution of Add A,R_i adds the contents of memory location A to the contents of register R_i and then places the result in register R_i. This type of instruction, in which one address always refers to as a location in the main memory, and the other, shorter address always refers to a CPU register, has a format intermediate to the one- and two-address formats discussed earlier. Because of this property, it is often called a 1¹/₂-address format. Machines with multiple CPU registers also include instruction formats that permit operations among the registers themselves. Thus Add R_i,R_j adds the contents of register R_i to those of register R_j and places the answer in register R_j.

A number of general-purpose registers are often used in computers with two-address instruction formats. In general, the registers improve processing efficiency by reducing the number of main memory accesses required when a particular data item is used repeatedly in a computation. In machines that have the two-address format, there is usually a way to specify a register instead of a main memory location in either of the address fields. Thus the instructions: Move A,R_i and Move R_i,A achieves the same results as the instructions:

Load A,R_i and Store A,R_i in the more restrictive 1¹/₂-address format discussed earlier [4, 11, 23, 29, 41, 48, 56, 61, and 64].

We have discussed instructions that use three-, two-, and one-address fields. It is also possible to use instructions where the locations of all operands are defined implicitly. They depend upon the use of a method for storing operands in what is called a pushdown stack. Such instructions are sometimes referred to as null- or zero-address instructions.

Addressing Modes.An important issue in comparing microprocessors is the manner in which they allow memory to be addressed. Typically, a processor has a variety of addressing modes that allow memory to be addressed in various ways. There is often a close relationship between these memory-addressing modes and the kind of memory addressing which is needed when a program written in a high-level language is compiled. These addressing modes provide support for compilers that need to generate code for programs written in high-level languages. The relationship between high-level languages and the hardware that supports them has historically been something of a two-way street. On the one hand, programming languages have been influenced by the design of the hardware, and on the other hand, high-level languages have influenced hardware design.

Direct Memory Addressing.Most programming languages allow a programmer to define static data, that is, data for which one can be certain that only one instance of that data will exist in memory during the execution of a program. Some, but not all, processors have a direct addressing mode that permits the actual address of static data to appear in the instruction. When direct addressing is not available on a processor, a commonly used technique is to have a compiler arrange the static data in a contiguous region of memory so that the offset of each variable within the region is known at compile time. Static data is then addressed by loading a register with the address of the first word in this region, and accessing the variables by adding the offset of each piece of data to the address in this register.

Indexed Addressing.The array appears as a data structure in almost all programming languages, and as a result, almost all hardware supports the use of array indexing. The semantics of a programming language define exactly how and where an array is allocated in memory. If an array is allocated statically, then a very simple addressing mode known as indexed addressing supports access to the elements of the array. If the array, on the other hand, is allocated at run time then a slightly more complex addressing mode is required. Addressing an element in a statically allocated array requires that the base address of the array, known at either compile or link time, be added to the offset of the array element (known at run time). Indexed addressing is implemented in the hardware by having the offset (held in a register) added to a specified starting address.

Based Addressing.An addressing mode which is related to index addressing, but whose use is not exactly the same, is known as based addressing. Using based addressing, a register known as a base register contains the address of a block of storage. An important case of based addressing occurs when the block of storage to which the base register is pointing is a procedure's stack frame on the run-time stack. The local variables of a procedure are stored in these stack frames, and a pointer known as the frame pointer points to the stack frame of the currently executing procedure.

Comparing Indexed and Based Addressing. Based addressing is similar to indexed addressing, but there are two important differences. First, the value in the base register is always a memory address, so it does not need scaling. Second, in the case of indexing a static variable, the static base address is a full memory address, whereas in based addressing, it is often the case that the offset within the based area of memory is small (both records and stack frames are typically small).

Base Plus Index Addressing.When an array is dynamically allocated, or if the array is a local variable allocated in a stack frame, the addressing of an element in this array requires both based and indexed addressing.

Indirect Addressing. When parameters are passed to procedures; the value passed and stored for use by the calling procedure is often the address of the actual parameter, rather than a copy of the value of the parameter. In some cases, a programming language may require the use of this method; call by reference, of passing parameters (e.g., the VAR parameters of Pascal). In other cases, the method of passing parameters is optional.

Indirect Addressing with Indexing.As this gets more complicated, the issue of whether to provide a single addressing mode that handles this case becomes more contentious. Only one of the processors we look at has this addressing mode built in. On other processors, a sequence of two or more instructions is needed to access an indirect array element.

Procedure Calls and the Call Instruction.In a language such as Pascal, which permits procedures to be nested and recursive procedure calls to be made, there are two issues that must be dealt with. The first issue is how a procedure will address its own local variables, since the possibility of recursion means that it is possible for more than one instance of a procedure's variables to be alive. The second is how this procedure can address variables within which it is lexically nested, that is, non-local variables. The universal approach is to maintain a stack, often called the run-time stack (sometimes called the activation stack), which is used to hold the values of variables that are local to a procedure, as well as other important information. The run-time stack is organized into stack frames (also called activation records), one for each active procedure. The main purpose of a stack frame is to allocate storage for the local variables of a procedure. If a procedure has called itself recursively exactly once, then there will be two stack frames on the run-time stack, one for each instance of the procedure. On most machines the run-time stack builds down in memory, which means that it will build down to lower memory addresses.

The issue of how to manage the stack, that is, how to create and dismantle stack frames as procedures are called and returned from, forms a substantial part of what is known in textbooks as run-time storage management. When a procedure call occurs, a highly stylized sequence of events, called the calling sequence, occurs as a new stack frame can be constructed. When a called procedure completes, the stack frame must be dismantled, and the execution environment of the calling procedure must be restored, something that is done as part of the return sequence.

The calling and return sequences are sequences of instructions that are executed by both the calling procedure and the called procedure. This sequence of machine instructions is generated by a compiler as part of a standard set of prologues and epilogues to the procedures as well as part of the call instruction.

Before describing the structure of stack frames and how they are constructed during the calling sequence, we will begin with a short description of call instructions. Throughout this section, keep in mind that this is a simplified view of stack frames and the calling/return sequence. It ignores the differences that would arise due to either the particular machine for which a compiler is generating code or the particular program?ming language for which the compiler has been built. The two basic instructions required for implementing procedure calls are the call instruction and the return instruction. A call instruction is simply a jump in which the return address is saved. A return instruction returns control to the calling procedure by restoring the saved address after the called procedure has finished executing.

Most microprocessors use one of two basic styles of call instruction. In one, the return point is placed in a register, typically a specially designated register, so that no information about which register has been used needs to be specified within the instruction format. This is desirable since call instructions require that most bits be used to specify an address and not be "wasted" on other fields that may not be particularly useful for this type of instruction. The return instruction in this case simply needs to be able to branch indirectly to a return address stored in a register.

The second style of call instruction saves the return point on a hardware stack, adjusting the value of the register that serves as a stack pointer. The corresponding return instruction pops the return point off this stack and uses it to return to the instruction just after the call. This stack-based type of call and return is particularly well suited to languages like Pascal, C that allow recursion and depend on the use of a stack to control procedure calls.

Date: 2016-06-12; view: 166

<== previous page	\|	next page ==>
Superscalar Processing	\|	Building a Stack Frame

doclecture.net - lectures - 2014-2025 year. Copyright infringement or personal data (0.595 sec.)