The features that distinguish the SPARC from all of the other RISC machines are the register windows first introduced in the Berkeley RISC designs. The first thing we should point out is that SPARC, an acronym for Scalable Processor Architecture, is the name of architecture and not of a specific chip.
A SPARC implementation consists of up to three logical units: the integer unit (IU); the floating-point unit (FPU); the coprocessor (CP). An implementation is only required to include the integer unit, although in practice all implementations will include the floating-point unit as well. The optional coprocessor is intended to support an additional set of functions tailored to the needs of a specific system. The logical organization of these units is shown in Fig. 5.1. The IU executes the basic arithmetic, logical, and shift operations, as well as other user and supervisor instructions. In addition, the IU has a set of load, store, and operate instructions that control the floating-point unit and the coprocessor if either is part of an implementation. For example, when a floating-point load instruction is issued by the IU, the contents of a given memory location are loaded into the appropriate register of the FPU. Since they are issued by the IU, these instructions are considered to be IU instructions. In this respect, the floating-point registers might be viewed as if they were an extension of the IU. The floating-point unit is an IEEE-compatible design supporting single-, double, and, optionally, extended precision formats. Floating-point operate instructions are issued by the IU, and cause the appropriate operation to be executed by the floating-point unit. The SPARC definition has very little to say about the coprocessor other than that it can have up to thirty-two 32-bit registers that are visible to the user and up to 1024 different instructions. The coprocessor (CP) is also controlled by the IU through the use of load, store, and operate instructions. Unlike the FPU, the CP has no predefined operations in the sense that a meaning can be attached to the bit pattern in the opcode field of an operate instruction only if one is describing a specific CP implementation. It is up to an engineer to define those instructions, or some subset of them, as the coprocessor is designed. As in the case of the FPU, there are load and store instructions to put values into and get values from registers.
What are some examples of coprocessors that would be useful? A floating-point vector processor is one good example. A coprocessor could also be used to implement a commercial instruction set with decimal arithmetic, since, as we shall see, the SPARC instruction set is weak in supporting applications related to commercial processing. Yet another possibility would be a very simple coprocessor that had only a divide instruction, because the SPARC defines no divide instruction in its basic instruction set. If some useful function can be found for the coprocessor, a coprocessor chip can be built that implements what is needed, and then it can be put under the control of the IU. The SPARC architecture does not specify whether each of these units is implemented as an individual chip, whether they should all be implemented on a single chip,or anything else of that nature. These decisions are made on the basis of available technology and other design considerations. The first implementation of SPARC, for example, used an IU implemented on a single chip and an FPU consisting of three chips [11], [60], [61], [62].
The IU Register Set.The register set of the integer unit has an organization quite different from other RISC processors. More specifically, the register model is based on the concept of overlapping register windows; an idea pioneered in the Berkeley RISC I processor.
The User Register Set.An executing procedure on the SPARC has access to a total of 32 registers, the same number as the MIPS. But unlike the MIPS, the SPARC architecture divides these registers into several classes because of the influence of the register window model. From the point of view of a single user procedure executing on the SPARC, all of these registers can be treated in the same way. However, a set of conventions specifies that registers be used in specific ways as part of the procedure call convention. For example, some registers are used to pass parameters, and other registers are used to hold the results of intermediate computations.
The view that a single procedure has of the standard register set is completely independent of the register window mechanism, but if these conventions are followed, the interface between procedures can use the register window mechanism in an effective manner. In other words, the conventions for which registers to use for parameter passing are not simply software conventions, as on the MIPS ? they are dictated by, or at least suggested by, the register window mechanism.
One register that behaves differently whether register windows are used or not is register 0. Register 0 behaves in a similar fashion to register 0 on the MIPS ? it always contains the value 0. While the general intention of this hardware convention is similar to that of the MIPS, the uses of this register are somewhat different.
Register ZERO. The "trick" of wiring register 0 so that it always contains the value 0 is used on the SPARC just as on the MIPS. In both cases, this use of register 0 allows for some interesting applications. Remember that if R0 is used as a source operand, the value 0 is automatically provided. If R0 is used as a destination operand, then the value will vanish into a black hole.
Another use of register 0 is to remove the need for special simplified addressing modes. On a machine with double indexing like the SPARC, two registers are added to form an address. In some cases, only one register is needed to form the address. Instead of providing two separate addressing modes, one for single indexing and the other for double indexing, single indexing can be implemented as double indexing by using R0 as one of the two registers. Similarly, an absolute reference to low memory can be achieved using register R0 as the base register and the memory address as the offset.
The System Register Set.SPARC defines several registers in addition to the general-purpose registers. There are registers to control the register windows, a register to control traps, two program counters, a register to aid in multiplication, and a register defining the processor's state. Some of these registers are reserved for use in supervisor state, while others may be modified or tested by user programs.
The processor state register (PSR) has a number of fields that describe the current state of the processor (Fig. 5.2). The VER and IMPL fields indicate the version and implementation identification for the particular SPARC chip in use. ICC holds the four standard condition code bits: the negative bit, the zero bit, the overflow bit, and the carry bit can all be found in the PSR. The PSR also includes bits that allow software to test which coprocessors, if any, are attached to a specific SPARC implementation. If the corresponding bit is not set, a trap will occur if one tries to issue a floating-point or coprocessor instruction. The S-bit determines whether the processor is in a supervisor or user state. Finally, there is a 5-bit field that is the current window pointer (CWP).
A few of the other 16 system registers are also used for system status information. The window invalid register is used with the register windows and will be discussed in a later section. Some of the bits are set to indicate how many windows are implemented on this machine. Obviously, the operating system needs to know this number, for example, to know how much space to save for the registers on a context switch.
Register Windows.The term register window is intended to suggest that a procedure's view of its available registers represents only a small portion of a larger panorama of registers known as a register file. The fundamental idea behind the use of register windows is to reduce the overhead of procedure calls. Procedure call overhead is a well-known source of irritation to hardware implementers and compiler writers. One cause of this overhead is the need to save registers containing live values in memory when a procedure call is made, and then restore those registers on procedure return. Another source of procedure call overhead comes from the need to pass parameters in both directions between procedure calls. The register window design of the Berkeley RISC chips and the SPARC provide an imaginative solution that reduces the overhead associated with these problems. At any moment during execution, a procedure can refer to any one of 32 registers. The first eight registers are global registers that can always be referenced by any procedure. The remaining 24 registers constitute the register window. These 24 registers are divided into three groups: in registers, out registers, and local registers. The out registers are numbered 8 through 15, the local registers are numbered 16 through 23, and the in registers are numbered 24 through 31 (Fig. 5.3).
When the currently executing procedure calls another procedure, the usual convention is that the called procedure executes the special instruction SAVE. The effect of this instruction is to decrement the CWP, a 5-bit field within the PSR, moving the current register window down by 16 registers in the register file. Even though each register window contains 24 registers, the current window is moved only 16 registers down in the register file. This makes the out registers of the calling procedure the in registers of the called procedure, in effect passing parameters without having to copy them from one set of registers to another. The local registers are not shared between the caller and the called procedures and are used for storage of local variables or temporary results computed by the called procedure. From the point of view of the called procedure, the in region of its window contains the required parameter values. One important case where a procedure may decide not to decrement CWP is when that procedure does not call any other procedures. These procedures, known as leaf procedures, need not change CWP because they are certain not to require additional register windows. The idea is for leaf procedures to perform all their computations using the set of out registers. If the leaf procedure needs more registers, then it can get a new window in the usual way, but if it can live entirely in the out registers, then the overhead of manipulating the window can be avoided. In practice, programs dynamically call a significant number of small leaf procedures that can be optimized in this way. This is why the save and restore instructions are separate from the call and return instructions.
The Stack and Frame Point Refers. In preparation for a procedure call, a compiler must generate code that places the parameters in what the calling procedure refers to as %o0 through %o7, its eight out register (R8 through R15). After the CALL instruction has been executed, the called procedure executes a SAVE. Instruction that bumps the registers window, in the new window these same registers are now referred to as %i0 trough %i7, the eight in registers, of the called procedure (R24 through R31). The need to explicitly save and restore registers is eliminated by this mechanism. A key point of the register window approach is that it is expected that most parameters will be passed in registers in this way rather than on a stack. On other machines, the choice of passing parameters in registers or on the stack is one of the decisions that must be made by a compiler writer. On the SPARC it is more or less assumed that it will not be necessary to save and restore registers on procedure call and return because this windowing mechanism obviates any need to do this. However, we still need the usual arrangement of stack and frame pointers to manage local variables of procedures.
The suggested convention on the SPARC is to use only six of the eight out registers for passing parameters. Register 14 (%o6) is used as a stack pointer, and register 15 is reserved as a temporary register, leaving only six registers for parameters. Consider the normal requirement for manipulating the stack and frame pointers on entry to a procedure. The following three steps need to be performed.
? Save the old frame pointer.
? Set the new frame pointer to the stack pointer.
? Adjust the stack pointer for the new loc %i frame.
The SAVE instruction automatically performs the first two steps. We use register 30 (%i6) as the frame pointer. When the register window is decremented, the old frame pointer (i.e., the old %i6) is safely saved away in the previous window, and the old stack pointer (i.e., the old %o6) is copied into the frame pointer (the new %i6). Of course, it isn't really copied, it happens to be there already due to the overlapping of the windows. What about the third step? So far in our description of SAVE, no operands were needed. However, this instruction is 32 bits like all others, so there is plenty of room, and we find that SAVE has the capability of performing the operation ?Rd Rs ? constant? as well as performing the register window manipulations. Now we can see how to use this addition operation. We set Rd to %o6 (the new stack pointer) and Rs to %o6 (which is a copy of the old stack pointer). The constant is set to the size of the procedure frame (either plus or minus depending on which way the stack grows).
The RESTORE instruction undoes the effect of the SAVE, restoring the original frame and stack pointers. Thus we see that the entire job of building nested stack frames, as well as that of saving and restoring registers is handled by a single instruction or entry and exit to the procedure, and these instructions require only a single clock.
The Number of Registers and Windows The SPARC architectural specification allows an implementation to have anywhere from 2 to 32 register windows. There must be at least two register windows available because, as we shall discuss in more detail later, when a trap occurs there will definitely be a change of windows. Two register windows, on the other hand, is not a practical number, as the discussion of this section will show. Each register window is actually 16 registers long, since there is an overlap between adjacent sets. Furthermore, the out registers of the last window overlap the in registers of the first window. In other words, the register windows are arranged in a circle. The total number of registers available on a machine would therefore be 40 in the case of a two-window implementation, or 520 in the case of a 32-window implementation. The initial Fujitsu implementation of the SPARC used seven register windows, for a total of 120 registers. These Figs are computed using the formula:
?Total number of registers = 8 + (16 × Number of register windows)?
The 8 in this formula reflects the eight global registers that are always available. The total number of register windows is hidden from software. The program is written as though an infinite number of windows are available, and it is the job of the operating system to manage the register windows so that this view is transparent.
Managing the Register File.Because of the dynamic manner in which procedures call each other, and in particular the possibility of recursion in many common programming languages, procedure call depth cannot be predicted. Since the SPARC architecture requires that the number of register windows on any implementation be limited to between 2 and 32, it is to be expected that a typical large program will "run out" of available register windows.
At least two important questions arise from the fact that the number of register windows is limited. First, when the procedure call depth exceeds the maximum number of register windows, what mechanism is used to allocate the required windows? Second, what is the cost of dealing with such a register overflow, and does it occur so infrequently that this cost does not become a major consideration? To understand the answer to the first question, consider an implementation with eight register windows, as shown in Fig. 5.4. The CWP (current window pointer) register is initialized to 7. As successive procedures are called, CWP is decremented so that the next window is used. Let's look at the situation when CWP is set to 1 and we call one more procedure that does a SAVE operation. We would expect CWP to be decremented to zero, but we can't actually use window 0. Why not? Because it overlaps with window 7, which is in use at the highest level. If CWP is decremented to zero without doing something, the newly called procedure will destroy registers belonging to the procedure seven levels up on the call chain, which certainly will not do.
The required mechanism is provided by the window invalid mask (WIM), one of the supervisor registers. One bit exists in the WIM register for each possible register window. For implementations where the maximum numbers of registers are not available, the bits for unimplemented windows are wired to 1 and cannot be changed.
The bits for the windows that are present can be modified, but only in supervisor mode. When a SAVE instruction is executed, it first decrements the CWP register, and then checks the corresponding bit in the WIM register to make sure that the resulting window is valid. If not, a window overflow trap occurs. Like other traps and interrupts, the window overflow trap routine operates in supervisor mode and is thus able to manipulate the CWP and WIM registers. This routine has the job of saving some of the registers in memory, adjusting WIM accordingly, and then returning control to the routine that caused the window overflow trap. A typical strategy is to save some number of previously used register windows in memory and then reset the appropriate bits in the WIM register. The best choice of how many windows to save and restore depends upon the pattern of calls in a program and the number of register windows. For our example, WIM would be set to indicate that w0was invalid, thus causing an overflow trap on the SAVE instruction that tries to use this window. A typical approach in the overflow trap routine is as follows:
? Save the contents of windows w6 and w7.
? Set w6asinvalid in the WIM.
? Set w0 as now being valid in the WIM.
? Re-execute the SAVE instruction that caused the trap.
The SAVE now succeeds, and the procedure can use w0. On the next call, CWP is decremented ? it wraps from w0 to w7, and one more procedure can be executed before mother window invalid trap occurs. On the way back up, as the corresponding RESTORE- instructions are executed, another window invalid trap will occur when we increment from w7 to w6, due to the new setting of WIM. The window invalid trap routine will then restore the contents of w7 and w6 and reset WIM to its original state.
A perceptive reader will notice that in Fig. 5.4 there is one set of eight registers that never gets used, namely, the locals of the window marked invalid. These registers actually do serve an important function. When a window invalid trap, or any other exception, occurs the CWP is decremented without checking WIM. The exception routine is always free to use the locals of the current window for its own use, without the overhead of saving and restoring user program registers, and without interfering with the live registers of any active procedure.
The cyclical nature of the register window set is important to the operating efficiency of this approach. If the register windows were laid out in a vector with a fixed maximum and minimum value for CWP, then the window invalid trap would have to move the entire register set up and down, which would be unpleasantly inefficient. With the cyclical arrangement of the SPARC, only the registers that actually need to be saved must be moved.
The issue of how many windows to save on the window invalid trap is a matter for careful consideration. If too few windows are saved, then we will get too many traps. On the other hand, if too many windows are saved, we will waste time saving and restoring registers. The issue of how to manage registers windows, and more specifically how to handle overflows and underflows, was an important component of the research conducted as part of the original Berkeley RISC effort. One study concluded that when the number of register windows was greater than eight, an effective number of windows to save and restore would be two. With fewer than eight register windows, saving and restoring one window was best. Literature cited by the designers of the RISC I claims that with a setof 8 register windows, only one percent of the total calls in a "typical'' sample of programs resulted in windows overflow.
The Use of Register Windows.The intention behind the provision of register windows is that aSAVE instruction should be issued on entry to a procedure, and a Corresponding RESTORE on exit. Register Window Scheme #1B (Modified SunOS Scheme), for example, splits the register file into two pieces, one for user processes and the other for supervisor processes. A second register window scheme suggests that each task, including the kernel, should get a fixed number of register windows. The idea here is that no saving or restoring of windows is required on a context switch. This scheme is intended to be used in real-time systems where there are a small fixed number of tasks.
Although it is true that the window mechanism can be used in many different ways, a typical operating system and software environment makes a specific choice among the schemes. Since any application program will call the operating system to perform system services, including task management, the application program will find itself forced to use a particular approach. This means that the flexibility implied by the various schemes for using register windows is something that a systems designer can take advantage of, but not the programmer using a specific system.
The advantage of register windows is that it is almost never necessary to explicitly save and restore registers for a procedure call, because a new set of registers will be provided in the newly allocated register window. Since each procedure has its own register set, the compiler simply has to worry about allocating the registers within each procedure in an efficient manner and can ignore the issue of register allocation between procedures.
It is important to remember that no one is forcing a compiler writer or other programmer to use the register window mechanism ? this is a decision to be made on the basis of performance considerations.