1. A SPARC implementation has K register windows. What is the number N of physical registers?
2. SPARC is lacking a number of instructions commonly found on CISC machines. Some of these are easily simulated using either register R0, which is always set to 0, or a constant operand. These simulated instructions are called pseudo instructions and are recognized by the SPARC compiler. Show how to simulate the following pseudo instructions, each with a single SPARC instruction. In all of these, src and dst refer to registers. (Hint: A store to R0 has no effect.)
a. MOV src, dst d. NOT dst g. DEC dst
b. COMPARE src1, dst2 e. NEG dst h. CLR dst
c. TESTsrc1 f. INC dst i. NOP
3. Multiprocessors with the common memory and multicomputers with the transfer of messages are two architectures supporting parallel performing of tasks interacting with one another. For which of them is it simpler to emulate the work of another architecture? Explain your answer briefly.
4. The processor and memory are realized on the same chip. Is a cache-memory necessary for such a system? Explain your answer.
5. The branch instruction of the UltraSPARC II processor has the bit Ann??ul. If this bit is set by the compiler and the branch is not performed, the instruction from the defer slot is deleted from the conveyer. The instruction may also be deleted in case the branch is performed. What are the advantages of each of these approaches?
6. The program cycle is terminated by the conditional branch to the beginning of the cycle. How to realize this cycle for the conveyer processor, in which the technology of deferred branches is used with one defer slot? Under what conditions is it possible to fill the defer slot with useful instructions?
7. The computer supports one defer slot. The instruction in this slot is performed independently of the predicted branch result, but if the branch is not performed the instruction is canceled. Propose an effective method for realization of program cycles for such a computer.
8. The technology of deferred branches is used in a conveyer processor. It is necessary to choose one of two variants of the processor architecture. According to the first of them, the processor has a 4-stages conveyer and one defer slot. According to the second architecture, the processor has a 6-stage conveyer and two defer slots. Compare the throughputs of these two architectures. Take into account that 20% of the performed program instructions are branch instructions. The probability of filling one defer slot is 80% (variant 1) and 25% (variant 2).
9. The system has two universal processors performing both addition and multiplication in one time slot. Suppose that the operations data reading and writing occur instantaneously and the place for storing intermediate results is always ready. What is the minimal time slot necessary to perform the following program fragment?
a=c*d+e;
b=v*t+u;
f=x*y+z.
10. It is known that a program runs well in a superscalar processor with some set of independent units. Does it mean that the same program will run well in a VLIW-processor with the same set of units? Is the inverse statement true?
11. Give an example of the algorithmic structure of what is a ?ring?.
12. How many immediate neighbors does each processor have in the ?three-dimensional torus? topology?
13. What does the user have to take into account when moving from SMP-computer to the computer with NUMA architecture?
14. What is necessary to take into account when creating effective programs for computers with the NUMA architecture and for computers with distributed memory?
15. What is the most effective connection of processors in a cluster?