At the elementary transistor gate level, we can formulate total power dissipation as the sum of three major components: switching loss, leakage, and short-circuit loss.
PWdevice=(1/2)C VDDVswing af +IleakageVDD+Isc VDD(6.1)
Here, C is the output capacitance, VDD is the supply voltage, f is the chip clock frequency, and a is the activity factor (0<a<1) that determines the device switching frequency. Vswing is the voltage swing across the output capacitor. Ileakage is the leakage current, and Isc is the average short-circuits current.
The literature often approximates Vswingas equal to VDD (or simply V for short), making the switching loss around (1/2)CV2af. Also for current ranges of VDD (say, 1 volt to 3 volts) switching loss, (1/2)CV2af remains the dominant component. So as a first-order approximation for the whole chip we may formulate the power dissipation as
PWchip = (6.2)
Ci, Vi, ai, and fi are unit- or block-specific average values. The summation is taken overall blocks or units i, at the microarchitecture level (instruction cache, data cache, integer unit; floating-point unit, load-store unit, register files, and buses). For the voltage range considered, the operating frequency is roughly proportional to the supply voltage; C remains roughly the same if we keep the same design, but scale the voltage. If a single voltage and clock frequency is used for the whole chip, the formula reduces to
(6.3)
where K's are unit- or block-specific constants. If we consider the worst case activity factor for each unit i ? that is, if ai=1 for all i, then
PWchip=KvV3=Kf f 3 (6.4)
where Kvand Kf are design-specific constants, where K's are unit- or block-specific constants.
That equation leads to the so-called cube-root rule.This point to the single most efficient method for reducing power dissipation for a processor designed to operate at high frequency: reduce the voltage (and hence the frequency). This is the primary mechanism of power control in Transmeta's Crusoe chip. There's a limit, however, on how much VDD can be reduced (for a given technology), which has to do with manufacturability and circuit reliability issues. Thus, a combination of microarchitecture and circuit techniques to reduce power consumption ? without necessarily employing multiple or variable supply voltages ? is of special relevance.
Performance Basics.The most straightforward metric for measuring performance is the execution time of a representative workload mix on the target processor. We can write the execution time as
T =PL CPI CT =PL CPI (1/f ) (6.5)
Here, PL is the dynamic path length of the program mix, measured as the number of machine instructions executed. CPI is the average processor cycles per instruction incurred in executing the program mix, and CT is the processor cycle time (measured in seconds per cycle) whose inverse determines clock frequency f. Since performance increases with decreasing T, we may formulate performance PF as
PFchip=Kpf f = KpvV (6.6)
Here, the K's are constants for a given microarchitecture-compiler implementation. The Kpf value stands for the average number of machine instructions executed per cycle on the machine being measured. PFchipin this case is measured in MIPS.
Adopting a noncontroversial weighted mix is not easy. Each ratio is calculated as the speedup with respect to execution time on a specified reference machine. This method has the advantage of allowing us to rank different machines unambiguously from a performance viewpoint. That is, we can show the ranking as independent of the reference machine used in such a formulation.
SMT/CMP Differences and Energy-Efficiency Issues.Consider the floating-point loop kernel shown in Table 6.1.