Moore's Law and Its Impact.In 1965 when preparing a talk, Gordon Moore noticed dial up to that time microchip capacity had seemed to double each year. The pace of change having slowed down a bit over the last few years, we've seen the definition of Moore's law change (with Moore's approval) to reflect that the doubling occurs only every 18 months. Stated another way, transistor count grows at a compound annual rate of 60% [6, 14, 26, 52].
Process and architectural changes act to drive Moore's law for transistor count and also for performance. CPU cycle time measured in megahertz tends to track Moore's law, following the same path as transistor count. However, system performance is affected not only by a CPU's MHz but also by first memory access time. The CPU can only execute code that it can fetch. This drives the requirement for larger level-1 caches and on-chip level-2 caches found in current CPU architectures. Fig. 4.1 shows that the number of transistors used on Intel CPUs have grown 23-fold from 1.2 million on the 486DX2 processor in 1992 to 28 million on the Intel Pentium III with an onboard L2 cache introduced in 1999. Correspondingly, CPU MHz has increased 20-fold from 50 MHz to 1 GHz during the same time period. As a check to Moore's law, the expected transistor count starting from the 486DX2 and projected forward 7 years to the Intel Pentium III is 1.2 million ×1.6, or approximately 32 million transistors.
This is within 12% of the actual figures. Project this forward 10 more years to 2009, and the expected transistor count will approach 28 million ×1.6, or more than 3 billion transistors with a corresponding expected CPU speed of over 100 GHz!
At the same time, current architectures are becoming more system-on-chip, or SOC, centric. This change to designs incorporating a whole system on one chip implies larger numbers of I/Os will need to escape from the die.
If the die size strays relatively constant for a pad-limited design, the available transistor count will explode; today's 0.18-micron process technology supports a density of 6 million transistors per square centimeter. Current pad-limited designs require approximately 5 to 6 cm2. Ten years from now the pad-limited design will still be around 5 cm2, but the transistor density will be 6 million ×1.6, or 660 million transistors/cm2. The pad-limited die will then support 3.3 - to 4-billion transistor designs. This leaves quite a lot of free real estate for other functions on the die.
Potential Roadblocks.Hard realities set in when considering multibillion-transistor designs. Each new process generation requires finer, more precise lithography techniques, additional and more intricate mask sets, cleaner fabs (Computer-Chip Fabrication Plants), and advances in tester capabilities. These designs will be extremely dependent on interconnects on the die. This drives the need for better simulation, modeling, timing, and layout tools. These issues require major design innovations in differing fields, and they must all be available at the same time, or new process advances won't make it to market.
As transistor sizes approach the molecular level, physical boundaries will emerge that may derail the process migration path. Voltage tends to drop with each process improvement, which permits more transistors switching at higher frequencies without a huge increase in power. However, with each new process step, the percentage of voltage drops from one generation to the next decreases. At some point voltage drops won't offset the transistor growth and switching factor in the power equation (power = capacitance × voltage2 × frequency). Given a fixed voltage power requirements will begin to grow linearly with transistor count. The overall die capacitance with interconnects, parasitic capacitance, and capacitive coupling will tend to negate reduction in capacitance due to the process technology.
With a doubling in power every 18 months, the costs associated with packaging and thermal dissipation will tend to dwarf any savings achieved with higher transistor densities. In fact, as shown in Fig. 4.2, the power density
measured in watts/cm2 will increase from around 20 W/cm2 to nuclear reactor densities (250 W/cm2) as power doubles every 18 months. At dies point there's no possible way to dissipate the heat. Additionally, as frequencies increase, noise and coupling issues become more and more difficult to solve, especially with the decreased noise floor immunity offered by lower voltage processes. Finally, market dynamics drive the need for raster design-to-production schedules. The world is moving at "Internet speed," yet technology's half-life is measured in months. Today's architecture trend will eventually hit either a brick wall or a number of stumbling blocks. CPU architects can't continue designing by looking in the rear view mirror; they must realize the necessity of change. The convergence of technical and market requirements will act as the catalyst needed to drive invention and the adoption of new CPU and system architectures.
Conflicting Requirements.As Internet ubiquity and personalization drive the requirements of future CPU and system design, a few fundamental cornerstones for technology and architecture will emerge: battery life, portability, security, connectivity, user interface, application compatibility, universal data access, and cost. This list of requirements presents an enigma for the CPU and system architect. While battery life, portability, and cost require simple, application-specific solutions, previously described performance impulse requirements, universal data, security, and user interfaces require the ability for higher performance. Of these requirements, the user interfaces present the most conflicting requirements for system design specifications.
The consumer and the usage model dictate a more personal interaction with the CPU and system. Today's keyboard and mouse evolve into voice command and control, voice recognition, handwriting recognition, fingerprint identification, and other biometrics. These requirements call not only for specific digital performance requirements but also for specialized analog capability to permit better interaction with the analog-centric human user.
CPU and system optimization for one set of requirements will cause unacceptable design trade-offs in other areas. For example, architecture cannot be designed solely for high-overhead general-purpose performance, or it will sacrifice battery life, portability, and cost.
The process, voltage, and design issues described earlier along with the requirements of the Internet and consumer usage model will bring about a divergence point that will require changes in CPU and system design philosophy. Fig. 4.3 describes the diverging requirements in the consumer marketplace for performance, user experience, and connectivity.
Sufficient performance provides a better experience to the novice user. However, the highest performance system will provide no more perceived performance to that inexperienced user. As indicated earlier, connectivity is a key element in the system architecture. A user with the highest performance system and a poor connection will perceive the usage model of a person with a very low performance system and high-bandwidth connection.
These issues are occurring because the required pace of technological innovation is increasing, while the actual time to market and effectual usage of technology is decreasing. Human nature being what it is, we react skeptically and negatively to change. Technology is no exception. The current installed base of more-transistors-and-faster-MHz designers is comfortable with the status quo. The infrastructure to support the current design path is large and heavily capitalized. Software companies, original device manufacturers (ODMs), original equipment manufacturers (OEMs), and users are all familiar with the current architecture trend. They may acknowledge the need for change but lack the energy required to overcome "rear view mirror" inertia.