Multicore Microprocessors.

Montecito.Montecito is the code-name of a major release of Intel?s Itanium 2 Processor Family (IPF), which implements the IA-64 instruction set architecture on a dual-core processor. It is one of several Intel projects for releasing processors with several cores for notebooks, PCs, and servers. Montecito is oriented on application in multiprocessor server systems with IA-64 architecture. Dual-core variant with Intel NetBurst chip architecture for PC has work name Smithfield. According to Intel, Montecito doubles performance versus the previous, single-core Itanium 2 processor, and reduces power consumption by about 20%. It also adds multi-threading capabilities (two threads per core), a greatly expanded cache subsystem (12 MB per core), and silicon support for virtualization.

Montvale.In contrast to earlier speculations, there will be no 65 nm shrink for this next-generation Itanium 2 processor, but the codename Montvale will cover Montecito-on-steroids, released approximately one year later, at the tail end of 2006. Earlier data suggested that Montvale's clock speed would have likely hit 2.5-2.6 GHz, sitting on a 400 MHz Front Side Bus (FSB). As of today, Montvale might reach only 2 GHz, and considering Montecito's delay, it is possible that delivery could slip as well. It is also very likely that, as with Montecito, Montvale will arrive along with updated Itanium compiler technology. This should provide additional improvements in the performance characteristics of Itanium servers.

Opteron. The term "multi-core" at AMD in practice means "dual core"; each physical Opteron chip actually contains two separate processor cores. This effectively doubles the computer power available to each motherboard processor socket. One socket can now deliver the performance of two processors, two sockets can deliver the performance of four processors, and so on. Since motherboard costs go up dramatically as the number of CPU sockets increases, multicore CPUs now allow much higher performing systems to be built with more affordable motherboards. AMD's model number scheme has changed somewhat in light of its new multicore lineup. At the time of its introduction, AMD's fastest multicore Opteron was the model 875, with two cores running at 2.2 GHz each. AMD's fastest single-core Opteron at this time was the model 252, with one core running at 2.6 GHz. Next generation AMD Opteron processors are offered in three series: the 1000 Series (up to 1P/2-core), the 2000 Series (up to 2P/4-core), and the 8000 Series (4P/8-core to 8P/16-core).

Tukwila 4-Core Processor. It is a Montvale's successor - the first 65 nm design. QuickPath Interconnect (QPI), Dual Integrated Memory Controllers, and Simultaneous Multi-Threading should be pointed out among the key peculiarities of Tukwila. The total capacity of cache is 30 MB. The QuickPath Architec?ture (Fig. 4.19) allows providing high-speed data exchange between processors, external memory, and I/O hub. The key peculiarity of the architecture is the use of scalable shared memory instead of traditional memory pool which is accessed through a single bus - FSB. The QuickPath Architecture provides locating the memory controller directly in the processor and using a completely new system bus - QuickPath Interconnect.

Presently Intel uses an external bidirectional bus FSB. It is a connecting link between processor cores and a chipset which includes a memory controller and acts as an access point to other buses (e.g., PCI, AGP, etc.) of the motherboard. The principal methods of increasing the FSB throughput is increasing its frequency and uniting several FSBs into a single system. To reduce load on FSB, Intel equips their processors with caches of larger capacity and a greater degree of association.

Dunnington 6-Core Processor. These chips are based on the 45 nm version of Core architecture. All the cores jointly with cache cell arrays are placed in a single chip. The Multi-level disjoint cache concept is used here. Each core pair shares 3? L2 cache. The third level of caching (L3 cache), whose capacity is 16MB, is also used. The peculiarity of Dunnington is using FSB bus with 1066 MT/s high-speed interconnect and 40-bit physical addressing.

6 core chips should be an intermediate solution between up-to-date 4-core Xeon (Core) chips and processors of the next generation Nehalem microarchitecture.

Intel Nehalem Microarchitecture. The main peculiarities and innovative new design in this microarchitecture are scalability up to 8 cores; ability to process 4 instructions during one time slot; Simultaneous Multi-Threading (SMT) technology, in which each core is able to execute two software threads simultaneously; an integrated memory controller; a new large 8MB fully-shared L3 cache with the inclusive cache policy to minimize snoop traffic; a new system bus - QuickPath Interconnect; a dynamic power management; a new set of instructions SSE4.2. The Nehalem processor structure may be divided into 5 basic structural units: a processor core, an integrated memory controller, an array of the cache-memory cells, a QuickPath Interconnect bus, and an iGraphics video-core. The integrated memory controller includes 3 channels and supports up to 3 DIMMs/channel. L1 cache has remained unchanged - 32 KB is used for instructions and 32 KB for data. There is a new 256 KB/core, a low latency L2 cache. Besides, there is a new large 8MB fully-shared L3 cache with the inclusive cache policy to minimize snoop traffic, which is especially actual for multi-core systems. Another architectural improvement is implementation of a new 2-level TLB hierarchy (the second level of 512 entry Translation Look-aside Buffer is added). There are also further branch prediction enhancements: a new 2^nd level branch predictor and a Renamed Return Stack Buffer.

Sandy Bridge Microarchitecture. Sandy Bridge 32-nm processors are expected to contain a set of vector instructions AVX (Advanced Vector Extension). One of the key peculiarities of AVX is an increased register capacity (from 128 to 256 bits), which will enable doubling the maximal number of FLOPs. Support of 3-operand instructions with so called non-destructive syntax will provide such advatages as a more efficient use of register memory and a smaller program code.

Larrabee Visual Computing Architecture. This is a high-end computing platform of a new level including a multi-core processor, a chipset, a graphics subsystem, as well as software and an appropriate software development kit. Larrabee architecture is based on the Visual Computing concept which redefines the traditional computer graphic concepts. The Larrabee concept includes a high-performance SIMD Vector Processing Unit and a new vector instruction set for vector memory operations, conditionals, integer and FP arithmetic. Besides, Larrabee supports the use of a new circuit for analyzing the cache-memory coherence ? an important function playing one of the key roles in multi-core systems. Per se, Larrabee may be referred to General Purpose Graphical Processor Unit (GPGPU) processors. The idea of GPGPU has been already realized in the products of key designers of graphics accelerators, e.g., AMD promotes Stream Computing technology to market and NVIDIA has the hardware/software CUDA architecture. 45 nm technologies will be used in manufacturing Larrabee products. Chips of 49.5?49.5 mm will include from 16 up to 24 cores capable to process four threads simultaneously. The clock frequencies will be in the range from 1.7 to 2.5 GHz.

Each core will have L1 cache 32 KB with a latency of one slot and L2 cache 256 KB with a latency of 10 slots. Besides, each chip will include two memory controllers and units for choosing textures. The GPGPU-processor throughput in FP operations will be scalable to TeraFLOPS. The new architecture will support the up-to-date Application Platform Interface DirectX and OpenGL.

Date: 2016-06-12; view: 331

<== previous page	\|	next page ==>
Computers Systems on a Chip	\|	Principles of Constructing Reconfigurable Computing Systems

doclecture.net - lectures - 2014-2026 year. Copyright infringement or personal data (0.741 sec.)