Model Specific Registers

Value (in Hex)	Register Name	Description
00H	Machine Check Address	Stores address of cycle causing the exception
01H	Machine Check Type	Stores cycle type of cycle causing the exception
0EH	Test Register 12 (TR12)	New feature control

Note:Do not execute RDMSR or WRMSR with undefined values in ECX. Software must not depend on the value of reserved bits in the model specific registers. Any writes to the model specific registers should write "0" into any reserved bits.

Floating Point Unit.The floating-point unit (FPU) of the Pentium processor is integrated with the integer unit on the same chip. It is heavily pipelined. The FPU is designed to be able to accept one floating-point operation every clock. It can receive up to two floating-point instructions every clock, one of which must be an exchange instruction.

Floating-Point Pipeline Stages. The Pentium processor FPU has 8 pipeline stages, the first five of which it shares with the integer unit. Integer instructions pass through only the first 5 stages. Integer instructions use the fifth (X1) stage as a WB (write-back) stage. The 8 FP pipeline stages and the activities that are performed in them are summarized below: PF - Prefetch; Dl - Instruction Decode; D2 - Address generation; EX - Memory and register read; conversion of FP data to external memory format and memory write; X1 - Floating Point Execute stage one, conversion of external memory format to internal FP data format and write operand to FP register file, bypass; X2 - Floating Point Execute stage two; WF - Perform rounding and write floating-point result to register file, bypass 2; ER - Error Reporting/Update Status Word.

On-Chip Caches. The Pentium processor implements two internal caches for a total integrated cache size of 16 Kbytes: an 8 Kbyte data cache and a separate 8 Kbyte code cache. The data cache fully supports the MESI (modified/exclusive/shared/invalid) writeback cache consistency protocol. The code cache is inherently write protected to prevent code from being inadvertently corrupted, and as a consequence supports a subset of the MESI protocol, the S (shared) and I (invalid) states.

The caches have been designed for maximum flexibility and performance. The data cache is configurable as writeback or writethrough on a line by line basis. Memory areas can be defined as non-cacheable by software and external hardware. Cache writeback and invalidations can be initiated by hardware or software. Protocols for cache consistency and line replacement are implemented in hardware, easing system design.

Cache Organization. Each of the caches is 8 Kbytes in size and each is organized as a 2-way set associative cache. There are 128 sets in each cache, each set containing 2 lines (each line has its own tag address). Each cache line is 32 bytes wide. Replacement in both the data and instruction caches is handled by the LRU mechanism which requires one bit per set in each of the caches. A conceptual diagram of the organization of the data and code caches is shown below in Fig. 4.14. Note that the data cache supports the MESI write?back cache consistency protocol, which requires 2 state bits, while the code cache supports the S and I state only and therefore requires only one state bit.

Cache Structure.The instruction and data caches can be accessed simultaneously. The instruction cache can provide up to 32 bytes of raw opcodes and the data cache can provide data for two data references all in the same clock. This capability is implemented partially through the tag structure. The tags in the data cache are triple ported. One of the ports is dedicated to snooping while the other two are used to lookup two independent addresses corresponding to data references from each of the pipelines. The instruction cache tags are also triple ported. Again, one port is dedicated to support snooping and other two ports facilitate split line accesses (simultaneously accessing upper half of one line and lower half of the next line). The storage array in the data cache is single ported but interleaved on 4 byte boundaries to be able to provide data for two simultaneous accesses to the same cache line. Each of the caches is parity protected. In the instruction cache, there are parity bits on a quarter line basis and there is one parity bit for each tag. The data cache contains one parity bit for each tag and a parity bit per byte of data. Each of the caches is accessed with physical addresses and each cache has its own TLB (translation lookaside buffer) to translate linear addresses to physical addresses. The data cache has a 4-way set associative, 64-entry TLB for 4 Kbyte pages and a separate 4-way set associative, 8-entry TLB to support 4 Mbyte pages. The code cache has one 4-way set associative, 32-entry TLB for 4 Kbyte pages and 4 Mbyte pages which are cached in 4 Kbyte increments. The TLBs associated with the instruction cache are single ported whereas the data cache TLBs are fully dual ported to be able to translate two independent linear addresses for two data references simultaneously. Replacement in the TLBs is handled by a pseudo LRU mechanism that requires 3 bits per set. The tag and data arrays of the TLBs are parity protected with a parity bit associated with each of the tag and data entries in the TLBs.

Cache Operating Modes.The operating modes of the caches are controlled by the CD (cache disable) and NW (not write-through) bits in CR0. For normal operation and highest performance, these bits should both be reset to "0." The bits come out of RESET as CD = NW=1. To completely disable the cache, the following two steps must be performed.

1. CD and NW must be set to 1.

2. The caches must be flushed.

If the cache is not flushed, cache hits on reads will still occur and data will be read from the cache. In addition, the cache must be flushed after being disabled to prevent any inconsistencies with memory.

Page Cacheability.Two bits for cache control, PWT and PCD are defined in the pagetable and page directory entries. The states of these bits are driven out on the PWT and PCD pins during memory access cycles. The PWT bit controls write policy for the second level caches used with the Pentium processor. Setting PWT to 1 defines a writethrough policy for the current page, while clearing PWT to 0 defines a writeback policy for the current page.

The PCD bit controls cacheability on a page by page basis. The PCD bit is internally ANDed with the KEN# signal to control cacheability on a cycle by cycle basis. PCD = 0 enables cacheing, while PCD = 1 disables it. Cache line fills are enabled when PCD = 0 and KEN# = 0.

Inquire Cycles. Inquire cycles are initiated by the system to determine if aline is present in the code or data cache, and what its state is. (This document refers to inquire cycles and snoop cycles interchangeably.) Inquire cycles are driven to the Pentium processor when a bus master other than the Pentium processor initiates a read or write bus cycle. Snoop cycles are driven to the Pentium processor when the bus master initiates a read to determine if the Pentium processor data cache contains the latest information. If the snooped line is in the Pentium processor data cache in the modified state, the Pentium processor has the most recent information and must schedule a writeback of the data. Inquire cycles are driven to the Pentium processor when the other bus master initiates a write to determine if the Pentium processor code or data cache contains the snooped line and to invalidate the line if it is present.

Cache Flushing. The on-chip cache can be flushed by external hardware or by software instructions. Flushing the cache through hardware is accomplished by driving the FLUSH# pin low. This causes the cache to writeback all modified lines in the data cache and mark the state bits for both caches invalid. The Flush Acknowledge special cycle is driven by the Pentium processor when all writebacks and invalidations are complete. The INVD and WBINVD instructions cause the on-chip caches to be invalidated also. WBINVD causes the modified lines in the internal data cache to be written back, and all lines in both caches to be marked invalid. After execution of the WBINVD instruction, the Writeback and Flush special cycles are driven to indicate to any external cache that it should writeback and invalidate its contents. INVD causes all lines in both caches to be invalidated. Modified lines in the data cache are not written back. The Flush special cycle is driven after the INVD instruction is executed to indicate to any external cache that it should invalidate its contents. Care should be taken when using the INVD instruction that cache consistency problems are not created. Note that the implementations of the INVD and WBINVD instructions are processor dependent. Future processor generations may implement these instructions differently.

Data Cache Consistency Protocol (MESI Protocol).The Pentium processor Cache Consistency Protocol is a set of rules by which states are assigned to cached entries (lines). The rules apply for memory read/write cycles only. I/O and special cycles are not run through the data cache. Every line in the Pentium processor data cache is assigned a state dependent on both Pentium processor generated activities and activities generated by other bus masters (snooping). The Pentium processor Data Cache Protocol consists of 4 states that define whether a line is valid (HIT/MISS), if it is available in other caches, and if it has been MODIFIED. The four states are the M (Modified), E (Exclusive), S (Shared) and the I (Invalid) states and the protocol is referred to as the MESI protocol. The Pentium processor code cache follows a subset of the MESI protocol.

Bus Functional Description.The Pentium processor bus is designed to support a data transfer rate. All data transfers occur as a result of one or more bus cycles.

Physical Memory and I/O Interface. Pentium processor memory is accessible in 8-, 16-, 32-, and 64-bit quantities. Pentium processor I/O is accessible in 8-, 16-, and 32-bit quantities. The Pentium processor can directly address up to 4 Gbytes of physical memory, and up to 64 Kbytes of I/O. In hardware, memory space is organized as a sequence of 64-bit quantities. Each 64-bit location has eight individually addressable bytes at consecutive memory addresses (Fig. 4.15). I/O space is organized as a sequence of 32-bit quantities. Each 32-bit quantity has four individually addressable bytes at consecutive memory addresses. Fig. 4.16 is for a conceptual diagram of the I/O space. 64-bit memories are organized as arrays of physical quadwords (8-byte words). Physical quadwords begin at addresses evenly divisible by 8. The quadwords are addressable by physical address lines A31-A3. 32-bit memories are organized as arrays of physical dwords (4-byte words). Physical dwords begin at addresses evenly divisible by 4. The dwords are addressable by physical address lines A31-A3 and A2. 16-bit memories are organized as arrays of physical words (2-byte words). Physical words begin at addresses evenly divisible by 2. The words are addressable by physical address lines A31-A3, A2-A1, BHE#, and BLE#. To address 8-bit memories, the lower 3 address lines (A2-A0) must be decoded from the byte enables.

Data Transfer Mechanism.All data transfers occur as a result of one or more bus cycles. Logical data operands of byte, word, dword, and quadword lengths may be transferred. Data may be accessed at any byte boundary, but two cycles may be required for misaligned data transfers. The Pentium processor considers a 2-byte or 4-byte operand that crosses a 4-byte boundary to be misaligned. In addition, an 8-byte operand that crosses an 8-byte boundary is misaligned. The Pentium processor address signals are split into two components. The address lines A31-A3 provide high-order address bits. The byte enables BE7#-BE0# form the low-order address and selects the appropriate byte of the 8-byte data bus. The byte enable outputs are asserted when their associated data bus bytes are involved with the present bus cycle. For both memory and I/Oaccesses, the byte enable outputs indicate which of the associated data bus bytes are driven valid for write cycles and on which bytes data is expected back for read cycles. Non-contiguous byte enable patterns will never occur. Because the data bus is 64 bits, special considerations need to be made for interfacing to 32-bit memory systems. Address bit 2 along with the appropriate byte enable signals need to be generated by external hardware.

Interfacing with 8-, 16-, 32-, and 64-Bit Memories.In 64-bit physical memories such as Fig. 4.17, each 8-byte qword begins at a byte address that is a multiple of eight. A31-A3 are used as an 8-byte qword select and BE7#-BE0# select individual bytes within the word. Memories that are 32 bits wide require external logic for generating A2 and BE3#-BE0#. Memories that are 16 bits wide require external logic for generating A2, Al, BHE#, and BLE#. Memories that are 8 bits wide require external logic for generating A2, Al, and A0. All memory systems that are less than 64 bits wide require external byte swapping logic for routing data to the appropriate data lines. The Pentium processor expects all the data requested by the byte enables to be returned as one transfer (with one BRDY#), so byte assembly logic is required to return all requested bytes to the Pentium processor at one time. Note that the Pentium processor does not support BS8# or BS16# (or BS32#), so this logic must be implemented externally if necessary. Fig. 4.18 shows the Pentium processor address bus interface to 64, 32, 16 and 8-bit memories. Address bits A2, Al, and A0 and BHE#, BLE#, and BE3'#-BE0'# are decoded. External byte swapping logic is needed on the data lines so that data is supplied to and received from the Pentium processor on the correct data pins. For memory widths smaller than 64 bits, byte assembly logic is needed to return all bytes of data requested by the Pentium processor in one cycle. Operand alignment and size dictate when two cycles are required for a data transfer. When multiple cycles are required to transfer a multi-byte logical operand, the highest order bytes are transferred first.

Date: 2016-06-12; view: 177

<== previous page	\|	next page ==>
Features of Organization Structure of the Pentium Processors	\|	Computers Systems on a Chip

doclecture.net - lectures - 2014-2025 year. Copyright infringement or personal data (0.051 sec.)