Home Random Page


CATEGORIES:

BiologyChemistryConstructionCultureEcologyEconomyElectronicsFinanceGeographyHistoryInformaticsLawMathematicsMechanicsMedicineOtherPedagogyPhilosophyPhysicsPolicyPsychologySociologySportTourism






Instruction-Level Parallelism

In planning the new EPIC architecture, Intel designers wanted to exploit the high level of instruction-level parallelism (ILP) found in application code. To accomplish this goal, they incorporated a powerful set of features such as control and data speculation, predication, register rotation, loop branches, and a large register file. By using these features, the compiler plays a crucial role in achieving the overall performance of an IA-64 platform. The electron code generator (ECG) that maximizes the benefits of these features is described here.

The ECG consists of multiple phases occurring in the order shown in Fig. 7.16. The first phase, translation, converts the optimizer's intermediate representation (IL0) of the program into the ECG IR. Predicate region formation, if conversion, and compare generation occur in the predication phase. The ECG contains two schedulers: the software pipeliner for targeted cyclic regions and the global code scheduler for all remaining regions. Both schedulers make use of control and data speculation. The software pipeliner also uses rotating registers, predication, and loop branches to generate efficient schedules for integer as well as floating-point loops.

The ECG's register allocator must handle several IA-64 specific issues. These include not-a-thing (NaT) bit maintenance during spill/fill, advanced load address table (ALAT) awareness for data-speculative registers, correct rotating register allocation for the software pipeliner, and predicate awareness. NaT bits are associated with a control speculation mechanism. The ALAT is the hardware mechanism enabling data speculation.

The Intel IA-64 Compiler Code Generator.Predication, or conditional execution of an instruction based on a predicate, is one of the key IA-64 architectural features. Using predication, the compiler can merge the execution of multiple control flow paths. This increases ILP by removing the penalty of mispredicted branches and nonsequential control flow in pipelined regions. In addition, predication increases code motion freedom by allowing instructions to be moved upward across branches in a nonspeculative manner and to be pushed downward into subsequent join blocks.

Region Formation and If Conversion.Predicate region formation selects a group of connected basic blocks ? a predicate region ? to be if-converted, that is, to remove control flow edges within the region using predicates. The basic characteristic of the selected predicate region is that the total number of static branches within the predicate region should be reduced after if conversion. The predicate region selection criteria depend on the availability of dynamic profile information. Without dynamic profile feedback, the selection algorithm focuses on the availability of processor resources and the compatibility of individual critical paths. The algorithm avoids including basic blocks in the predicate region if they cause processor resource oversubscription or if they significantly increase the critical path through the region. With dynamic profile feedback, the selection criteria are extended to include the cost of branch misprediction and the weight of individual critical paths. The algorithm focuses on the branches that produce the most misprediction penalties and chooses the surrounding blocks to form a predicate region. Compare generation materializes all the necessary predicates through the use of predicate generation instructions. It computes control dependence information for all basic blocks within the predicate region and the predicate regions exit basic blocks. Basic block B depends upon another basic block (A) for control when the condition computed in A dictates whether B gets executed. A controlling predicate, using the assigned virtual predicate name, is generated for all the controlled basic blocks within the predicate region and all the region exit basic blocks.



Path collapsing merges the control flow paths within the predicate region into a minimal set of control flow paths. The merged basic blocks are physically placed next to each other. Each basic block within the predicate region is guarded with the correct virtual predicate name. A side exit from the predicate region occurs when the exiting basic block isn't contained within the predicate region. When there are identical branch targets outside the predicate region, these exit flow edges are merged to remove duplicates.

Predicate Optimizations.Predicate registers are virtually named and materialized only when it becomes necessary. Predicate name assignment generates a virtual predicate name for all basic blocks within a function (Fig.7.17). Basic blocks exhibiting identical control flow behaviors are assigned the same virtual predicate name. For all critical edges (edges from a node with multiple successors to a node with multiple predecessors ? such as a missing else-block within an if-then statement), predicate name assignment creates a nonvisible basic block holder and assigns a virtual predicate name. Later, due to control flow changes, the predicate name assignment creates new virtual predicate names.

To execute efficiently and correctly, compare optimization examines the collapsed region for opportunities to insert, merge, and replace predicate-generating instructions within the predicate region. The first step is to replace predicate-generating instructions into parallel semantics, if possible. This reduces the critical path through the predicate region. The remaining conditional compares are examined to discover unconditional compare opportunities. When a conditional compare is converted into an unconditional compare, it removes the necessity for predicate initialization instructions. In general, predicate initialization instructions are inserted when the predicate register is not defined on all paths to the predicate register. The decision to predicate is based on a combination of dynamic profile information, resource availability, and critical-path length compatibility. Merging the control flow paths accomplishes two tasks. The unbiased conditional branch is eliminated, resulting in a highly biased conditional branch. A larger basic block is formed from otherwise small basic blocks. This offers more opportunity for ILP to fill the issue bandwidth and hide long latency instructions.

Predicate Query System.PQS is a predicate relational database accessed by later phases of the code generator. It contains information such as predicate disjointness, predicate dominance and postdominance, predicate promotion, and predicate addition and subtraction. Without accurate predicate information, the scheduling, software pipelining, and register allocation must make conservative decisions that would result in suboptimal code.


Date: 2016-06-12; view: 143


<== previous page | next page ==>
Implementation of cache hints | Global Code Scheduler and Register Allocation
doclecture.net - lectures - 2014-2024 year. Copyright infringement or personal data (0.008 sec.)