Global Code Scheduler and Register Allocation

GCS is capable of scheduling the code in any acyclic control flow subgraph. No restrictions are placed on the number of entries into or exits from the subgraph. GCS may generally move code in either direction across a visible edge. To schedule loops, GCS removes back edges to tender the subgraph acyclic. After the region of code is scheduled, it's abstracted away by nesting it. Thus, after scheduling code in an inner loop, the nested loop is represented as a single node containing summary data flow information. This information enables code motion across the nested inner loop. The code schedule in the inner loop is not disturbed when scheduling the outer loop. Also, GCS takes into account all latencies for live-in and live-out values of the inner loop.

To support global analysis in the presence of control flow within the scheduling region, a novel path-based data dependence representation is used. This representation merges control flow and data dependence information in one structure.

Wavefront Scheduling and Deferred Compensation Code.Compensation is duplicated code that needs to be inserted when moving code up across control flow joins or down below control flow splits. Empty blocks called join-split (JS) blocks are added on each edge that both starts from a node with multiple successors (split) and ends at a node with multiple predecessors (join). These are placeholders for scheduling compensation code. Blocks called interface blocks are created for this same purpose at side entries/exits, that is, blocks that conditionally enter or exit the region. Adding JS and interface blocks to an acyclic region forms a scheduling region. We define a wavefront in the scheduling region as a strongly independent cut set. Fig. 7.18 shows an acyclic scheduling region and all possible wavefronts in it.

In wavefront scheduling, the wavefront is the partition between scheduled and unscheduled code. If scheduling top down, the nodes above the wavefront are already scheduled, and the schedule in these nodes won't be changed (unless we resort to back tracking). The nodes below the wavefront don't contain scheduled code. The wavefront represents the set of blocks into which instructions are currently being scheduled. The candidates for scheduling into a block on the wavefront include unscheduled data-ready instructions originating from the same block or another block that is either above or below the wavefront. In a top-down scheduling scheme, the wavefront first passes through the region entry blocks. When code scheduling into a block is completed, the scheduler declares it dosed and advances the wavefront across it. This block now lies above the wavefront and represents a fully scheduled block. Thus the wavefront advances down the region until it finally passes through all the exit nodes in the region. When block B is closed, any unscheduled instructions from B, or above, are implicitly moved below B to be scheduled later. Hence before declaring a block closed, the scheduler examines both the correctness and profitability of such downward code motion.

Fig. 7.18 numbers the wavefronts to show a wavefront's advancement in a top-down scheduling scheme. Note that there can be multiple alternatives to the way the wavefront is advanced, as shown in the figure by W2a and W2b.

When performing an upward or downward code motion that requires code duplication, GCS postpones the generation of this compensation code until it is actually scheduled. In doing this, scheduling freedom is provided for the compensation code since its destination block is not fixed a priori. At most one copy of the instruction is executed on any path through the region.

Fig. 7.19 illustrates how this is done. When the wavefront is Wl, copy I' of instruction I originating from block G is first scheduled in block F. This motion requires compensation since F doesn't dominate G. Instead of immediately generating and inserting the compensation code for I in block B, instruction I is left behind in block G along with some bookkeeping information that indicates that it remains to be scheduled along control flow paths ABDEG and ABCEG. The bookkeeping data also indicates that the compensation code must be scheduled on or before E to avoid execution of I (or its copies) multiple times along path AFG.

The scheduling heuristics may not select I for scheduling in block B if it isn't an important enough candidate in B. Assume that this is the case and that the wavefront is advanced to W2. It is then selected for scheduling in block C, so copy I" of instruction I originating from block G is scheduled into block C. Since F and C don't collectively dominate G ? the block of origin of I ?- this scheduling choice requires compensation. Now the remaining instruction I in G represents the compensation that needs to be scheduled in block D. To avoid multiple executions of I along path ABCEG, advancement of the wavefront past W2 is constrained until I is finally scheduled in block D [7].

Speculation and Prediction.GCS selects an instruction to schedule into block B from a candidate list of data-ready instructions. These instructions may originate from any block in the region connected to B by at least one control flow path. The speculation support in IA-64 allows many dependencies to be ignored when determining data readiness. Control dependencies and data dependencies on qualifying predicates can be broken using control speculation. Data speculation allows unlikely memory dependencies to be broken.

GCS selects the best candidate from all the speculative and nonspeculative candidates based on a cost-benefit analysis. Among other things, this analysis takes into consideration the instruction's global critical-path length, its resource requirements, and its speculation and code duplication costs. Speculation check instructions and recovery code are generated as a byproduct of speculation; load safety information avoids unnecessary control speculation.

Global Register Allocation.The GRA in the IA-64 compiler is a region-based Chaitin/Briggs-style graph-coloring scheme. Special consideration for IA-64 features must be taken into account in the design.

Each general register has an associated NaT bit. When the register is spilled, the NaT bit is stored in the UNAT application register. The spill address determines the bit location. To obey this association, the register allocator spills registers draft may contain speculated values into contiguous memory. In addition, since the UNAT is a 64-bit register, after 64 registers are spilled, the UNAT itself must be spilled. This mechanism requires extra bookkeeping for the register allocator.

Rotating Registers.Traditional register allocators don't support register rotation. However, live ranges that span multiple pipelined loop stages can benefit from the use of rotating registers. A special allocator within the software pipeliner allocates these live ranges. The GRA then allocates the remaining live ranges. The use of data speculation often increases register lifetimes.

In the presence of predication, virtual registers liveness can be a function of multiple predicates because the qualifying predicate of an instruction guards its definitions and uses. Traditionally, liveness is represented as a bit vector. However, in our predicate-aware register allocator, it's represented as a bit matrix with predicates being the additional dimension.

We designed the ECG compiler to exploit the features provided in the IA-64 architecture. Many novel techniques were devised and implemented to make good use of predication, control and data speculation, rotating registers, and the register stack engine.

As IA-64 is a new architecture, we'll learn much more about how to more effectively use these features over the next several years. Currently, the ECG compiler provides high performance and a solid base upon which to further leverage IA-64 strengths.

Date: 2016-06-12; view: 269

<== previous page	\|	next page ==>
Instruction-Level Parallelism	\|	The 16-Way Itanium Server

doclecture.net - lectures - 2014-2026 year. Copyright infringement or personal data (0.072 sec.)