4.5.21 -unroll, -ur, (unroll=4), -unroll2, -ur2, (-unroll2=160), -unroll3, -ur3, (-unroll3=20)

The -unroll, -unroll2, and -unroll3 switches control how KAP unrolls scalar inner loops. Loop execution is often more efficient when the loops are unrolled. Fewer iterations with more work per iteration requires less loop-control overhead. KAP unrolls the loop until either the loop has been unrolled the number of times given in the -unroll switch, or the amount of "work" in each iteration reaches the value given by the -unroll2 switch.


Note
If you use kapf with the Digital Fortran compiler optimization level set to - O5 , you should turn off loop unrolling by setting - unroll=1 and -notransform_loops . See Section 2.7 and Section 3.4 for more information about kapf .

Outer loop unrolling is a part of memory management and is not controlled by these switches.

The -scalaropt=2 level is required to enable loop unrolling.

The syntax for -unroll and -unroll2 is as follows:

There are two ways to control loop unrolling. The first is to set the maximum number of iterations that can be unrolled; the second is to set the maximum amount of work to be done in an unrolled iteration. KAP will unroll as many iterations as possible while keeping within both these limits, up to a maximum of 100 iterations. NO warning is given if you request more than 100 unrolled iterations.

The default (4,100) means that the maximum number of iterations to unroll is 4 and that the maximum amount of work is 100.

Loop overhead is reduced by performing more iterations from the original loop for each pass through the new loop, but the gain is less with each additional unrolled iteration. Eventually, the cost in extra memory exceeds the gain from unrolling. The -unroll switch sets a maximum number of iterations to unroll.


Note
When the total number of iterations to be executed by the loop is constant, KAP searches for a number of iterations to unroll that is near the -unroll value and which exactly divides the iteration count. This avoids having extra iterations left over, which must be handled separately and generate extra code. The range over which KAP searches for an exact divisor is the -unroll value plus or minus 25%.

To use the "work per unrolled iteration" limit, KAP analyzes a given loop by computing an estimate of the computational work that is inside the loop for ONE iteration. This rough estimate is based on the following criteria:


# of assignments +
# of IF statements +
# of subscripts +
# of arithmetic operations

For the following example, the user has specified 8 for the maximum number of iterations to unroll (-unroll=8 ) and 100 for the maximum "work per unrolled iteration" (-unroll2=100 ):

DO 10 I = 2,N
     A(I) = B(I)/A(I-1)
10   CONTINUE

This example has: 1 assignment 0 ifs 3 subscripts 2 arithmetic operators --------- 6 is the weighted sum (the work for 1 iteration)

This weighted sum is then divided into 100 to give a potential unrolling factor of 16. However, because the user has also specified 8 for the maximum number of unrolled iterations, KAP takes the minimum of the 8 and 16. Therefore, KAP will unroll only 8 iterations. The maximum number of iterations that KAP will unroll is 100. If the user requests more than that, NO warning will be given.

In this case (an unknown number of iterations), KAP will generate two loops - the primary unrolled loop and a cleanup loop to ensure that the number of iterations in the main loop is a multiple of the unrolling factor. The result is the following:

DO 11 I=2,N-7,8
     A(I) = B(I) / A(I-1)
     A(I+1) = B(I+1) / A(I)
     A(I+2) = B(I+2) / A(I+1)
     A(I+3) = B(I+3) / A(I+2)
     A(I+4) = B(I+4) / A(I+3)
     A(I+5) = B(I+5) / A(I+4)
     A(I+6) = B(I+6) / A(I+5)
     A(I+7) = B(I+7) / A(I+6)
       11 CONTINUE
       DO 2 I=I,N,1
       A(I) = B(I) / A(I-1)
2  CONTINUE

Additional examples are in Chapter 8.

The unroll3=n switch sets the lower limit for unrolling. If there are less than n units of work in the loop (same units as -unroll2 ), the loop will not be unrolled. The amount of work in each loop iteration is shown in the loop table in the annotated listing. Leave this switch at 1, the default.


Previous Page | Next Page | Contents | Index |
Command Line Switches