The -unroll, -unroll2,
and -unroll3
switches control how KAP unrolls scalar inner loops. Loop execution
is often more efficient when the loops are unrolled. Fewer
iterations with more work per iteration requires less loop-control
overhead. KAP unrolls the loop until either the loop has been
unrolled the number of times given in the -unroll
switch, or the amount of "work" in each iteration reaches the value
given by the -unroll2
switch.
kapf
with the DEC Fortran compiler optimization level set to -
O5
, you should turn off loop unrolling by setting -
unroll=1
and -notransform_loops
. See Section 2.7 and Section 3.4
for more information about kapf
.
Outer loop unrolling is a part of memory management and is not controlled by these switches.
The -scalaropt=2
level is required to enable loop
unrolling.
The syntax for -unroll
and -unroll2
and
-unroll3
is as follows:
-unroll=<#it>
or -unroll2=<weight>
or -unroll3=<weight>
-u=<#it>
,
-ur=<#it>
, -ur2=<weight>
,
-ur3=<weight>
, where <#it>
is the
maximum number of iterations to unroll. Other settings are as
follows:
=0
means use the default value.
=1
means no unrolling.
<weight>
is the maximum (-unroll2) or minimum
(-unroll3) weight in an unrolled loop. <weight>
is estimated by counting operands and operators in a loop.
The amount of work in each loop iteration is shown in the loop
table in the annotated listing.
There are two ways to control loop unrolling. The first is to set the maximum number of iterations that can be unrolled; the second is to set the maximum amount of work to be done in an unrolled iteration. KAP will unroll as many iterations as possible while keeping within both these limits, up to a maximum of 100 iterations. NO warning is given if you request more than 100 unrolled iterations.
The default (4,100) means that the maximum number of iterations to unroll is 4 and that the maximum amount of work is 100.
Loop overhead is reduced by performing more iterations from the
original loop for each pass through the new loop, but the gain
is less with each additional unrolled iteration. Eventually,
the cost in extra memory exceeds the gain from unrolling. The
-unroll
switch sets a maximum number of iterations
to unroll.
-unroll
value and which exactly divides the iteration
count. This avoids having extra iterations left over, which must be
handled separately and generate extra code. The range over which KAP
searches for an exact divisor is the -unroll
value plus
or minus 25%.
To use the "work per unrolled iteration" limit, KAP analyzes a given loop by computing an estimate of the computational work that is inside the loop for ONE iteration. This rough estimate is based on the following criteria:
For the following example, the user has specified 8 for the maximum
number of iterations to unroll (-unroll=8
) and 100 for
the maximum "work per unrolled iteration" (-unroll2=100
):
DO 10 I = 2,N A(I) = B(I)/A(I-1) 10 CONTINUE
This example has: 1 assignment 0 ifs 3 subscripts 2 arithmetic operators --------- 6 is the weighted sum (the work for 1 iteration)
This weighted sum is then divided into 100 to give a potential unrolling factor of 16. However, because the user has also specified 8 for the maximum number of unrolled iterations, KAP takes the minimum of the 8 and 16. Therefore, KAP will unroll only 8 iterations. The maximum number of iterations that KAP will unroll is 100. If the user requests more than that, NO warning will be given.
In this case (an unknown number of iterations), KAP will generate two loops - the primary unrolled loop and a cleanup loop to ensure that the number of iterations in the main loop is a multiple of the unrolling factor. The result is the following:
DO 11 I=2,N-7,8 A(I) = B(I) / A(I-1) A(I+1) = B(I+1) / A(I) A(I+2) = B(I+2) / A(I+1) A(I+3) = B(I+3) / A(I+2) A(I+4) = B(I+4) / A(I+3) A(I+5) = B(I+5) / A(I+4) A(I+6) = B(I+6) / A(I+5) A(I+7) = B(I+7) / A(I+6) 11 CONTINUE DO 2 I=I,N,1 A(I) = B(I) / A(I-1) 2 CONTINUE
Additional examples are in Chapter 8.
The unroll3=n
switch sets the lower limit for
unrolling. If there are less than n
units of work in
the loop (same units as -unroll2
), the loop will not
be unrolled. The amount of work in each loop iteration is shown in
the loop table in the annotated listing. Leave this switch at 20,
the default. A value less than the default could result in a program
that executes more slowly.
Copyright © Digital Equipment Corporation. 1997.
All Rights Reserved.