Compaq KAP Fortran/OpenMP
for Tru64 UNIX
User Guide

5.6.8 -fpregisters, -fpr, (-fpregisters=32)

The -fpregisters switch specifies the number of single-precision, that is, ordinary floating-point, registers each processor has.

5.6.9 -fuse, -nfuse, (-nofuse)

The -fuse switch tells KAP to perform loop fusion. Loop fusion is a conventional compiler optimization that transforms two adjacent loops into a single loop. Data dependence tests allow fusion of more loops than standard techniques allow. Before KAP can perform loop fusion, you must specify the switch -scalaropt=2 or -optimize=5.

5.6.10 -fuselevel, (-fuselevel=0)

The fuselevel option further controls the level of loop fusion. (Whenever you set -fuselevel, KAP automatically sets -fuse.)

The possible settings for this option are the following:

0 --- KAP performs standard fusion techniques. This is the default.
1 --- This setting instructs the fusion pass to move nonadjacent loops to adjacent positions. This movement is validated using a test based on data dependence. If the movement is successful, KAP attempts to fuse the repositioned loops.
2 --- This setting instructs the fusion pass to attempt loop-iteration space reversal and loop peeling to provide additional opportunities to fuse loops together. Both of these transformations are based on data dependence tests. Reversing the iteration space on adjacent loops, so that the index sets are the same, might permit certain loops to fuse together that would not fuse otherwise. Loop peeling is performed so that adjacent loops will have the same iteration space.

5.6.11 -generateh

KAP needs two passes to resolve Fortran 90 forward declarations. The first pass, the generateh pass, builds the information needed to analyze the program for forward references.

KAP automatically sets the -generateh switch correctly for you. Compaq recommends that you do not set the -generateh switch.

5.6.12 -hdir, -hd, (-hdir=current_directory)

The -hdir=directoryname switch specifies the name of the directory where the KAP -generateh pass stores the temporary files containing information about forward references. The -useh switch picks up the information from that directory. The default is the current directory.

KAP automatically sets the -hdir switch for you. Compaq recommends that you do not set the -hdir switch.

5.6.13 -heaplimit, -heap, (-heaplimit=100)

KAP may require large amounts of memory in order to processes your source code. The -heaplimit option specifies the maximum size in megabytes that the KAP heap can grow. If this limit is reached, KAP will stop processing your source code and try to exit with an out of memory error message.

If you choose a -heaplimit setting that is greater than the amount of memory that your machine has available, KAP may run out of memory before it reaches the -heaplimit. KAP relies upon the operating system to tell it that the process has run out of memory before that problem occurs. Some operating systems kill KAP without first telling KAP that there is insufficient memory. In that case, KAP may stop processing your code and exit in an undefined manner. Using -heaplimit makes a graceful exit more likely.

5.6.14 -hoist_loop_invariants, -hli, (-hoist_loop_invariants=1)

The -hoist_loop_invariants switch controls code hoisting of loop-invariant expressions from loops. This switch is independent of the switches, -each_invariant_if_growth and -max_invariant_if_growth, that control the floating of invariant-IFs out of loops. The possible settings for -hoist_loop_invariants are the following:

0 --- Turns off the hoisting of invariant code from loops.
1 --- Hoists (floats) all loop invariant expressions not under the control of an IF-structure within the given loop nest. This is the default setting.
2 --- Same as 1, except that a zero trip IF statement is not created to guard the loop to protect array references that are potentially out of bounds when floated outside the loop. This can give a slight performance increase at the expense of a possible runtime error.
3 --- Floats all loop-invariant expressions from the loop structure.

If there is invariant code that is protected by an IF-structure and the hoisting value is less than 3, then KAP generates the following message in the output listing:

An invariant expression not hoisted because -hoist_loop_invariants < 3

5.6.15 -interleave, -intl, (-interleave)

The -interleave switch controls loop unrolling and rescheduling by turning on interleaved unrolling. Interleaved unrolling can help the compiler recognize quad-word loads and stores, which are more efficient than ordinary loads and stores. It does this by first unrolling the loop as in ordinary loop unrolling. Second, the statements in the loop are interchanged where possible to make references to the same array adjacent to each other.

The following example demonstrates interleaved unrolling:

REAL A(100),B(100) DO I = 1, 100 A(i) = 99. B(i) = 100. ENDDO PRINT *,a,b END

The output from KAP with interleaved unrolling turned on is as follows:

REAL A(100), B(100) DO I=1,97,4 A(I) = 99. A(I+1) = 99. A(I+2) = 99. A(I+3) = 99. B(I) = 100. B(I+1) = 100. B(I+2) = 100. B(I+3) = 100. ENDDO PRINT *, A, B END

The default value is -interleave.

5.6.16 -library_calls, -lc, (off)

The -library_calls switch directs KAP to replace sections of code with calls to standard numerical library routines that have the same functionality. This can simplify the source code and, if a version of the library that has been highly tuned for the target machine is available, the use of the standard package will improve performance of the application program. For example, if you specify this switch and you link the application with the DIGITAL Extended Math Library (DXML), calls to the DXML Basic Linear Algebra Subroutines (BLAS) will replace sections of code. Use the following command:

kf90 -fkapargs='-lc' -ldxml myprog.f90

The argument for -library_calls identifies which library to create CALLs for. The DXML BLAS libraries are BLAS1, which performs vector-vector operations such as dot product; BLAS2, which performs matrix-vector operations such as matrix vector multiplication; and BLAS3, which performs matrix-matrix multiplication.

To specify both BLAS1 and BLAS2, specify BLAS12.
To specify both BLAS2 and BLAS3, specify BLAS23; this is the recommended switch.
Specifying BLAS is equivalent to specifying BLAS23. This switch can be disabled within a section of code with the !*$* optimize=o directive. This switch is disabled if -roundoff=0. See Chapter 6.

Caution

This switch will introduce calls to BLAS routines to be linked from system libraries. Use of this switch can cause a collision between KAP generated BLAS routine names and user-provided routines in the source code. Even if the user-provided routines are identical in function to the library routines, rename or remove the user routines, because the linker will not use the optimized library routines if the user's calls to routines can be satisfied with the user-provided routines.

5.6.17 -limit, -lm, (-limit=10)

To reduce compilation time, KAP estimates the length of time required to analyze each loop nest construct. If a loop is too deeply nested, KAP ignores the outer loop and recursively visits the inner loops. The loop nest limit is a rough dial to control what KAP considers too deeply nested.

Loops that exceeded this threshold will be marked in the Loop Table (-listoptions=l) in the listing file. (See Chapter 10.)

Larger loop nest limits may allow more optimizations to be performed for deeply nested loop structures, but may take more compilation time. The limit does not correspond to the DO loop nest level; rather, it is an estimate of the number of loop orderings that can be generated from a loop nest. The -limit switch resets this internal limit. The loop nest limit can also be modified with the !*$* limit <integer> directive. Most users do NOT need to change this value.

5.6.18 -machine, -ma, -noma, (machine=s)

Set the machine switch according to the characteristics of the system on which Compaq KAP Fortran/OpenMP output runs.

Use any combination of the following switch settings, except do not specify switches s and n simultaneously:

n --- Prefers non-stride-1 array access over stride-1 array access. This is suitable for machine architectures that have special interleaved memory hardware where non-stride-1 array access provides the best performance.
o --- Directs Compaq KAP Fortran/OpenMP not to parallelize innermost loops when optimizing, that is, to parallelize only outermost loops. This capability is available to prevent concurrentization on applications that have small inner loop bounds, thereby reducing overhead costs. KAP Fortran 90 makes decisions concerning the overhead:benefit ratio when making concurrentization decisions. When the loop bounds are unknown at compile time, Compaq KAP Fortran/OpenMP may generate concurrent code for innermost loops, a practice that may be inefficient for the actual loop bounds.
s --- Directs Compaq KAP Fortran/OpenMP to prefer optimization of a DO loop that generates stride-1 (contiguous) references over one that generates nonstride-1 operands. Some computers perform better if consecutive references are contiguous in memory.

To disable all of the switch settings, enter -nomachine on the command line.

5.6.19 -max_invariant_if_growth, -miifg, (-miifg=500)

When a loop contains an IF statement whose condition does not change from one iteration to another, the same test must be repeated for every iteration. The code can often be made more efficient by floating the IF outside the loop and putting the THEN and ELSE sections into their own loops.

This gets more complicated when there is other code in the loop, because a copy of it must be included in both the THEN and ELSE loops. The -max_invariant_if_growth switch allows you to limit the total number of additional lines of code generated in each program unit through invariant-IF restructuring.

The -miifg setting is the maximum number of lines to which a program unit may grow due to invariant-IF floating. If restructuring a loop with invariant IFs would cause the size of the program unit to exceed this limit, the restructuring will not be performed. For example, if -miifg=500 and the original size of a subroutine was 450 noncomment lines, then at most 50 additional lines may be added by invariant-IF floating. Because other KAP transformations can add or delete lines, the number of lines actually added by invariant-IF floating and the final size of a program unit may differ from what the -miifg value alone would cause.

This can be controlled on a loop-by-loop basis with the
!*$* max_invariant_if_growth (<integer>) directive (see Chapter 6). The maximum amount of additional code generated in a single loop through invariant-IF floating can be limited with the -each_invariant_if_growth switch.

5.6.20 -routine, -rt, -nrt, (-noroutine)

The -routine switch allows you to specify switches that apply only to specific routines within the source file KAP processes. The only switches that -routine can specify are as follows:

-each_invariant_if_growth
-max_invariant_if_growth
-optimize
-roundoff
-scalaropt
-skip
-unroll
-unroll2
-unroll3

The syntax of a KAP command using the -routine switch is as follows:

kapf90 [-<switches>] source_file.f \ -routine=<routine_name>[,<routine_name>...]<switches_for_routine_names> ...

Place the -routine switch after the name for the Fortran 90 source file. Specify switches that apply to all routines in the source file after kapf90. The <routine_name> argument must be a routine in source_file.f.

For example, consider the following command line:

kapf90 -scalaropt program.f90 -routine=sub_1 -roundoff -optimize -freeformat

This command invokes KAP and passes the -scalaropt switch to all program units in file program.f90 including sub_1. Program unit sub_1 processes with both the -roundoff and -optimizechoices, switches.

Using the -routine switch implies that directives equivalent to the specified switches are asserted only while processing particular routines. The effect is the same as if the implied directives were inserted at the top of the associated routines.

Using the -routine switch makes the resulting kapf90 command contain two halves. The first half looks like any other kapf90 command because it contains kapf90, switches different from -routine, and a source file name. The second half of the command is different because it contains one or more -routine switches, each with associated routines and switches for the routines selected from the preceding bulleted list.

For example, consider the following command line:

kapf90 -cachesize=8,0 -syntax=a -freeformat my_program.f90 -
-routine=sub_1,sub_2,sub_3 -roundoff -optimize -routine=sub_4 -unroll

An explanation of the two halves follows:

This command invokes KAP and passes the -cachesize=8,0,
-freeformat, and -syntax=a switches to all program units in file my_program.f90. The program units include sub_1, sub_2, sub_3, and sub_4.
Program units sub_1, sub_2, and sub_3 process with both the -roundoff and -optimize switches. Routine -sub_4 processes with the -unroll switch.

The usual rules for shortening the names of switches also apply to the
-routine switch. For example, the following KAP command fragments produce identical results:

-routine=subroutine_a -optimize -unroll

-routine=subroutine_a -opt -unr

5.6.21 -setassociativity, -sasc, (-setassociativity=1,1)

The -setassociativity switch provides information on the mapping of physical addresses in main memory to cache pages in the Level 1 and Level 2 cache. The first integer describes the set associativity of the Level 1 cache, and the second integer describes the set associativity of the Level 2 cache. A setting of n means that a page can appear in any of n places in the cache. For instance, a setting of 1 means that a page in main memory can be placed in only one place on the cache. If the cache page is already in use, its contents will have to be rewritten or flushed in order to copy the newly accessed page into the cache.

5.6.22 -srlcd, -nsrlcd, (-nosrlcd)

The -srlcd switch tells KAP to remove loop-carried dependencies. SRLCD is an abbreviation of Scalar Replacement of Loop Carried Dependencies. KAP holds in temporary scalars, array values read or written across multiple loop iterations. Faster temporary/register accesses replace slower memory accesses in the loop body.

Before KAP can remove loop-carried dependencies, you must specify the switch -scalaropt=n, where n is greater than or equal to 2.

5.6.23 -unroll, -ur, (unroll=4), -unroll2, -ur2, (-unroll2=160), -unroll3, -ur3, (-unroll3=1)

The -unroll, -unroll2, and -unroll3 switches control how KAP unrolls scalar inner loops. Loop execution is often more efficient when the loops are unrolled. Fewer iterations with more work per iteration will require less loop-control overhead. KAP unrolls the loop until either the loop has been unrolled the number of times given in the -unroll switch, or the amount of "work" in each iteration reaches the value given by the -unroll2 switch.

Note

If you use kapf90 with the Compaq Fortran compiler optimization switch set to -O5, you should turn off loop unrolling by setting -unroll=1.

Outer loop unrolling is a part of memory management and is not controlled by these switches.

The -scalaropt=2 level is required to enable loop unrolling.

The syntax for -unroll and -unroll2 and -unroll3 is as follows:

Long forms: -unroll=<#it> or -unroll2=<weight> or -unroll3=<weight>
Short forms: -u=<#it>, -ur=<#it>, -ur2=<weight>, -ur3=<weight> where <#it> is the maximum number of iterations to unroll. Other settings are as follows:
- =0 --- use the default value.
- =1 --- no unrolling.
- <weight> --- the maximum (-unroll2) or minimum (-unroll3) weight, that is, estimate of work, in an unrolled loop. The <weight> setting is estimated by counting operands and operators in a loop.

There are two ways to control loop unrolling. The first is to set the maximum number of iterations that can be unrolled; the second is to set the maximum amount of work to be done in an unrolled iteration. KAP will unroll as many iterations as possible while keeping within both these limits, up to a maximum of 100 iterations. NO warning is given if you request more than 100 unrolled iterations.

The default (4,100) means that the maximum number of iterations to unroll is 4 and that the maximum amount of work is 100.

Loop overhead is reduced by performing more iterations from the original loop for each pass through the new loop, but the gain is less with each additional unrolled iteration. Eventually, the cost in extra memory exceeds the gain from unrolling. The -unroll switch sets a maximum number of iterations to unroll.

Note

When the total number of iterations to be executed by the loop (the iteration count) is constant, KAP searches for a number of iterations to unroll that is near the -unroll value and which exactly divides the iteration count. This avoids having extra iterations left over, which must be handled separately and generate extra code. The range over which KAP searches for an exact divisor is the -unroll value plus or minus 25%.

To use the "work per unrolled iteration" limit, KAP analyzes a given loop by computing an estimate of the computational work that is inside the loop for ONE iteration. This rough estimate is based on the following criteria:

# of assignments +
# of IF statements +
# of subscripts +
# of arithmetic operations

For the following example, the user has specified 8 for the maximum number of iterations to unroll (-unroll=8) and 100 for the maximum "work per unrolled iteration" (-unroll2=100):

DO 10 I = 2,N A(I) = B(I)/A(I-1) 10 CONTINUE

This example has:
1 assignment
0 ifs
3 subscripts
2 arithmetic operators
-------------------------
6 is the weighted sum (The work for 1 iteration)

This weighted sum is then divided into 100 to give a potential unrolling factor of 16. However, because the user has also specified 8 for the maximum number of unrolled iterations, KAP takes the minimum of the 8 and 16. Therefore, KAP will unroll only 8 iterations. The maximum number of iterations that KAP will unroll is 100. If the user requests more than that, NO warning will be given.

In this case (an unknown number of iterations), KAP will generate two loops --- the primary unrolled loop and a cleanup loop to ensure that the number of iterations in the main loop is a multiple of the unrolling factor. The result is the following:

DO 11 I=2,N-7,8 A(I) = B(I) / A(I-1) A(I+1) = B(I+1) / A(I) A(I+2) = B(I+2) / A(I+1) A(I+3) = B(I+3) / A(I+2) A(I+4) = B(I+4) / A(I+3) A(I+5) = B(I+5) / A(I+4) A(I+6) = B(I+6) / A(I+5) A(I+7) = B(I+7) / A(I+6) 11 CONTINUE DO 2 I=I,N,1 A(I) = B(I) / A(I-1) 2 CONTINUE

Additional examples are in Chapter 9.

The unroll3=n switch sets the lower limit for unrolling. If there are less than n units of work in the loop (same units as -unroll2), the loop will not be unrolled. The amount of work in each loop iteration is shown in the loop table in the annotated listing. Leave this switch at 1, the default. A value less than the default could result in a program that executes more slowly.

5.6.24 -useh

KAP needs two passes to resolve Fortran 90 forward declarations. The second pass, the useh pass, resolves any forward references.

KAP automatically sets the -useh switch correctly for you. Compaq recommends that you do not set the -useh switch.

5.7 Directive Recognition Switches for kapf90

The following section explains the function of each directive recognition switch.

5.7.1 -directives, -dr, (-directives=akpv), -nodirectives, -ndr

The -directives switch tells KAP which directives to accept. KAP directives and assertions use the following syntax:

!*$* key word(s)
!*$* key word(s) (argument)
!*$* assert key word(s) (argument)

Directives work regardless of whether the initial C precedes the string.

The -directives switches are as follows:

a --- KAP assertions
k --- KAP !*$* directives
p --- KAP Parallel Computing Forum directives
v --- VAST cvd$ directives

The KAP directives are described in Chapter 6. The Parallel Computing Forum directives are described in Chapter 3. KAP assertions are described in Chapter 7.

For example, -directives=k enables KAP directives only, whereas -directives=ka enables both KAP directives and assertions. Any combination of the previous three switches is acceptable. To disable all of the choices, enter -nodirectives on the command line.

KAP assertions are similar in form to directives, but they assert program characteristics that KAP may use in its optimizations. (See Chapter 7.) The acceptance of assertions can also be controlled with the !*$* assertions and !*$* noassertions directives.

Contents

Index

Compaq KAP Fortran/OpenMP for Tru64 UNIXUser Guide

5.6.8 -fpregisters, -fpr, (-fpregisters=32)

5.6.11 -generateh

5.6.17 -limit, -lm, (-limit=10)

Compaq KAP Fortran/OpenMP
for Tru64 UNIX
User Guide