Compaq KAP Fortran/OpenMP
for Tru64 UNIX
User Guide

5.1.8 -verbose, -v, (-nov)

This switch prints the passes as they execute with their arguments and their input and output files. Also prints final resource usage in the C-shell time format.

5.2 General Optimization Switches for kapf90

The following sections explain the function of each general optimization switch.

5.2.1 -interchange, -nointerchange, (-interchange)

Use the -interchange switch to enable loop interchanging. KAP enables loop interchange when -interchange is specified and the -optimize switch level is at least 1 or the -scalaropt switch level is 3. If you specify -nointerchange, KAP disables loop interchange regardless of the -optimize or -scalaropt switch settings. Loop interchanging is enabled by default.

5.2.2 -namepartitioning, -namepart, -nnamepart, (-nonamepartitioning)

This switch tells KAP to look at distinct array names and limit the number of arrays that appear in a loop to avoid cache thrashing. That is, this switch breaks a loop containing, for example, references to arrays A and B into two loops. One loop references array A and the other loop references array B.

Two arguments (i and j) used in a -namepartitioning=i,j switch, control name partitioning as follows:

i --- Specifies the minimum number of partitions. This is the preferred smallest number of distinct arrays in each distributed loop.
j --- Specifies the maximum number of partitions. This is the preferred largest number of distinct arrays in each distributed loop.

If no arguments appear with the -namepartitioning switch, KAP uses its default values of 2 for the minimum and 8 for the maximum number of partitions.

Before KAP can perform name partitioning, you must specify the switch -scalaropt=n, where n is greater than or equal to 3.

The -nonamepartitioning switch explicitly prevents name partitioning.

5.2.3 -optimize, -o, (-optimize=5)

The -optimize switch sets the base optimization and code analysis level, ranging from 0 (no optimization) to 5 (maximum optimization). The optimization level can also be modified on a loop-by-loop basis by the !*$* optimize (<integer>) directive. Some of the code analysis techniques can be enabled with the -scalaropt switch.

The meaning of each of the different optimization levels is as follows. Each optimization level is cumulative, for example, level 4 performs what is listed below for that level, in addition to what is listed for levels 0--3.

0 --- KAP performs no loop optimization.
1 --- KAP performs only simple analysis and optimization. Induction variables are recognized. DO loop interchanging techniques are applied.
2 --- Lifetime analysis is performed to determine when last-value assignment of subprogram-local scalar variables is necessary. More powerful data dependence tests are used.
3 --- More loop interchanging is attempted, such as interchanging of triangular loops. Special-case data dependence tests are used. Special index sets called wraparound variables are recognized.
4 --- Loop interchanging around reductions is attempted. More exact data dependence tests are used.
5 --- Array expansion is enabled at this point.

A higher optimization level results in more optimization, more analysis, and more ambitious transformations, along with increased compilation time.

5.2.4 -recursion, -rec, (-norecursion), -norec

The -recursion switch informs KAP that subroutines and functions in the source program may be called recursively (that is, it calls itself or calls another routine that calls it). This affects storage allocation decisions and the interpretation of the -save option. The -recursion switch must be in force in each recursive routine that kapf90 processes, or unsafe transformations could result.

The -norecursion option tells KAP to assume that recursion is not used in the program being processed.

5.2.5 -roundoff, -r, (-roundoff=3)

The -roundoff switch allows you to specify the change from serial roundoff error that is tolerable. If an arithmetic reduction is accumulated in a different order than in the scalar program, the roundoff error is accumulated differently and the final result may differ from that of the original program. While the difference is usually insignificant, certain restructuring transformations performed by KAP must be disabled to obtain exactly the same results as the scalar program. These transformations are discussed further in Chapter 9.

KAP classifies its transformations by the amount of difference in roundoff error that can accumulate so you can decide what level of roundoff error differences is allowable. The -roundoff command switch has the values 0 to 3.

Roundoff levels are cumulative, performing what is listed for each level, as well as what is listed for the lower levels. The meaning of each roundoff level is as follows:

0 --- Allow no roundoff-changing transformations.
1 --- Enable expression simplification and code floating. Recognize arithmetic reductions. Allow loop interchanging around arithmetic reductions, if -optimize>4. Allow loop rerolling, when -scalaropt>1.
2 --- Allow reciprocal substitution in loops.
3 --- Recognize induction variables whose types are not the default integer type if -scalaropt>2 or -optimize>1. Enable memory management, if -scalaropt=3. INTEGER multiple division can be rotated into multiplication.

5.2.6 -scalaropt, -so, (-scalaropt=3)

The -scalaropt command-line switch sets the level to which dusty-deck and other scalar transformations are performed. Unlike the -scalaropt command-line switch, the !*$* scalar optimize directive sets the level of loop-based optimizations (for example, loop fusion) only, and not straight-code optimizations (for example, dead-code elimination).

The allowed values and their meanings are as follows:

0 --- No scalar optimizations are performed.
1 --- IF loops are changed into DO loops. Simple code floating out of loops is performed. Inaccessible or unused code is removed. Forward substitution of variables is performed. Dusty-deck IF transformations are enabled.
2 --- The full range of scalar optimization is performed. Included are floating invariant IFs out of loops, induction variable recognition, loop rerolling (if -roundoff> 0), loop peeling, loop fusion, induction variable recognition, and loop unrolling.
3 --- Memory management is performed (if -roundoff=3); additional dead-code elimination is performed during output conversion.

5.2.7 -skip, -sk, -nsk, (-noskip)

Use the -skip switch following the -routine switch to stop KAP from processing specified routines. KAP writes out unchanged source code for the specified routines. See the description of the -routine switch in Section 5.6.20.

5.2.8 -tune, -tune, (-tune=<architecture>)

The KAP preprocessor determines whether the host architecture is ev4, ev5, or ev6 and then optimizes your program for that architecture by default. In the event you compile a program on one architecture but plan to run it on another, override the default by setting -tune equal to the architecture where the program will run. For example, if you compile a program on ev4 architecture, but plan to run it on ev5, use -tune=ev5.

5.3 Parallel Processing Switches for kapf90

The following sections describe the switches you use to control how the multiprocessor version of KAP prepares programs for parallel execution.

5.3.1 -chunk

This switch modifies, and is used only with, the -scheduling switch. The -chunk switch determines the number of loop iterations that are in a group. Its default value is 1.

5.3.2 -concurrent, -conc, -noconc, (-noconcurrent)

The -concurrent switch directs KAP to restructure the source code for parallel processing.

Setting -noconcurrent disables parallel execution and allows all serial optimizations to take place. You can enable and disable parallel execution on a module-by-module basis using KAP directives or on a loop-by-loop basis using KAP assertions. For more information about parallel processing directives, see Section 6.4. Parallel processing assertions are described in Section 7.3.

Programs containing many loops that require synchronization or programs that have loops with small iteration counts may run slower when parallelized. In these cases disable parallel execution.

Section 3.1.1 summarizes the two methods of parallelization, automatic and combined, that require the -conc switch. Several examples of the -conc switch are in the descriptions of these two methods.

5.3.3 -minconcurrent, -mc, (-minconcurrent=1000)

Executing a loop in parallel incurs overhead that varies with different systems. If a loop has little computational work, the overhead required to set up parallel execution may make the loop execute more slowly than it executes serially. The -minconcurrent switch sets the level of work in a loop above which KAP should execute the loop in parallel. Setting the -minconcurrent switch causes KAP to automatically set the -concurrent switch.

The range of values for -minconcurrent is all integers greater than or equal to 0. The higher the minconcurrent value, the more iterations and/or statements the loop body must have to run concurrently.

At compilation time, KAP estimates the amount of work inside a loop on the basis of loop computations and loop iterations. KAP multiplies the loop iteration count by the sum of the noindex operands/results and the nonassignment operators. KAP compares its estimation with the minconcurrent value. If the estimated amount of work is greater than the minconcurrent value, KAP generates parallel code for the loop. Otherwise, the loop execution is serial. This is called a two-version loop.

If the DO loop bounds are known at compilation time, KAP computes the exact iteration count. However, if the DO loop bounds are unknown, KAP generates a block IF around the parallel code. The block IF allows a run-time decision whether or not to execute the loop in parallel.

To disable the generation of two-version loops throughout the program, use the command-line switch -minconcurrent=0. To disable this action in specific DO loops, use the !*$* minconcurrent(0) directive.

The following loop illustrates this switch using the minconcurrent default of 1000:

DO 10 I = 1,N A(I) = B(I) + C(I) 10 CONTINUE

Becomes:

IF (N .GE. 425) THEN CALL mppfrk (P$PLP10,0) ELSE DO 2 I=1,N-3,4 A(I) = B(I) + C(I) A(I+1) = B(I+1) + C(I+1) A(I+2) = B(I+2) + C(I+2) A(I+3) = B(I+3) + C(I+3) 2 CONTINUE ENDIF

At run time, if the iteration count N is greater than or equal to 425 (1000/4), the concurrent loop executes in parallel; otherwise, it executes serially.

When KAP restructures DO loops whose bounds are not known in a source program named MYPROG.F, it inserts calls to subroutine MPPFRK whose first parameter comes from the sequence PKMYPROG_, PKMYPROG_1, PKMYPROG_2, ...

5.3.4 -parallelio, -nopio, -pio, (-noparallelio)

The -parallelio switch allows parallel execution of loops with I/O. Use this switch when you know the I/O will not execute. An example is a test for an error condition that causes a message to be printed.

Its complement, -noparallelio (short name -nopio), prevents parallel execution of loops containing I/O statements. The default value is -noparallelio.

5.3.5 -pdefault

This switch tells KAP how to process variables that are not listed in an OpenMP data environment directive. Furthermore, it is used only during directed parallelization. The values of this switch and their meanings are next.

safe
This is the default value. However, if OpenMP directives are present and you have specified this value and KAP cannot determine the classification of a variable, then KAP will display an error message.
none
Make no attempt to classify variables that have not been explicitly classified.
private(list)
This switch provides a mechanism to make the listed variables private to each thread in a team. The behavior of a variable declared in list is as follows:
- A new object of the same type is declared once for each thread in the team. The new object is no longer storage associated with the original object.
- All references to the original object in the lexical extent of the directive construct are replaced with references to the private object.
shared(list)
This clause provides a mechanism to make variables that appear in list to be shared among all the threads in a team. All threads within a team access the same storage area for shared data.

5.3.6 -psyntax

This switch specifies the set of parallel directives that KAP recognizes. Its values are openmp (the default) and kap.

The setting -psyntax=kap is useful if you are migrating applications that contain KAP Parallel Computing Forum (PCF) directives. The KAP parallel runtime library (libkmp-osfp10.a) will be used to implement the multithreading.

The setting -psyntax=openmp is required if your applications use OpenMP directives. Usage of OpenMP directives implies one of the following conditions:

You have manually inserted these directives in your source program and want KAP to process them.
You specify automatic parallelization to KAP and thus KAP inserts OpenMP directives.

The compiler will be used to implement the multithreading.

5.3.7 -scheduling=<list>, -sched=<list>, (-scheduling=e)

The -scheduling switch tells KAP the kind of scheduling to use for loop iterations on a multiprocessor system. The -scheduling options are as follows:

e --- selects even scheduling where KAP divides the iterations into equal size chunks so that the number of chunks does not exceed the number of processors. For example, if there are 9 iterations and 3 processors, the first processor would execute iterations 1, 2, and 3, the second processor, iterations 4, 5, and 6, and the third processor, iterations 7, 8, and 9.
s --- selects static scheduling where the processors execute iterations starting from their processor number and skipping iterations equal to the total number of processors. For example, if there are 9 iterations and 3 processors, the first processor would execute iterations 1, 4, and 7, the second processor, iterations 2, 5, and 8, and the third processor, iterations 3, 6, and 9.

5.4 Fortran Dialect Switches for kapf90

The following sections explain the function of each Fortran dialect switch.

5.4.1 -align_common, -align_common, (-align_common=8)

The -align_common switch aligns data elements in COMMON blocks. Its integer value represents the boundary size in bytes. The default is -align_common=8.

5.4.2 -align_struct, -align_struct, (-align_struct=4)

The -align_struct switch aligns subfields. Its integer value represents the boundary size in bytes. The default is -align_struct=4.

5.4.3 -assume, -a, (-assume=cel), -noassume, -na

The -assume switch tells KAP to make certain global assumptions about the program being processed. Most of these can also be controlled by various assertions (see Chapter 7). The -assume switch settings and the corresponding KAP assertions are as follows:

a --- Different subroutine or function parameters can refer to the same object. !*$* assert argument aliasing
b --- Array subscripts can go outside the declared bounds. !*$* assert bounds violations
c --- Constants used in subroutine or function calls will be placed in temporary variables. !*$* assert temporaries for constant arguments.
e --- EQUIVALENCE statements can cause different names to refer to the same memory location. !*$* assert equivalence hazard
l --- Unless KAP can prove they are not needed, KAP must insert code to assign to variables in transformed loops the values they would have had after the original serial loop. !*$* assert last value needed

By default, KAP assumes that a program conforms to the Fortran 77 standard, that is, -assume=el. The default includes -assume=c to simplify some analysis and inlining.

To disable all the above assumptions, enter -noassume on the command line.

5.4.4 -datasave, -ds, (-datasave), -nodatasave, -nds

The -datasave switch tells KAP to treat local variables in a subroutine or function that appear in DATA statements as if they were also in SAVE statements. That is, their values will be retained between invocations of the subroutine or function. This is the practice of many commercial Fortran compilers. This choice affects certain optimizations performed by KAP.

The negative switch, -nodatasave, complies with the Fortran 77 standard.

5.4.5 -dlines, -dl, (-nodlines), -ndl

The -dlines switch allows a D in column 1 to be treated like a character space. The rest of that line will then be parsed as a normal Fortran 90 statement. By default, KAP treats these lines as comments. This switch is useful for the inclusion or exclusion of debugging lines. Data dependence relationships may be different when the D lines are included.

In the following example, the -nodlines default would cause the WRITE statement to be treated as a comment:

DO 10 I = 1,N A (I) = B (I) D WRITE (*,*) A (I) 10 CONTINUE

But when -dlines is specified, KAP sees a WRITE statement and will not optimize the whole loop as it is:

DO 2 I=1,N A(I) = B(I) 2 CONTINUE DO 3 I=1,N WRITE (*, *) A(I) 3 CONTINUE

5.4.6 -escape, -noescape, (-escape)

The -escape switch causes KAP to scan escape characters in input lines.

5.4.7 -freeformat, -ff, (-nofreeformat)

The -freeformat command-line switch removes the standard column restrictions for Fortran source code. For example, source files can be up to 132 columns and use an ampersand (&) at the end of the line to indicate continuation. See the Fortran Language Reference manual for more information.

The -freeformat switch is off by default, and the usual Fortran 77 conventions apply. For example, files are truncated after column 72 unless you specify the Compaq Fortran flag -extend_source. A character (except a zero or a blank) in column 6 indicates a continuation line.

5.4.8 -integer, -int, (-integer=4)

This switch specifies a size in bytes, N, for the default size of INTEGER variables. When N=2 or 4, take INTEGER*N as the default INTEGER type. When N=0, use the ordinary default length for INTEGER variables.

Executing kf90 and explicitly calling the compiler switch -noi4 will cause KAP to be called with the command switches -integer=2 and -logical=2.

5.4.9 -intlog, (-intlog)

The -intlog switch enables the mixing of integer and logical operands in expressions. When integer operands are used with logical operators, the operations are performed in a bitwise manner. When logical operations are used with arithmetic operators, the operands are treated as integers.

5.4.10 -kind, (-kind), (-kind=4)

The -kind switch establishes the value for the Fortran 90 KIND type parameter used when KIND has not been specified or KIND=0 is specified. -kind applies to all data types: logical, integer, real, and complex. The values for -kind are 4 or 8 with 4 being the default. The -kind switch allows you to change the underlying precision of compuations without violating the Fortran 90 standard constraints that default logical, default integer and default real occupy the same amount of storage and that default double precision and default complex occupy twice the storage of default real.

5.4.11 -logical, -log, (-logical=4)

This switch specifies a size in bytes, N, for the default size of LOGICAL variables. When N=1, 2, or 4, take LOGICAL*N as the default LOGICAL type. When N=0, use the ordinary default length for LOGICAL variables.

Executing kf90 and explicitly calling the compiler switch -noi4 will cause KAP to be called with the command switches -integer=2 and -logical=2.

5.4.12 -natural, -nat, -nonatural

This switch no longer exists. Its replacement is the pair of switches -align_common, described in Section 5.4.1, and -align_struct, described in Section 5.4.2.

Contents