Compaq KAP C/OpenMP
for Tru64 UNIX
User Guide

KAP is run after the standard C preprocessor. The code examples in this chapter show the original code before the preprocessor and the KAP transformed code with some of the C preprocessor additions stripped off for Tru64 UNIX clarity.

4.1 Switches for the kcc Driver

The following sections explain the function of each KCC driver switch.

4.1.1 -cc, -nocc, (-cc=/usr/bin/cc)

This switch provides an alternate path to the C compiler or inhibits execution of the C compiler.

4.1.2 -cext, (C file extension)

This switch tells kapc to treat files with the indicated extension as C source files.

4.1.3 -ckap, (-ckap='/usr/bin/kapc')

This switch provides a way to define an alternate path kapc preprocessor (translator).

4.1.4 -ckapargs

The -ckapargs switch passes switches to the kapc translator. This switch must precede switches to the kapc translator.

4.1.5 -cpp, (-cpp='/usr/bin/cc')

This switch provides a way to define an alternate path to the C preprocessor before execution of kapc.

4.1.6 -sif, -S (off)

Save intermediate files. Specifying -sif is equivalent to -sif=cpp,kap, which will save all kapc and C preprocessor intermediate files. Specifying -S is equivalent to -sif=kap and passing -S to the compiler, which saves the assembly-language output. Intermediate file-naming conventions are as follows:

<file>.cpp - cpp output file K<file>.c - kapc translator output file

The path and switch strings shown above must be enclosed in single or double quotes if they contain white space characters.

4.1.7 -tmpdir, (-tmpdir=/tmp/)

This is the directory to place temporary files. This switch may also be set by the environment variable TMPDIR.

4.1.8 -tune, (-tune=current system architecture)

Kapc determines whether the host Alpha architecture is ev4, ev5, or ev6 and then optimizes your program for that architecture by default. In the event you compile a program on one architecture but plan to run it on another, you should override the default by setting -tune equal to the architecture of the target system.

The kapc -tune switch and the C compiler -tune host switch work independently and perform different optimizations. If the switch appears on the command line inside -ckapargs='-tune...', for example:

> kcc myprog.c -ckapargs='-tune=ev6'

the switch value will be applied only to the kapc translator. However, in the case:

> kcc myprog.c -tune=ev6

the switch will be applied to both kapc and the C compiler.

4.1.9 -verbose, -v, (-nov)

Prints the passes as they execute with their arguments and their input and output files. Also prints final resource usage in the C-shell time format.

4.2 General Optimization Switches for the kapc Preprocessor

The following sections explain the function of each kapc general optimization switch.

4.2.1 -interchange, -nointerchange, (-interchange)

Use the -interchange switch to enable loop interchanging. KAP enables loop interchange when -interchange is specified and the -optimize level is at least 1 or the -scalaropt level is 3. If you specify -nointerchange, KAP disables loop interchange regardless of the -optimize or -scalaropt levels. Loop interchanging is enabled by default.

4.2.2 -namepartitioning, -namepart, -nonamepart, (-nonamepartitioning)

This switch tells KAP to look at distinct array names and limit the number of arrays that appear in a loop to avoid cache thrashing. That is, this switch breaks a loop containing, for example, references to arrays A and B into two loops. One loop references array A and the other loop references array B.

Two arguments (i and j) used in a -namepartitioning=i,j switch, control name partitioning as follows:

i --- specifies the minimum number of partitions. This is the preferred smallest number of distinct arrays in each distributed loop.
j --- specifies the maximum number of partitions. This is the preferred largest number of distinct arrays in each distributed loop.

If no arguments appear with the -namepartitioning switch, KAP uses its default values of 2 for the minimum and 8 for the maximum number of partitions.

Before KAP can perform name partitioning, you must specify the switch -scalaropt=n where n is greater than or equal to 3.

The -nonamepartitioning switch explicitly prevents name partitioning.

4.2.3 -natural, -nat, (-nonatural), -nnat

This switch selects between "natural" alignment (for example, double entities start on eight-byte boundaries) and non-alignment of data elements.

The -natural switch causes variables and arrays to start on boundaries that correspond to their size.

4.2.4 -optimize, -o, (-optimize=5)

The -optimize switch sets the program analysis and optimization level, ranging from 0 for minimum optimization to 5 for maximum optimization.

Each optimization level is cumulative. For example, -optimize=5 performs everything up to and including that level. Table 4-3 shows the meaning of each of the different optimization levels.

Table 4-3 Optimization Levels
Value Meaning

0 KAP performs only simple program analysis.

1 KAP performs only simple analysis and optimization. KAP can distribute loops to optimize only a part of a loop.

2 KAP optimizes any loop (and perhaps nested loops) in a loop nest. Performs lifetime analysis to determine when last-value assignment of scalars is necessary. Performs more powerful data dependence tests to find opportunities for optimization.

3 Special techniques are used to break data dependence cycles that otherwise prevent advanced optimizations. Linear recurrences are recognized. Triangular loops are recognized and loop interchanging will be attempted to improve memory referencing. Special case data dependence tests are used.

4 Two versions of a loop are generated, if necessary, to break a data dependence arc. Apply FOR loop interchanging techniques. Exact data dependence tests are used to allow more opportunities for optimization to be discovered. Special index sets, called wraparound variables, are recognized.

5 Loop fusion is enabled.

**Table 4-3 Optimization Levels**
Value	Meaning
0	KAP performs only simple program analysis.
1	KAP performs only simple analysis and optimization. KAP can distribute loops to optimize only a part of a loop.
2	KAP optimizes any loop (and perhaps nested loops) in a loop nest. Performs lifetime analysis to determine when last-value assignment of scalars is necessary. Performs more powerful data dependence tests to find opportunities for optimization.
3	Special techniques are used to break data dependence cycles that otherwise prevent advanced optimizations. Linear recurrences are recognized. Triangular loops are recognized and loop interchanging will be attempted to improve memory referencing. Special case data dependence tests are used.
4	Two versions of a loop are generated, if necessary, to break a data dependence arc. Apply FOR loop interchanging techniques. Exact data dependence tests are used to allow more opportunities for optimization to be discovered. Special index sets, called wraparound variables, are recognized.
5	Loop fusion is enabled.

A higher optimize level allows more sophisticated optimization, along with increased compilation time. Many programs that are written to be easily optimized do not need advanced transformations; with these programs, a lower optimization level will suffice.

4.2.5 -recursion, -rc, (-norecursion), -nrc

The -recursion switch informs KAP that functions in the source program may be called recursively, that is, the function calls itself, or it calls another routine which calls it.

The -recursion switch must be in force in each recursive routine that KAP processes, or unsafe transformations could result.

4.2.6 -roundoff, -r, (-roundoff=3)

The -roundoff switch allows you to specify the change from serial roundoff error that is acceptable. Certain reductions are sensitive to the algorithms used to compute them. In particular, if an arithmetic reduction is accumulated in a different order than in the scalar program, the roundoff error is accumulated differently and the final result may differ from that of the original program. While the difference is usually insignificant, some restructuring transformations performed by KAP must be disabled in order to obtain exactly the same numerical results as the original program.

KAP classifies its transformations by the amount of difference in roundoff that can accumulate, so you can decide what level of roundoff error differences is allowable. The command switch -roundoff sets the roundoff error level from 0 to 3.

Each nonzero roundoff level is cumulative. For example, level 3 performs everything up to and including that level. Table 4-4 shows the meaning of each roundoff level.

Table 4-4 Roundoff Levels
Value Meaning

0 Allow no roundoff-changing transformations. Loops containing nonarithmetic reductions (such as the largest element of a vector) may still be optimized.

1 Interchange loops around serial reductions, if -optimize>4. Simplification of expressions from forward substitution or from inside trigonometric intrinsic functions returning integer values is performed. Code floating is enabled, if -scalaropt>2. Loop rerolling is enabled, if -scalaropt>2.

2 Perform reciprocal substitution to move an expensive division outside a loop.

3 Recognize induction variables whose types are not the default integer type. Floating-point (float or double) induction variables are recognized. If -scalaropt=3, memory management is enabled. INTEGER expressions such as L/M/N can be rotated to L/(M*N).

**Table 4-4 Roundoff Levels**
Value	Meaning
0	Allow no roundoff-changing transformations. Loops containing nonarithmetic reductions (such as the largest element of a vector) may still be optimized.
1	Interchange loops around serial reductions, if -optimize>4. Simplification of expressions from forward substitution or from inside trigonometric intrinsic functions returning integer values is performed. Code floating is enabled, if -scalaropt>2. Loop rerolling is enabled, if -scalaropt>2.
2	Perform reciprocal substitution to move an expensive division outside a loop.
3	Recognize induction variables whose types are not the default integer type. Floating-point (float or double) induction variables are recognized. If -scalaropt=3, memory management is enabled. INTEGER expressions such as L/M/N can be rotated to L/(M*N).

4.2.7 -scalaropt, -so, (-scalaropt=3)

The -scalaropt switch sets the level of scalar optimization that KAP will perform. These scalar optimizations include dusty-deck transformations, dead-code elimination, and loop unrolling. The parameter indicates which level of optimization is desired.

Table 4-5 shows the value and meaning of scalar levels.

Table 4-5 Scalar Levels
Value Meaning

0 No scalar optimizations are performed.

1 Only simple scalar optimizations are performed. These include dead-code elimination, global forward substitution, and dusty-deck IF transformations.

2 The full range of scalar optimization is performed. These include floating invariant IFs out of loops, induction variable recognition, loop rerolling (if -roundoff>0), loop peeling, loop fusion, and loop unrolling.

3 Memory management is enabled, if -roundoff=3.

**Table 4-5 Scalar Levels**
Value	Meaning
0	No scalar optimizations are performed.
1	Only simple scalar optimizations are performed. These include dead-code elimination, global forward substitution, and dusty-deck IF transformations.
2	The full range of scalar optimization is performed. These include floating invariant IFs out of loops, induction variable recognition, loop rerolling (if -roundoff>0), loop peeling, loop fusion, and loop unrolling.
3	Memory management is enabled, if -roundoff=3.

4.2.8 -skip, -sk, -nsk, (-noskip)

Use the -skip switch following the -routine switch to stop KAP from processing specified routines. KAP writes out unchanged source code for the specified routines. See the description of the -routine switch in Section 4.6.15.

4.2.9 -tune, -tune, (-tune=<architecture>)

> kcc myprog.c -ckapargs='-tune=ev6'

the switch value will be applied only to the kapc translator. However, in the case:

> kcc myprog.c -tune=ev6

the switch will be applied to both kapc and the C compiler.

4.3 Parallel Processing Switches for the kapc Preprocessor

The following sections describe the kapc switches you use to control how the multiprocessor version of KAP prepares programs for parallel execution.

4.3.1 -concurrentize, -conc, -noconc, (-noconcurrentize)

The -concurrentize switch directs KAP to restructure the source code for parallel processing.

Setting -noconcurrentize disables parallel execution and allows all serial optimizations to take place. You can enable or disable parallel execution on a loop-by-loop basis using KAP pragmas. See Section 5.2 for more information.

Programs containing many loops that require synchronization or programs that have loops with small iteration counts may run slower when parallelized. In these cases you should disable parallel execution.

4.3.2 -minconcurrent, -mc, (-minconcurrent=1000)

Executing a loop in parallel incurs overhead that varies with different systems. If a loop has little work, the overhead required to set up parallel execution may make the loop execute more slowly than it would using serial execution. The -minconcurrent switch sets the level of work in a loop above which KAP executes the loop in parallel. Setting the -minconcurrent switch causes KAP to automatically set the -concurrentize switch for you.

The range of values for -minconcurrent is all numbers greater than or equal to 0. The higher the minconcurrent value, the more iterations and/or statements the loop body must have to run concurrently.

KAP estimates the amount of work inside a loop by adding the number of operators and the number of operands, excluding the loop index, in each iteration. KAP multiplies this sum by the number of iterations and designates this product as the amount of "work" of the loop. KAP then compares this estimate with the minconcurrent value. If the loop bounds are constant and the estimated amount of work is greater than the minconcurrent value, KAP generates parallel code for the loop. Otherwise, the loop executes serially.

If the FOR loop bounds are not known at compilation time, KAP generates an IF expression in the parallel pragma. The compiler interprets this parallel pragma as a request to generate a two-version loop; one version is parallel and the other is serial. A run-time check decides whether or not to execute the loop in parallel. To disable the generation of two-version loops throughout a program, use the command-line switch -minconcurrent=0.

4.3.3 -scheduling=<list>, -sched=<list>, (-sched=e)

The -scheduling switch tells KAP the kind of scheduling to use for loop iterations on a multi-processor system. The -scheduling options are as follows:

e --- selects even scheduling where KAP divides the iterations into equal size chunks so that the number of chunks does not exceed the number of processors. For example, if there are nine iterations and three processors, the first processor would execute iterations 1, 2, and 3, the second processor, iterations 4, 5, and 6, and the third processor, iterations 7, 8, and 9.
s --- selects static scheduling where the processors execute iterations starting from their processor number and skipping iterations equal to the total number of processors. For example, if there are nine iterations and three processors, the first processor would execute iterations 1, 4, and 7, the second processor, iterations 2, 5, and 8, and the third processor, iterations 3, 6, and 9.

4.4 Inlining and Interprocedural Analysis Switches for the kapc Preprocessor

The following sections explain the function of each kapc switch used in function inlining and Interprocedural Analysis (IPA). Inlining is the process of replacing a function reference with the text of the function. IPA is the process of inspecting a called function to identify relationships between the function arguments, the function returned value, global data, and the code surrounding the call, in order to identify opportunities for optimization.

Inlining and IPA can be performed in the same KAP run. The only restriction is that the same function may not be in global lists for both inlining and IPA. You can use the inline and IPA pragmas to inline a function in one place and IPA it in another. For additional information about these switches and examples of their use, see Chapter 5 and Chapter 6.

4.4.1 -inline, -inl (off), -noinline, -ninl, -ipa, -ipa, (off), -noipa, -nipa

The -inline switch provides KAP with a list of functions to inline. The -ipa switch provides KAP with a list of functions to analyze. Additionally, -ipa causes KAP to give information in the annotated listing about appropriate settings for the -ind, -inll, and -ipall switches on a loop-by-loop basis.

If you specify either the -inline or the -ipa switch without an argument list, KAP will try to inline/analyze all the called functions in the inlining (or IPA) universe specified by the -inline_from... -ipa_from... switches. If you specify a list of names, for example -inline=mkcoef,yval, just the routines named are inlined or analyzed.

The -inline and -ipa command switches can be overridden by the
#pragma _KAP inline and #pragma _KAP ipa directives. See Chapter 5 and Chapter 6 for more information about these pragmas.

A list of routines must be included with -noinline or -noipa. All routines in the inlining/IPA universe are candidates for inlining except the listed ones. See Chapter 6 for more information.

4.4.2 -inline_and_copy, -inlc, (off)

The -inline_and_copy command switch functions like the -inline switch, except that if all references to a function are inlined, the text of the function is not optimized, but is copied unchanged to the transformed code file. This switch is intended for use when inlining routines from the same file as the call, and has no special effect when the routines being inlined are taken from a library or another source file.

When a function has been inlined everywhere it is used, leaving it unoptimized saves compilation time. When a program involves multiple source files, the unoptimized function will still be available in case one of the other source files contains a reference to it, so no errors will result.

Note

The -inline_and_copy algorithm assumes that all references to the routine precede it in the source file. If the routine is referenced after the text of the routine, and that particular call site cannot be inlined, the unoptimized version of the routine will be invoked.

4.4.3 -inline_create, -incr, (off), -ipa_create, -ipacr, (off)

These switches cause KAP to build a library file containing partially-analyzed routines for later inlining/analysis. The library created is used with the -inline_from_libraries -ipa_from_libraries switches.

When you specify either of these switches, no transformed code file is generated.

Libraries created with -inline_create can be used with either inlining or IPA, because they contain essentially complete descriptions of the functions included. Libraries created with -ipa_create can be used only with IPA, because they do not have the complete text of the functions, just the data relationship information.

You can use any name for the created library. However, for maximum compatibility with the -inline_from_libraries and -ipa_from_libraries switches, Compaq recommends that you use the .klib extension.

4.4.4 -inline_depth, -ind, (-inline_depth=2), -ipa_depth, -ipad, (-ipa_depth=2)

The -inline_depth, -ipa_depth switches set the maximum level of function nesting, that is, calls to functions with calls to functions and so forth, that KAP will attempt to inline or analyze. Higher switch values cause KAP to trace function references further. The values and their meanings are as follows:

1 to 10 --- Recursively inline/analyze functions to this depth.
0 --- Use the default value.
-1 --- Inline/analyze only routines that do not contain function references.

The #pragma _KAP [no]inline and #pragma _KAP [no]ipa directives, when enabled, are not affected by -inline_depth or -ipa_depth restrictions.

4.4.5 -inline_from_files, -inff, (Current Source File)

See Section 4.4.8.

4.4.6 -inline_from_libraries, -infl, (off)

See Section 4.4.8.

4.4.7 -ipa_from_files, -ipaff, (Current Source File)

See Section 4.4.8.

4.4.8 -ipa_from_libraries, -ipafl, (off)

The -..._from_... switches provide KAP with the locations of functions available for inlining/IPA. The total set of available functions is called the inlining (or IPA) universe.

The -..._from_files switches take the names of source files and directories containing source files.

The -..._from_libraries switches take the names of libraries created with the -..._create switches and directories containing such libraries. In directories, the KAP libraries are identified by the .klib extension.

Multiple files/libraries or directories can be given in one -..._from_... switch, separated by commas and enclosed by parentheses. Multiple -..._from_... switches can be specified on the command line. KAP searches for functions in the provided files and libraries in the order in which they appear on the command line.

Contents

Index

Compaq KAP C/OpenMP for Tru64 UNIXUser Guide

4.1.1 -cc, -nocc, (-cc=/usr/bin/cc)

4.4 Inlining and Interprocedural Analysis Switches for the kapc Preprocessor

4.4.3 -inline_create, -incr, (off), -ipa_create, -ipacr, (off)

Compaq KAP C/OpenMP
for Tru64 UNIX
User Guide