Compaq KAP C/OpenMP
for Tru64 UNIX
User Guide

4.6.18 -syntax, -sy, (-syntax=d)

Use the -syntax switch to select the dialect of C that KAP will accept. The default dialect set with -syntax=d is C. Specifying -syntax=a tells KAP to accept ANSI C --- extensions will be flagged with Warning messages. Specifying -syntax=k tells KAP to accept common C. The -standard compiler switch settings affect the -syntax switch settings, as follows:

Explicitly calling the compiler switch -standard=ansi will cause KAP to be called with the command switch -syntax=a.
Calling the C compiler without any compiler switches or with the compiler switch -standard=relaxed_ansi will cause KAP to be called with the command switch -syntax=d.

4.6.19 -unroll, -ur, (-unroll=4), -unroll2, -ur2, (-unroll2=160), -unroll3, -ur3, (-unroll3=1)

The -unroll, -unroll2, and -unroll3 switches modify how KAP unrolls inner FOR loops. More work per iteration with fewer iterations gives less overhead. The -scalaropt=2 switch level is required to enable inner loop unrolling. Each iteration reaches the value given by the -unroll2 switch.

Note

If you use kapc with the Compaq C compiler optimization switch set to -O5, you should turn off KAP's loop unrolling by setting -unroll=1.

Outer FOR loop unrolling is performed as part of memory management and is not controlled by these switches.

The syntax for -unroll and -unroll2 and -unroll3 is as follows:

Long forms: -unroll=<#IT> or -unroll2=<WEIGHT> or -unroll3=<WEIGHT> Short forms: -ur=<#IT> or -ur2=<WEIGHT> or -ur3=<WEIGHT>

Where:

<#IT> is the maximum number of iterations to unroll.
=0 uses default values to unroll.
=1 means no unrolling.
<WEIGHT> is the maximum (-unroll2) or minimum (-unroll3) weight, that is, estimate of work, in an unrolled loop. <WEIGHT> is estimated by counting operands and operators in a loop. The amount of work in each loop iteration is shown in the loop table in the annotated listing.

There are two ways to control loop unrolling. The first is to set the maximum number of iterations that can be unrolled; the second is to set the maximum amount of work to be done in an unrolled iteration. KAP will unroll as many iterations as possible while keeping within both these limits, up to a maximum of 100 iterations. NO warning is given if you request more than 100 unrolled iterations.

The default (4,200) means that the maximum number of iterations to unroll is 4 and that the maximum amount of work is 200.

By increasing or decreasing the maximum iteration workload, you can control the amount of work that ends up in each loop iteration, as long as the number of unrolled iterations does not exceed the unroll limit. The workload is estimated by adding operations, including subscripts and assignments; scalars, not including the loop index; and if statements. Loops with function calls are weighted more heavily and are never unrolled. The following example demonstrates the workload limit. Assume that -unroll=3 and -unroll2=24 are the switch settings.

for ( i=0; i<n; i++ ) { a[i] = b[i]+c[i]; }

The amount of work in this loop is 5. By default, the loop would be unrolled three times, because that is the maximum allowed by the unroll limit, and the resulting weight (3X5) is less than the unroll2 limit of 24.

If you set the -unroll2 limit to 10, the loop would be unrolled twice because unrolling the original loop three times would produce a loop with workload of 15, which would exceed the -unroll2 limit. The result would be the following:

for ( i = 0; i<=n - 2; i+=2 ) { a[i] = b[i] + c[i]; a[i+1] = b[i+1] + c[i+1]; } for ( ; i<n; i++ ) { a[i] = b[i] + c[i]; }

The unroll3=n switch sets the lower limit for unrolling. If there are less than n units of work in the loop (same units as -unroll2), the loop will not be unrolled. The amount of work in each loop iteration is shown in the loop table in the annotated listing. This switch value should be left at 1, the default. A value less than the default could result in a program that executes more slowly.

4.7 Input-Output Files Switches for the kapc Preprocessor

The following sections explain the function of each kapc switch that affects KAP input-output file selection.

4.7.1 -cmp, (<file>.cmp.c), (<file>.cmp), -nocmp, -ncmp

The -cmp switch causes KAP to save the optimized C program under the file name of your choice. By default, kcc names the optimized source <file>.cmp.c, where <file> is the program name from the command line. The kapc default is to name the optimized source program <file>.cmp.

The kapc default is to name the optimized source program <file>.cmp. The Compaq C compiler will not process a file with the default .cmp extension. Consequently, you should override the default by using the -cmp switch in the kapc command line to rename the optimized source <file>.cmp.c. See the examples in Section 2.7.

Both kcc and kapc place the optimized source file in the current directory. To disable generation of the optimized C output file, enter -nocmp on the command line.

To disable generation of the optimized C output, enter -nocmp on the command line.

4.7.2 -list, -l, (-nolist), -nolist, -nl

The -list switch tells KAP where to put the listing requested with the -listoptions switch. If -list=file is specified, the listing is written to that file. To explicitly disable generation of the listing file, enter -nolist on the command line.

Specifying just -list with no file name will cause the listing file to be written to <file>.out, where <file> is the input file name with any trailing .c stripped off.

4.8 Listing Switches for the kapc Preprocessor

The following sections explain the function of each kapc switch concerning the listing file or the optional listing information available in the transformed code file.

4.8.1 -cmpoptions, -cp, (-nocmpoptions), -nocmpoptions, -ncp

The -cmpoptions switch specifies optional additional information for inclusion in the transformed code (.cmp) file. The only additional information currently selectable is special line-number directives. These are enabled with -cmpoptions=i.

Special line numbers are # line directives that may appear in the transformed program file in order to reference line numbers of the source code. The line in the transformed code that immediately follows a # line comment is either the transformed version of the line in the source code that is referenced, or a line that KAP inserted before the referenced line. The name of the source file from the command line is included, in the form it had on the KAP command line.

In the following unrolled loop, the for in the source code was on line 7, and the assignment was on line 8:

# line 7 "-./csource/unr5.c" for ( i = i1 + 1; i<=n; i+=3 ) { a[i] = b[i] / a[i-1]; # line 8 "-./csource/unr5.c" a[i+1] = b[i+1] / a[i]; # line 8 "-./csource/unr5.c" a[i+2] = b[i+2] / a[i+1]; # line 8 "-./csource/unr5.c" }

4.8.2 -lines, -ln, (-lines=55)

The -lines switch tells KAP to paginate the listing file properly for printing on different line printers. You can change the number of lines per page on the listing by using -lines=number. The -lines=0 switch tells KAP to paginate only after completing a -listoptions request (see Section 4.8.4) for each program unit (function) processed.

4.8.3 -listingwidth, -lw, (-listingwidth=132)

The -listingwidth switch sets the maximum line length for the listing file produced by KAP. The switch setting affects the format of the loop summary table (-lo=l) and the KAP switches table (-lo=k). The default, 132, is optimal for most line printers. The alternative, 80, is more convenient for looking at the listing file on most terminals. At present, no other values are allowed.

4.8.4 -listoptions, -lo, (-no listing)

The -listoptions switch tells KAP what information to include in the listing and error files:

Value Output

c Calling tree at the end of the program listing

k KAP switches used written at the end of each program unit

l Loop information table

n Program unit names, as processed, to the error file

p Compilation performance statistics

s Loop summary

Value	Output
c	Calling tree at the end of the program listing
k	KAP switches used written at the end of each program unit
l	Loop information table
n	Program unit names, as processed, to the error file
p	Compilation performance statistics
s	Loop summary

The transformed code is recorded in the transformed code file regardless of whether you request a listing file.

See Chapter 8 for examples of the different types of KAP listing output.

Chapter 5
Assertions and Directives

Assertions enable the programmer to provide KAP with additional information about the program. Although many KAP users run the product without assertions, sometimes assertions can improve the optimization results. Directives control advanced features and transformations on a local basis.

KAP does not guarantee that an assertion will have an effect. KAP notes the information provided by the assertion, and if that information helps, KAP uses it.

A variable used in a pragma needs to be declared before it is used in that pragma; otherwise, KAP detects an error.

To understand the process KAP uses in interpreting assertions, it is necessary to understand assumed dependences. In the following loop where X is an array, n and m are scalars, and nothing is known about the relationship between n and m, there are two types of dependences, as follows:

for (i=0; i<n; i++) X[i] = X[i-1] + X[m];

Between X[i] and X[i-1] there is a FORWARD dependence, and the distance is known to be one. Between X[i] and X[m], KAP tries to find a relation but cannot, because it does not know the value of m in relation to n. The second dependence is called an ASSUMED dependence, because it is assumed but cannot be proven to exist.

Assertions can be unsafe, because KAP cannot check the correctness of the assertions. If you provide incorrect information, then the KAP generated code may give different results than the original scalar program generates.

Avoid giving a series of similar, redundant assertions, and avoid mixing assertions and hand-coded parallel C pragmas. Either can cause KAP to process some of the pragmas and not others, potentially giving undesired results.

Table 5-1 lists KAP assertions and directives and their durations.

Table 5-1 KAP Assertions and Directives
Name Duration

Assertions

#pragma _KAP arl=(<integer>) Selectable

#pragma _KAP distinct (name, name) Selectable

#pragma _KAP no side effects(name) Program unit

Assertions (Parallel Processing) --- affect automatic detection

#pragma _KAP concurrent Loop

#pragma _KAP concurrent call Loop

#pragma _KAP concurrent ignore call Loop

#pragma _KAP serial Loop

Directives (Inlining and IPA)

#pragma _KAP inline [here|routine|global] [( name[,name..] )] Selectable

#pragma _KAP ipa [here|routine|global] [( name[,name..] )] Selectable

Directives (Parallel Processing)

#pragma _KAP minconcurrent (<integer>) Program unit

**Table 5-1 KAP Assertions and Directives**
Name	Duration
Assertions
#pragma _KAP arl=(<integer>)	Selectable
#pragma _KAP distinct (name, name)	Selectable
#pragma _KAP no side effects(name)	Program unit
Assertions (Parallel Processing) --- affect automatic detection
#pragma _KAP concurrent	Loop
#pragma _KAP concurrent call	Loop
#pragma _KAP concurrent ignore call	Loop
#pragma _KAP serial	Loop
Directives (Inlining and IPA)
#pragma _KAP inline [here\|routine\|global] [( name[,name..] )]	Selectable
#pragma _KAP ipa [here\|routine\|global] [( name[,name..] )]	Selectable
Directives (Parallel Processing)
#pragma _KAP minconcurrent (<integer>)	Program unit

5.1 Assertions

The following sections describe each of the KAP assertions.

5.1.1 #pragma _KAP arl=<integer>

This is the assertion form of the -addressresolution command switch. You can use it to specify, on a function-by-function basis, the degree of data aliasing in a program. Aliasing is the use of multiple names including pointers to refer to the same memory location. This is often a useful technique, but it complicates data-dependence analysis and can reduce the opportunities for optimization. You can use this pragma to tell KAP how cautious to be about multiple names affecting the same object.

The levels are cumulative --- the assumptions made at one level include the assumptions made at earlier levels. The permitted values are as follows:

0 --- KAP makes no assumptions about aliasing.
1 --- Assume there is no pointer self-referencing (default).
2 --- Assume function arguments are distinct from each other.
3 --- Assume local pointers and arrays are distinct from global pointers and arrays.
4 --- No aliasing is used --- all variables, pointers, and arrays refer to different objects.

See Chapter 4, the description of the -addressresolution switch, for a detailed description of the meaning of each level.

When this pragma appears within a function between the outer { and } of the function definition, it applies only to that function. When this pragma appears elsewhere, it applies to all functions following it in the source file. Placing a pragma within a function overrides (for that function) the global pragma or command switch, so different values can be used in different parts of a program. The -addressresolution command switch acts like an assertion at the beginning of the source file, and can be overridden by #pragma _KAP arl assertions later in the file.

See also the #pragma _KAP distinct assertion ( Section 5.1.2) that can be used to specify specific pairs of variables that are not aliased.

5.1.2 #pragma _KAP distinct

This assertion tells KAP that listed objects (variables, arrays, items pointed to with a pointer) do not overlap in memory. The syntax of this pragma is as follows:

#pragma _KAP distinct (expr1, expr2[, expr3,...])

Where expr1, expr2,... represent objects. The form is as follows:

id --- a variable
*id --- what the pointer id points to
id[] --- the array whose name is id

For example, if the object pointed to by a pointer p never overlaps with the array a[i] for any i used in the program, this can be asserted with #pragma _KAP distinct (*p, a[]).

All variables specified in this assertion must have been previously declared.

The range of this assertion is the function where it was written and any succeeding functions. If this assertion is made about local variables or function parameters, it will have no effect outside the immediate function.

5.1.3 #pragma _KAP no side effects ( name [,name...] )

C functions frequently produce more information than just the returned value. Changing values of arguments with pointers or arrays, changing global data, and I/O make a function unsafe to parallelize. The no side effects assertion indicates that all of the functions named can be assumed to be safe to execute concurrently. This means that they perform no I/O and they modify only local variables. If pointers or array names are passed to the routines, it is assumed that the memory locations they represent are not modified. The functions named by the #pragma _KAP no side effects () must have been declared before the assertion.

Warning

The #pragma _KAP no side effects () assertion tells KAP to assume that all external functions are thread-reentrant. This will override KAP default behavior, which is to assume that all external functions are NOT thread-reentrant or thread-safe. If the external functions are not thread-safe, and you use #pragma _KAP no side effects (), your program may not execute correctly. For example, local variables in functions are thread-safe only if they are stored as thread-specific data. See the Guide to DECthreads for further information on thread-safe functions.

5.2 Parallel Processing Assertions

The following sections describe assertions available in the multiprocessor version of KAP.

5.2.1 #pragma _KAP concurrent

This assertion tells KAP to ignore assumed dependences and to prefer parallel execution of the immediately following loop. KAP continues to honor dependences it finds.

This assertion is in effect only for the loop it precedes. KAP does not generate parallel code if you use the -noconcurrentize command-line switch.

5.2.2 #pragma _KAP concurrent call

The #pragma _KAP concurrent call assertion tells KAP that the function calls in the immediately following loop can execute in parallel. KAP ignores all potential data dependences due to function arguments.

This assertion does not apply to any nested or surrounding loops. Place the concurrent call assertion before each loop with function references that can execute in parallel, as shown in the following example:

main() { double x[800][5000]; int row,col; float time1,time2; csettime_(1000); time1 = ctimec_(); #pragma _KAP concurrent call for(row=0; row<800; row++){ for(col=0; col <5000; col++){ x[row][col]=sin(-(double)(row)) + sin(-(double)(row)); } } time2 = ctimec_(); printf("time: %f\n",time1-time2); }

Using the #pragma _KAP concurrent call assertion and processing with the -unroll=1 and -conc switches causes KAP to generate the following code:

int main( ) { double x[800][5000]; int row; int col; float time1; float time2; int _Kii1; int _Kii2; csettime_( 1000 ); time1 = ctimec_( ); #pragma omp parallel shared(x) private(_Kii1,_Kii2) { #pragma omp for nowait for (_Kii1 = 0;_Kii1<=799;_Kii1++) { for (_Kii2 = 0;_Kii2<=4999;_Kii2++) { x[_Kii1][_Kii2] = sin(-((double)(_Kii1)))+sin(-((double)(_Kii1)) ); } } } time2 = ctimec_( ); printf("time:%f\n",time1 - time2 ); }

KAP does not generate parallel code if you use the -noconcurrentize command-line switch.

5.2.3 #pragma _KAP concurrent ignore call

The #pragma _KAP concurrent ignore call assertion tells KAP to execute both the #pragma _KAP concurrent and the #pragma _KAP concurrent call assertions in the immediately following loop.

5.2.4 #pragma _KAP serial

The #pragma _KAP serial assertion forces the loop immediately following it to be serial, and restricts optimization by forcing all enclosing loops to be serial also. Inner loops and other loops inside the same enclosing loop nest, but not enclosing the serial loop, may be optimized. KAP always honors this assertion.

5.3 Inlining and IPA Directives

The -inline_... and -ipa_... directives control manual inlining and IPA. See Chapter 6 for a more complete description.

The -inline_... and -ipa_... directives let you manually select which functions to inline or interprocedure-analyze at which call sites. If these directives appear with a name list, all occurrences of the named functions will be inlined/analyzed, if possible, in all references within the scope of the directive. If a directive appears without a list of functions, all function references are eligible.

The no forms turn off inlining and IPA of the named functions.

The routine and global scopes can be terminated by the corresponding no directives. Likewise, -noinline directives can be terminated with the positive directive.

Enabled pragmas override the -inline, -ipa, -inline_depth,
-inline_looplevel, and -ipa_looplevel command switches. You can use them in addition to, or in place of, command-line controlled, inlining/IPA.

Note

An -inline or -ipa command switch must be specified for the corresponding directive to be enabled. The -inline_manual or -ipa_manual switch will enable the corresponding directive without activating the automatic selection algorithms. See the description of the -..._manual command switches in Chapter 4 or Chapter 6 for more information.

Contents

Index

Compaq KAP C/OpenMP for Tru64 UNIXUser Guide

4.6.18 -syntax, -sy, (-syntax=d)

4.6.19 -unroll, -ur, (-unroll=4), -unroll2, -ur2, (-unroll2=160), -unroll3, -ur3, (-unroll3=1)

Chapter 5Assertions and Directives

5.2 Parallel Processing Assertions

Compaq KAP C/OpenMP
for Tru64 UNIX
User Guide

Chapter 5
Assertions and Directives