Compaq KAP C/OpenMP
for Tru64 UNIX
User Guide


Previous Contents Index

4.4.9 -inline_looplevel, -inll, (-inline_looplevel=2), -ipa_looplevel, -ipall, (-ipa_looplevel=2)

The -..._looplevel switches enable you to limit inlining to just functions that are referenced in nested loops, where the effects of reduced function call overhead or enhanced optimizations will be multiplied.

The parameter is defined from the most deeply nested function reference. The -inll=1 switch restricts inlining to functions referenced in the deepest loop nest. The -inll=3 switch restricts inlining to those routines referenced at the three deepest levels. The for loop nest level of each function reference is included in the optional calling tree section of the listing file.

The #pragma _KAP [no]inline and #pragma _KAP [no]ipa directives, when enabled, are not affected by the looplevel restrictions.

4.4.10 -inline_manual, -inm, (off), -ipa_manual, -ipam, (off)

These switches cause KAP to recognize the #pragma _KAP [no]inline and #pragma _KAP [no]ipa directives. This allows manual control over which functions are inlined/analyzed at specific call sites.

The default is to ignore these pragmas. When any inlining or IPA switch is included on the command line, the inline or ipa pragmas, respectively, are enabled. The -inline_manual and -ipa_manual switches are provided so the pragmas can be enabled without activating the automatic inlining or IPA algorithms. Because #pragma _KAP [no]inline and #pragma _KAP [no]ipa are not otherwise affected by the -inline=, -ipa=, -inline_depth, and -.._looplevel command switches, you can use them with command-line control to select functions or call sites that the regular selection algorithm would reject.

See Chapter 5 and Chapter 6 for more information about the inline and ipa pragmas.

4.4.11 -inline_optimize, (-inline_optimize=0), -ipa_optimize, (-ipa_optimize=0)

The switches -inline_optimize and -ipa_optimize help you to optimize large programs by causing KAP to set other switches depending on the value you replace for <integer>. The values and meanings for <integer> are as follows:

4.5 Language Switches for the kapc Preprocessor

The following sections provide information about kapc language switches.

4.5.1 -signed, -signed, (-signed)

The -signed switch changes char objects to signed char. This switch is sometimes necessary when porting code from other platforms whose C compiler defaults char to signed char.

To turn off the -signed switch, use -unsigned on the command line.

4.6 Advanced Optimization Switches for the kapc Preprocessor

These kapc switches control, or provide information for, transformations that are machine-specific or program-specific. They are provided to allow the advanced user to experiment with obtaining the maximum optimization of a specific application code.

Some of these switches set parameters that KAP uses to optimize memory hierarchy usage.

Knowing how much data can be kept in fast memory (cache or arithmetic registers), and the costs of moving data in the memory hierarchy, enable better optimization of memory reference patterns. The -scalaropt=3 and -roundoff=3 switches are required for memory management to be enabled.

4.6.1 -addressresolution, -arl, (-arl=1)

This switch tells KAP what level of data aliasing (the use of multiple names for the same memory location) may be present in the program. When there might be multiple ways for the same variable to be referenced, KAP is more cautious about transforming the code in ways that might change the order in which variables and arrays are used.

The associated pragma #pragma _KAP arl=<n> has the same meaning. The command switch is equivalent to a pragma at the beginning of the file, and is thus overridden by other pragmas later in the file.

The meanings of the individual levels are as follows:

4.6.2 -arclimit, -arclm, -noarclimit, (-arclimit=5000)

The -arclimit switch is used to set the size of the dependence arc data structure that KAP uses to perform data dependence analysis. This data structure is dynamically allocated on a loop-nest-by-loop-nest basis. See Appendix A for a description of data-dependence analysis.

The formula that you use to estimate the number of dependence arcs for a given loop nest is as follows:


dependence_array_size = max (#_of_statements * 4,  arclimit value) 

This is an estimate because KAP is assuming that EACH statement, in the worst case, would have four dependence arcs.

When the Loop Information Table is included in the listing file
(-listoptions=l), any loop that was too complex for the dependence data structure to hold the information will be marked as too many stmts/DD arcs. Increasing the -arclimit value may enable KAP to optimize the loop. If -arclimit is already at its maximum value, you can try simplifying the loop or dividing it into smaller loops.

The maximum -arclimit value allowed is 5000. If you specify a value greater than 5000, KAP will default to 5000 in its allocation of the data-dependence array.

Note

Most users do NOT need to change this value.

4.6.3 -cacheline, -chl, (-cacheline=32)

The -cacheline switch tells KAP the width of the memory channel in bytes between cache and main memory.

The -cacheline switch can take a second argument, for example, -cacheline=32,64. When two arguments are specified, the first argument gives the width of the memory channel between the primary cache and the secondary cache, and the second argument gives the width of the memory channel between the secondary cache and main memory. Omitting the second argument, or specifying the default 32 tells KAP to not optimize secondary cache usage.

4.6.4 -cache_prefetch_line_count, -cplc, (-cplc=0)

The -cache_prefetch_line_count gives the number of additional lines prefetched into the cache during a cache miss.

4.6.5 -cachesize, -chs, (-cachesize=8,0)

The -cachesize switch tells KAP the size in kilobytes of the cache memory.

The -cachesize switch can take a second argument, for example, -cachesize=8,128. When two arguments are specified, the first argument gives the size of the primary cache, and the second argument gives the size of the secondary cache. Omitting the second argument, or specifying the default 0 tells KAP to not optimize secondary cache usage.

When -tune=ev6, the default values for -chs are 32,0.

4.6.6 -dpregisters, -dpr, (-dpregisters=32)

The -dpregisters switch specifies the number of double precision registers for each processor.

4.6.7 -each_invariant_if_growth, -eiifg, (-eiifg=20)

When a loop contains an if statement whose condition does not change from one iteration to another, the same test must be repeated for every iteration. The code can often be made more efficient by floating the if outside the loop and putting the then and else sections into their own loops.

This gets more complicated when there is other code in the loop, because a copy of it must be included in both the then and else loops, as shown in the following example:


for ( i = ...) { 
          section-1 
              if ( ) { 
                 section-2 
              } 
          else 
               { 
                 section-3 
                } 
                  section-4 
enddo 

Becomes:


      if  ( ) { 
                for ( i = ...) { 
                      section-1 
                      section-2 
                      section-4 
                    } 
                } 
               else 
                { 
                 for ( i = ...) { 
                      section-1 
                      section-3 
                      section-4 
                     } 
              } 

When sections 1 and 4 are large, the extra code generated can slow a program down through cache contention, extra paging, and so on, more than the reduced number of if tests speed it up. The -each_invariant_if_growth switch provides a maximum size in number of lines of executable code of sections 1 and 4 below which KAP tries to float an invariant if outside a loop.

The total amount of additional code generated in a program unit through invariant-if floating can be limited with the -max_invariant_if_growth switch.

4.6.8 -fpregisters, -fpr, (-fpregisters=32)

The -fpregisters switch specifies the number of single precision (that is, float) registers for each processor.

4.6.9 -fuse, -nfuse, (-nofuse)

The -fuse switch tells KAP to perform loop fusion. Loop fusion is a conventional compiler optimization that transforms two adjacent loops into a single loop. Using data dependence tests allows fusion of more loops than standard techniques allow. Before KAP can perform loop fusion, you must specify the switch -scalaropt=2 or -optimize=5.

4.6.10 -fuselevel, (-fuselevel=0)

The fuselevel option further controls the level of loop fusion. (Whenever you set -fuselevel, KAP automatically sets -fuse.)

The possible settings for this option are the following:

4.6.11 -heaplimit, -heap, (-heaplimit=100)

KAP may require large amounts of memory in order to process your source code. The -heaplimit option specifies the maximum size in megabytes that the KAP heap can grow. If this limit is reached, KAP will stop processing your source code and try to exit with an "out of memory" error message.

If you choose a -heaplimit setting that is greater than the amount of memory that your machine has available, KAP may run out of memory before it reaches the -heaplimit. KAP relies upon the operating system to tell it that the process has run out of memory before that problem occurs. Some operating systems kill KAP without first telling KAP that there is insufficent memory. In that case, KAP may stop processing your code and exit in an undefined manner. Using -heaplimit makes a graceful exit more likely.

4.6.12 -limit, -lm, (-limit=50)

KAP estimates how much time is required to analyze each loop nest construct. If a loop is too deeply nested, KAP ignores the outer loop and recursively visits the inner loops. The -limit switch is a rough dial to control what KAP thinks is too deeply nested.

Larger loop nest limits may allow more optimization for deeply nested loop structures, but may take more compilation time. The limit does not correspond to the for loop nest level; rather, it is an estimate of the number of loop orderings that can be generated from the loop nest. The -limit switch resets this internal limit.

Note

Most users do NOT need to change this value.

4.6.13 -machine, -ma, -noma, (-machine=s)

Set the -machine switch according to the characteristics of the system on which KAP C output runs.

Use any combination of the following switch settings, except do not specify switches s and n simultaneously:
n Prefers non-stride-1 array access over stride-1 array access. This is suitable for machine architectures that have special interleaved memory hardware where non-stride-1 array access provides the best performance.
o Directs KAP C not to parallelize innermost loops when optimizing, that is, to parallelize only outermost loops. This capability is available to prevent concurrentization on applications that have small inner loop bounds, thereby reducing overhead costs. KAP C makes decisions concerning the overhead:benefit ratio when making concurrentization decisions. When the loop bounds are unknown at compile time, KAP C may generate concurrent code for innermost loops, a practice that may be inefficient for the actual loop bounds.
s Directs KAP C to prefer optimization of a for loop that generates stride-1 (contiguous) references over one that generates non-stride-1 operands. Some computers perform better if consecutive references are contiguous in memory.

To disable all the switches, enter -nomachine on the command line.

4.6.14 -max_invariant_if_growth, -miifg, (-miifg=500)

When a loop contains an if statement whose condition does not change from one iteration to another (loop-invariant), the same test must be repeated for every iteration. The code can often be made more efficient by floating the if outside the loop and putting the then and else sections into their own loops.

This gets more complicated when there is other code in the loop, because a copy of it must be included in both the then and else loops. The -max_invariant_if_growth switch allows you to set the total number of noncomment lines of code in each function. Invariant-if restructuring will not create additional lines of code if that would exceed this limit. Other KAP transformations can add or delete code.

The maximum amount of additional code generated in a single loop through invariant-if floating can be limited with the -each_invariant_if_growth switch.

4.6.15 -routine, -rt, -nrt, (-noroutine)

The -routine switch allows you to specify other switches that apply to specific routines within the source file that KAP processes. The only switches that -routine can specify are as follows:

The syntax of a KAP command with the -routine switch requires that -routine and the switches it specifies come at the end of the command line after the C source file, for example:


kapc [-<switches>] source_file.c 
-routine=<routine_name>[,<routine_name>...] 
-<switches_for_routine_names> 
... 

Note

If the -routine switch and the switches it specifies are not at the end of the command line after the source file, KAP generates the following error message:


Command line error -- An input file has not been specified on the 
command line. 
KAP -- Command Line Syntax Error Detected. 

You can specify switches that apply to all routines in the source file after kcc or kapc. Of course, <routine_name> must be a routine in source_file.c. Finally, switches for each instance of <routine_name> must come from the preceding bulleted list. In particular, the -skip does not process the associated routine.

For example, consider the following command line:


kapc -scalaropt program.c -routine=sub_1 -roundoff=2 -optimize=3 

This command invokes KAP and passes the -scalaropt switch to all program units in file program.c including sub_1. Furthermore, program unit sub_1 is processed with both the -roundoff and -optimize sub_4 switches.

Using the -routine switch implies that directives equivalent to the specified switches are asserted only while processing particular routines. The effect is the same as if the implied directives were inserted at the top of the associated routines.

Using the -routine switch also makes the resulting KAP command contain two halves. The first half looks like any other KAP command because it contains KAP switches different from -routine and a source file name. The second half is different because it contains one or more -routine switches, each with associated routines and switches for the routines selected from the preceding bulleted list.

For example, consider the following command line:


kapc -cachesize=8,0 -syntax=a my_program.c - 
-routine=sub_1,sub_2,sub_3 -roundoff=2 -optimize=3 -routine=sub_4 -unroll 

Next is an explanation of the two halves:

  1. This command invokes KAP and passes the -cachesize=8,0 and -syntax=a switches to all program units in file my_program.c. The program units include sub_1, sub_2, sub_3, and sub_4.
  2. Program units sub_1, sub_2, and sub_3 are processed with both the -roundoff and -optimize switches, while routine -sub_4 is processed with the -unroll switch. Of course, the three switches -roundoff, -optimize, and -unroll are in the preceding bulleted list.

Finally, the usual rules for shortening the names of switches also apply to the -routine switch. For example, the following KAP command fragments produce identical results:

-routine=subroutine_a -optimize=3 -unroll=4

4.6.16 -setassociativity, -sasc, (-setassociativity=1,1)

The -setassociativity switch provides information on the mapping of physical addresses in main memory to cache pages in the Level 1 and Level 2 cache. The first integer describes the set associativity of the Level 1 cache, and the second integer describes the set associativity of the Level 2 cache. A setting of n means that a page can appear in any of n places in the cache. For instance, a setting of 1 means that a page in main memory can be placed in only one place on the cache. If the cache page is already in use, its contents will have to be rewritten or flushed in order to copy the newly accessed page into the cache.

4.6.17 -stdio, -stdio, (off)

The -stdio switch tells KAP to perform strength reduction on calls to certain functions in the standard I/O library. Programs that heavily use functions such as printf will generally have improved I/O performance when this is done.

The -scalaropt=3 switch is required when you use -stdio with standalone KAP, described in Section 2.7 and Section 3.2.1. Additionally, you must specify the linker switch, -lkio, when you link your program.


Previous Next Contents Index