Compaq KAP Fortran/OpenMP
for Tru64 UNIX
User Guide


Previous Contents Index

5.4.13 -onetrip, 1, (-noonetrip), -n1

The -onetrip switch allows you to specify "one-trip" DO loops. Many pre-Fortran 77 compilers implemented DO loops that would always have at least one iteration, even if the initial value of the loop control variable were higher than the final value. This switch informs KAP that the program being processed contains loops that need the one-trip feature.

Executing kf90 and explicitly calling the compiler switch -nof77 will cause KAP to be called with the -onetrip command switch.

5.4.14 -real, -rl, (-real=4)

The -real switch tells KAP what the Compaq Fortran compiler default size for REAL variables is in bytes, N, where REAL*N can be 4 or 8. To change the default size of REAL variables, for example, from 4 to 8, first, set the Compaq Fortran compiler switch -r=8. Next, tell KAP the new size with the -real=8 switch.

5.4.15 -save, -sv, (-save=manual_adjust)

The -save switch tells KAP whether to perform live variable analysis to determine if the value of a local scalar variable in a subroutine or function needs to be saved between invocations of the routine being processed. SAVE statements will be generated for any variables requiring them. KAP will not delete or ignore a SAVE statement coded by the user.

Saving local variables may be required for correct execution of the program, but can restrict KAP optimizations.

With -save=manual, KAP assumes you have inserted the necessary SAVE statements into the code and performs no corresponding analysis of its own. The user-written SAVE statements are assumed to be correct and sufficient. This combination is not affected by the -[no]recursion switch.

The effect of -save=manual_adjust depends on the [no]recursion setting:

The effect of -save=all_adjust depends on the [no]recursion setting:

With -recursion, this is the same as -save=all_adjust:

5.4.16 -scan, (-scan=72)

The -scan switch allows you to set the length of the Fortran 90 input lines. KAP will ignore (treat as a comment) characters on columns beyond the value of the -scan switch. The value must be either 72, 120, or 132.

5.4.17 -syntax, -sy, (off)

The -syntax switch directs KAP to check for compliance with certain syntactic rules. Using a dialect switch can prevent a construct being translated differently than expected by a user who is familiar with a different implementation of Fortran. The default is to accept the superset of the ANSI Fortran 77 standard defined by Compaq Fortran, that includes many common Fortran 77 extensions. See your Fortran language reference manual for differences in the dialects.

The -syntax switch has settings as follows:

5.4.18 -type, -ty, (-notype), -nty

The -type switch causes KAP to issue warning messages for variables not explicitly typed. The -notype default suppresses this checking.

5.5 Inlining and Interprocedural Analysis Switches for kapf90

The following sections explain the function of each switch used in subprogram inlining and Interprocedural Analysis (IPA).

Inlining is the process of replacing a subroutine CALL or function reference with the text of the subroutine or function. IPA is the process of inspecting a called routine to identify relationships between the arguments, the returned value, and the code surrounding the call to identify opportunities for optimization.

Inlining and IPA can be performed in the same KAP run. The only restriction is that the same routine cannot be in global lists for both inlining and IPA. You can use the !*$* inline and !*$* ipa directives to inline a subroutine or function in one place and interprocedurally analyze it in another. (See Chapter 6 and Chapter 8 for information about these directives.)

For additional information about these switches and examples of their use, see Chapter 8.

5.5.1 -inline, -inl, (off) -noinline, -ninl, -ipa, -ipa, (off), -noipa, -nipa

The -inline switch provides KAP with a list of routines to inline. The -ipa switch provides KAP with a list of routines to analyze. Additionally, -ipa causes KAP to give information in the annotated listing about appropriate settings for the -ind, -inll, and -ipall switches on a loop-by-loop basis.

If you specify either the -inline or the -ipa switch without an argument list, KAP will try to inline/analyze all the called subroutines and functions in the inlining (or IPA) universe specified by the -inline_from... (-ipa_from...) switches, subject to restrictions imposed by the -inline_depth and -inline_looplevel (-ipa_looplevel) switches.

To permit KAP to inline routines that contain static SAVE or DATA variables use the -aggressive=c switch with -inline. The -aggressive=c switch promotes the static variables to members of a COMMON that is introduced into the program. See Section 5.6.1 for more information.

If you include a list of names, for example: -inline=mkcoef,yval, then just the routines named will be inlined or analyzed.

A list of routines must be included with -noinline or -noipa. All routines in the inlining/IPA universe are candidates for inlining except the listed ones.

The -[no]inline and -[no]ipa command switches can be overridden by the !*$* [no]inline and !*$* [no]ipa directives. (See Chapter 6 and Chapter 8 for more information about these directives.)

5.5.2 -inline_and_copy, -inlc, (off)

The -inline_and_copy command switch functions like the -inline switch, except that if all CALLs or references to a subprogram are inlined, the text of the routine is not optimized, but is copied unchanged to the transformed code file. This is intended for use when inlining routines from the same file as the call, and has no special effect when the routines being inlined are being taken from a library or another source file.

When a subprogram has been inlined everywhere it is used, leaving it unoptimized saves compilation time. When a program involves multiple source files, the unoptimized routine will still be available in case one of the other source files contains a reference to it, so no errors will result.

Note

The -inline_and_copy algorithm assumes that all CALLs and references to the routine precede it in the source file. If the routine is referenced after the text of the routine, and that particular call site cannot be inlined, the unoptimized version of the routine will be invoked.

5.5.3 -inline_create, -incr, (off), -ipa_create, -ipacr, (off)

These switches cause KAP to build a library file containing partially analyzed routines for later inlining. The library created is used with the -inline_from_libraries (-ipa_from_libraries) switch.

Libraries created with -inline_create can be used with either inlining or IPA, because they contain essentially complete descriptions of the subroutines and functions included. Libraries created with -ipa_create can be used only with IPA, because they do not have the complete text of the routines, just the data relationship information.

You can use any name for the created library. However, for maximum compatibility with the -inline_from_libraries and -ipa_from_libraries switches, Compaq recommends that you use the .klib extension.

5.5.4 -inline_depth, -ind, (-inline_depth=2), -ipa_depth, -ipad, (-ipa_depth=2)

The -inline_depth and -ipa_depth switches set the maximum level of subprogram nesting that KAP will attempt to inline. Higher values instruct KAP to trace CALLs and function references further. The values and their meanings are as follows:

Chapter 8 has examples of recursive inlining with different values of -inline_depth.

The !*$* [no]inline and !*$* [no]ipa directives, when enabled, are not affected by the -inline_depth or -ipa_depth restrictions.

5.5.5 -inline_from_files, -inff, (current source file)

See Section 5.5.8.

5.5.6 -inline_from_libraries, -infl, (off)

See Section 5.5.8.

5.5.7 -ipa_from_files, -ipaff, (current source file)

See Section 5.5.8.

5.5.8 -ipa_from_libraries, -ipafl, (off)

The -..._from_... switches provide KAP with the locations of subroutines and functions available for inlining/IPA. The total set of available routines is called the inlining (or IPA) universe.

The -..._from_files switches take the names of source files and directories containing source files. Including a directory, for example, -ipaff=/work is equivalent to the notation /work*.f90.

The -..._from_libraries switches take the names of libraries created with the -..._create switches and directories containing such libraries. In directories, the KAP libraries are identified by the .klib extension.

Multiple files/libraries or directories can be given in one -..._from_... switch, separated by commas or colons. Multiple -..._from_... switches can be specified on the command line. KAP searches for subroutines and functions in the provided files and libraries in the order in which they appear on the command line.

The -..._from_... switches do not activate inlining or IPA. The -inline or -ipa switches must be specified.

5.5.9 -inline_looplevel, -inll, (-inline_looplevel=2), -ipa_looplevel, -ipall, (-ipa_looplevel=2)

The -..._looplevel switches enable you to limit inlining to just routines that are referenced in nested loops, where the effects of reduced call overhead or enhanced optimizations will be multiplied.

The parameter is defined from the most deeply nested subprogram reference. The -inll=1 switch restricts inlining to subroutines and functions referenced in the deepest loop nest. The -inll=3 switch restricts inlining to those routines referenced at the three deepest levels. The DO loop nest level of each CALL or function reference is included in the optional calling tree section of the listing file.

The -..._looplevel switches do not activate inlining or IPA. The -inline or -ipa switches must be specified.

The !*$* [no]inline and !*$* [no]ipa directives, when enabled, are not affected by the -looplevel restrictions.

5.5.10 -inline_manual, -inm, (off), -ipa_manual, -ipam, (off)

These switches cause KAP to recognize the !*$* [no]inline and !*$* [no]ipa directives. This allows manual control over which subroutines and functions are inlined/analyzed at specific call sites.

The default is to ignore these directives. They are enabled when any inlining (IPA) switch is given on the command line. When -inline_manual (-ipa_manual) is included on the command line, the !*$* inline (!*$* ipa) directives are enabled without enabling the automatic inlining algorithms. Because !*$* [no]inline and !*$* [no]ipa override the -inline=, -ipa=, -inline_depth, and -looplevel command switches, you can use them along with command-line control to select routines or call sites that the regular selection algorithm would reject or to prevent specific routines or CALL sites from being inlined/analyzed.

See Chapter 6 and Chapter 8 for more information about the !*$* inline and !*$* ipa directives.

5.5.11 -inline_optimize, (-inline_optimize=0), -ipa_optimize, (-ipa_optimize=0)

The switches -inline_optimize and -ipa_optimize help you to optimize large programs by causing KAP to set other switches depending on the value you replace for <integer>. The values and meanings for <integer> are as follows:

5.6 Advanced Optimization Control for kapf90

The following sections describe command switches that the advanced user may want to use for maximum performance.

Some of these switches (-aggressive, -cacheline, -cachesize, -dpregisters, -fpregisters, -setassociativity) set parameters that KAP uses to optimize memory usage. Knowing how much data can be kept in fast memory (cache or arithmetic registers) and the costs of moving data in the memory hierarchy, enable better optimization of memory reference patterns. The -scalaropt=3 and -roundoff=3 switches are required for memory management to be enabled.

5.6.1 -aggressive, -ag, (-noaggressive), -nag

The -aggressive switch takes a list of options as follows:

To explicitly disable these options, specify /noaggressive.

See also the -natural, -cacheline, -cachesize, and -setassociativity command-line switches.

5.6.2 -arclimit, -arclm, (-arclimit=5000)

The -arclimit switch sets the size of the dependence arc data structure that KAP uses to perform data dependence analysis (see Appendix B).

This data structure is dynamically allocated on a loop-nest-by-loop-nest basis. By default, this data structure is allocated with a size = max (# of statements * 4, -arclimit value). If a loop contains too many dependence relationships and cannot be represented in the dependence data structure, KAP will give up optimization of the loop. Loops that exceed this threshold are marked in the Loop Table (-listoptions=l) in the listing file. (See Chapter 10.)

You can use the -arclimit switch to increase the size of the data structure to enable KAP to perform more optimizations. Reducing the -arclimit value will (slightly) reduce the size of the KAP executable, while reducing the complexity of loops that KAP can analyze. (Most users will not need to change this value.)

The maximum value is 5000. If a larger value is specified, and the "KAP switches" (-listoptions=k) section is enabled, the entry for arclimit is -arclimit override=5000. The value will be set to 5000.

The dependence arc data structure size can also be modified with the !*$* arclimit <integer> directive.

5.6.3 -cacheline, -chl, (-cacheline=32,32)

The -cacheline switch informs KAP of the width of the memory channel in bytes between cache and main memory.

The -cacheline switch can take a second argument, for example, -cacheline=16,64.

When two arguments are specified, the first argument gives the width of the memory channel between the primary cache and the secondary cache and the second argument gives the width of the memory channel between the secondary cache and main memory. Omitting the second argument, or specifying it as 32 (the default), tells KAP to not optimize secondary cache usage.

5.6.4 -cache_prefetch_line_count, -cplc, (-cplc=0)

The -cache_prefetch_line_count switch gives the number of additional lines prefetched into the cache during a cache miss.

5.6.5 -cachesize, -chs, (-cachesize=8,0)

The -cachesize switch informs KAP of the size in kilobytes of the cache memory.

The -cachesize switch can take a second argument, for example, -cachesize=8,128. When two arguments are specified, the first argument gives the size of the primary cache and the second argument gives the size of the secondary cache. Omitting the second argument, or specifying it as 0 (the default), tells KAP to not optimize secondary cache usage.

When -tune=ev6, the default values for -chs are 32,0.

5.6.6 -dpregisters, -dpr, (-dpregisters=32)

The -dpregisters switch specifies the number of DOUBLE PRECISION registers each processor has.

5.6.7 -each_invariant_if_growth, -eiifg, (-eiifg=20)

When a loop contains an IF statement whose condition does not change from one iteration to another (loop-invariant), the same test must be repeated for every iteration. The code can often be made more efficient by floating the IF outside the loop and putting the THEN and ELSE sections into their own loops.

This gets more complicated when there is other code in the loop, because a copy of it must be included in both the THEN and ELSE loops, for example:


 DO I = ... 
   section-1 
 
   IF ( ) THEN 
     section-2 
 
   ELSE 
     section-3 
 
   ENDIF 
     section-4 
 ENDDO 

Becomes:


    IF  ( ) THEN 
      DO I = ... 
 
      section-1 
      section-2 
      section-4 
 
      ENDDO 
      ELSE 
        DO I = ... 
 
        section-1 
        section-3 
        section-4 
        ENDDO 
    ENDIF 

When sections 1 and 4 are large, the extra code generated can slow a program down through cache contention, extra paging, and so on, more than the reduced number of IF tests speed it up. The -each_invariant_if_growth switch provides a maximum size (in number of lines of executable code) of sections 1 and 4 which KAP will try to float an invariant IF outside a loop.

This can be controlled on a loop-by-loop basis with the
!*$* each_invariant_if_growth (<integer>) directive (see Chapter 6). The total amount of additional code generated in a program unit through invariant-IF floating can be limited with the -max_invariant_if_growth switch.


Previous Next Contents Index