Previous | Contents | Index |
The -onetrip switch allows you to specify "one-trip" DO loops. Many pre-Fortran 77 compilers implemented DO loops that would always have at least one iteration, even if the initial value of the loop control variable were higher than the final value. This switch informs KAP that the program being processed contains loops that need the one-trip feature.
Executing kf90 and explicitly calling the compiler switch
-nof77 will cause KAP to be called with the -onetrip
command switch.
5.4.14 -real, -rl, (-real=4)
The -real switch tells KAP what the Compaq Fortran compiler
default size for REAL variables is in bytes, N, where REAL*N can be 4
or 8. To change the default size of REAL variables, for example, from 4
to 8, first, set the Compaq Fortran compiler switch -r=8.
Next, tell KAP the new size with the -real=8 switch.
5.4.15 -save, -sv, (-save=manual_adjust)
The -save switch tells KAP whether to perform live variable analysis to determine if the value of a local scalar variable in a subroutine or function needs to be saved between invocations of the routine being processed. SAVE statements will be generated for any variables requiring them. KAP will not delete or ignore a SAVE statement coded by the user.
Saving local variables may be required for correct execution of the program, but can restrict KAP optimizations.
With -save=manual, KAP assumes you have inserted the necessary SAVE statements into the code and performs no corresponding analysis of its own. The user-written SAVE statements are assumed to be correct and sufficient. This combination is not affected by the -[no]recursion switch.
The effect of -save=manual_adjust depends on the [no]recursion setting:
The effect of -save=all_adjust depends on the [no]recursion setting:
With -recursion, this is the same as -save=all_adjust:
The -scan switch allows you to set the length of the Fortran
90 input lines. KAP will ignore (treat as a comment) characters on
columns beyond the value of the -scan switch. The value must
be either 72, 120, or 132.
5.4.17 -syntax, -sy, (off)
The -syntax switch directs KAP to check for compliance with certain syntactic rules. Using a dialect switch can prevent a construct being translated differently than expected by a user who is familiar with a different implementation of Fortran. The default is to accept the superset of the ANSI Fortran 77 standard defined by Compaq Fortran, that includes many common Fortran 77 extensions. See your Fortran language reference manual for differences in the dialects.
The -syntax switch has settings as follows:
The -type switch causes KAP to issue warning messages for
variables not explicitly typed. The -notype default suppresses
this checking.
5.5 Inlining and Interprocedural Analysis Switches for kapf90
The following sections explain the function of each switch used in subprogram inlining and Interprocedural Analysis (IPA).
Inlining is the process of replacing a subroutine CALL or function reference with the text of the subroutine or function. IPA is the process of inspecting a called routine to identify relationships between the arguments, the returned value, and the code surrounding the call to identify opportunities for optimization.
Inlining and IPA can be performed in the same KAP run. The only restriction is that the same routine cannot be in global lists for both inlining and IPA. You can use the !*$* inline and !*$* ipa directives to inline a subroutine or function in one place and interprocedurally analyze it in another. (See Chapter 6 and Chapter 8 for information about these directives.)
For additional information about these switches and examples of their
use, see Chapter 8.
5.5.1 -inline, -inl, (off) -noinline, -ninl, -ipa, -ipa, (off), -noipa, -nipa
The -inline switch provides KAP with a list of routines to inline. The -ipa switch provides KAP with a list of routines to analyze. Additionally, -ipa causes KAP to give information in the annotated listing about appropriate settings for the -ind, -inll, and -ipall switches on a loop-by-loop basis.
If you specify either the -inline or the -ipa switch without an argument list, KAP will try to inline/analyze all the called subroutines and functions in the inlining (or IPA) universe specified by the -inline_from... (-ipa_from...) switches, subject to restrictions imposed by the -inline_depth and -inline_looplevel (-ipa_looplevel) switches.
To permit KAP to inline routines that contain static SAVE or DATA variables use the -aggressive=c switch with -inline. The -aggressive=c switch promotes the static variables to members of a COMMON that is introduced into the program. See Section 5.6.1 for more information.
If you include a list of names, for example: -inline=mkcoef,yval, then just the routines named will be inlined or analyzed.
A list of routines must be included with -noinline or -noipa. All routines in the inlining/IPA universe are candidates for inlining except the listed ones.
The -[no]inline and -[no]ipa command switches can be
overridden by the !*$* [no]inline and !*$* [no]ipa
directives. (See Chapter 6 and Chapter 8 for more information
about these directives.)
5.5.2 -inline_and_copy, -inlc, (off)
The -inline_and_copy command switch functions like the -inline switch, except that if all CALLs or references to a subprogram are inlined, the text of the routine is not optimized, but is copied unchanged to the transformed code file. This is intended for use when inlining routines from the same file as the call, and has no special effect when the routines being inlined are being taken from a library or another source file.
When a subprogram has been inlined everywhere it is used, leaving it unoptimized saves compilation time. When a program involves multiple source files, the unoptimized routine will still be available in case one of the other source files contains a reference to it, so no errors will result.
The -inline_and_copy algorithm assumes that all CALLs and references to the routine precede it in the source file. If the routine is referenced after the text of the routine, and that particular call site cannot be inlined, the unoptimized version of the routine will be invoked. |
These switches cause KAP to build a library file containing partially analyzed routines for later inlining. The library created is used with the -inline_from_libraries (-ipa_from_libraries) switch.
Libraries created with -inline_create can be used with either inlining or IPA, because they contain essentially complete descriptions of the subroutines and functions included. Libraries created with -ipa_create can be used only with IPA, because they do not have the complete text of the routines, just the data relationship information.
You can use any name for the created library. However, for maximum
compatibility with the -inline_from_libraries and
-ipa_from_libraries switches, Compaq recommends that you use
the .klib extension.
5.5.4 -inline_depth, -ind, (-inline_depth=2), -ipa_depth, -ipad, (-ipa_depth=2)
The -inline_depth and -ipa_depth switches set the maximum level of subprogram nesting that KAP will attempt to inline. Higher values instruct KAP to trace CALLs and function references further. The values and their meanings are as follows:
Chapter 8 has examples of recursive inlining with different values of -inline_depth.
The !*$* [no]inline and !*$* [no]ipa directives, when
enabled, are not affected by the -inline_depth or
-ipa_depth restrictions.
5.5.5 -inline_from_files, -inff, (current source file)
See Section 5.5.8.
5.5.6 -inline_from_libraries, -infl, (off)
See Section 5.5.8.
5.5.7 -ipa_from_files, -ipaff, (current source file)
See Section 5.5.8.
5.5.8 -ipa_from_libraries, -ipafl, (off)
The -..._from_... switches provide KAP with the locations of subroutines and functions available for inlining/IPA. The total set of available routines is called the inlining (or IPA) universe.
The -..._from_files switches take the names of source files and directories containing source files. Including a directory, for example, -ipaff=/work is equivalent to the notation /work*.f90.
The -..._from_libraries switches take the names of libraries created with the -..._create switches and directories containing such libraries. In directories, the KAP libraries are identified by the .klib extension.
Multiple files/libraries or directories can be given in one -..._from_... switch, separated by commas or colons. Multiple -..._from_... switches can be specified on the command line. KAP searches for subroutines and functions in the provided files and libraries in the order in which they appear on the command line.
The -..._from_... switches do not activate inlining or IPA.
The -inline or -ipa switches must be specified.
5.5.9 -inline_looplevel, -inll, (-inline_looplevel=2), -ipa_looplevel, -ipall, (-ipa_looplevel=2)
The -..._looplevel switches enable you to limit inlining to just routines that are referenced in nested loops, where the effects of reduced call overhead or enhanced optimizations will be multiplied.
The parameter is defined from the most deeply nested subprogram reference. The -inll=1 switch restricts inlining to subroutines and functions referenced in the deepest loop nest. The -inll=3 switch restricts inlining to those routines referenced at the three deepest levels. The DO loop nest level of each CALL or function reference is included in the optional calling tree section of the listing file.
The -..._looplevel switches do not activate inlining or IPA. The -inline or -ipa switches must be specified.
The !*$* [no]inline and !*$* [no]ipa directives, when
enabled, are not affected by the -looplevel restrictions.
5.5.10 -inline_manual, -inm, (off), -ipa_manual, -ipam, (off)
These switches cause KAP to recognize the !*$* [no]inline and !*$* [no]ipa directives. This allows manual control over which subroutines and functions are inlined/analyzed at specific call sites.
The default is to ignore these directives. They are enabled when any inlining (IPA) switch is given on the command line. When -inline_manual (-ipa_manual) is included on the command line, the !*$* inline (!*$* ipa) directives are enabled without enabling the automatic inlining algorithms. Because !*$* [no]inline and !*$* [no]ipa override the -inline=, -ipa=, -inline_depth, and -looplevel command switches, you can use them along with command-line control to select routines or call sites that the regular selection algorithm would reject or to prevent specific routines or CALL sites from being inlined/analyzed.
See Chapter 6 and Chapter 8 for more information about the
!*$* inline and !*$* ipa directives.
5.5.11 -inline_optimize, (-inline_optimize=0), -ipa_optimize, (-ipa_optimize=0)
The switches -inline_optimize and -ipa_optimize help you to optimize large programs by causing KAP to set other switches depending on the value you replace for <integer>. The values and meanings for <integer> are as follows:
The following sections describe command switches that the advanced user may want to use for maximum performance.
Some of these switches (-aggressive, -cacheline,
-cachesize, -dpregisters, -fpregisters,
-setassociativity) set parameters that KAP uses to optimize memory
usage. Knowing how much data can be kept in fast memory (cache or
arithmetic registers) and the costs of moving data in the memory
hierarchy, enable better optimization of memory reference patterns. The
-scalaropt=3 and -roundoff=3 switches are required
for memory management to be enabled.
5.6.1 -aggressive, -ag, (-noaggressive), -nag
The -aggressive switch takes a list of options as follows:
To explicitly disable these options, specify /noaggressive.
See also the -natural, -cacheline,
-cachesize, and -setassociativity command-line
switches.
5.6.2 -arclimit, -arclm, (-arclimit=5000)
The -arclimit switch sets the size of the dependence arc data structure that KAP uses to perform data dependence analysis (see Appendix B).
This data structure is dynamically allocated on a loop-nest-by-loop-nest basis. By default, this data structure is allocated with a size = max (# of statements * 4, -arclimit value). If a loop contains too many dependence relationships and cannot be represented in the dependence data structure, KAP will give up optimization of the loop. Loops that exceed this threshold are marked in the Loop Table (-listoptions=l) in the listing file. (See Chapter 10.)
You can use the -arclimit switch to increase the size of the data structure to enable KAP to perform more optimizations. Reducing the -arclimit value will (slightly) reduce the size of the KAP executable, while reducing the complexity of loops that KAP can analyze. (Most users will not need to change this value.)
The maximum value is 5000. If a larger value is specified, and the "KAP switches" (-listoptions=k) section is enabled, the entry for arclimit is -arclimit override=5000. The value will be set to 5000.
The dependence arc data structure size can also be modified with the
!*$* arclimit <integer> directive.
5.6.3 -cacheline, -chl, (-cacheline=32,32)
The -cacheline switch informs KAP of the width of the memory channel in bytes between cache and main memory.
The -cacheline switch can take a second argument, for example, -cacheline=16,64.
When two arguments are specified, the first argument gives the width of
the memory channel between the primary cache and the secondary cache
and the second argument gives the width of the memory channel between
the secondary cache and main memory. Omitting the second argument, or
specifying it as 32 (the default), tells KAP to not optimize secondary
cache usage.
5.6.4 -cache_prefetch_line_count, -cplc, (-cplc=0)
The -cache_prefetch_line_count switch gives the number of
additional lines prefetched into the cache during a cache miss.
5.6.5 -cachesize, -chs, (-cachesize=8,0)
The -cachesize switch informs KAP of the size in kilobytes of the cache memory.
The -cachesize switch can take a second argument, for example, -cachesize=8,128. When two arguments are specified, the first argument gives the size of the primary cache and the second argument gives the size of the secondary cache. Omitting the second argument, or specifying it as 0 (the default), tells KAP to not optimize secondary cache usage.
When -tune=ev6, the default values for -chs are 32,0.
5.6.6 -dpregisters, -dpr, (-dpregisters=32)
The -dpregisters switch specifies the number of DOUBLE
PRECISION registers each processor has.
5.6.7 -each_invariant_if_growth, -eiifg, (-eiifg=20)
When a loop contains an IF statement whose condition does not change from one iteration to another (loop-invariant), the same test must be repeated for every iteration. The code can often be made more efficient by floating the IF outside the loop and putting the THEN and ELSE sections into their own loops.
This gets more complicated when there is other code in the loop, because a copy of it must be included in both the THEN and ELSE loops, for example:
DO I = ... section-1 IF ( ) THEN section-2 ELSE section-3 ENDIF section-4 ENDDO |
Becomes:
IF ( ) THEN DO I = ... section-1 section-2 section-4 ENDDO ELSE DO I = ... section-1 section-3 section-4 ENDDO ENDIF |
When sections 1 and 4 are large, the extra code generated can slow a program down through cache contention, extra paging, and so on, more than the reduced number of IF tests speed it up. The -each_invariant_if_growth switch provides a maximum size (in number of lines of executable code) of sections 1 and 4 which KAP will try to float an invariant IF outside a loop.
This can be controlled on a loop-by-loop basis with the
!*$* each_invariant_if_growth (<integer>) directive (see
Chapter 6). The total amount of additional code generated in a
program unit through invariant-IF floating can be limited with the
-max_invariant_if_growth switch.
Previous | Next | Contents | Index |