Previous | Contents | Index |
The following specifies names of particular routines to inline:
-inline[=name[,name...]] [-inl=] -ipa[=name[,name...]] [-ipa=] |
The default is all routines in the universe specified by any -inline_from... (-ipa_from...) switches, subject to the -inline_looplevel (-ipa_looplevel) setting.
Inlining and IPA are off by default, that is, if you do not specify inlining (IPA) switches and no inlining (IPA) directives are found in the source code, no inlining (IPA) will take place.
If you omit -inline (-ipa) from the command line, automatic selection of routines to inline is disabled. You can perform manual selection of routines to inline (analyze) with the -inline_manual (-ipa_manual) switches and the inline and IPA directives.
If you specify -inline (-ipa) on the command line without a list of routine names, then all routines in the inlining (IPA) universe are eligible, subject to the -inline_looplevel (-ipa_looplevel) and -inline_depth values.
If you specify -inline (-ipa) on the command line with a list of routine names, then only the routines that are included in the list are eligible, subject to the -inline_looplevel (-ipa_looplevel) and -inline_depth values. The list items can be separated by commas or colons.
The following switches have no versions, but they must have arguments as shown:
-noinline=name[,name..] [-ninl=] -noipa=name[,name..] [-nipa=] |
These switches enable the automatic inlining (IPA) algorithms in the same way that inline (IPA) does when given without arguments, but the routines listed are ones to NOT be inlined (analyzed). That is, all the subroutines and functions but the named ones are eligible.
A list of function names is required.
You cannot specify both -inline and -noinline (-ipa and -noipa) on the same command line.
If all call sites of a subroutine or function are to be inlined, the following variant of the -inline switch may be of interest:
-inline_and_copy[=name[,name..]] [-inlc=] |
The -inline_and_copy command switch functions like the
-inline switch, except that if all references in the source
file to a function are inlined, the text of the function is copied to
the transformed code file unchanged. This is intended for use when the
functions being inlined are in the same file as the function reference,
and has no special effect when the routines being inlined are being
taken from a library or another source file.
8.1.4 DO Loop Level
The switches, -inline_looplevel=<n> [-inll] and -ipa_looplevel=<n> [-ipall], set a minimum DO loop nest level for CALL/function reference expansion. The -inline_looplevel and -ipa_looplevel switches enable you to limit inlining and IPA to just routines that are referenced in nested loops, where the reduced call overhead or enhanced optimization will be multiplied.
The argument is defined from the most deeply nested leaf of the call tree. The default, 10, allows inlining (IPA) for the 10 deepest nest levels, for example:
PROGRAM MAIN .. CALL A ---> SUBROUTINE A .. DO DO CALL B --> SUBROUTINE B ENDDO DO ENDDO DO CALL C ---> SUBROUTINE C ENDDO ENDDO |
The CALL B is inside a doubly nested loop, and would be more profitable to expand than the CALL A. The CALL C is quadruply nested, so inlining C would yield the biggest gain of the three.
The argument is defined from the most deeply nested CALL or function reference:
The -inline_depth switch (-inline_depth=<n> [-ind]) sets the maximum level of subprogram nesting (CALLs in routines that are CALLed) that KAP will attempt to inline. Higher values cause KAP to trace CALLs and function references further. The values and their meanings are as follows:
Recursive inlining can be quite expensive in compilation time. You must
exercise discretion in its use.
8.1.6 Manual Control
These switches (-inline_manual [-inm] and -ipa_manual [-ipam]) cause KAP to recognize the KAP !*$* [no]inline and !*$* [no]ipa directives. This allows manual control over which routines are inlined/analyzed at which call sites.
The default is to ignore these directives. They are enabled when any
inlining/IPA command switch is given on the command line. When the
-inline_manual (-ipa_manual) is included on the command line,
the directives are enabled without enabling the automatic inlining
(IPA) algorithms. Because !*$* [no]inline and !*$*
[no]ipa are not restricted by the -inline= -ipa=,
-..._looplevel, and -inline_depth command switches,
they can be used either with or without command-line controlled
inlining.
8.2 Inlining and IPA Directives
This section describes the following KAP directives:
!*$* [no]inline [here|routine|global] [(name[,name..])] !*$* [no]ipa [here|routine|global] [(name[,name..])] |
The !*$* inline and !*$*ipa directives tell KAP to inline or IPA the named routines. The !*$* noinline and !*$* noipa directives tell KAP not to inline or analyze the named routines. These directives combine next-line, entire routine, and global (entire program) scope. If none of the optional elements are included, all routines referenced on the next line of code that are in the inlining/analyzing universe are inlined on that line.
These directives are disabled by default. They are enabled when any inlining or IPA switch, respectively, is given on the command line. They can be enabled without activating the automatic inlining/IPA selection algorithms with the -inline_manual and -ipa_manual command switches. They are not restricted by the other inlining and IPA command switches, and can be used instead of, or in addition to, command-line controlled inlining.
The optional names are routine names. If any routines are named in the directive, it applies only to them. If NO routine names are given, the directive applies to ALL routines. The parentheses around the routine names are not required if the list of routine names is empty.
If a !*$* inline or !*$* ipa names a routine not in
the corresponding universe, a Warning message is issued, and the
directive is ignored.
8.3 Listing File Support
The following section describes the metric used for the optional
calling tree.
8.3.1 -Listoptions=c
The optional calling tree includes the loop nest depth level of each
CALL/function reference. The metric used is the convention of the
-inline_looplevel and -ipa_looplevel switches --- the
farthest-out leaf is 1, and higher values trace back to the main
program.
8.4 Inlining/IPA Examples
The following code examples demonstrate a few of the possibilities for using the features described in this chapter. Because KAP undergoes constant enhancement, the code that your version of KAP produces may not be identical to that of these examples. The temporary variable names, in particular, can change without significantly altering the transformed code.
Unless otherwise noted, the following examples were run with
-optimize=0 -scalaropt=0 to show the inlining more
clearly. If nonzero values are specified, the routines are first
inlined or analyzed, and then the regular serial transformations are
applied.
8.4.1 Inlining Example --- Same Source File
The following example demonstrates inlining both with -inline=setup, meaning only the subroutine setup will be inlined, and with -inline, meaning both subroutines are inlined. The KAP output includes optimized versions of both routines, in addition to the expanded main program. Setting -inline_looplevel>2 is required, because one CALL is in a loop and one is not.
Source file: PROGRAM TSTEXP REAL A(200,200),B(200,200),C(200,200) CALL SETUP(B,200) CALL SETUP(C,200) DO 100 N = 25,200,25 CALL MXMR(N,A,B,C) WRITE(*,900) N,A(7,13) 100 CONTINUE 900 FORMAT(1X,'For N=',I5,', A(7,13):',F12.4) END SUBROUTINE SETUP(E,N) REAL E(N,N) DO 10 I=1,N DO 10 J=1,N E(I,J) = MOD( I + 7*J, 10) /10.0 10 CONTINUE RETURN END SUBROUTINE MXMR(N,A,B,C) REAL A(200,200),B(200,200),C(200,200) DO 1000 I=1,N DO 1000 J=1,N A(I,J) = 0.0 DO 1000 K=1,N A(I,J)=A(I,J)+B(I,K)*C(K,J) 1000 CONTINUE RETURN END |
The -inline=setup switch generates the following main program:
PROGRAM TSTEXP REAL A(200,200),B(200,200),C(200,200) INTEGER II1, II2, II3, II4 DO 2 II1=1,200 DO 2 II2=1,200 B(II1,II2) = MOD (II1 + 7 * II2, 10) / 10.0 2 CONTINUE DO 3 II3=1,200 DO 3 II4=1,200 C(II3,II4) = MOD (II3 + 7 * II4, 10) / 10.0 3 CONTINUE DO 100 N=25,200,25 CALL MXMR(N,A,B,C) WRITE(*,900) N,A(7,13) 100 CONTINUE 900 FORMAT(1X,'For N=',I5,', A(7,13):',F12.4) END |
The -inline switch generates the following output:
PROGRAM TSTEXP REAL A(200,200),B(200,200),C(200,200) INTEGER II1, II2, II3, II4, II5, II6, II7 DO 3 II4=1,200 DO 3 II5=1,200 B(II4,II5) = MOD (II4 + 7 * II5, 10) / 10.0 3 CONTINUE DO 4 II6=1,200 DO 4 II7=1,200 C(II6,II7) = MOD (II6 + 7 * II7, 10) / 10.0 4 CONTINUE DO 100 N=25,200,25 DO 2 II1=1,N DO 2 II2=1,N A(II1,II2) = 0.0 DO 2 II3=1,N A(II1,II2) = A(II1,II2) + B(II1,II3) * C(II3,II2) 2 CONTINUE WRITE (*, 900) N, A(7,13) 100 CONTINUE 900 FORMAT(1X,'For N=',I5,', A(7,13):',F12.4) END |
The following example demonstrates the creation of a library and inlining routines from it; a two-step process:
Step 1: Create the library.
SUBROUTINE MKCOEF (COEF,N) REAL COEF(N) DO 99 I = 1,N COEF(I) = 1.0/I 99 CONTINUE RETURN END REAL FUNCTION YVAL (X, COEF, N) REAL COEF(N), X, SUM SUM = 0.0 DO 99 I=1,N SUM = SUM + COEF(I) * SIN(I*X) 99 CONTINUE YVAL = SUM RETURN END |
If the file subfil.f contains the previous two routines, then executing the KAP command, KAP -inline_create subfil.f will create a library file subfil.klib with the two routines, and a listing file subfil.out that contains only a list of routines and whether or not each was saved in the library:
subroutine MKCOEF -- saved function YVAL -- saved |
Step 2: Inline the routines into a calling program.
PROGRAM LIBCR PARAMETER (NC = 15) PARAMETER (PI = 3.141593) REAL COEF(NC), YVAL, Y(2000) CALL MKCOEF (COEF, NC) DO 900 I = 1,2000 Y(I) = YVAL( I*0.001*PI, COEF, NC) 900 CONTINUE J=1 DO 910 I=1,2000,10 PRINT *, (Y(J),J=I,I+9) 910 CONTINUE END |
If the file sqwv.f contains the main program LIBCR, then running the command KAP -inline-infl=subfil.klib -inll=2 -o=0 -r=0 -so=0 sqwv.f will put the following into the file sqwv.cmp:
PROGRAM LIBCR PARAMETER (NC = 15) PARAMETER (PI = 3.141593) REAL COEF(NC), YVAL, Y(2000) SAVE J REAL RR1, RR2, RR3 INTEGER II1, II2 DO 3 II2=1,NC COEF(II2) = 1.0 / II2 3 CONTINUE DO 900 I=1,2000 RR1 = I * 0.001 * PI RR2 = 0.0 DO 2 II1=1,NC RR2 = RR2 + COEF(II1) * SIN (II1 * RR1) 2 CONTINUE RR3 = RR2 Y(I) = RR3 900 CONTINUE J=1 DO 910 I=1,2000,10 PRINT *, (Y(J),J=I,I+9) 910 CONTINUE END |
In the previous example, all other optimizations were turned off to
show the expansion more clearly. If you specify nonzero values for the
-optimize, -scalaropt, and -roundoff
switches, KAP first inlines the routines, then performs the
optimizations in the usual manner.
8.4.3 IPA Example
In the following example, the variable N always has the same value, so the same IF branch will always be taken. This information is hidden behind a subroutine call, however, so KAP normally will not try to perform dead-code elimination to simplify the block IF in the first routine. When the -ipa=setn command switch is specified, KAP will inspect the named subroutine for information on the relationship of its arguments and returned value and the surrounding code. Once the CALLed routine is examined, KAP global forward substitution and dead-code elimination transformations (see Chapter 9) can delete the unused code.
If a subroutine or function cannot be inlined, or if you do not want to inline it, it can often still be analyzed for its effects on the calling routine.
The following example was run with the default values for -optimize and -scalaropt:
CALL SETN ( N ) IF ( N.GT.10 ) THEN X = 1. ELSE X = 2. ENDIF ... SUBROUTINE SETN (N) INTEGER N N = 12 RETURN END |
The example becomes the following:
CALL SETN (N) X = 1. ... |
Just the CALL and the simplified IF block were shown.
8.4.4 Recursive Inlining Examples
The -inline_depth switch sets the maximum level of subprogram nesting that KAP will attempt to inline. Higher values cause KAP to trace CALLs and function references further.
Consider the following simplified example:
PROGRAM EXDDEM REAL A,B,C,D,E,F,G CALL S1 (A,B) CALL S2 (C,D,E,F) CALL S3 (G) PRINT *,A,B,C,D,E,F,G END SUBROUTINE S1 (W,X) REAL W,X W=1.0 CALL S4(X) RETURN END |
SUBROUTINE S2 (Q,R,S,T) REAL Q,R,S,T Q = 2.0 CALL S1 (R,S) CALL S3 (T) RETURN END SUBROUTINE S3 (U) REAL U U = 137.0 RETURN END SUBROUTINE S4 (V) REAL V V = 2.7 RETURN END |
When run with -inline and -inline_depth=4, all the subroutines are inlined, including calls to calls to calls, and the main program becomes:
PROGRAM EXDDEM REAL A,B,C,D,E,F,G EXTERNAL S4 A = 1.0 B = 2.7 C = 2.0 D = 1.0 E = 2.7 F = 137.0 G = 137.0 PRINT *, A, B, C, D, E, F, G END |
When run with -inline and -inline_depth=1, meaning inline only one routine deep, all the CALLs in the main program and subroutines are expanded, but CALLs in inlined routines are not. The main program becomes:
PROGRAM EXDDEM REAL A,B,C,D,E,F,G EXTERNAL S4 A = 1.0 CALL S4 (B) C = 2.0 CALL S1 (D,E) CALL S3 (F) G = 137.0 PRINT *, A, B, C, D, E, F, G END |
When run with -inline and -inline_depth=-1 (inline only routines that do not contain CALLs or FUNCTION references), the main program becomes:
PROGRAM EXDDEM REAL A,B,C,D,E,F,G CALL S1 (A,B) CALL S2 (C,D,E,F) G = 137.0 PRINT *, A, B, C, D, E, F, G END |
In this last case, only SUBROUTINEs S3 and S4 could
be inlined. Repeated runs with -inline_depth=1 can be used to
inline additional levels of routines.
8.4.5 Manual Inlining Example
Manual inlining and IPA allow you greater control over the routines that are inlined/analyzed at CALL sites. They use directives (!*$*inline and !*$*ipa) that are placed into the source code. The directives are normally ignored by KAP, but are enabled when an inlining or IPA switch, respectively, is given on the command line.
The following example is based on the Recursive Inlining example from Section 8.4.4. It was run with default switches, except for -inline_manual.
The directives used have different scopes. Routine S1 is inlined everywhere it is used, routine S3 is inlined only in the main program, and routine S4 is inlined only in routine S1, which is then inlined elsewhere.
With the default value for -scalaropt, forward substitution places the assigned values into the PRINT statement and dead-code elimination deletes the now-unneeded inlined assignments:
!*$* INLINE GLOBAL (S1) !*$* INLINE ROUTINE (S3) PROGRAM EXDTST REAL A,B,C,D,E,F,G CALL S1 (A,B) CALL S2 (C,D,E,F) CALL S3 (G) PRINT *,A,B,C,D,E,F,G END |
SUBROUTINE S1 (W,X) REAL W,X W=1.0 !*$* INLINE CALL S4(X) RETURN END SUBROUTINE S2 (Q,R,S,T) REAL Q,R,S,T Q = 2.0 CALL S1 (R,S) CALL S3 (T) RETURN END |
SUBROUTINE S3 (U) REAL U U = 137.0 RETURN END SUBROUTINE S4 (V) REAL V V = 2.7 RETURN END |
Becomes:
C KAP/F_DEC_OSF/1_AXP 0.0 k093201 911011 o5r3so3 11-Aug-1992 21:23:10 !*$* INLINE ( ) GLOBAL S1 !*$* INLINE ( ) ROUTINE S3 PROGRAM EXDTST REAL A, B, C, D, E, F, G SAVE G, F, E, D, C, B, A EXTERNAL S4 CALL S2 (C,D,E,F) PRINT *, 1., 2.7, C, D, E, F, 137. END |
C KAP/F_DEC_OSF/1_AXP 0.0 k093201 911011 o5r3so3 11-Aug-1992 21:23:10 SUBROUTINE S1 (W, X) REAL W, X W = 1. !*$* INLINE ( ) X = 2.7 RETURN END C KAP/F_DEC_OSF/1_AXP 0.0 k093201 911011 o5r3so3 11-Aug-1992 21:23:10 SUBROUTINE S2 (Q, R, S, T) REAL Q, R, S, T EXTERNAL S4 Q = 2. R = 1. S = 2.7 CALL S3 (T) RETURN END |
Subroutines S3 and S4 are unchanged.
8.4.6 Notes on Inlining and IPA
Routines to be inlined must pass all the criteria (-inline=, -inline_looplevel, -inline_depth) to be inlined.
The !*$* [no]inline and !*$* [no]ipa directives, when enabled, override the inlining/IPA command switches.
A !*$* inline global directive without a name list tells KAP to inline every routine it can, regardless of the -inline, -inline_depth, and -inline_looplevel settings. A !*$* noinline global directive tells KAP not to inline anything, regardless of the -inline, -inline_depth, and -inline_looplevel settings. The !*$* inline and !*$* ipa directives are disabled by default; they are enabled when any inlining or IPA command switch is specified.
When a library is specified with -inline_from_libraries, routines may be taken from that library for inlining into the source code. No attempt is made to inline routines from the source file into routines from the library. For example, if the main program calls routine BB, which is in the library, and BB calls routine DD, which is in the source file, then BB can be inlined into the main program, but KAP will not attempt to inline DD into the text from library routine BB.
A library created with -inline_create will work for inlining or IPA, because it is just partially reduced source code. However, a library created with -ipa_create may not appear in an -inline_from_libraries= list. Attempting to do so is flagged with a Warning message.
Inlining and IPA are slow, memory-intensive activities. Specifying large values for -inline_looplevel and -inline_depth (inline all available routines everywhere they are used) for a large set of inlinable routines for a large source file can absorb significant system resources. For most programs, specifying small values for -inline_looplevel and -inline_depth and/or a small number of routines with -inline= can provide most of the benefits of inlining. The same applies for the IPA switches.
Previous | Next | Contents | Index |