Compaq KAP Fortran/OpenMP
for Tru64 UNIX
User Guide


Previous Contents Index

8.1.3 Naming Specific Routines

The following specifies names of particular routines to inline:


-inline[=name[,name...]]    [-inl=] 
-ipa[=name[,name...]]       [-ipa=] 

The default is all routines in the universe specified by any -inline_from... (-ipa_from...) switches, subject to the -inline_looplevel (-ipa_looplevel) setting.

Inlining and IPA are off by default, that is, if you do not specify inlining (IPA) switches and no inlining (IPA) directives are found in the source code, no inlining (IPA) will take place.

If you omit -inline (-ipa) from the command line, automatic selection of routines to inline is disabled. You can perform manual selection of routines to inline (analyze) with the -inline_manual (-ipa_manual) switches and the inline and IPA directives.

If you specify -inline (-ipa) on the command line without a list of routine names, then all routines in the inlining (IPA) universe are eligible, subject to the -inline_looplevel (-ipa_looplevel) and -inline_depth values.

If you specify -inline (-ipa) on the command line with a list of routine names, then only the routines that are included in the list are eligible, subject to the -inline_looplevel (-ipa_looplevel) and -inline_depth values. The list items can be separated by commas or colons.

The following switches have no versions, but they must have arguments as shown:


-noinline=name[,name..]   [-ninl=] 
-noipa=name[,name..]      [-nipa=] 

These switches enable the automatic inlining (IPA) algorithms in the same way that inline (IPA) does when given without arguments, but the routines listed are ones to NOT be inlined (analyzed). That is, all the subroutines and functions but the named ones are eligible.

A list of function names is required.

You cannot specify both -inline and -noinline (-ipa and -noipa) on the same command line.

If all call sites of a subroutine or function are to be inlined, the following variant of the -inline switch may be of interest:


-inline_and_copy[=name[,name..]]      [-inlc=] 

The -inline_and_copy command switch functions like the -inline switch, except that if all references in the source file to a function are inlined, the text of the function is copied to the transformed code file unchanged. This is intended for use when the functions being inlined are in the same file as the function reference, and has no special effect when the routines being inlined are being taken from a library or another source file.

8.1.4 DO Loop Level

The switches, -inline_looplevel=<n> [-inll] and -ipa_looplevel=<n> [-ipall], set a minimum DO loop nest level for CALL/function reference expansion. The -inline_looplevel and -ipa_looplevel switches enable you to limit inlining and IPA to just routines that are referenced in nested loops, where the reduced call overhead or enhanced optimization will be multiplied.

The argument is defined from the most deeply nested leaf of the call tree. The default, 10, allows inlining (IPA) for the 10 deepest nest levels, for example:


 PROGRAM MAIN 
  .. 
 
   CALL A ---> SUBROUTINE A 
 
 .. 
 
    DO 
     DO 
       CALL B --> SUBROUTINE B 
     ENDDO  DO 
 
     ENDDO  DO 
        CALL C ---> SUBROUTINE C 
 
     ENDDO 
    ENDDO 

The CALL B is inside a doubly nested loop, and would be more profitable to expand than the CALL A. The CALL C is quadruply nested, so inlining C would yield the biggest gain of the three.

The argument is defined from the most deeply nested CALL or function reference:

8.1.5 Recursive Inlining

The -inline_depth switch (-inline_depth=<n> [-ind]) sets the maximum level of subprogram nesting (CALLs in routines that are CALLed) that KAP will attempt to inline. Higher values cause KAP to trace CALLs and function references further. The values and their meanings are as follows:

Recursive inlining can be quite expensive in compilation time. You must exercise discretion in its use.

8.1.6 Manual Control

These switches (-inline_manual [-inm] and -ipa_manual [-ipam]) cause KAP to recognize the KAP !*$* [no]inline and !*$* [no]ipa directives. This allows manual control over which routines are inlined/analyzed at which call sites.

The default is to ignore these directives. They are enabled when any inlining/IPA command switch is given on the command line. When the
-inline_manual (-ipa_manual) is included on the command line, the directives are enabled without enabling the automatic inlining (IPA) algorithms. Because !*$* [no]inline and !*$* [no]ipa are not restricted by the -inline= -ipa=, -..._looplevel, and -inline_depth command switches, they can be used either with or without command-line controlled inlining.

8.2 Inlining and IPA Directives

This section describes the following KAP directives:


!*$* [no]inline  [here|routine|global]  [(name[,name..])] 
!*$* [no]ipa     [here|routine|global]  [(name[,name..])] 

The !*$* inline and !*$*ipa directives tell KAP to inline or IPA the named routines. The !*$* noinline and !*$* noipa directives tell KAP not to inline or analyze the named routines. These directives combine next-line, entire routine, and global (entire program) scope. If none of the optional elements are included, all routines referenced on the next line of code that are in the inlining/analyzing universe are inlined on that line.

These directives are disabled by default. They are enabled when any inlining or IPA switch, respectively, is given on the command line. They can be enabled without activating the automatic inlining/IPA selection algorithms with the -inline_manual and -ipa_manual command switches. They are not restricted by the other inlining and IPA command switches, and can be used instead of, or in addition to, command-line controlled inlining.

The optional names are routine names. If any routines are named in the directive, it applies only to them. If NO routine names are given, the directive applies to ALL routines. The parentheses around the routine names are not required if the list of routine names is empty.

If a !*$* inline or !*$* ipa names a routine not in the corresponding universe, a Warning message is issued, and the directive is ignored.

8.3 Listing File Support

The following section describes the metric used for the optional calling tree.

8.3.1 -Listoptions=c

The optional calling tree includes the loop nest depth level of each CALL/function reference. The metric used is the convention of the -inline_looplevel and -ipa_looplevel switches --- the farthest-out leaf is 1, and higher values trace back to the main program.

8.4 Inlining/IPA Examples

The following code examples demonstrate a few of the possibilities for using the features described in this chapter. Because KAP undergoes constant enhancement, the code that your version of KAP produces may not be identical to that of these examples. The temporary variable names, in particular, can change without significantly altering the transformed code.

Unless otherwise noted, the following examples were run with -optimize=0 -scalaropt=0 to show the inlining more clearly. If nonzero values are specified, the routines are first inlined or analyzed, and then the regular serial transformations are applied.

8.4.1 Inlining Example --- Same Source File

The following example demonstrates inlining both with -inline=setup, meaning only the subroutine setup will be inlined, and with -inline, meaning both subroutines are inlined. The KAP output includes optimized versions of both routines, in addition to the expanded main program. Setting -inline_looplevel>2 is required, because one CALL is in a loop and one is not.


Source file: 
   PROGRAM TSTEXP 
   REAL A(200,200),B(200,200),C(200,200) 
   CALL SETUP(B,200) 
   CALL SETUP(C,200) 
   DO 100  N = 25,200,25 
   CALL MXMR(N,A,B,C) 
   WRITE(*,900) N,A(7,13) 
100 CONTINUE 
900 FORMAT(1X,'For N=',I5,',  A(7,13):',F12.4) 
   END 
 
   SUBROUTINE SETUP(E,N) 
   REAL E(N,N) 
   DO 10 I=1,N 
      DO 10 J=1,N 
        E(I,J) = MOD( I + 7*J, 10) /10.0 
    10 CONTINUE 
       RETURN 
       END 
 
   SUBROUTINE MXMR(N,A,B,C) 
   REAL A(200,200),B(200,200),C(200,200) 
   DO 1000 I=1,N 
     DO 1000 J=1,N 
         A(I,J) = 0.0 
           DO 1000 K=1,N 
             A(I,J)=A(I,J)+B(I,K)*C(K,J) 
   1000 CONTINUE 
        RETURN 
        END 

The -inline=setup switch generates the following main program:


PROGRAM TSTEXP 
  REAL A(200,200),B(200,200),C(200,200) 
  INTEGER II1, II2, II3, II4 
  DO 2 II1=1,200 
  DO 2 II2=1,200 
  B(II1,II2) = MOD (II1 + 7 * II2, 10) / 10.0 
2 CONTINUE 
  DO 3 II3=1,200 
  DO 3 II4=1,200 
  C(II3,II4) = MOD (II3 + 7 * II4, 10) / 10.0 
3 CONTINUE 
 DO 100 N=25,200,25 
   CALL MXMR(N,A,B,C) 
   WRITE(*,900) N,A(7,13) 
100 CONTINUE 
900 FORMAT(1X,'For N=',I5,',  A(7,13):',F12.4) 
  END 

The -inline switch generates the following output:


PROGRAM TSTEXP 
  REAL A(200,200),B(200,200),C(200,200) 
  INTEGER II1, II2, II3, II4, II5, II6, II7 
  DO 3 II4=1,200 
  DO 3 II5=1,200 
  B(II4,II5) = MOD (II4 + 7 * II5, 10) / 10.0 
3 CONTINUE 
  DO 4 II6=1,200 
  DO 4 II7=1,200 
  C(II6,II7) = MOD (II6 + 7 * II7, 10) / 10.0 
4 CONTINUE 
  DO 100 N=25,200,25 
  DO 2 II1=1,N 
  DO 2 II2=1,N 
  A(II1,II2) = 0.0 
  DO 2 II3=1,N 
  A(II1,II2) = A(II1,II2) + B(II1,II3) * C(II3,II2) 
2 CONTINUE 
  WRITE (*, 900) N, A(7,13) 
100 CONTINUE 
900 FORMAT(1X,'For N=',I5,',  A(7,13):',F12.4) 
  END 

8.4.2 Inlining Example with a Library

The following example demonstrates the creation of a library and inlining routines from it; a two-step process:

Step 1: Create the library.


SUBROUTINE  MKCOEF (COEF,N) 
  REAL COEF(N) 
  DO 99 I = 1,N 
  COEF(I) = 1.0/I 
99   CONTINUE 
  RETURN 
  END 
 
  REAL FUNCTION YVAL (X, COEF, N) 
  REAL COEF(N), X, SUM 
  SUM = 0.0 
  DO 99 I=1,N 
  SUM = SUM + COEF(I) * SIN(I*X) 
99   CONTINUE 
  YVAL = SUM 
  RETURN 
  END 

If the file subfil.f contains the previous two routines, then executing the KAP command, KAP -inline_create subfil.f will create a library file subfil.klib with the two routines, and a listing file subfil.out that contains only a list of routines and whether or not each was saved in the library:


subroutine MKCOEF -- saved 
function YVAL -- saved 

Step 2: Inline the routines into a calling program.


  PROGRAM  LIBCR 
  PARAMETER (NC = 15) 
  PARAMETER (PI = 3.141593) 
  REAL COEF(NC), YVAL, Y(2000) 
 
  CALL MKCOEF (COEF, NC) 
  DO 900 I = 1,2000 
  Y(I) = YVAL( I*0.001*PI, COEF, NC) 
900  CONTINUE 
  J=1 
  DO 910 I=1,2000,10 
  PRINT *, (Y(J),J=I,I+9) 
910  CONTINUE 
END 

If the file sqwv.f contains the main program LIBCR, then running the command KAP -inline-infl=subfil.klib -inll=2 -o=0 -r=0 -so=0 sqwv.f will put the following into the file sqwv.cmp:


  PROGRAM LIBCR 
  PARAMETER (NC = 15) 
  PARAMETER (PI = 3.141593) 
  REAL COEF(NC), YVAL, Y(2000) 
  SAVE J 
  REAL RR1, RR2, RR3 
  INTEGER II1, II2 
  DO 3 II2=1,NC 
  COEF(II2) = 1.0 / II2 
3 CONTINUE 
 
  DO 900 I=1,2000 
  RR1 = I * 0.001 * PI 
  RR2 = 0.0 
  DO 2 II1=1,NC 
  RR2 = RR2 + COEF(II1) * SIN (II1 * RR1) 
  2 CONTINUE 
  RR3 = RR2 
  Y(I) = RR3 
900  CONTINUE 
 
  J=1 
  DO 910 I=1,2000,10 
    PRINT *, (Y(J),J=I,I+9) 
910  CONTINUE 
  END 

In the previous example, all other optimizations were turned off to show the expansion more clearly. If you specify nonzero values for the -optimize, -scalaropt, and -roundoff switches, KAP first inlines the routines, then performs the optimizations in the usual manner.

8.4.3 IPA Example

In the following example, the variable N always has the same value, so the same IF branch will always be taken. This information is hidden behind a subroutine call, however, so KAP normally will not try to perform dead-code elimination to simplify the block IF in the first routine. When the -ipa=setn command switch is specified, KAP will inspect the named subroutine for information on the relationship of its arguments and returned value and the surrounding code. Once the CALLed routine is examined, KAP global forward substitution and dead-code elimination transformations (see Chapter 9) can delete the unused code.

If a subroutine or function cannot be inlined, or if you do not want to inline it, it can often still be analyzed for its effects on the calling routine.

The following example was run with the default values for -optimize and -scalaropt:


  CALL SETN ( N ) 
  IF ( N.GT.10 ) THEN 
    X = 1. 
  ELSE 
    X = 2. 
  ENDIF 
  ... 
  SUBROUTINE SETN (N) 
  INTEGER N 
  N = 12 
  RETURN 
  END 

The example becomes the following:


  CALL SETN (N) 
   X = 1. 
  ... 

Just the CALL and the simplified IF block were shown.

8.4.4 Recursive Inlining Examples

The -inline_depth switch sets the maximum level of subprogram nesting that KAP will attempt to inline. Higher values cause KAP to trace CALLs and function references further.

Consider the following simplified example:


  PROGRAM EXDDEM 
  REAL  A,B,C,D,E,F,G 
 
  CALL S1 (A,B) 
  CALL S2 (C,D,E,F) 
  CALL S3 (G) 
 
  PRINT *,A,B,C,D,E,F,G 
  END 
 
  SUBROUTINE S1 (W,X) 
  REAL W,X 
  W=1.0 
  CALL S4(X) 
  RETURN 
  END 


 
  SUBROUTINE S2 (Q,R,S,T) 
  REAL Q,R,S,T 
  Q = 2.0 
  CALL S1 (R,S) 
  CALL S3 (T) 
  RETURN 
  END 
 
  SUBROUTINE S3 (U) 
  REAL U 
  U = 137.0 
  RETURN 
  END 
 
  SUBROUTINE S4 (V) 
  REAL V 
  V = 2.7 
  RETURN 
  END 

When run with -inline and -inline_depth=4, all the subroutines are inlined, including calls to calls to calls, and the main program becomes:


 
 PROGRAM EXDDEM 
 REAL  A,B,C,D,E,F,G 
 EXTERNAL S4 
 
 A = 1.0 
 B = 2.7 
 C = 2.0 
 D = 1.0 
 E = 2.7 
 F = 137.0 
 G = 137.0 
 
 PRINT *, A, B, C, D, E, F, G 
 END 

When run with -inline and -inline_depth=1, meaning inline only one routine deep, all the CALLs in the main program and subroutines are expanded, but CALLs in inlined routines are not. The main program becomes:


 PROGRAM EXDDEM 
 REAL  A,B,C,D,E,F,G 
 EXTERNAL S4 
 
 A = 1.0 
 CALL S4 (B) 
 C = 2.0 
 CALL S1 (D,E) 
 CALL S3 (F) 
 G = 137.0 
 
 PRINT *, A, B, C, D, E, F, G 
 END 

When run with -inline and -inline_depth=-1 (inline only routines that do not contain CALLs or FUNCTION references), the main program becomes:


PROGRAM EXDDEM 
REAL  A,B,C,D,E,F,G 
 
CALL S1 (A,B) 
CALL S2 (C,D,E,F) 
G = 137.0 
 
PRINT *, A, B, C, D, E, F, G 
END 

In this last case, only SUBROUTINEs S3 and S4 could be inlined. Repeated runs with -inline_depth=1 can be used to inline additional levels of routines.

8.4.5 Manual Inlining Example

Manual inlining and IPA allow you greater control over the routines that are inlined/analyzed at CALL sites. They use directives (!*$*inline and !*$*ipa) that are placed into the source code. The directives are normally ignored by KAP, but are enabled when an inlining or IPA switch, respectively, is given on the command line.

The following example is based on the Recursive Inlining example from Section 8.4.4. It was run with default switches, except for -inline_manual.

The directives used have different scopes. Routine S1 is inlined everywhere it is used, routine S3 is inlined only in the main program, and routine S4 is inlined only in routine S1, which is then inlined elsewhere.

With the default value for -scalaropt, forward substitution places the assigned values into the PRINT statement and dead-code elimination deletes the now-unneeded inlined assignments:


!*$* INLINE GLOBAL (S1) 
!*$* INLINE ROUTINE (S3) 
PROGRAM EXDTST 
REAL  A,B,C,D,E,F,G 
 
CALL S1 (A,B) 
CALL S2 (C,D,E,F) 
CALL S3 (G) 
 
PRINT *,A,B,C,D,E,F,G 
END 


SUBROUTINE S1 (W,X) 
REAL W,X 
W=1.0 
!*$* INLINE 
CALL S4(X) 
RETURN 
END 
 
SUBROUTINE S2 (Q,R,S,T) 
REAL Q,R,S,T 
Q = 2.0 
CALL S1 (R,S) 
CALL S3 (T) 
RETURN 
END 
 


SUBROUTINE S3 (U) 
REAL U 
U = 137.0 
RETURN 
END 
 
SUBROUTINE S4 (V) 
REAL V 
V = 2.7 
RETURN 
END 

Becomes:


C   KAP/F_DEC_OSF/1_AXP 0.0 k093201 911011  o5r3so3  11-Aug-1992 21:23:10 
!*$* INLINE (  ) GLOBAL  S1 
!*$* INLINE (  ) ROUTINE  S3 
PROGRAM EXDTST 
REAL A, B, C, D, E, F, G 
SAVE G, F, E, D, C, B, A 
EXTERNAL S4 
CALL S2 (C,D,E,F) 
 
PRINT *, 1., 2.7, C, D, E, F, 137. 
END 


C KAP/F_DEC_OSF/1_AXP 0.0 k093201 911011  o5r3so3  11-Aug-1992 21:23:10 
 
SUBROUTINE S1 (W, X) 
REAL W, X 
W = 1. 
!*$* INLINE (  ) 
X = 2.7 
RETURN 
END 
 
C KAP/F_DEC_OSF/1_AXP 0.0 k093201 911011  o5r3so3  11-Aug-1992 21:23:10 
 
SUBROUTINE S2 (Q, R, S, T) 
REAL Q, R, S, T 
EXTERNAL S4 
Q = 2. 
R = 1. 
S = 2.7 
CALL S3 (T) 
RETURN 
END 

Subroutines S3 and S4 are unchanged.

8.4.6 Notes on Inlining and IPA

Routines to be inlined must pass all the criteria (-inline=, -inline_looplevel, -inline_depth) to be inlined.

The !*$* [no]inline and !*$* [no]ipa directives, when enabled, override the inlining/IPA command switches.

A !*$* inline global directive without a name list tells KAP to inline every routine it can, regardless of the -inline, -inline_depth, and -inline_looplevel settings. A !*$* noinline global directive tells KAP not to inline anything, regardless of the -inline, -inline_depth, and -inline_looplevel settings. The !*$* inline and !*$* ipa directives are disabled by default; they are enabled when any inlining or IPA command switch is specified.

When a library is specified with -inline_from_libraries, routines may be taken from that library for inlining into the source code. No attempt is made to inline routines from the source file into routines from the library. For example, if the main program calls routine BB, which is in the library, and BB calls routine DD, which is in the source file, then BB can be inlined into the main program, but KAP will not attempt to inline DD into the text from library routine BB.

A library created with -inline_create will work for inlining or IPA, because it is just partially reduced source code. However, a library created with -ipa_create may not appear in an -inline_from_libraries= list. Attempting to do so is flagged with a Warning message.

Inlining and IPA are slow, memory-intensive activities. Specifying large values for -inline_looplevel and -inline_depth (inline all available routines everywhere they are used) for a large set of inlinable routines for a large source file can absorb significant system resources. For most programs, specifying small values for -inline_looplevel and -inline_depth and/or a small number of routines with -inline= can provide most of the benefits of inlining. The same applies for the IPA switches.


Previous Next Contents Index