Compaq KAP C/OpenMP
for Tru64 UNIX
User Guide


Previous Contents Index

5.4 Parallel Processing Directive

The following directive is available in the multiprocessor version of KAP:


#pragma _KAP minconcurrent=<integer> 

Executing a loop in parallel incurs overhead that varies with different systems. If a loop has little work, the overhead required to set up parallel execution may make the loop execute more slowly than it executes serially. The minconcurrent switch sets the level of work in a loop above which KAP should execute the loop in parallel. The higher the minconcurrent value, the more iterations and/or statements the loop body must have to run in parallel.

Specifying #pragma _KAP minconcurrent (0) tells KAP to parallelize all the following loops in the program unit, regardless of the loop bounds and the work in the loop. The range of values for this directive is 0 to 999.


Chapter 6
Inlining and IPA

This chapter presents additional information about the KAP command switches and inline pragmas used to inline functions or to perform Interprocedural Analysis (IPA).

6.1 Inlining

Inlining is the process of replacing a function reference with the text of the function. Inlining eliminates the overhead of the function call, and can assist other optimizations by making relationships between function arguments, returned values, and the surrounding code easier to find.

IPA is the process of inspecting called functions for information on relationships between arguments, returned values, and global data. IPA can provide many of the benefits of inlining, but without replacing the function reference.

The rest of this chapter covers the inlining and IPA command switches and pragmas, related command switches, examples of their use, and information about program constructs that inhibit inlining. Inlining and IPA are almost symmetrical from the command-line standpoint --- there are parallel sets of commands and pragmas for them. The exception is -inline_depth. In many places in this chapter, the term "inlining" applies to both inlining and IPA.

6.2 Inlining and IPA Command Switches

There are two phases to inlining: defining the universe of inlinable functions and selecting which functions in that universe to inline or analyze. The -inline_from... and -ipa_from... switches define the universe of inlinable functions. The -inline/-ipa, -inline_depth, and -..._looplevel switches select which of the available functions are to be inlined/analyzed. The -inline_create and -ipa_create switches set up collections of functions for inclusion in later KAP runs.

All of the inlining and IPA command switches are listed in the following sections. The short forms of their names are in brackets.

6.2.1 -Inline_from and -ipa_from Switches

The following list shows the -inline_from and -ipa_from switches:

Where <list> is one or more of the following: source file name, library file name, directory, separated by commas. The default is the current source file.

You can distinguish types of files by their extensions. The
-inline_from_files=xj.c,yy.c,-./mrtn/ would look for functions in the C source files xj.c and yy.c, and in C source files in the directory -./mrtn. All source files that contain C preprocessor directives must be preprocessed by cc before being inlined.

The -..._libraries versions of these switches take as their arguments lists of function libraries and directories containing such libraries.

KAP recognizes the type of file from its extension, or lack of one, as follows:

Two special abbreviations are defined:

If multiple -inline_from... [-ipa_from...] switches are given, their lists are concatenated to get a bigger universe.

Function name references are resolved by a search in the order that files appear in -inline_from... -ipa_from... switches on the command line. Libraries are searched in their original lexical order. Multiple -inline_from... -ipa_from... lists are searched in the order that they appear on the command line.

6.2.2 Library Creation

Use the following switches to create a preprocessed library. To specify an existing library file to inline from, use -inline_from_libraries= or -ipa_from_libraries=, as follows:


-inline_create=<library name>  [-incr] 
-ipa_create=<library name>     [-ipacr] 

The default source for functions to put into the library is the current source file. If -inline_from... or -ipa_from... is specified, the functions in the listed files are the ones put into the library. This provides a method to combine or expand libraries --- just include the old library(ies) in an -inline_from_libraries or -ipa_from_libraries switch, along with an -inline_from_files or -ipa_from_files switch giving source files containing any new functions.

Functions are included in libraries in the order in which they appear in the input files. This is to make sure that if multiple functions with the same name are in the same source file, the one chosen for inlining will be the one you expect from the algorithm under -inline_from....

A library created with -inline_create will work for inlining or IPA, because it is just partially reduced source code. However, a library created with -ipa_create may not appear in an -inline_from=list. It is flagged with a Warning message.

If no library name is given, the name used is file.klib, where file is the input file name with any trailing .c stripped off.

When creating a library, only one -inline_create (-ipa_create) switch may be given. That is, only one library may be created per KAP run. If the library file existed prior to running KAP, it is overwritten.

When -inline_create (-ipa_create) is specified on the command line, no transformed code file will be generated.

See the description of the -inline_from_libraries and -ipa_from_libraries switches for information about using libraries created with these switches.

If the -inline (-ipa) switch is not given, or is given without a list of function names, the default will be to include all the functions from the inlining universe in the library, if possible. If -inline=<name list> or -ipa=<name list> is specified, only the named functions will be included in the library.

6.2.3 Naming Specific Functions

The following switches specify names of particular functions to inline. The default is all functions in the function universe specified by any -inline_from... (-ipa_from...) switches, subject to the -inline_looplevel -ipa_looplevel and -inline_depth settings.


-inline[=name[,name...]]    [-inl=] 
-ipa[=name[,name...]]       [-ipa=] 

Inlining and IPA are off by default. That is, if you do not specify inlining (IPA) switches, then no inlining (IPA) will take place.

If you omit -inline (-ipa) from the command line, you can still enable automatic selection of functions to inline (analyze) with one of the -..._from_... switches. You can perform manual selection of functions to inline (analyze) with the -inline_manual (-ipa_manual) switches and the inline (IPA) pragmas.

If you specify -inline (-ipa) on the command line without a list of function names, then all functions in the inlining (IPA) universe are eligible, subject to the -inline_looplevel (-ipa_looplevel) value.

If you specify -inline (-ipa) on the command line with a list of function names, then only the functions that are included in the list are eligible, subject to the -inline_looplevel (-ipa_looplevel) and -inline_depth values.

The following switches have no versions, but they must have arguments, as follows:


-noinline=name[,name...]    [-ninl=] 
-noipa=name[,name...]       [-nipa=] 

These switches enable the automatic inlining (IPA) algorithms in the same way that inline (IPA) does when given without arguments, but the functions listed are ones to NOT be inlined (analyzed). That is, all the functions but the named ones are eligible.

You cannot specify both -inline and -noinline (-ipa and -noipa) on the same command line.

If all call sites of a function are to be inlined, the following variant of the -inline switch may be of interest:


-inline_and_copy[=name[,name...]]        [-inlc=] 

The -inline_and_copy command switch functions like the -inline switch, except that if all references to a function are inlined, the inlined function is copied to the transformed code file unchanged.

When a function has been inlined everywhere it is used, not optimizing it saves compilation time and deleting its text saves memory. These switches are intended for use when the functions being inlined are in the same file as the function reference, and have no special effect when the functions being inlined are being taken from a library or another source file.

Note

These switches assume that all references to a function to be inlined precede it in the source file, and that the file being processed will not be combined or linked with files containing references to the inlined functions. With -inline_and_copy, later references in the same file will either be inlined or execute the unoptimized function; references in other files will execute the unoptimized function.

6.2.4 Call Nesting (Recursive Inlining)

The switch -inline_depth=<n> [-ind] sets a maximum level of recursive inlining that KAP will attempt to inline. Recursive inlining means calls to functions with calls to functions with calls to functions and so forth.

The parameter values and their meanings are as follows:

There is no corresponding -ipa_depth switch. IPA always looks at the called function, and only at the called function.

6.2.5 For-Loop Level

The following switches set a minimum for-loop nest level for function call expansion. The -inline_looplevel and -ipa_looplevel switches enable you to limit inlining and IPA to just functions that are referenced in nested loops, where the reduced function call overhead or enhanced optimization will be multiplied:


-inline_looplevel=<n>  [-inll] 
-ipa_looplevel=<n>     [-ipall] 

The argument is defined from the most deeply nested leaf of the call tree. A small value restricts inlining (IPA) to the best candidate functions, for example:


main 
{ 
.. 
  a(); ------>  a() {...} 
    } 
 
 .. 
 for (..) { 
   for (..) { 
     b(); --------->  b() { 
     }        for (..) { 
 }         for (..) { 
      c(); -------> c() {...} 
             } 
              } 
        } 

The call to b is inside a doubly nested loop, and would be more profitable to expand than the call to a. The call to c is quadruply nested, so inlining c would yield the biggest gain of the three.

The argument is defined from the most deeply nested function reference:

6.2.6 Manual Control

The following switches cause KAP to recognize the #pragma _KAP [no]inline and #pragma _KAP [no]ipa directives. These switches allow manual control over which functions are inlined/analyzed at which call sites. (See Section 6.3.)


-inline_manual        [-inm] 
-ipa_manual           [-ipam] 

The default is to ignore these pragmas. They are enabled when any of the -inlining or -ipa command switches, respectively, are specified. The -inline_manual and -ipa_manual switches permit you to enable the directives without performing other inlining.

6.3 Inlining Pragmas

The inline and IPA pragmas tell KAP to inline/IPA the named functions.


#pragma _KAP [no]inline  [here|routine|global]  [(name[,name...])] 
#pragma _KAP [no]ipa     [here|routine|global]  [(name[,name...])] 

The noinline and noipa pragmas tell KAP to not inline/analyze the named functions. These pragmas combine next-statement, entire routine (function), and global (entire program file) scope. If none of the optional elements are included, all functions referenced in the next statement that are in the inlining/analyzing universe are inlined/analyzed on that line.

These pragmas are disabled by default. You can enable them by specifying any of the -inlining (-ipa) command switches. Also, you can enable them without enabling any other inlining/IPA with the -inline_manual (-ipa_manual) command switch. They are otherwise independent of the other -inlining (-ipa) command switches, and can be used instead of, or in addition to, command-line controlled inlining and IPA.

The keywords including the word pragma must be lowercase. On some systems, the function names are case sensitive.

The effects of scope keywords on pragmas are as follows:

The optional names are function names. If any functions are named in the directive, it applies only to them. If NO function names are given, the pragma applies to ALL functions. The parentheses around the function names are not required if the list of function names is empty.

If a #pragma _KAP inline or #pragma _KAP ipa names a function not in the inlining or IPA universe, a Warning message is issued, and the pragma is ignored.

6.4 Listing File Support

The optional calling tree and loop tables include the loop nest depth level of each for loop. (See Chapter 8 for examples.) This information can be used to determine the nest level for function calls for setting -inline_looplevel or -ipa_looplevel.

6.5 Inlining/IPA Examples

The following code examples demonstrate a few of the possibilities for using the features described in this chapter. Because KAP undergoes constant enhancement, the code that your version of KAP produces may not be identical to that of these examples. The temporary variable names, in particular, can change without significantly altering the transformed code.

Unless otherwise noted, the following examples were run with -o=0 and -so=0 to show the inlining more clearly. If nonzero values are specified, the functions are first inlined or analyzed, and then the concurrent and scalar transformations are applied.

In some cases, C preprocessor additions or modifications to the code were removed to clarify the example outputs.

6.5.1 Inlining Example

The following example demonstrates inlining with -inline=setup, where only the function setup will be inlined, and with -inline, where both functions are inlined. The KAP output includes optimized versions of both functions, in addition to the expanded main program, as follows:


Source file (before the C preprocessor): 
 
#include <math.h> 
#include <stdio.h> 
#define  SIZE  200 
 
main () 
{ 
int i,n; 
double a[SIZE][SIZE], b[SIZE][SIZE], c[SIZE][SIZE]; 
double cksum, matm(); 
 
setup(b,SIZE); 
setup(c,SIZE); 
 
  for (n=25; n<=SIZE; n=n+25) 
  { 
  cksum = matm(n,a,b,c); 
  printf("For N=  %d   checksum= %g \n", n, cksum); 
   } 
   } 
 
setup (e,n) 
double e[SIZE][SIZE]; 
int    n; 
 { 
   int    i,j; 
    for(i=0; i<n; i++) 
  { 
   for (j=0; j<n; j++) 
   e[i][j] = ( (i+ 7*j) % 10 )/10.0; 
  } 
return; 
  } 
 
double matm (n,a,b,c) 
int    n; 
double a[SIZE][SIZE], b[SIZE][SIZE], c[SIZE][SIZE]; 
 { 
 int i,j,k; 
 
 for (i=0; i<n; i++) 
 for (j=0; j<n; j++) 
  { 
    a[i][j] = 0.0 ; 
    for (k=0; k<n; k++) 
    a[i][j] = a[i][j] + b[i][k]*c[k][j]; 
     } 
 
return (a[3][5]); 
       } 

The main function generated by -inline=setup is as follows:


int main(  ) 
 
 { 
  int i; 
  int n; 
  double a[200][200]; 
  double b[200][200]; 
  double c[200][200]; 
  double cksum; 
  double matm( ); 
  int _Kii3; 
  int j; 
  int _Kii6; 
  int _Kii7; 
 
     for ( _Kii3 = 0; _Kii3<=199; _Kii3++ ) { 
       for ( j = 0; j<=199; j++ ) { 
      b[_Kii3][j] = ((_Kii3 + j * 7) % 10) / 10.0; 
         } 
          } 
     for ( _Kii6 = 0; _Kii6<=199; _Kii6++ ) { 
       for ( _Kii7 = 0; _Kii7<=199; _Kii7++ ) { 
        c[_Kii6][_Kii7] = ((_Kii6 + _Kii7 * 7) % 10) / 10.0; 
     } 
    } 
       for ( n = 25; n<=200; n+=25 ) { 
     cksum = matm( n, a, b, c ); 
     printf( "For N=  %d   checksum= %g \n", n, cksum ); 
    } 
      } 

The main function generated by -inline is as follows:


int main(  ) 
 
 { 
 int i; 
 int n; 
 double a[200][200]; 
 double b[200][200]; 
 double c[200][200]; 
 double cksum; 
 double matm( ); 
 double _Kaa1; 
 int _Kii2; 
 int j; 
 int k; 
 int _Kii5; 
 int _Kii6; 
 int _Kii9; 
 int _Kii10; 
 
    for ( _Kii5 = 0; _Kii5<=199; _Kii5++ ) { 
      for ( _Kii6 = 0; _Kii6<=199; _Kii6++ ) { 
       b[_Kii5][_Kii6] = ((_Kii5 + _Kii6 * 7) % 10) / 10.0; 
     } 
   } 
    for ( _Kii9 = 0; _Kii9<=199; _Kii9++ ) { 
      for ( _Kii10 = 0; _Kii10<=199; _Kii10++ ) { 
     c[_Kii9][_Kii10] = ((_Kii9 + _Kii10 * 7) % 10) / 10.0; 
     } 
    } 
    for ( n = 25; n<=200; n+=25 ) { 
      for ( _Kii2 = 0; _Kii2<n; _Kii2++ ) { 
       for ( j = 0; j<n; j++ ) { 
       a[_Kii2][j] = 0.0; 
      for ( k = 0; k<n; k++ ) { 
     a[_Kii2][j] +=  b[_Kii2][k] * c[k][j]; 
       } 
      } 
      } 
    _Kaa1 = a[3][5]; 
    cksum = _Kaa1; 
    printf( "For N=  %d checksum= %g \n", n, cksum ); 
   } 
   } 

6.5.2 IPA Example

In the following example, the variables n and np1 have a simple relationship. This relationship is hidden behind a function call, however, so KAP normally does not try to concurrentize the loop in the main program. When the -ipa=rxgfs command switch is specified, KAP inspects the named function for information on the relationship of its arguments and returned value and the surrounding code. The assumed dependence is lifted and the loop can be safely concurrentized. If a function cannot be inlined, or if you do not want to inline it, it can often still be analyzed for its effects on the calling function.

The following example was run with the default values for -optimize and -scalaropt.


main() 
{ 
 int np1, i, m, n; 
 int a[100][100]; 
 
 np1 = rxgfs( n ); 
   for ( i=0; i<m; i++ ) { 
     a[i][n] = a[i-1][np1]; 
       } 
 
     } 
 
    int rxgfs( n ) 
    int n; 
   { 
return (n+1); 
   } 

Becomes:


int main(  ) 
 { 
  int np1; 
  int i; 
  int m; 
  int n; 
  int a[100][100]; 
 
  np1 = rxgfs ( n ) ; 
 { 
    for ( i = 0; i<m; i++ ) { 
     a[i][n] = a[i-1][np1]; 
       } 
    } 
  } 

The subfunction was not shown.


Previous Next Contents Index