Compaq KAP C/OpenMP
for Tru64 UNIX
User Guide


Previous Contents Index

3.3 Directed Parallelization Using the kcc Driver and OpenMP Directives

Under the directed method, Compaq KAP does not do any automatic parallel detection. As always, any OpenMP directives in the original source program are passed to the C compiler for processing.

Parallelization --- that is, creating an executable file that executes as a multithreaded application on symmetric multiprocessor systems --- via inserting OpenMP directives is most useful for programs under the following circumstances:

The directed method applies only to FOR loops with OpenMP directives. Consider a C application with two procedures called "example_1" and "example_2" with the following contents:


int a[1000], b[1000], c[1000], d[1000], n; 
 
void example_1( ) 
 
{ 
    int i; 
#pragma  omp parallel shared(n, a, b, c, d) private(i) 
 
    { 
#pragma omp for nowait 
    for(i=0;i<n;i++){ 
       a[i]=b[i]+c[i]; 
       if(d[i]){ 
          a[i]/=d[i]; 
 
       } 
     } 
   } 
 } 
void example_2() 
 
{ 
   int j; 
 
   for (j=0;j<n;j++) 
      { 
      a[j]=b[j]*c[j]; 
      d[j]=a[j]/d[j]; 
      } 
} 

Compaq KAP passes the OpenMP directives of the FOR loop with index "i" onto the compiler for processing. Compaq KAP does not parallelize the FOR loop with index "j". So, "directed" means that any loops not surrounded with OpenMP directive statements are not parallelized. If instead Compaq KAP were to attempt to transform both FOR loops, then it would be running under the combination method where the procedure "example_2" would have to reside in a C source file that does not contain any OpenMP directives.

An example of how to use KAP to process a program for which no automatic parallelization is desired is given below:


 
         kcc  -ckapargs='-noconc'  my_prog.c \
               -omp  -pthread  -call_shared 

The results include a transformed source program and its processing by the compiler and linker to create executable file a.out. Because of the -noconc switch, Compaq KAP does not automatically set compiler and linker switches related to parallel processing. Therefore the user must explicitly set the -omp and -pthread compiler and -call_shared linker switches.

3.3.1 Changing Source Programs

Insert OpenMP directives (beginning with #pragma omp) only with loops that are safe to parallelize. When Compaq KAP sees a loop prefaced with OpenMP directives, it does not perform data dependence analysis on that loop and does not prevent you from using a parallel directive incorrectly. The OpenMP directives are described in your Compaq C user's guide.

The OpenMP directives (pragmas) are listed below:

3.3.2 Giving Command Line Switches

There are no Compaq KAP switches that affect the processing by the compiler of OpenMP directives inserted by the user.

3.3.3 Directing the Compilation and Linking Process

To parallelize a program containing OpenMP directives, you normally need to give only the kcc command with the -noconc KAP switch, the -omp and -pthread C compiler switches, and -call_shared linker switch.

An example follows:


   kcc  -ckapargs='-noconc'  myprog.c \
         -omp  -pthread  -call_shared 

Because of the -noconc switch, Compaq KAP does not automatically set the compiler and linker switches needed for parallelization. Correct ones appear here.

3.4 Combined Automatic and Directed Parallelization Using the kcc Driver

Parallelization --- that is, creating an executable file that executes as a multithreaded application on symmetric multiprocessor systems --- by the combined method is most useful for large programs in which you want to explicitly control the parallelization of some FOR loops by inserting OpenMP parallel directives while letting Compaq KAP automatically parallelize the remaining loops. The combined method is a merge of the automatic and directed methods. The appropriate command line to use to process a program using the combined method is:

kcc -ckapargs='-concurrent' openmp.c no_openmp.c

3.4.1 Changing Source Programs

You insert OpenMP directives around those FOR loops that you want to explicitly parallelize. As mentioned previously, KAP C/OpenMP does not perform automatic parallel decomposition or serial optimizations on files that contain OpenMP directives.

In addition, you can insert guiding assertions, that is, non-OpenMP directives, around loops that you want to help Compaq KAP to parallelize automatically. Compaq KAP cannot automatically parallelize loops with data dependencies between loop iterations and loops with calls to external routines. You can help Compaq KAP detection of these loops by placing parallel processing assertions and parallel processing directives (each beginning with #) in the source program. These assertions and directives are:


#pragma _KAP concurrent 
#pragma _KAP concurrent call 
#pragma _KAP concurrent ignore call 
#pragma _KAP serial 
#pragma _KAP minconcurrent 

3.4.2 Giving Command Line Switches

Command-line switches you can give to Compaq KAP that affect its transformation of FOR loops are:

3.4.3 Directing the Compilation and Linking Process

To construct a program for parallel execution via the combined method, you normally need to give only the -concurrent switch to the kcc command as follows:

kcc -ckapargs='-concurrent' my_prog.c

The -concurrent switch tells KAP to automatically parallelize appropriate FOR loops within files that do not contain OpenMP directives. The -concurrent switch also sets the compiler and linker switches needed for parallelization. KAP inserts OpenMP directives around loops that it automatically detects are good candidates for parallelization. The actual parallelization is done by the compiler which processes the OpenMP directives inserted automatically by KAP and the OpenMP directives inserted by the programmer.

Finally, you may want to create a completely non-parallelized program so you can compare its execution time with the times of programs that are parallelized in various ways (such as the automatic method and the directed method). The following command does this:

kcc -ckapargs='-noconc' -noomp myprog.c

The -noconc switch prevents automatic parallelization of FOR loops. The -noomp switch prevents the C compiler from responding to any parallel directive statements in the transformed source file it receives.

3.5 Compiling a Program for Parallel Execution Using kapc

Note

Normally, you use the kcc command with the -conc switch to create an optimized and parallelized executable file. Compaq recommends this command because it sets the compiler and linker switches correctly.

To view these switches, include the -v switch with the kcc command.

To compile a program for parallel execution using the kapc command on Tru64 UNIX, issue the following commands:


   kapc -conc -cmp=myprog_mp.c myprog.c 
 
   cc myprog_mp.c -fast -tune host -automatic \
       -omp -pthread 

The kapc command preprocesses myprog.c to produce a new source file, myprog_mp.c, which contains OpenMP directives inserted by Compaq KAP for loops Compaq KAP has selected for automatic parallelization. The file, myprog_mp.c is then processed by the compiler and linker to produce a parallelized executable, a.out. Further explanation of the switches used follows:

3.6 Running a Parallelized Program

To run a program parallelized with OpenMP directives, you may want to change the following environment variables:


OMP_SCHEDULE               (static,dynamic,guided,runtime) 
OMP_DYNAMIC                (true,false) default is false. 
OMP_NESTED                 (true,false) default is false. 
OMP_NUM_THREADS            (number) default value is the number of 
                           processors on the current system. 

For further information on environment variables read by the C compiler see your Compaq C user's guide.

3.7 Parallel Processing Options

KAP provides parallel command switches, directives, and assertions. For further information about these options, refer to the following sections:

3.8 Parallel Programming Tips


Chapter 4
KAP Command Switches

This chapter describes Compaq KAP for C command switches that allow you to alter KAP defaults.

You will frequently be satisfied with the default switch settings of Compaq KAP for C. However, you can alter default settings to customize optimizations for a given application program and machine. These alterations include limiting the search space for loop optimization, adjusting the parameters that describe cache memory, and enabling or disabling classes of transformations.

Command switches are switches requested on the command line when submitting a KAP job, rather than in the source file.

To specify a command switch, you can use the long name or short name. If a command switch appears more than once on the command line, the last value given is used. Multiple occurrences of an input/output file selection switch are not allowed.

Note

The short names for switches are provided as a convenience, especially for interactive users. However, they may not remain unique from one version of KAP to another. Use the long names in situations that require long-term compatibility, for example, a canned shell script.

Tables 4-1 and 4-2 list the command switches for the KCC driver and the KAPC preprocessor. The first column lists the long name of each switch. This column also lists the functional categories of switches: general optimization, parallel processing, inlining and Interprocedural Analysis, advanced optimization control, input/output files, and listing. The next two columns list the short name and default value of each switch. Switches that have different argument syntax in their regular and negative (no) forms are shown on two lines.

Note

File names are case sensitive on Tru64 UNIX systems, so file-name parameters must match the names of the files wanted.

A hyphen (-) is required before each switch listed in the following tables, but the hyphen is not shown in the tables.

Table 4-1 Command-Line Switches for KCC Driver
Long Name Related Switch Short Name Default Value
cc=C_compiler_path     /usr/bin/cc
cext=C file extension     c
ckap=path to kapc     /usr/bin/kapc
ckapargs=kap_switch_string      
cpp=cpp_path     /usr/bin/cc
sif=cpp, kap, -S     off
tmpdir=temporary_directory_path     /tmp/
tune=architecture     current system architecture
verbose   v nov

Table 4-2 Command-Line Switches for KAPC Preprocessor
Long Name Related Switch Short Name Default Value
General Optimization      
[no]interchange     interchange
namepartitioning=
integer, integer
so namepart= <integer>,<integer> nonamepartitioning
natural   nat nonatural
optimize=integer   o=<integer> optimize=5
recursion   rc nrc
roundoff=integer o, so r=<integer> roundoff=3
scalaropt=integer r so=<integer> scalaropt=3
skip   sk nosk
tune=architecture   tune=<architecture> tune=<architecture>
Parallel Processing Switches      
[no]concurrentize   [n]conc noconcurrentize
minconcurrent=integer   mc minconcurrent=1000
scheduling=list   sched=<list> scheduling=e
Inlining and IPA      
inline[=names]   inl[=<names>] off
noinline[=names]   ninl[=<names>]  
ipa[=names]   ipa[=<names>] off
noipa[=names]   nipa[=<names>]  
inline_and_copy=names   inlc=<names> off
inline_create=file   incr=<file> off
ipa_create=file   ipacr=<file> off
inline_depth=integer   ind=<integer> ind=2
ipa_depth=integer   ipad=<integer> ipad=2
inline_from_files=file, file inl inff=<file>,<file> current source file
ipa_from_files=file, file ipa ipaff=<file>,<file> current source file
inline_from_libraries=library,
library
inl infl=<library>,
<library>
off
ipa_from_libraries=library, library ipa ipafl=<library>,
<library>
off
inline_looplevel=integer   inll=<integer> inll=2
ipa_looplevel=integer   ipall=<integer> ipall=2
inline_manual   inm off
ipa_manual   ipam off
inline_optimize=integer     inline_optimize=0
ipa_optimize=integer     ipa_optimize=0
Input-Output      
cmp[=file]   cmp[=<file>] See Section 4.7.1
nocmp   ncmp  
list[=file]   l[=<file>] nl
nolist   nl  
Listing      
cmpoptions[=list]   cp[=<list>] ncp
nocmpoptions   ncp  
lines=integer   ln=<integer> ln=55
listingwidth=integer   lw=<integer> lw=132
listoptions=list   lo=<list> See Section 4.8.4
Language Switches      
signed   signed See Section 4.5.1
Advanced Optimization      
addressresolution=integer so, r arl=<integer> arl=1
[no]arclimit=integer so, r arclm=<integer> arclm=5000
cacheline=integer [,integer]   chl=<integer>
[,<integer>]
chl=32,32
cache_prefetch_line_count=
integer
  cplc=<integer> cplc=0
cachesize=integer [,integer]   chs=<integer>
[,<integer>]
chs=8,0
dpregisters=integer   dpr=<integer> dpr=32
each_invariant_if_growth=integer so, r, miifg eiifg=<integer> eiifg=20
fpregisters=integer   fpr=<integer> fpr=32
[no]fuse so,o fuse nofuse
fuselevel=integer fuse =<integer> fuselevel=0
heaplimit=integer   heap=<integer> heaplimit=100
limit=integer   lm=<integer> lm=50
machine=list so, r ma=<list> ma=s
max_invariant_if_growth=integer so, r, eiifg miifg=<integer> miifg=500
routine=
routine_name/limited switches
See text rt=<routine_name>/<limited switches> noroutine
setassociativity=
integer [,integer]
so, r sasc=<integer>[,<integer>] sasc=1,1
stdio so, r stdio off
syntax=value   sy=<value> sy=d
unroll=integer so, r ur=<integer> ur=4
unroll2=integer so, r ur2=<integer> ur2=160
unroll3=integer so, r ur3=<integer> ur3=1


Previous Next Contents Index