Previous | Contents | Index |
Under the directed method, Compaq KAP does not do any automatic parallel detection. As always, any OpenMP directives in the original source program are passed to the C compiler for processing.
Parallelization --- that is, creating an executable file that executes as a multithreaded application on symmetric multiprocessor systems --- via inserting OpenMP directives is most useful for programs under the following circumstances:
The directed method applies only to FOR loops with OpenMP directives. Consider a C application with two procedures called "example_1" and "example_2" with the following contents:
int a[1000], b[1000], c[1000], d[1000], n; void example_1( ) { int i; #pragma omp parallel shared(n, a, b, c, d) private(i) { #pragma omp for nowait for(i=0;i<n;i++){ a[i]=b[i]+c[i]; if(d[i]){ a[i]/=d[i]; } } } } void example_2() { int j; for (j=0;j<n;j++) { a[j]=b[j]*c[j]; d[j]=a[j]/d[j]; } } |
Compaq KAP passes the OpenMP directives of the FOR loop with index "i" onto the compiler for processing. Compaq KAP does not parallelize the FOR loop with index "j". So, "directed" means that any loops not surrounded with OpenMP directive statements are not parallelized. If instead Compaq KAP were to attempt to transform both FOR loops, then it would be running under the combination method where the procedure "example_2" would have to reside in a C source file that does not contain any OpenMP directives.
An example of how to use KAP to process a program for which no automatic parallelization is desired is given below:
kcc -ckapargs='-noconc' my_prog.c \ -omp -pthread -call_shared |
The results include a transformed source program and its processing by
the compiler and linker to create executable file a.out. Because of the
-noconc switch, Compaq KAP does not automatically set compiler and
linker switches related to parallel processing. Therefore the user must
explicitly set the -omp and -pthread compiler and -call_shared linker
switches.
3.3.1 Changing Source Programs
Insert OpenMP directives (beginning with #pragma omp) only with loops that are safe to parallelize. When Compaq KAP sees a loop prefaced with OpenMP directives, it does not perform data dependence analysis on that loop and does not prevent you from using a parallel directive incorrectly. The OpenMP directives are described in your Compaq C user's guide.
The OpenMP directives (pragmas) are listed below:
There are no Compaq KAP switches that affect the processing by the
compiler of OpenMP directives inserted by the user.
3.3.3 Directing the Compilation and Linking Process
To parallelize a program containing OpenMP directives, you normally need to give only the kcc command with the -noconc KAP switch, the -omp and -pthread C compiler switches, and -call_shared linker switch.
An example follows:
kcc -ckapargs='-noconc' myprog.c \ -omp -pthread -call_shared |
Because of the -noconc switch, Compaq KAP does not automatically set
the compiler and linker switches needed for parallelization. Correct
ones appear here.
3.4 Combined Automatic and Directed Parallelization Using the kcc Driver
Parallelization --- that is, creating an executable file that executes as a multithreaded application on symmetric multiprocessor systems --- by the combined method is most useful for large programs in which you want to explicitly control the parallelization of some FOR loops by inserting OpenMP parallel directives while letting Compaq KAP automatically parallelize the remaining loops. The combined method is a merge of the automatic and directed methods. The appropriate command line to use to process a program using the combined method is:
kcc -ckapargs='-concurrent' openmp.c no_openmp.c
3.4.1 Changing Source Programs
You insert OpenMP directives around those FOR loops that you want to explicitly parallelize. As mentioned previously, KAP C/OpenMP does not perform automatic parallel decomposition or serial optimizations on files that contain OpenMP directives.
In addition, you can insert guiding assertions, that is, non-OpenMP directives, around loops that you want to help Compaq KAP to parallelize automatically. Compaq KAP cannot automatically parallelize loops with data dependencies between loop iterations and loops with calls to external routines. You can help Compaq KAP detection of these loops by placing parallel processing assertions and parallel processing directives (each beginning with #) in the source program. These assertions and directives are:
#pragma _KAP concurrent #pragma _KAP concurrent call #pragma _KAP concurrent ignore call #pragma _KAP serial #pragma _KAP minconcurrent |
Command-line switches you can give to Compaq KAP that affect its transformation of FOR loops are:
To construct a program for parallel execution via the combined method, you normally need to give only the -concurrent switch to the kcc command as follows:
kcc -ckapargs='-concurrent' my_prog.c
The -concurrent switch tells KAP to automatically parallelize appropriate FOR loops within files that do not contain OpenMP directives. The -concurrent switch also sets the compiler and linker switches needed for parallelization. KAP inserts OpenMP directives around loops that it automatically detects are good candidates for parallelization. The actual parallelization is done by the compiler which processes the OpenMP directives inserted automatically by KAP and the OpenMP directives inserted by the programmer.
Finally, you may want to create a completely non-parallelized program so you can compare its execution time with the times of programs that are parallelized in various ways (such as the automatic method and the directed method). The following command does this:
kcc -ckapargs='-noconc' -noomp myprog.c
The -noconc switch prevents automatic parallelization of FOR
loops. The -noomp switch prevents the C compiler from
responding to any parallel directive statements in the transformed
source file it receives.
3.5 Compiling a Program for Parallel Execution Using kapc
Normally, you use the kcc command with the -conc switch to create an optimized and parallelized executable file. Compaq recommends this command because it sets the compiler and linker switches correctly. To view these switches, include the -v switch with the kcc command. |
To compile a program for parallel execution using the kapc command on Tru64 UNIX, issue the following commands:
kapc -conc -cmp=myprog_mp.c myprog.c cc myprog_mp.c -fast -tune host -automatic \ -omp -pthread |
The kapc command preprocesses myprog.c to produce a new source file, myprog_mp.c, which contains OpenMP directives inserted by Compaq KAP for loops Compaq KAP has selected for automatic parallelization. The file, myprog_mp.c is then processed by the compiler and linker to produce a parallelized executable, a.out. Further explanation of the switches used follows:
To run a program parallelized with OpenMP directives, you may want to change the following environment variables:
OMP_SCHEDULE (static,dynamic,guided,runtime) OMP_DYNAMIC (true,false) default is false. OMP_NESTED (true,false) default is false. OMP_NUM_THREADS (number) default value is the number of processors on the current system. |
For further information on environment variables read by the C compiler
see your Compaq C user's guide.
3.7 Parallel Processing Options
KAP provides parallel command switches, directives, and assertions. For further information about these options, refer to the following sections:
ps mOpcpu |
This chapter describes Compaq KAP for C command switches that allow you to alter KAP defaults.
You will frequently be satisfied with the default switch settings of Compaq KAP for C. However, you can alter default settings to customize optimizations for a given application program and machine. These alterations include limiting the search space for loop optimization, adjusting the parameters that describe cache memory, and enabling or disabling classes of transformations.
Command switches are switches requested on the command line when submitting a KAP job, rather than in the source file.
To specify a command switch, you can use the long name or short name. If a command switch appears more than once on the command line, the last value given is used. Multiple occurrences of an input/output file selection switch are not allowed.
The short names for switches are provided as a convenience, especially for interactive users. However, they may not remain unique from one version of KAP to another. Use the long names in situations that require long-term compatibility, for example, a canned shell script. |
Tables 4-1 and 4-2 list the command switches for the KCC driver and the KAPC preprocessor. The first column lists the long name of each switch. This column also lists the functional categories of switches: general optimization, parallel processing, inlining and Interprocedural Analysis, advanced optimization control, input/output files, and listing. The next two columns list the short name and default value of each switch. Switches that have different argument syntax in their regular and negative (no) forms are shown on two lines.
File names are case sensitive on Tru64 UNIX systems, so file-name parameters must match the names of the files wanted. A hyphen (-) is required before each switch listed in the following tables, but the hyphen is not shown in the tables. |
Long Name | Related Switch | Short Name | Default Value |
---|---|---|---|
cc=C_compiler_path | /usr/bin/cc | ||
cext=C file extension | c | ||
ckap=path to kapc | /usr/bin/kapc | ||
ckapargs=kap_switch_string | |||
cpp=cpp_path | /usr/bin/cc | ||
sif=cpp, kap, -S | off | ||
tmpdir=temporary_directory_path | /tmp/ | ||
tune=architecture | current system architecture | ||
verbose | v | nov |
Long Name | Related Switch | Short Name | Default Value |
---|---|---|---|
General Optimization | |||
[no]interchange | interchange | ||
namepartitioning= integer, integer |
so | namepart= <integer>,<integer> | nonamepartitioning |
natural | nat | nonatural | |
optimize=integer | o=<integer> | optimize=5 | |
recursion | rc | nrc | |
roundoff=integer | o, so | r=<integer> | roundoff=3 |
scalaropt=integer | r | so=<integer> | scalaropt=3 |
skip | sk | nosk | |
tune=architecture | tune=<architecture> | tune=<architecture> | |
Parallel Processing Switches | |||
[no]concurrentize | [n]conc | noconcurrentize | |
minconcurrent=integer | mc | minconcurrent=1000 | |
scheduling=list | sched=<list> | scheduling=e | |
Inlining and IPA | |||
inline[=names] | inl[=<names>] | off | |
noinline[=names] | ninl[=<names>] | ||
ipa[=names] | ipa[=<names>] | off | |
noipa[=names] | nipa[=<names>] | ||
inline_and_copy=names | inlc=<names> | off | |
inline_create=file | incr=<file> | off | |
ipa_create=file | ipacr=<file> | off | |
inline_depth=integer | ind=<integer> | ind=2 | |
ipa_depth=integer | ipad=<integer> | ipad=2 | |
inline_from_files=file, file | inl | inff=<file>,<file> | current source file |
ipa_from_files=file, file | ipa | ipaff=<file>,<file> | current source file |
inline_from_libraries=library, library |
inl |
infl=<library>,
<library> |
off |
ipa_from_libraries=library, library | ipa |
ipafl=<library>,
<library> |
off |
inline_looplevel=integer | inll=<integer> | inll=2 | |
ipa_looplevel=integer | ipall=<integer> | ipall=2 | |
inline_manual | inm | off | |
ipa_manual | ipam | off | |
inline_optimize=integer | inline_optimize=0 | ||
ipa_optimize=integer | ipa_optimize=0 | ||
Input-Output | |||
cmp[=file] | cmp[=<file>] | See Section 4.7.1 | |
nocmp | ncmp | ||
list[=file] | l[=<file>] | nl | |
nolist | nl | ||
Listing | |||
cmpoptions[=list] | cp[=<list>] | ncp | |
nocmpoptions | ncp | ||
lines=integer | ln=<integer> | ln=55 | |
listingwidth=integer | lw=<integer> | lw=132 | |
listoptions=list | lo=<list> | See Section 4.8.4 | |
Language Switches | |||
signed | signed | See Section 4.5.1 | |
Advanced Optimization | |||
addressresolution=integer | so, r | arl=<integer> | arl=1 |
[no]arclimit=integer | so, r | arclm=<integer> | arclm=5000 |
cacheline=integer [,integer] |
chl=<integer>
[,<integer>] |
chl=32,32 | |
cache_prefetch_line_count= integer |
cplc=<integer> | cplc=0 | |
cachesize=integer [,integer] |
chs=<integer>
[,<integer>] |
chs=8,0 | |
dpregisters=integer | dpr=<integer> | dpr=32 | |
each_invariant_if_growth=integer | so, r, miifg | eiifg=<integer> | eiifg=20 |
fpregisters=integer | fpr=<integer> | fpr=32 | |
[no]fuse | so,o | fuse | nofuse |
fuselevel=integer | fuse | =<integer> | fuselevel=0 |
heaplimit=integer | heap=<integer> | heaplimit=100 | |
limit=integer | lm=<integer> | lm=50 | |
machine=list | so, r | ma=<list> | ma=s |
max_invariant_if_growth=integer | so, r, eiifg | miifg=<integer> | miifg=500 |
routine= routine_name/limited switches |
See text | rt=<routine_name>/<limited switches> | noroutine |
setassociativity= integer [,integer] |
so, r | sasc=<integer>[,<integer>] | sasc=1,1 |
stdio | so, r | stdio | off |
syntax=value | sy=<value> | sy=d | |
unroll=integer | so, r | ur=<integer> | ur=4 |
unroll2=integer | so, r | ur2=<integer> | ur2=160 |
unroll3=integer | so, r | ur3=<integer> | ur3=1 |
Previous | Next | Contents | Index |