Compaq KAP Fortran/OpenMP
for Tru64
UNIX
User Guide
2.7 Compiling a Program Using kapf90
Use the following command to execute KAP as a standalone preprocessor:
kapf90 [kap_switch_string] myprog.f90 -cmp=myprog.cmp.f90 -freeformat
|
The kapf90 command assumes that the source file input is
fixed format by default. Use the Compaq KAP Fortran/OpenMP
-freeformat switch to cause KAP to treat source files as free
format, as shown in the previous code example. For more information
about the -freeformat switch, see Section 5.4.7.
After preprocessing your program, give myprog.cmp.f90 to the
Compaq Fortran compiler, as follows:
f90 -fast -tune host -non_shared myprog.cmp.f90
|
Note
When you use kapf90 to process a file, you must set the
Compaq Fortran compiler and linker switches appropriately. For
this reason, Compaq recommends that you use kf90 whenever
possible, because kf90 automatically sets the compiler and
linker switches correctly.
|
2.8 Compiling a Program Containing C Preprocessor Directives Using kapf90
If a Fortran 90 program contains C preprocessor directives, preprocess
it with cpp before you process it with kapf90. For
example, if your program has C include statements, process it as
follows:
cpp -P myprog.f > clinic.i
kapf90 myprog.i -cmp=myprog.f90
f90 myprog.f90
|
2.9 Using KAP Syntax
Specify switches in lowercase with the syntax -switch[=value]. Do not
leave spaces between the switch name and the value. Switches can appear
before or after the input file as follows:
kapf90 -inm myprog.f90 -roundoff=2 -freeformat
|
KAP recognizes standard abbreviations for switches. Switches that take
a list of names must have the names separated by commas and with no
spaces, for example:
Enclose KAP command switches passed through kf90 by using the
-fkapargs switch with single quotation marks, as follows:
kf90 -fkapargs='-optimize=5 -roundoff=3 -scalaropt=3' -w myprog.f90
|
Compaq Fortran compiler switches, for example, -w, do
not require quotation marks.
2.10 Using File Naming Conventions
Any input file name is valid. If the file name does not have an
extension, the extension .f90 is assumed. As KAP processes a
Fortran 90 file it generates three output files --- the optimized
program file, the optional listing file, and the executable file.
The default output file names are as follows:
- <file>.cmp.f90 --- the optimized Fortran 90 program
from the kf90 driver
- <file>.cmp --- the optimized Fortran 90 program from
kapf90
- <file>.out --- the annotated KAP listing file
- a.out --- the executable file
Other output file names can be specified with the -cmp and
-list switches.
When KAP detects an error condition, KAP writes a message to standard
error.
2.11 Optimization Hints and Tips
This information can be used with both multiprocessor and
single-processor systems, and with both Fortran and C versions of all
KAP products. Therefore, the information may contain references to
command-line switches or settings that are unavailable or that are
different from those in the KAP that you are using.
This section provides separate protocols for small and large programs.
Small programs are defined as those that can be compiled and run
quickly. Because the cost of each iteration is small, you can take
risks. The information presented here further assumes that small
programs have a small number of program units.
Large programs are defined as those that take more time to compile and
run than it takes for you to check the results. A program can be large
either because the source code is very large or because the execution
time is long.
2.11.1 Optimizing Small Programs with KAP
Follow these guidelines to optimize small programs:
- Compile the program without KAP, with minimum compiler optimization
and with all compiler run-time checks enabled. Note the execution time
and verify the results. If the program fails at this step, there is
little optimization you can do.
- If you have the time and you know what the program is supposed to
do, you can try to isolate the incorrect code, correct it, and proceed.
This action may not be feasible for handling problems in large
programs, but it might work for isolated portability problems.
If
the problem code is isolated and runs without KAP optimization, you may
be able to run KAP on the rest of the program and leave out any
problematic sections.
You can also refer to Section 2.15. You may
be able to diagnose and correct some problems, and then run KAP on your
program successfully.
- If the program compiles with minimum compiler optimization enabled,
turn on all optimization except inlining by invoking
-optimize=4.
- If step 2 succeeds and the results are correct, try the suggestions
in Section 2.14 about additional performance improvement techniques.
If step 2 fails, try reducing one optimization at a time
(-roundoff=0, -scalaropt=1, -optimize=3),
and any compiler optimizations until the program runs correctly. Use
the -lo=k switch setting to create a listing of the KAP
command-line switches and settings.
2.11.2 Optimizing Large Programs with KAP
Follow these guidelines to optimize large programs:
- Compile the program without KAP, with minimum compiler
optimization, and with all compiler run-time checks enabled. Note the
execution time and verify the results. If the program fails at this
step, there is not much optimization you can do.
Some older
programs use standard-violating techniques that KAP will not transform
safely. If KAP fails because of this problem, there is little
optimization you can do.
If you have the time and you know what
the program is supposed to do, you can try to isolate the incorrect
code, correct it, and proceed. This action is feasible for large
programs only if the problems are easily understood and isolated or if
you have enough time to find more intractable problems.
If the
problem code is isolated and runs without KAP optimization, you may be
able to run KAP on the rest of the program and leave out any
problematic sections. You can also refer to Section 2.15 on KAP
problems. You may be able to diagnose and correct some problems, and
then run KAP on your program successfully.
- Compile without KAP but with maximum compiler optimization. Note
the execution time and verify the results. If the program fails, reduce
compiler optimization and try again.
- Compile the fastest/best non-KAP run and run it again with
profiling enabled (for example, gprof) to identify the program
units that take the most time to run.
Time-intensive units that
have many iterative loops and arrays are good candidates for KAP loop
optimizations. Go to step 4.
If these units are not good
candidates, then the lower-payoff optimizations, such as inlining, may
provide some performance improvement especially if there are places
where inlining inside loop nests may also allow KAP to perform
vectorization optimizations. In this case, go to step 6.
- If time-intensive routines were identified as good candidates, run
KAP on them with modest KAP optimization (-optimize=2),
compile the whole program with the other switches used in the best run
from step 2, note the execution time, and verify the results.
If
the program fails, try again with the KAP switch -roundoff=0.
If that works, the failure is probably due to roundoff-sensitive
operation. If it still fails with -roundoff=0, try
-scalaropt=1.
- If step 4 works, repeat with full KAP optimization, with full
compiler optimization, and with -roundoff=0 or
-scalaropt=1, if needed.
If the program fails, reduce the
setting to a lower KAP optimization level or a lower compiler
optimization level, and try again. If you have success at this step,
you can also try the suggestions found in Section 2.14.
- If there are no routines with arrays and loops, run the whole
program with -optimize=0 and
-inline_and_copy=aaa,bbb,ccc,.., where aaa, bbb, and so forth,
are the most frequently called routines from the profiling run in
step 3.
If this action succeeds, repeat with the
-optimize=4 and
-inline_and_copy=... switches. If this action fails, try
rerunning with -roundoff=0 or -scalaropt=1 or with
fewer routines inlined. (See Section 2.15 for an explanation of
binary chop.) Also, if you have success at this step, try the
suggestions in Section 2.14.
2.12 General Optimization Tips
- Use the -v switch on the kf90 command line to
view the switches the KAP preprocessor passes to the compiler and the
linker.
- Use the -ipa switch to cause KAP to give information in
the annotated listing about appropriate settings for the
-ipall switch on a loop-by-loop basis.
- Avoid writing code that accesses an array outside of the array
bounds, because this necessitates that you use the -assume=b
switch. Setting -assume=b prevents KAP from performing many of
its optimizations.
2.13 Improving and Customizing KAP Performance
After you have used the KAP protocol for either small or large
programs, you can find ways to fine-tune KAP to fit your application.
This section helps you discover which KAP command-line switches,
directives, or assertions can be used to try to improve KAP performance
for a particular application program. The following is a list of common
goals and common program situations that KAP users often have, and it
offers suggestions for possible improvements.
Remember that KAP is a tool to optimize Compaq Fortran code.
Like any tool, it performs best when you are familiar with the details
of how it works and are able to use its switches correctly and
advantageously.
Although KAP default switch settings will achieve performance
improvement, you can often achieve greater improvement if you
understand and use alternate switch settings. Moreover, you can often
insert directives or assertions to achieve improved performance.
See Table 2-2 for user actions and specific goals.
Table 2-2 User Actions for Specific Goals
Goal |
User Action |
Have a more informative listing to help answer your questions.
|
Use
-lo=otkl or other listing switches under
-listoptions command-line switch.
|
Recognize more reductions.
|
Increase
-roundoff switch setting.
|
Answer a KAP generated question.
|
Use appropriate assertion.
|
Eliminate unnecessary last-value assignment.
|
Use
!*$* assert no last value needed or
-assume without the l switch; or try
-save=manual.
|
Spend less time optimizing deeply nested loops.
|
Reduce
-limit and
-arclimit or their directives.
|
Disable inner loop unrolling.
|
Use
-unroll=1 or
-scalaropt < 2.
|
Disable outer loop unrolling.
|
Use
-roundoff < 3 or
-scalaropt < 3.
|
Prevent a given loop from being optimized.
|
Use
!*$* assert do (serial),
!*$* assert do prefer (serial), !*$* noconcurrent, or
!*$* optimize (0). (Remember to reenable optimization after
the serial loop.)
|
Disable some data dependence checking.
|
Use
!*$* assert no recurrence for one loop nest.
|
Expand (inline) subroutine calls within DO loops.
|
Use
-inline, -inline_from_files, or
-inline_create and
-inline_from_libraries. Or, if the goal is to execute the
subroutine body concurrently, try
-ipa or
!*$* assert concurrent call.
|
Inline more routines.
|
Increase
-inline_depth and
-inline_looplevel. (See also the
!*$* inline directive.)
|
Turn off directives and assertions.
|
Use the
-nodirectives switch.
|
Process a program that uses intentional array bounds violation.
|
Use
!*$* assert bounds violations.
|
Use STATIC storage.
|
Insert SAVE statements or use
-save=all_adjust.
|
2.14 Using Additional Performance Improvement Techniques
After you have successfully run KAP on a working program by using
either the protocol for small programs or that for large programs, you
can try the following procedures to find additional opportunities for
optimization within your program:
- If each COMMON block is laid out the same way everywhere that it is
used or if no data is passed between different layouts, then try
-aggressive=a or -aggressive=ab. In some cases, these
actions can yield up to a 20% performance improvement.
- If you have successfully run KAP on some routines in a large
program, then try running KAP on the whole program with the same
switches.
- Try -save=manual. If the user-written SAVE statements are
sufficient and correct for the compiler you are using, this action may
reduce unneeded precautions that KAP ordinarily takes.
- Try lowering the settings on the Invariant-IF switches
-eiifg and -miifg. These actions may reduce the total
code space enough to make paging or caching the program code work
better.
- You can try brute-force inlining. Set -inline_and_copy and
-inll=2. Inlining is usually more effective if you inline only
a few carefully chosen routines rather than inlining everything and
cluttering up the code with too much low-payoff inlining. However, the
shotgun approach can sometimes produce good results.
- Experiment with each of the following switches to determine if they
improve the run-time of your program.
However, the above switches may increase the amount of time and
memory KAP needs to process your source files.
2.15 Correcting KAP Problems
The following are some problems you may encounter when using KAP and
possible fixes and workarounds:
- KAP works best on programs that are CPU-intensive, that spend a
great deal of time doing floating-point calculations, and that have
large loop bounds.
The two most common reasons KAP is unable to
achieve performance improvement in applications code are the following:
- A program with small loop limits or too few loops cause the KAP
vectorization setup overhead to outweigh the speedup.
- A program that is I/O bound is not likely to achieve much
performance improvement because no amount of improvement to the
computation sections will change execution time significantly. However,
in the case of a Fortran 90 program, I/O strength reduction can improve
I/O performance. Profiling information may provide clues to either
problem. You may need to insert additional print statements to verify
loop limits.
- If the program is correct but the output is significantly different
when KAP is run on the program, try reducing the setting on the
-optimize switch.
- Nonsensical or nonrepeatable values in the output can be the result
of the program violating declared array bounds. Try compiling the
original source code with array bounds checking enabled. If the program
is violating array bounds, you may be able to work around the problem
by using -assume=b or the assertion !*$* assert bounds
violation. Nonsensical or nonrepeatable values in the output may
also be the result of unstable algorithms. Try setting
-roundoff=0.
For Fortran 90 programs, also try
-save=all_adjust in case the save statements you have supplied
in the code are incorrect or insufficient for the compiler you are
using.
- If you get incorrect results from a large program with a few
routines run through KAP, or a small program run through KAP, or a
program with a few routines inlined by KAP, you may be able to
determine the source of the problem by means of binary chop.
For
example, suppose you have five routines, a, b, c, d, and e. When all
five are processed with KAP, the program produces incorrect results or
dies. Try running KAP again, but only on routines a and b. If they
succeed, then the problem is in c, d, or e. If they fail, try with just
routine a, and so on. By breaking the list of suspects into approximate
halves for each test, you can fairly quickly identify which routine or
routines cause the failure. Leave the problematic routines out of
future KAP runs.
- If you have link errors, ensure that the link step loaded all the
libraries needed for all parts of the program. A link failure may also
occur because KAP failed while processing a file, and the routines that
came after the point of failure in that file were not copied to the
compile file. Determine the reason that KAP failed, and try relinking.
- If the compiler issues a syntax error on a transformed program,
compare the source code with the transformed code. KAP detects and
flags some run-time errors, especially in I/O statements, at
compilation time.
- Insufficient memory for KAP to run can sometimes be fixed by
placing fewer names on the -inline_and_copy switch or by
reducing the -eiifg and -miifg settings. To
compensate for insufficient memory, you can also break up a source file
into smaller logical units and run KAP on the separate units.
- If you receive the messages Preprocessor Failed or Translator
Error, try lowering switch values, especially -scalaropt.