Compaq KAP Fortran/OpenMP
for Tru64 UNIX
User Guide

2.7 Compiling a Program Using kapf90

Use the following command to execute KAP as a standalone preprocessor:

kapf90 [kap_switch_string] myprog.f90 -cmp=myprog.cmp.f90 -freeformat

The kapf90 command assumes that the source file input is fixed format by default. Use the Compaq KAP Fortran/OpenMP -freeformat switch to cause KAP to treat source files as free format, as shown in the previous code example. For more information about the -freeformat switch, see Section 5.4.7.

After preprocessing your program, give myprog.cmp.f90 to the Compaq Fortran compiler, as follows:

f90 -fast -tune host -non_shared myprog.cmp.f90

Note

When you use kapf90 to process a file, you must set the Compaq Fortran compiler and linker switches appropriately. For this reason, Compaq recommends that you use kf90 whenever possible, because kf90 automatically sets the compiler and linker switches correctly.

2.8 Compiling a Program Containing C Preprocessor Directives Using kapf90

If a Fortran 90 program contains C preprocessor directives, preprocess it with cpp before you process it with kapf90. For example, if your program has C include statements, process it as follows:

cpp -P myprog.f > clinic.i kapf90 myprog.i -cmp=myprog.f90 f90 myprog.f90

2.9 Using KAP Syntax

Specify switches in lowercase with the syntax -switch[=value]. Do not leave spaces between the switch name and the value. Switches can appear before or after the input file as follows:

kapf90 -inm myprog.f90 -roundoff=2 -freeformat

KAP recognizes standard abbreviations for switches. Switches that take a list of names must have the names separated by commas and with no spaces, for example:

-inff=besl.f90,util.f90

Enclose KAP command switches passed through kf90 by using the -fkapargs switch with single quotation marks, as follows:

kf90 -fkapargs='-optimize=5 -roundoff=3 -scalaropt=3' -w myprog.f90

Compaq Fortran compiler switches, for example, -w, do not require quotation marks.

2.10 Using File Naming Conventions

Any input file name is valid. If the file name does not have an extension, the extension .f90 is assumed. As KAP processes a Fortran 90 file it generates three output files --- the optimized program file, the optional listing file, and the executable file.

The default output file names are as follows:

<file>.cmp.f90 --- the optimized Fortran 90 program from the kf90 driver
<file>.cmp --- the optimized Fortran 90 program from kapf90
<file>.out --- the annotated KAP listing file
a.out --- the executable file

Other output file names can be specified with the -cmp and -list switches.

When KAP detects an error condition, KAP writes a message to standard error.

2.11 Optimization Hints and Tips

This information can be used with both multiprocessor and single-processor systems, and with both Fortran and C versions of all KAP products. Therefore, the information may contain references to command-line switches or settings that are unavailable or that are different from those in the KAP that you are using.

This section provides separate protocols for small and large programs. Small programs are defined as those that can be compiled and run quickly. Because the cost of each iteration is small, you can take risks. The information presented here further assumes that small programs have a small number of program units.

Large programs are defined as those that take more time to compile and run than it takes for you to check the results. A program can be large either because the source code is very large or because the execution time is long.

2.11.1 Optimizing Small Programs with KAP

Follow these guidelines to optimize small programs:

Compile the program without KAP, with minimum compiler optimization and with all compiler run-time checks enabled. Note the execution time and verify the results. If the program fails at this step, there is little optimization you can do.
If you have the time and you know what the program is supposed to do, you can try to isolate the incorrect code, correct it, and proceed. This action may not be feasible for handling problems in large programs, but it might work for isolated portability problems.
If the problem code is isolated and runs without KAP optimization, you may be able to run KAP on the rest of the program and leave out any problematic sections.
You can also refer to Section 2.15. You may be able to diagnose and correct some problems, and then run KAP on your program successfully.
If the program compiles with minimum compiler optimization enabled, turn on all optimization except inlining by invoking -optimize=4.
If step 2 succeeds and the results are correct, try the suggestions in Section 2.14 about additional performance improvement techniques.
If step 2 fails, try reducing one optimization at a time (-roundoff=0, -scalaropt=1, -optimize=3), and any compiler optimizations until the program runs correctly. Use the -lo=k switch setting to create a listing of the KAP command-line switches and settings.

2.11.2 Optimizing Large Programs with KAP

Follow these guidelines to optimize large programs:

Compile the program without KAP, with minimum compiler optimization, and with all compiler run-time checks enabled. Note the execution time and verify the results. If the program fails at this step, there is not much optimization you can do.
Some older programs use standard-violating techniques that KAP will not transform safely. If KAP fails because of this problem, there is little optimization you can do.
If you have the time and you know what the program is supposed to do, you can try to isolate the incorrect code, correct it, and proceed. This action is feasible for large programs only if the problems are easily understood and isolated or if you have enough time to find more intractable problems.
If the problem code is isolated and runs without KAP optimization, you may be able to run KAP on the rest of the program and leave out any problematic sections. You can also refer to Section 2.15 on KAP problems. You may be able to diagnose and correct some problems, and then run KAP on your program successfully.
Compile without KAP but with maximum compiler optimization. Note the execution time and verify the results. If the program fails, reduce compiler optimization and try again.
Compile the fastest/best non-KAP run and run it again with profiling enabled (for example, gprof) to identify the program units that take the most time to run.
Time-intensive units that have many iterative loops and arrays are good candidates for KAP loop optimizations. Go to step 4.
If these units are not good candidates, then the lower-payoff optimizations, such as inlining, may provide some performance improvement especially if there are places where inlining inside loop nests may also allow KAP to perform vectorization optimizations. In this case, go to step 6.
If time-intensive routines were identified as good candidates, run KAP on them with modest KAP optimization (-optimize=2), compile the whole program with the other switches used in the best run from step 2, note the execution time, and verify the results.
If the program fails, try again with the KAP switch -roundoff=0. If that works, the failure is probably due to roundoff-sensitive operation. If it still fails with -roundoff=0, try -scalaropt=1.
If step 4 works, repeat with full KAP optimization, with full compiler optimization, and with -roundoff=0 or -scalaropt=1, if needed.
If the program fails, reduce the setting to a lower KAP optimization level or a lower compiler optimization level, and try again. If you have success at this step, you can also try the suggestions found in Section 2.14.
If there are no routines with arrays and loops, run the whole program with -optimize=0 and -inline_and_copy=aaa,bbb,ccc,.., where aaa, bbb, and so forth, are the most frequently called routines from the profiling run in
step 3.
If this action succeeds, repeat with the -optimize=4 and
-inline_and_copy=... switches. If this action fails, try rerunning with -roundoff=0 or -scalaropt=1 or with fewer routines inlined. (See Section 2.15 for an explanation of binary chop.) Also, if you have success at this step, try the suggestions in Section 2.14.

2.12 General Optimization Tips

Use the -v switch on the kf90 command line to view the switches the KAP preprocessor passes to the compiler and the linker.
Use the -ipa switch to cause KAP to give information in the annotated listing about appropriate settings for the -ipall switch on a loop-by-loop basis.
Avoid writing code that accesses an array outside of the array bounds, because this necessitates that you use the -assume=b switch. Setting -assume=b prevents KAP from performing many of its optimizations.

2.13 Improving and Customizing KAP Performance

After you have used the KAP protocol for either small or large programs, you can find ways to fine-tune KAP to fit your application.

This section helps you discover which KAP command-line switches, directives, or assertions can be used to try to improve KAP performance for a particular application program. The following is a list of common goals and common program situations that KAP users often have, and it offers suggestions for possible improvements.

Remember that KAP is a tool to optimize Compaq Fortran code. Like any tool, it performs best when you are familiar with the details of how it works and are able to use its switches correctly and advantageously.

Although KAP default switch settings will achieve performance improvement, you can often achieve greater improvement if you understand and use alternate switch settings. Moreover, you can often insert directives or assertions to achieve improved performance.

See Table 2-2 for user actions and specific goals.

Table 2-2 User Actions for Specific Goals
Goal User Action

Have a more informative listing to help answer your questions. Use -lo=otkl or other listing switches under -listoptions command-line switch.

Recognize more reductions. Increase -roundoff switch setting.

Answer a KAP generated question. Use appropriate assertion.

Eliminate unnecessary last-value assignment. Use !*$* assert no last value needed or -assume without the l switch; or try -save=manual.

Spend less time optimizing deeply nested loops. Reduce -limit and -arclimit or their directives.

Disable inner loop unrolling. Use -unroll=1 or -scalaropt < 2.

Disable outer loop unrolling. Use -roundoff < 3 or -scalaropt < 3.

Prevent a given loop from being optimized. Use !*$* assert do (serial), !*$* assert do prefer (serial), !*$* noconcurrent, or !*$* optimize (0). (Remember to reenable optimization after the serial loop.)

Disable some data dependence checking. Use !*$* assert no recurrence for one loop nest.

Expand (inline) subroutine calls within DO loops. Use -inline, -inline_from_files, or -inline_create and
-inline_from_libraries. Or, if the goal is to execute the subroutine body concurrently, try -ipa or !*$* assert concurrent call.

Inline more routines. Increase -inline_depth and
-inline_looplevel. (See also the !*$* inline directive.)

Turn off directives and assertions. Use the -nodirectives switch.

Process a program that uses intentional array bounds violation. Use !*$* assert bounds violations.

Use STATIC storage. Insert SAVE statements or use -save=all_adjust.

**Table 2-2 User Actions for Specific Goals**
Goal	User Action
Have a more informative listing to help answer your questions.	Use -lo=otkl or other listing switches under -listoptions command-line switch.
Recognize more reductions.	Increase -roundoff switch setting.
Answer a KAP generated question.	Use appropriate assertion.
Eliminate unnecessary last-value assignment.	Use !$* assert no last value needed* or -assume without the l switch; or try -save=manual.
Spend less time optimizing deeply nested loops.	Reduce -limit and -arclimit or their directives.
Disable inner loop unrolling.	Use -unroll=1 or -scalaropt < 2.
Disable outer loop unrolling.	Use -roundoff < 3 or -scalaropt < 3.
Prevent a given loop from being optimized.	Use !$* assert do (serial), !$ assert do prefer (serial), !$ noconcurrent,* or !$* optimize (0)*. (Remember to reenable optimization after the serial loop.)
Disable some data dependence checking.	Use !$* assert no recurrence* for one loop nest.
Expand (inline) subroutine calls within DO loops.	Use -inline, -inline_from_files, or -inline_create and -inline_from_libraries. Or, if the goal is to execute the subroutine body concurrently, try -ipa or !$* assert concurrent call*.
Inline more routines.	Increase -inline_depth and -inline_looplevel. (See also the !$* inline* directive.)
Turn off directives and assertions.	Use the -nodirectives switch.
Process a program that uses intentional array bounds violation.	Use !$* assert bounds violations*.
Use STATIC storage.	Insert SAVE statements or use -save=all_adjust.

2.14 Using Additional Performance Improvement Techniques

After you have successfully run KAP on a working program by using either the protocol for small programs or that for large programs, you can try the following procedures to find additional opportunities for optimization within your program:

If each COMMON block is laid out the same way everywhere that it is used or if no data is passed between different layouts, then try -aggressive=a or -aggressive=ab. In some cases, these actions can yield up to a 20% performance improvement.
If you have successfully run KAP on some routines in a large program, then try running KAP on the whole program with the same switches.
Try -save=manual. If the user-written SAVE statements are sufficient and correct for the compiler you are using, this action may reduce unneeded precautions that KAP ordinarily takes.
Try lowering the settings on the Invariant-IF switches -eiifg and -miifg. These actions may reduce the total code space enough to make paging or caching the program code work better.
You can try brute-force inlining. Set -inline_and_copy and -inll=2. Inlining is usually more effective if you inline only a few carefully chosen routines rather than inlining everything and cluttering up the code with too much low-payoff inlining. However, the shotgun approach can sometimes produce good results.
Experiment with each of the following switches to determine if they improve the run-time of your program.
However, the above switches may increase the amount of time and memory KAP needs to process your source files.

2.15 Correcting KAP Problems

The following are some problems you may encounter when using KAP and possible fixes and workarounds:

KAP works best on programs that are CPU-intensive, that spend a great deal of time doing floating-point calculations, and that have large loop bounds.
The two most common reasons KAP is unable to achieve performance improvement in applications code are the following:
- A program with small loop limits or too few loops cause the KAP vectorization setup overhead to outweigh the speedup.
- A program that is I/O bound is not likely to achieve much performance improvement because no amount of improvement to the computation sections will change execution time significantly. However, in the case of a Fortran 90 program, I/O strength reduction can improve I/O performance. Profiling information may provide clues to either problem. You may need to insert additional print statements to verify loop limits.
If the program is correct but the output is significantly different when KAP is run on the program, try reducing the setting on the -optimize switch.
Nonsensical or nonrepeatable values in the output can be the result of the program violating declared array bounds. Try compiling the original source code with array bounds checking enabled. If the program is violating array bounds, you may be able to work around the problem by using -assume=b or the assertion !*$* assert bounds violation. Nonsensical or nonrepeatable values in the output may also be the result of unstable algorithms. Try setting -roundoff=0.
For Fortran 90 programs, also try -save=all_adjust in case the save statements you have supplied in the code are incorrect or insufficient for the compiler you are using.
If you get incorrect results from a large program with a few routines run through KAP, or a small program run through KAP, or a program with a few routines inlined by KAP, you may be able to determine the source of the problem by means of binary chop.
For example, suppose you have five routines, a, b, c, d, and e. When all five are processed with KAP, the program produces incorrect results or dies. Try running KAP again, but only on routines a and b. If they succeed, then the problem is in c, d, or e. If they fail, try with just routine a, and so on. By breaking the list of suspects into approximate halves for each test, you can fairly quickly identify which routine or routines cause the failure. Leave the problematic routines out of future KAP runs.
If you have link errors, ensure that the link step loaded all the libraries needed for all parts of the program. A link failure may also occur because KAP failed while processing a file, and the routines that came after the point of failure in that file were not copied to the compile file. Determine the reason that KAP failed, and try relinking.
If the compiler issues a syntax error on a transformed program, compare the source code with the transformed code. KAP detects and flags some run-time errors, especially in I/O statements, at compilation time.
Insufficient memory for KAP to run can sometimes be fixed by placing fewer names on the -inline_and_copy switch or by reducing the -eiifg and -miifg settings. To compensate for insufficient memory, you can also break up a source file into smaller logical units and run KAP on the separate units.
If you receive the messages Preprocessor Failed or Translator Error, try lowering switch values, especially -scalaropt.

Contents

Index

Compaq KAP Fortran/OpenMP for Tru64 UNIXUser Guide