Compaq KAP Fortran/OpenMP
for Tru64 UNIX
User Guide


Previous Contents Index

2.7 Compiling a Program Using kapf90

Use the following command to execute KAP as a standalone preprocessor:


kapf90 [kap_switch_string] myprog.f90 -cmp=myprog.cmp.f90 -freeformat 

The kapf90 command assumes that the source file input is fixed format by default. Use the Compaq KAP Fortran/OpenMP -freeformat switch to cause KAP to treat source files as free format, as shown in the previous code example. For more information about the -freeformat switch, see Section 5.4.7.

After preprocessing your program, give myprog.cmp.f90 to the Compaq Fortran compiler, as follows:


f90 -fast -tune host -non_shared myprog.cmp.f90 

Note

When you use kapf90 to process a file, you must set the Compaq Fortran compiler and linker switches appropriately. For this reason, Compaq recommends that you use kf90 whenever possible, because kf90 automatically sets the compiler and linker switches correctly.

2.8 Compiling a Program Containing C Preprocessor Directives Using kapf90

If a Fortran 90 program contains C preprocessor directives, preprocess it with cpp before you process it with kapf90. For example, if your program has C include statements, process it as follows:


cpp -P myprog.f > clinic.i 
kapf90 myprog.i -cmp=myprog.f90 
f90 myprog.f90 

2.9 Using KAP Syntax

Specify switches in lowercase with the syntax -switch[=value]. Do not leave spaces between the switch name and the value. Switches can appear before or after the input file as follows:


kapf90 -inm  myprog.f90  -roundoff=2 -freeformat 

KAP recognizes standard abbreviations for switches. Switches that take a list of names must have the names separated by commas and with no spaces, for example:


-inff=besl.f90,util.f90 

Enclose KAP command switches passed through kf90 by using the -fkapargs switch with single quotation marks, as follows:


kf90 -fkapargs='-optimize=5  -roundoff=3  -scalaropt=3' -w myprog.f90 

Compaq Fortran compiler switches, for example, -w, do not require quotation marks.

2.10 Using File Naming Conventions

Any input file name is valid. If the file name does not have an extension, the extension .f90 is assumed. As KAP processes a Fortran 90 file it generates three output files --- the optimized program file, the optional listing file, and the executable file.

The default output file names are as follows:

Other output file names can be specified with the -cmp and -list switches.

When KAP detects an error condition, KAP writes a message to standard error.

2.11 Optimization Hints and Tips

This information can be used with both multiprocessor and single-processor systems, and with both Fortran and C versions of all KAP products. Therefore, the information may contain references to command-line switches or settings that are unavailable or that are different from those in the KAP that you are using.

This section provides separate protocols for small and large programs. Small programs are defined as those that can be compiled and run quickly. Because the cost of each iteration is small, you can take risks. The information presented here further assumes that small programs have a small number of program units.

Large programs are defined as those that take more time to compile and run than it takes for you to check the results. A program can be large either because the source code is very large or because the execution time is long.

2.11.1 Optimizing Small Programs with KAP

Follow these guidelines to optimize small programs:

  1. Compile the program without KAP, with minimum compiler optimization and with all compiler run-time checks enabled. Note the execution time and verify the results. If the program fails at this step, there is little optimization you can do.
  2. If you have the time and you know what the program is supposed to do, you can try to isolate the incorrect code, correct it, and proceed. This action may not be feasible for handling problems in large programs, but it might work for isolated portability problems.
    If the problem code is isolated and runs without KAP optimization, you may be able to run KAP on the rest of the program and leave out any problematic sections.
    You can also refer to Section 2.15. You may be able to diagnose and correct some problems, and then run KAP on your program successfully.
  3. If the program compiles with minimum compiler optimization enabled, turn on all optimization except inlining by invoking -optimize=4.
  4. If step 2 succeeds and the results are correct, try the suggestions in Section 2.14 about additional performance improvement techniques.
    If step 2 fails, try reducing one optimization at a time (-roundoff=0, -scalaropt=1, -optimize=3), and any compiler optimizations until the program runs correctly. Use the -lo=k switch setting to create a listing of the KAP command-line switches and settings.

2.11.2 Optimizing Large Programs with KAP

Follow these guidelines to optimize large programs:

  1. Compile the program without KAP, with minimum compiler optimization, and with all compiler run-time checks enabled. Note the execution time and verify the results. If the program fails at this step, there is not much optimization you can do.
    Some older programs use standard-violating techniques that KAP will not transform safely. If KAP fails because of this problem, there is little optimization you can do.
    If you have the time and you know what the program is supposed to do, you can try to isolate the incorrect code, correct it, and proceed. This action is feasible for large programs only if the problems are easily understood and isolated or if you have enough time to find more intractable problems.
    If the problem code is isolated and runs without KAP optimization, you may be able to run KAP on the rest of the program and leave out any problematic sections. You can also refer to Section 2.15 on KAP problems. You may be able to diagnose and correct some problems, and then run KAP on your program successfully.
  2. Compile without KAP but with maximum compiler optimization. Note the execution time and verify the results. If the program fails, reduce compiler optimization and try again.
  3. Compile the fastest/best non-KAP run and run it again with profiling enabled (for example, gprof) to identify the program units that take the most time to run.
    Time-intensive units that have many iterative loops and arrays are good candidates for KAP loop optimizations. Go to step 4.
    If these units are not good candidates, then the lower-payoff optimizations, such as inlining, may provide some performance improvement especially if there are places where inlining inside loop nests may also allow KAP to perform vectorization optimizations. In this case, go to step 6.
  4. If time-intensive routines were identified as good candidates, run KAP on them with modest KAP optimization (-optimize=2), compile the whole program with the other switches used in the best run from step 2, note the execution time, and verify the results.
    If the program fails, try again with the KAP switch -roundoff=0. If that works, the failure is probably due to roundoff-sensitive operation. If it still fails with -roundoff=0, try -scalaropt=1.
  5. If step 4 works, repeat with full KAP optimization, with full compiler optimization, and with -roundoff=0 or -scalaropt=1, if needed.
    If the program fails, reduce the setting to a lower KAP optimization level or a lower compiler optimization level, and try again. If you have success at this step, you can also try the suggestions found in Section 2.14.
  6. If there are no routines with arrays and loops, run the whole program with -optimize=0 and -inline_and_copy=aaa,bbb,ccc,.., where aaa, bbb, and so forth, are the most frequently called routines from the profiling run in
    step 3.
    If this action succeeds, repeat with the -optimize=4 and
    -inline_and_copy=... switches. If this action fails, try rerunning with -roundoff=0 or -scalaropt=1 or with fewer routines inlined. (See Section 2.15 for an explanation of binary chop.) Also, if you have success at this step, try the suggestions in Section 2.14.

2.12 General Optimization Tips

2.13 Improving and Customizing KAP Performance

After you have used the KAP protocol for either small or large programs, you can find ways to fine-tune KAP to fit your application.

This section helps you discover which KAP command-line switches, directives, or assertions can be used to try to improve KAP performance for a particular application program. The following is a list of common goals and common program situations that KAP users often have, and it offers suggestions for possible improvements.

Remember that KAP is a tool to optimize Compaq Fortran code. Like any tool, it performs best when you are familiar with the details of how it works and are able to use its switches correctly and advantageously.

Although KAP default switch settings will achieve performance improvement, you can often achieve greater improvement if you understand and use alternate switch settings. Moreover, you can often insert directives or assertions to achieve improved performance.

See Table 2-2 for user actions and specific goals.

Table 2-2 User Actions for Specific Goals
Goal User Action
Have a more informative listing to help answer your questions. Use -lo=otkl or other listing switches under -listoptions command-line switch.
Recognize more reductions. Increase -roundoff switch setting.
Answer a KAP generated question. Use appropriate assertion.
Eliminate unnecessary last-value assignment. Use !*$* assert no last value needed or -assume without the l switch; or try -save=manual.
Spend less time optimizing deeply nested loops. Reduce -limit and -arclimit or their directives.
Disable inner loop unrolling. Use -unroll=1 or -scalaropt < 2.
Disable outer loop unrolling. Use -roundoff < 3 or -scalaropt < 3.
Prevent a given loop from being optimized. Use !*$* assert do (serial), !*$* assert do prefer (serial), !*$* noconcurrent, or !*$* optimize (0). (Remember to reenable optimization after the serial loop.)
Disable some data dependence checking. Use !*$* assert no recurrence for one loop nest.
Expand (inline) subroutine calls within DO loops. Use -inline, -inline_from_files, or -inline_create and
-inline_from_libraries. Or, if the goal is to execute the subroutine body concurrently, try -ipa or !*$* assert concurrent call.
Inline more routines. Increase -inline_depth and
-inline_looplevel. (See also the !*$* inline directive.)
Turn off directives and assertions. Use the -nodirectives switch.
Process a program that uses intentional array bounds violation. Use !*$* assert bounds violations.
Use STATIC storage. Insert SAVE statements or use -save=all_adjust.

2.14 Using Additional Performance Improvement Techniques

After you have successfully run KAP on a working program by using either the protocol for small programs or that for large programs, you can try the following procedures to find additional opportunities for optimization within your program:

2.15 Correcting KAP Problems

The following are some problems you may encounter when using KAP and possible fixes and workarounds:


Previous Next Contents Index