Compaq KAP C/OpenMP
for Tru64 UNIX
User Guide


Previous Contents Index

2.10.2 Optimizing Large Programs with KAP

Follow these guidelines to optimize large programs:

  1. Compile the program without KAP, with minimum compiler optimization, and with all compiler run-time checks enabled. Note the execution time and verify the results. If the program fails at this step, there is not much optimization you can do.
    Some older programs use standard-violating techniques that KAP will not transform safely. If KAP fails because of this problem, there is little optimization you can do.
    If you have the time and you know what the program is supposed to do, you can try to isolate the incorrect code, correct it, and proceed. This action is feasible for large programs only if the problems are easily understood and isolated or if you have enough time to find more intractable problems.
    If the problem code is isolated and runs without KAP optimization, you may be able to run KAP on the rest of the program and leave out any problematic sections.
    You can also refer to Section 2.13. You may be able to diagnose and correct some problems, and then run KAP on your program successfully.
  2. Compile without KAP but with maximum compiler optimization, note the execution time, and verify the results. If the program fails, reduce compiler optimization and try again.
  3. Compile the fastest/best run not using KAP and run it again with profiling enabled (for example, gprof) to identify the program units that take the most time to run.
    If some time-intensive units have many iterative loops and arrays, then those units are good candidates for KAP loop optimizations. Go to step 4. If not, then the lower-payoff optimizations, such as inlining, may provide some performance improvement, especially if there are places where inlining inside loop nests may also allow KAP to perform vectorization optimizations. Go to step 6.
  4. If time-intensive routines were identified as good candidates above, run KAP on them with modest KAP optimization (-optimize=2), compile the whole program with the other switches used in the best run from step 2, note the execution time, and verify the results.
    If the program fails, try again with the KAP switch -roundoff=0; if that works, the failure is probably due to a roundoff-sensitive operation. If it still fails with -roundoff=0, try -scalaropt=1.
  5. If step 4 works, repeat with full KAP optimization, with full compiler optimization, and with -roundoff=0 or -scalaropt=1, if needed. If the program fails, reduce the setting to a lower KAP optimization level or a lower compiler optimization level, and try again.
    If things are still going well after this step, try the suggestions in Section 2.12.
  6. If there are no routines with arrays and loops, run the whole program with -optimize=0 and -inline_and_copy=aaa,bbb,ccc,.., where aaa, bbb, and so on, are the most frequently called routines from the profiling run in step 3.
    If this action succeeds, repeat with -optimize=4 and -inline_and_copy=... If this action fails, try rerunning with -roundoff=0 or -scalaropt=1 or with fewer routines inlined. See Section 2.13 for an explanation of "binary chop."
    If things are still going well after this step, try the suggestions in the Section 2.12.

2.10.3 General Optimization Tips

2.11 Improving and Customizing KAP Performance

After you have used the KAP protocol for either small or large programs, you can find ways to fine-tune KAP to fit your application.

This section helps you discover which KAP command-line switches, directives, or assertions can be used to try to improve KAP performance for a particular application program. The following is a list of common goals and common program situations that KAP users often have, and it offers suggestions for possible improvements.

Remember that KAP is a tool to optimize Fortran and C code. Like any tool, it performs best when you are familiar with the details of how it works and are able to use its switches correctly and advantageously.

Although KAP default switch settings will achieve performance improvement, you can often achieve greater improvement if you understand and use alternate switch settings. Moreover, you can often insert directives or assertions to achieve improved performance improvement.

See Table 2-1 for details about goals and user actions.

Table 2-1 User Actions for Specific Goals
Goal User Action
Have a more informative listing to help answer your questions Use -lo=kl or other listing switches under -listoptions command-line switch.
Recognize more reductions Increase -roundoff switch setting.
Spend less time optimizing deeply nested loops Reduce -limit and -arclimit or their directives.
Disable inner FOR loop unrolling Use -unroll=1 or -scalaropt<2.
Disable outer FOR loop unrolling Use -roundoff<3 or -scalaropt<3.
Expand (inline) function calls within FOR loops Use -inline, -inline_from_files, or -inline_from_libraries. Or, if the goal is to execute the function body concurrently, try -ipa or #pragma _KAP concurrent call.
Inline more routines Increase -inline_depth and
-inline_looplevel. (See also the #pragma _KAP inline directive.)

2.12 Using Additional Performance Improvement Techniques

After you have successfully run KAP on a working program by using either the protocol for small programs or the protocol for large programs, you can try the following procedures to find additional opportunities for optimization within your program:

2.13 Correcting KAP Problems

The following are some problems you may encounter when using KAP and possible fixes and workarounds:


Chapter 3
KAP Parallel Processing

KAP does parallel decomposition of programs so they run on symmetric multiprocessor (SMP) systems. This chapter describes how to compile and run a program for parallel execution using the kcc driver and kapc. Review Chapter 2 for general information on KAP syntax, file naming conventions, and optimizing programs.

3.1 Compaq KAP Parallel Processing

Compaq KAP transforms C source programs so that, when compiled and linked, they execute as multithreaded processes. These threads can run simultaneously --- that is, in parallel --- on symmetric multiprocessor systems. The result is a program whose start-to-finish time is less than a C program that does not execute as a multithreaded process. More specifically, at run time the instructions from FOR loops in a transformed C program execute in parallel mode. Parallelization is the process that transforms FOR loops into instructions in an executable file that execute as multithreaded processes.

Compaq KAP considers all FOR loops in a program as candidates for parallelization. Each loop is or is not parallelized according to:

This chapter describes the three basic methods of controlling parallel processing (automatic, directed, and combination). It explains, for each method, how to:

3.1.1 Parallel Processing Methods

Compaq KAP provides three methods for programmers to control parallel processing. Their summaries follow:

Note

KAP/C will not perform automatic parallel decomposition or serial optimization on files that contain OpenMP directives.

When using any of these three methods you must be aware of the values of environment variables, because they affect the run-time behavior of your program.

Environment Variables


OMP_SCHEDULE     (static,dynamic,guided,runtime) 
OMP_DYNAMIC      (true,false) default is false. 
OMP_NESTED       (true,false) default is false. 
OMP_NUM_THREADS  (number) default value is the number of 
                        processors on the current system. 

For further information on environment variables read by the C compiler see your Compaq C user's guide.

3.1.2 Parallel Processing Controls --- Summary

KAP provides the following parallel command switches, directives, and assertions for use with automatic parallel processing. Refer to the appropriate sections for explanations and code examples as follows:

Two types of command lines, kcc and kapc, invoke Compaq KAP software:

3.1.3 Parallel Processing Controls --- Interaction

As a programmer, you should always remember that you implement a parallel processing method (automatic, directed, or combination) by making choices from the previous command line options, directives, and assertions. Your choices affect the following actions:

For example, suppose you choose combination detection and parallelization for source programs openmp.c and no_openmp.c. These programs contain some or none of the parallel processing directives, parallel processing assertions, and OpenMP directives. Consider the following command:


   kcc -ckapargs='-concurrent -minconcurrent=1000' \
   openmp.c no_openmp.c 

This command tells Compaq KAP to:

Compaq KAP parallel processing options, such as -concurrent, are enclosed in single quotation marks and are values of the -ckapargs option. The kcc driver responds to the options enclosed in these single quotation marks by passing them as arguments to the kapc preprocessor (which actually transforms the source program file).

The default values of the parallel processing options also control Compaq KAP loop detections, loop transformations, calling of the compiler and linker, and runtime scheduling. They are:


  -minconcurrent=1000 
  -scheduling=e 
  -chunk=1 

Read the explanations of each of the three methods of parallelization in light of how your choices of options, directives, and assertions affect Compaq KAP detection of loops, changes to loops, compiler and linker behavior, and runtime behavior of executable file a.out.

3.2 Automatic Parallelization Using the kcc Driver

To compile and run your program with parallel processing, use the -concurrentize switch, abbreviated -conc, as follows:


kcc -ckapargs='-conc' myprog.c 

For information on running a parallel program, see Section 3.6.

3.2.1 Preprocessing a Program for Parallel Execution Using kapc

To execute KAP as a standalone preprocessor, use the following commands depending on your version of UNIX as follows:

An explanation of the remaining switches follows:

Note

When you use kapc to preprocess a file, you must set the Compaq C compiler and linker switches appropriately. For this reason, Compaq recommends that you use kcc whenever possible, because kcc automatically sets the compiler and linker switches correctly.


Previous Next Contents Index