Compaq Fortran
User Manual for
Tru64 UNIX and Linux Alpha Systems


Previous Contents Index

5.1.3 Process Shell Environment and Related Influences on Performance

Certain shell commands and system tuning can improve run-time performance:

For More Information:

On system tuning and cc options related to performance, see your operating system documentation and the appropriate reference pages.

5.2 Analyzing Program Performance

This section describes how you can:

Before you analyze program performance, make sure any errors you might have encountered during the early stages of program development have been corrected.

For information about parallel profiling techniques and the pprof profiler on Tru64 UNIX systems, see the Compaq Parallel Software Environment documentation.

5.2.1 Use the time Command to Measure Performance

Use the time command to provide information about program performance.

Run program timings when other users are not active. Your timing results can be affected by one or more CPU-intensive processes also running while doing your timings.

Try to run the program under the same conditions each time to provide the most accurate results, especially when comparing execution times of a previous version of the same program. Use the same CPU system (model, amount of memory, version of the operating system, and so on) if possible.

If you do need to change systems, you should measure the time using the same version of the program on both systems, so you know each system's effect on your timings.

For programs that run for less than a few seconds, run several timings to ensure that the results are not misleading. Overhead functions like loading shared libraries might influence short timings considerably.

Using the form of the time command that specifies the name of the executable program provides the following:

In the following example timings, the sample program being timed displays the following line:


Average of all the numbers is:    4368488960.000000

Using the Bourne shell, the following program timing reports that the program uses 1.19 seconds of total actual CPU time (0.61 seconds in actual CPU time for user program use and 0.58 seconds of actual CPU time for system use) and 2.46 seconds of elapsed time:


$ time a.out
Average of all the numbers is:    4368488960.000000 
real    0m2.46s 
user    0m0.61s 
sys     0m0.58s 

Using the C shell, the following program timing reports 1.19 seconds of total actual CPU time (0.61 seconds in actual CPU time for user program use and 0.58 seconds of actual CPU time for system use), about 4 seconds (0:04) of elapsed time, the use of 28% of available CPU time, and other information:


% time a.out
Average of all the numbers is:    4368488960.000000 
0.61u 0.58s 0:04 28% 78+424k 9+5io 0pf+0w

Using the bash shell (L*X ONLY), the following program timing reports that the program uses 1.19 seconds of total actual CPU time (0.61 seconds in actual CPU time for user program use and 0.58 seconds of actual CPU time for system use) and 2.46 seconds of elapsed time:


[user@system user]$ time ./a.out
Average of all the numbers is:    4368488960.000000 
elapsed  0m2.46s 
user     0m0.61s 
sys      0m0.58s 

Timings that show a large amount of system time may indicate a lot of time spent doing I/O, which might be worth investigating.

If your program displays a lot of text, you can redirect the output from the program on the time command line (see Section 5.1.3). Redirecting output from the program will change the times reported because of reduced screen I/O.

For more information, see time(1).

In addition to the time command, you might consider modifying the program to call routines within the program to measure execution time. For example:

5.2.2 Use Profiling Tools

To generate profiling information, use the f90 compiler and the prof , gprof , and pixie (TU*X ONLY) tools.

(TU*X ONLY) If you have installed the Parallel Software Environment (PSE) and need to profile a parallel HPF program, you can use the pprof profiler. For information about parallel profiling techniques and pprof, see the Compaq Parallel Software Environment documentation. The remainder of this section discusses nonparallel profiling.

Profiling identifies areas of code where significant program execution time is spent. Along with the f90 command, use the prof and pixie (TU*X ONLY) tools to generate the following profile information:

Once you have determined those sections of code where most of the program execution time is spent, examine these sections for coding efficiency. Suggested guidelines for improving source code efficiency are provided in Section 5.6.

You can also use the profiler facility provided by the optional DEC FUSE product, which provides an integrated development environment and windowing interface to many Compaq Tru64 UNIX program development facilities (see the DEC Fuse Handbook).

5.2.2.1 Program Counter Sampling (prof)

To obtain program counter sampling data, perform the following steps:

  1. Use the f90 command option -p to compile and link the program:


    % f90 -p -O3 -o profsample profsample.f90
    

    If you specify the -c option to prevent linking, you must specify the -p option when you link the program:


    % f90 -c -O3 profsample.f90
    % f90 -p -O3 -o profsample profsample.o
    

    Consider specifying optimization level -O3 or -inline manual to minimize the inlining of procedures. Once inlined, procedures are not listed as separate routines but as part of the routine into which they have been inlined. Allowing full inlining would result in program counter sampling for a small number of (usually) large routines. This might not help you locate areas of the program where significant program execution time is spent.

  2. Execute the profiled program:


    % profsample
    

    During program execution, profiling data is written to a profile data file, whose default name is mon.out . You can execute the program multiple times to generate multiple profile data files, which can be averaged. Use the PROFDIR environment variable to request a different profile data file name.

  3. Run the prof command, which formats the profiling data and displays it in a readable format:


    % prof  profsample mon.out
    

You can limit the report created by prof by using prof command options, such as -only , -exclude , or -quit .

For example, if you only want reports on procedures calc_max and calc_min, you could use the following command line to read the profile data file named mon.out :


% prof  -only calc_max -only calc_min profsample

The time spent in particular areas of code is reported by prof in the form of a percentage of the total CPU time spent by the program. To reduce the size of the report, you can either:

When you use the -only or -exclude options, the percentages are still based on all procedures of the application. To obtain percentages calculated by prof that are based on only those procedures included in the report, use the -Only and -Exclude options (use an uppercase initial letter in the option name).

You can use the -quit option to reduce the amount of information reported. For example, the following command prints information on only the five most time-consuming procedures:


% prof -quit 5 profsample 

The following command limits information only to those procedures using 10% or more of the total execution time:


% prof  -quit 10% profsample 

For more information on prof , see prof(1) and the Compaq Tru64 UNIX Programmer's Guide.

5.2.2.2 Call Graph Sampling (gprof)

To obtain call graph information, use the gprof tool. Perform the following steps:

  1. Use the f90 command option -pg when you compile and link the program:


    % f90 -pg -O3 -o profsample profsample.for
    

    If you specify the -c option to prevent linking, you must then specify the -pg option both when you compile and link the program:


    % f90 -pg -c -O3 profsample.f90
    % f90 -pg -O3 -o profsample profsample.f90
    

  2. Execute the profiled program:


    % profsample
    

    During execution, profiling data is saved to the file gmon.out , unless the environment variable PROFDIR is set.

  3. Run the formatting program gprof :


    % gprof profsample gmon.out
    

The output produced by gprof includes:

For more information on using gprof and its output, see the Compaq Tru64 UNIX Programmer's Guide.

5.2.2.3 Basic Block Counting (pixie and prof)

To obtain basic block counting information, perform the following steps:

  1. Compile and link the program without the -p option:


    % f90 -O3 -o profsample profsample.f90
    

    Consider specifying optimization level -O3 or -inline manual to minimize the inlining of procedures (once inlined, procedures are not listed as separate routines but as part of the routine into which they are inlined).

  2. Run the profiling command pixie : (TU*X ONLY)


    % atom -tools pixie profsample
    

    The pixie command creates: (TU*X ONLY)

  3. Execute the profiled program profsample.pixie generated by pixie :


    % profsample.pixie
    

    This program creates the file profsample.Counts , which contains the basic block counts.

  4. Run prof with the -pixie option, to extract and display information from the profsample.Addrs and profsample.Counts files:


    % prof -pixie profsample 
    

    When you specify the -pixie option (TU*X ONLY), the prof command searches for files with a suffix of .Addrs and .Counts (in this case profsample.Addrs and profsample.Counts ).
    You can reduce the amount of information in the report created by prof by using the -only , -exclude , -quit , and related options.

To create multiple profile data files, run the program multiple times.

For more information on prof , gprof , and pixie (TU*X ONLY), see prof(1), gprof(1), pixie(1), and the Compaq Tru64 UNIX Programmer's Guide.

5.2.2.4 Source Line CPU Cycle Use (prof and pixie)

You use the same files created by the pixie command (see Section 5.2.2.3) for basic block counting to estimate the number of CPU cycles used to execute each source file line.

To view a report of the number of CPU cycles estimated for each source file line, use the following options with the prof command:

Depending on the level of optimization chosen, certain source lines might be optimized away.

The CPU cycle use estimates are based primarily on the instruction type and its operands and do not include memory effects such as cache misses or translation buffer fills.

For example, the following command sequence uses:


% f90 -o profsample profsample.f90
% atom -tools pixie profsample
% profsample.pixie
% prof -pixie -heavy -only calc_max profsample

5.2.3 Creating and Using Feedback Files and Optionally cord

You can create a feedback file by using a series of commands. Once created, you can specify a feedback file in a subsequent compilation with the f90 command option -feedback . You can also request that cord use the feedback file to rearrange procedures, by specifying the -cord option on the f90 command line.

To create the feedback file, complete these steps:

  1. Compile and link the program. Omit the -p option, but specify the -gen_feedback option:


    % f90 -o profsample -gen_feedback profsample.f90
    

    The -gen_feedback option changes the default optimization level to -O0 .
    To include libraries in the profiling output, specify -non_shared .

  2. Execute the profiling command pixie (TU*X ONLY):


    % pixie profsample
    

    The pixie command creates:

  3. Execute the profiled program profsample.pixie generated by pixie :


    % profsample.pixie
    

    This program creates the file profsample.Counts , which contains the basic block counts.

  4. Run prof with the -pixie and -feedback options:


    % prof -pixie -feedback profsample.feedback profsample
    

    This prof command creates the feedback file profsample.feedback .

You can use the feedback file as input to the f90 compiler:


% f90 -feedback profsample.feedback -o profsample profsample.f90

The feedback file provides the compiler with actual execution information, which the compiler can use to improve such optimizations as inlining function calls.

Specify the desired optimization level ( -On option) for the f90 command with the -feedback name option (in this example the default is -O4 ).

You can use the feedback file as input to the f90 compiler and cord , as follows:


% f90 -cord -feedback profsample.feedback -o profsample profsample.f90

The -cord option invokes cord , which reorders the procedures in an executable program to improve program execution, using the information in the specified feedback file. Specify the desired optimization level ( -On option) for the f90 command with the -feedback name option (in this example -O4 ).

5.2.4 Atom Toolkit

(TU*X ONLY) The Atom toolkit includes a programmable instrumentation tool and several prepackaged tools. The prepackaged tools include:

To invoke atom tools, use the following general command syntax:


% atom -tool tool-name ...) 

For more information, see the Compaq Tru64 UNIX Programmers Guide, atom(1), hiprof(5), pixie(5), and third(5).

Atom does not work on programs built with the -om option.

5.3 Data Alignment Considerations

For optimal performance on Alpha systems, make sure your data is aligned naturally.

A natural boundary is a memory address that is a multiple of the data item's size (data type sizes are described in Table 9-1). For example, a REAL (KIND=8) data item aligned on natural boundaries has an address that is a multiple of 8. An array is aligned on natural boundaries if all of its elements are.

All data items whose starting address is on a natural boundary are naturally aligned. Data not aligned on a natural boundary is called unaligned data.

Although the Compaq Fortran compiler naturally aligns individual data items when it can, certain Compaq Fortran statements (such as EQUIVALENCE) can cause data items to become unaligned (see Section 5.3.1).

Although you can use the f90 command -align keyword options to ensure naturally aligned data, you should check and consider reordering data declarations of data items within common blocks and structures. Within each common block, derived type, or record structure, carefully specify the order and sizes of data declarations to ensure naturally aligned data. Start with the largest size numeric items first, followed by smaller size numeric items, and then nonnumeric (character) data.


Previous Next Contents Index