Compaq KAP Fortran/OpenMP
for Tru64 UNIX
User Guide


Previous Contents Index

9.5 Loop Rerolling

Many codes have loops that were unrolled manually over several iterations to amortize the cost of the branch at the bottom of the DO, for example:


DO 10 I = 1,N,3 
X (I) = A (I) + B (I) 
X (I+1) = A (I+1) + B (I+1) 
X (I+2) = A (I+2) + B (I+2) 
10    CONTINUE 

KAP recognizes this example as an unrolled loop and rerolls it before looking for optimization opportunities as follows:


DO 2 I=1,(N+2)/3*3 
X (I) = A (I) + B (I) 
2 CONTINUE 

Unrolled summations are also recognized, for example:


DO 10 I = 1,N,5 
10 S = S + A(I) + A(I+1) + A(I+2) + A(I+3) + A(I+4) 

Becomes:


DO 2 I=1,(N+4)/5*5 
S = S + A(I) 
2 CONTINUE 


Chapter 10
KAP Listing File

This chapter describes the information found in the optional KAP Fortran/OpenMP listing file and the messages KAP produces. To help you understand its actions, KAP lists the optimizations it performed and provides explanations for the places where no optimization was done.

For example, if three loops could have been optimized but KAP optimized only the one it determined most profitable, the listing file will contain notes giving reasons for the choices. Also, often a small DO loop is left unchanged because it will be faster to process in that form. Such situations can produce unexpected but correct code, so KAP produces an annotated listing to explain its output. The listing may also identify places where the use of directives or assertions may improve KAP effectiveness.

Section 10.1 presents the optional information selected by the -listoptions command switch. Section 10.1.1 shows an annotated listing of the original and transformed program. An introduction to the diagnostic messages that KAP can generate ends the chapter. Appendix F contains the main listing of KAP diagnostic messages.

10.1 Listing Switches

The -listoptions command switch tells KAP what information to include in the listing and error files. The listing and error files can contain any combination of the following messages about the optimizations performed, identified by the single-letter switches listed. The following sections present examples of the output selected by these switches.

See the -cmpoptions command switch for the optional information in the transformed code file.

The following examples used Compaq KAP Fortran/OpenMP for Tru64 UNIX default switch values, except for -listoptions=cklnopst:

The following sections explain the format of these listings.

10.1.1 Original Program Listing (O)

The o switch requests an annotated listing of the original program, for example:


 KAP/Tru64_U_F90   4.2 k3105107 990825  ATIMESB  Source   01-Sep-1999  09:31:22 
 Page 1 
 
 Footnotes  Actions   DO Loops  Line 
 
                                 1    
                                 2   C Simple Matrix Multiply example. 
                                 3    
                                 4     PROGRAM ATIMESB 
                                 5    
                                 6     PARAMETER M=500, N=400, P=500 
                                 7    
                                 8     DIMENSION A(1:M,1:N),B(1:N,1:P),C(1:M,1:P) 
                                 9    
                                10    
                                11   C Initialize the matrices 
                                12    
 1                    +---------13    DO 10 J=1,N 
 2          SO        !+--------14      DO 10 I=1,M 
            SO        !*        15      A(I,J) = 1.5 
                      !*________16   10   CONTINUE 
                                17    
 1                    +---------18    DO 20 J=1,P 
 2          SO        !+--------19      DO 20 I=1,N 
            SO        !*        20      B(I,J) = 3.0 
                      !*________21   20   CONTINUE 
                                22    
                                23   C Compute C = A * B 
                                24    
            SO                  25    CALL MATMUL(A, M, B, N, C, P) 
                                26    END 
 
 
 Abbreviations Used 
  SO       scalar optimization     
 
 
 Footnote List 
   1: not vectorized             Not an inner loop. 
   2: scalar optimization        Loop unrolled 4 times to improve scalar performance. 
 
 KAP/Tru64_U_F90   4.2 k3105107 990825  ATIMESB  Source   01-Sep-1999  09:31:22 
 Page 2 
 
 Footnotes     Actions     DO    Line 
                           Loops 
                                 27    
                                 28    
                                 29   SUBROUTINE MATMUL(A, LDA, B, LDB, C, LL) 
                                 30   REAL A(LDA,LDB), B(LDB,LL), C(LDA,LL) 
                                 31   INTEGER LDA,LDB,LL 
                                 32    
 1 2 3 4 5 6   NO SO        +----33   DO 20 J=1,LL 
 1 3 4 5 6     NO LR SO     !+---34     DO 20 I=1,LDA 
               SO           !*   35     C(I,J) =0.0 
 1 5 6 7       NO LR SO INF !*+--36       DO 20 K=1,LDB 
 8             DD SO        !*!  37       C(I,J) = C(I,J) + ( A(I,K) * B(K,J) ) 
                            !*!__38 20   CONTINUE 
                                 39    
                                 40    RETURN 
                                 41    
                                 42    END 
 
 
 Abbreviations Used 
  NO       not optimized           
  LR       loop reordering         
  DD       data dependence         
  SO       scalar optimization     
  INF      informational           
 
 
 Footnote List 
   1: not optimized          Loop was asserted serial by directive. 
   2: not vectorized         Not an inner loop. 
   3: scalar optimization    Cleanup-loop for loop unrolling added. 
   4: scalar optimization    Loop unrolled 3 times to improve scalar performance. 
   5: scalar optimization    Strip loop for strip mining with block size 24. 
   6: scalar optimization    Block loop for strip mining with block size 24. 
   7: informational          Unrolling of this loop was not done because 
                             heuristic says size is ok asis. 
   8: data dependence        Data dependence involving this line due to variable C. 

10.1.2 Calling Tree (C)

The calling tree is listed after all program units have been compiled. Each program unit's calling tree consists of the SUBROUTINEs and FUNCTIONs called in that program unit. A listing of variables and arrays used (both from the original source program and in code which KAP added) precedes the calling information.

After the cross-reference and calling tree information for the last program unit in the file, the calling tree information for the entire source file is summarized, for example:


      CALL SUMMARY TABLE 
 
      CROSS REFERENCE TABLE 
 
  Name            Type      Class     Storage 
----------------------------------------------------------------------------- 
A                 s.REAL   Array      
LDA               s.INT    Var        
B                 s.REAL   Array      
LDB               s.INT    Var        
C                 s.REAL   Array      
LL                s.INT    Var        
J                 s.INT    Var        
I                 s.INT    Var        
K                 s.INT    Var        
II1               s.INT    Var        
II2               s.INT    Var        
II3               s.INT    Var        
II4               s.INT    Var        
II5               s.INT    Var        
   .
   .
   .
RR1               s.REAL   Var        
RR2               s.REAL   Var        
RR3               s.REAL   Var        
RR4               s.REAL   Var        
RR5               s.REAL   Var        
RR6               s.REAL   Var        
RR7               s.REAL   Var        
   .
   .
   .
 
Abbreviations used in Source Program References 
 
 A = used as actual argument 
 D = Declared or Defined 
 M = Contents may get modified 
 U = Its value is used 
 
 CALL SUMMARY TABLE 
 
16-May-1996 15:02:20     
 
 Calling Tree 
 
line#         routines          at nest    max. aggregate nest 
 
 4           program ATIMESB 
25              call MATMUL     0          0 
 
29           subroutine MATMUL 
 
 Calling Tree 
 
 
ATIMESB 
    MATMUL 
 
 
Code Modules 
 
 ATIMESB   called from 
 MATMUL    called from ATIMESB  

10.1.3 KAP Switches (K)

The KAP switches table lists the settings of the command switches related to optimization used for this program unit. Some of the values may be changed within the program unit by using directives. Not all of these switches can be changed by the user. An example of a KAP switches table follows.


KAP/Tru64_U_F90   4.2 k3105107 990825  ATIMESB  Source   01-Sep-1999  09:31:22 
 
                     Page   1 
 
Switches Used for this Program Unit 
 
        no aggressive                    no library_calls 
           align_common=1 
           align_struct=4 
           arclimit=5000                    limit=10 
           assume=cel                       lines=55 
                                            list=mat.out 
                                            listingwidth=132 
           cacheline=32,32                  listoptions=klo 
           cache_prefetch_line_count=0      logical=4 
           cachesize=8                      machine=s 
           chunk=1 
           cmp=mat.cmp                      miifg=500 
        no cmpoptions                       minconcurrent=1000 
           complex=8 
        no concurrent                 
                                         no namepartitioning  
           datasave 
           directives=akpv                  optimize=5 
        no dlines 
           dpregisters=32                no onetrip 
                                    
           eiifg=20                      no parallelio     
        no escape                           psyntax=kap 
 
           fpregisters=32                
        no freeformat 
        no fuse                             
           fuselevel=0 
 
        no generateh 
 
        no hdir                             real=4 
           heaplimit=116 
           hli=1                            roundoff=3 
                                      
        no ignoreoptions 
        no include                          save=manual 
        no inline                           scalaroptimize=3 
        no inline_and_copy                  scan=72 
        no inline_create                    scheduling=e 
           inline_depth=2                   setassociativity=1 
        no inline_from_files             no skip 
        no inline_from_libraries         no srlcd 
           inline_looplevel=2            no suppress  
        no inline_manual 
           inline_optimize=0             no syntax 
           input=mat.f 
           integer=4                        
           interchange                      tune=EV4 
           interleave                    no type 
           intlog 
        no ipa 
        no ipa_create                    
           ipa_depth=2                      unroll=4 
        no ipa_from_files                   unroll2=160 
        no ipa_from_libraries               unroll3=1 
           ipa_looplevel=2                  useh 
        no ipa_manual 
           ipa_optimize=0 

10.1.4 Loop Table (L)

The loop table shows what KAP did for each DO loop in the program unit. If the loop could not be optimized, a reason is listed. The possible Status entries and brief explanations are in the Loop Table Messages, as shown in the following example:


KAP/Tru64_U_F90  4.2 k3105107 990825 ATIMESB Loop Summary   01-Sep-1999  09:31:22 
 
Loop Summary 
 
        From To   Loop  Loop   at   Unroll Unroll Iteration 
Loop#   line line label index  nest weight factor workload  Status 
1       13   16   Do 10 J      1                            not inner loop 
2       14   16   Do 10 I      2    3      4                left as DO loop 
3       18   21   Do 20 J      1                            not inner loop 
4       19   21   Do 20 I      2    3      4                left as DO loop 
   .
   .
   .

10.1.5 Name (N)

The program unit names, as processed, are printed to the standard error file (stderr), preceded by the source file name. If the source is read from standard input, the source file name is left blank.

10.1.6 Compilation Performance Statistics (P)

The compilation performance statistics list the number of lines in the program unit, the compilation time, the compilation rate in lines per minute, and temporary file use (for large program units or inlining). After all program units have been compiled, the cumulative totals are given, along with the final number of lines in the transformed code file. The cumulative values version is shown in the following example:


 KAP/Tru64_U_F90  4.2 k3105107 990825  ATIMESB  Compilation Statistics  
 01-Sep-1999  09:31:22 
 
Compilation Statistics For the Routine ATIMESB   
   26  Lines in Program Unit 
   23  Noncomment Lines in Program Unit 
 0.13  CPU Time 
12000  Lines Per Minute 
10615  Non Comment Lines Per Minute 
    0  Symbol Cache File Writes 
    0  Symbol Cache File Reads 
  110  Source Saver File Reads 
   26  Source Saver File Writes 
    1  Source Saver File Opens 
    0  Name Table File Writes 
    0  Name Table File Reads 
 
Compilation Statistics For the Routine MATMUL 
  16  Lines in Program Unit 
  16  Noncomment Lines in Program Unit 
0.90  CPU Time 
1066  Lines Per Minute 
1066  Non Comment Lines Per Minute 
   0  Symbol Cache File Writes 
   0  Symbol Cache File Reads 
 165  Source Saver File Reads 
  16  Source Saver File Writes 
   0  Source Saver File Opens 
   0  Name Table File Writes 
   0  Name Table File Reads 
 
Cumulative Compilation Statistics 
  42  Lines in Source File 
  39  Noncomment Lines in Source File 
   2  Program Units in Source File 
1.03  CPU Time 
2446  Lines Per Minute 
2271  Non Comment Lines Per Minute 
   0  Symbol Cache File Writes 
   0  Symbol Cache File Reads 
 275  Source Saver File Reads 
  42  Source Saver File Writes 
   1  Source Saver File Opens 
   0  Name Table File Writes 
   0  Name Table File Reads 
 259  Lines in Compile File 

10.1.7 Summary Table (S)

The summary table shows how many loops appeared in the program unit and how many loops were optimized in different ways, for example:


 KAP/Tru64_U_F90  4.2 k3105107 990825  ATIMESB  Optimization Summary 
 01-Sep-1999  09:31:22 
 
4 loops total 
2 loops vectorized 
2 with inner loop 
 
 KAP/Tru64_U_F90  4.2 k3105107 990825  ATIMESB  Optimization Summary 
 01-Sep-1999  09:31:22 
 
3 loops total 
1 loops vectorized 
2 with scalar directive 

10.1.8 Transformed Program Listing (T)

The following example shows the annotated transformed program in the listing file. Much of this information is always recorded in the transformed code file regardless of whether the user specifies -listoptions=t.


 KAP/Tru64_U_F90  4.2 k3105107 990825  ATIMESB  Transformed  01-Sep-1999  09:31:22 
 Page   1 
 
 Footnotes   Actions    DO        Line 
                        Loops 
                                   1    
                                   2 C Simple Matrix Multiply example. 
                                   3    
                                   4  PROGRAM ATIMESB 
                                   5    
                                   6    PARAMETER M = 200, N = 300, P = 200 
                                   7    
                                   8    DIMENSION A(1:200,1:300), B(1:300,1:200), C(1:200,1:200) 
            I                      8    INTEGER II2, II1 
            I                      8    PARAMETER (II2 = 300, II1 = 200) 
                                   9    
                                  10    
                                  11 C Initialize the matrices 
                                  12    
1           LM             +------13    DO 2 J=1,300 
2           LM INF         !+-----14      DO 2 I=1,200 
                           !!     15        A(I,J) = 1.5 
            I              !!_____16 2      CONTINUE 
                                  17    
1           LM             +------18    DO 3 J=1,200 
2           LM INF         !+-----19      DO 3 I=1,300 
                           !!     20        B(I,J) = 3.0 
            I              !!_____21 3      CONTINUE 
                                  22    
                                  23 C Compute C = A * B 
                                  24    
            SO                    25    CALL MATMUL (A,(II1),B,(II2),C,(II1)) 
                                  26  END 
 
 Abbreviations Used 
  LM       label modification      
  SO       scalar optimization     
  I        inserted                
  INF      informational           
 
 
 Footnote List 
   1: not vectorized   Not an inner loop. 
   2: informational    Unrolling of this loop was not done because heuristic 
                       says size is ok as is. 


KAP/Tru64_U_F90  4.2 k3105107 990825  MATMUL  Transformed  01-Sep-1999  09:31:22 
 
 Footnotes Actions DO Loops    Line 
 
                                  27    
                                  28    
                                  29  SUBROUTINE MATMUL (A, LDA, B, LDB, C, LL ) 
                                  30    REAL A(LDA,LDB), B(LDB,LL), C(LDA,LL) 
                                  31    INTEGER LDA, LDB, LL 
            I                     31    INTEGER II17, II16 
            I                     31    PARAMETER (II17 = 25, II16 = 1) 
            I                           INTEGER II1,II2,II3,II4,II5,II6,II7,II8 
                                     X     ,II9,II10,II11, II12,II13,II14,II15 
            I                           REAL RR1 
            I                     33    II1 = MOD (LL - II16, II17) + II16 
            I                     36    II5 = MOD (LDB - II16, II17) + II16 
            I                     34    II9 = MOD (LDA - II16, II17) + II16 
                                  32    
1           LM          +---------33    DO 2 J=1,LL 
2           LM INF      !+--------34      DO 2 I=1,LDA 
                        !!        35        C(I,J) = 0. 
            I           !!________38 2      CONTINUE 
            I                     33    II3 = II16 
            I                     33    II2 = II1 
3 4 5       I NO SO     +---------33    DO 7 II4=II16,LL,II17 
            I           !         36      II7 = II16 
            I           !         36      II6 = II5 
            I           !         36      II15 = II3 + II2 - II16 
3 4 5       I NO SO     !+--------36      DO 6 II8=II16,LDB,II17 
            I           !!        34        II11 = II16 
            I           !!        34        II10 = II9 
            I           !!        34        II13 = II7 + II6 - II16 
3 4 5       I NO SO     !!+-------34        DO 5 II12=II16,LDA,II17 
                        !!!       32    
            I           !!!       33          II14 = II11 + II10 - II16 
6           LM SO       !!!+------33          DO 4 J=II3,II15,II16 
6           LM LR SO    !!!!+-----34            DO 4 I=II11,II14,II16 
            I           !!!!!     34              RR1 = C(I,J) 
2 6         LM LR SO INF!!!!!+----36              DO 3 K=II7,II13,II16 
7                       !!!!!!    37                RR1 = RR1+(A(I,K) * B(K,J)) 
            I           !!!!!!____38 3              CONTINUE 
            I           !!!!!     38              C(I,J) = RR1 
            I           !!!!!_____38 4            CONTINUE 
            I           !!!       38          II11 = II11 + II10 
            I           !!!       38          II10 = II17 
            I           !!!_______38 5        CONTINUE 
            I           !!        38        II7 = II7 + II6 
            I           !!        38        II6 = II17 
            I           !!________38 6      CONTINUE 
            I           !         38      II3 = II3 + II2 
            I           !         38      II2 = II17 
            I           !_________38 7    CONTINUE 
                                  39    
                                  40    RETURN 
                                  41    
                                  42   END 
 
 Abbreviations Used 
  LM       label modification      
  NO       not optimized           
  LR       loop reordering         
  SO       scalar optimization     
  I        inserted                
  INF      informational           
 
 
 Footnote List 
   1: not vectorized       Not an inner loop. 
   2: informational        Unrolling of this loop was not done because 
                           heuristic says size is ok as is. 
   3: inserted             DO loop was inserted here. 
   4: scalar optimization  Block loop for strip mining with block size 25. 
   5: not optimized        Loop was asserted serial by directive. 
   6: scalar optimization  Strip loop for strip mining with block size 25. 
   7: data dependence      Data dependence involving this line due to variable C. 

10.2 Listing Information

This section presents KAP annotated listing information. Unless overridden with the -suppress command switch, KAP presents the following information in every KAP source (-listoptions=o) or transformed (-listoptions=t) program listing:

The following sections explain the format of these entries.

10.2.1 Line Numbers

A statement in the KAP listing labeled with a line number of 21, for example, is either the same as line 21 from the original program, or is derived from line 21. These line numbers are useful when inspecting the KAP transformed program listing. KAP sometimes generates several lines of code from a single line of the original program; in that case, each of the new lines of code is labeled with the same number as that of the original program. Consequently, lines of the KAP transformed program listing may be easily related to the lines of the original program listing. Lines from an INCLUDE file are numbered starting from 1 for the first line in the included file.

10.2.2 DO Loop Markings

DO loops are graphically displayed in a column headed DO Loops. Brackets mark the extent of each DO loop (up to nest level 10), as shown in the following example:


DO Loops     Line 
 +---------  5  DO 99  I = 1,1000 
 *           6    A(I,1) = B(1) 
 *+--------  7    DO 95 J = 2,1000 
 *!          8       A(I,J) = B(J)*A(I,J-1) 
 *!________  9    95 CONTINUE 
 *_________  10   99  CONTINUE 

A statement that is enclosed by n DO loops has n exclamation marks (!) on that line. Loops that have been optimized in a major way have asterisks (*) instead of exclamation points in the source listing.

Note

Compaq KAP Fortran/OpenMP for Tru64 UNIX recognizes certain operations, such as matrix multiplication, as basic entities. Frequently, the loops forming such operations will not be marked.


Previous Next Contents Index