Previous | Contents | Index |
Many codes have loops that were unrolled manually over several iterations to amortize the cost of the branch at the bottom of the DO, for example:
DO 10 I = 1,N,3 X (I) = A (I) + B (I) X (I+1) = A (I+1) + B (I+1) X (I+2) = A (I+2) + B (I+2) 10 CONTINUE |
KAP recognizes this example as an unrolled loop and rerolls it before looking for optimization opportunities as follows:
DO 2 I=1,(N+2)/3*3 X (I) = A (I) + B (I) 2 CONTINUE |
Unrolled summations are also recognized, for example:
DO 10 I = 1,N,5 10 S = S + A(I) + A(I+1) + A(I+2) + A(I+3) + A(I+4) |
Becomes:
DO 2 I=1,(N+4)/5*5 S = S + A(I) 2 CONTINUE |
This chapter describes the information found in the optional KAP Fortran/OpenMP listing file and the messages KAP produces. To help you understand its actions, KAP lists the optimizations it performed and provides explanations for the places where no optimization was done.
For example, if three loops could have been optimized but KAP optimized only the one it determined most profitable, the listing file will contain notes giving reasons for the choices. Also, often a small DO loop is left unchanged because it will be faster to process in that form. Such situations can produce unexpected but correct code, so KAP produces an annotated listing to explain its output. The listing may also identify places where the use of directives or assertions may improve KAP effectiveness.
Section 10.1 presents the optional information selected by the
-listoptions command switch. Section 10.1.1 shows an annotated
listing of the original and transformed program. An introduction to the
diagnostic messages that KAP can generate ends the chapter.
Appendix F contains the main listing of KAP diagnostic messages.
10.1 Listing Switches
The -listoptions command switch tells KAP what information to include in the listing and error files. The listing and error files can contain any combination of the following messages about the optimizations performed, identified by the single-letter switches listed. The following sections present examples of the output selected by these switches.
See the -cmpoptions command switch for the optional information in the transformed code file.
The following examples used Compaq KAP Fortran/OpenMP for Tru64 UNIX default switch values, except for -listoptions=cklnopst:
The following sections explain the format of these listings.
10.1.1 Original Program Listing (O)
The o switch requests an annotated listing of the original program, for example:
KAP/Tru64_U_F90 4.2 k3105107 990825 ATIMESB Source 01-Sep-1999 09:31:22 Page 1 Footnotes Actions DO Loops Line 1 2 C Simple Matrix Multiply example. 3 4 PROGRAM ATIMESB 5 6 PARAMETER M=500, N=400, P=500 7 8 DIMENSION A(1:M,1:N),B(1:N,1:P),C(1:M,1:P) 9 10 11 C Initialize the matrices 12 1 +---------13 DO 10 J=1,N 2 SO !+--------14 DO 10 I=1,M SO !* 15 A(I,J) = 1.5 !*________16 10 CONTINUE 17 1 +---------18 DO 20 J=1,P 2 SO !+--------19 DO 20 I=1,N SO !* 20 B(I,J) = 3.0 !*________21 20 CONTINUE 22 23 C Compute C = A * B 24 SO 25 CALL MATMUL(A, M, B, N, C, P) 26 END Abbreviations Used SO scalar optimization Footnote List 1: not vectorized Not an inner loop. 2: scalar optimization Loop unrolled 4 times to improve scalar performance. KAP/Tru64_U_F90 4.2 k3105107 990825 ATIMESB Source 01-Sep-1999 09:31:22 Page 2 Footnotes Actions DO Line Loops 27 28 29 SUBROUTINE MATMUL(A, LDA, B, LDB, C, LL) 30 REAL A(LDA,LDB), B(LDB,LL), C(LDA,LL) 31 INTEGER LDA,LDB,LL 32 1 2 3 4 5 6 NO SO +----33 DO 20 J=1,LL 1 3 4 5 6 NO LR SO !+---34 DO 20 I=1,LDA SO !* 35 C(I,J) =0.0 1 5 6 7 NO LR SO INF !*+--36 DO 20 K=1,LDB 8 DD SO !*! 37 C(I,J) = C(I,J) + ( A(I,K) * B(K,J) ) !*!__38 20 CONTINUE 39 40 RETURN 41 42 END Abbreviations Used NO not optimized LR loop reordering DD data dependence SO scalar optimization INF informational Footnote List 1: not optimized Loop was asserted serial by directive. 2: not vectorized Not an inner loop. 3: scalar optimization Cleanup-loop for loop unrolling added. 4: scalar optimization Loop unrolled 3 times to improve scalar performance. 5: scalar optimization Strip loop for strip mining with block size 24. 6: scalar optimization Block loop for strip mining with block size 24. 7: informational Unrolling of this loop was not done because heuristic says size is ok asis. 8: data dependence Data dependence involving this line due to variable C. |
The calling tree is listed after all program units have been compiled. Each program unit's calling tree consists of the SUBROUTINEs and FUNCTIONs called in that program unit. A listing of variables and arrays used (both from the original source program and in code which KAP added) precedes the calling information.
After the cross-reference and calling tree information for the last program unit in the file, the calling tree information for the entire source file is summarized, for example:
CALL SUMMARY TABLE CROSS REFERENCE TABLE Name Type Class Storage ----------------------------------------------------------------------------- A s.REAL Array LDA s.INT Var B s.REAL Array LDB s.INT Var C s.REAL Array LL s.INT Var J s.INT Var I s.INT Var K s.INT Var II1 s.INT Var II2 s.INT Var II3 s.INT Var II4 s.INT Var II5 s.INT Var . . . RR1 s.REAL Var RR2 s.REAL Var RR3 s.REAL Var RR4 s.REAL Var RR5 s.REAL Var RR6 s.REAL Var RR7 s.REAL Var . . . Abbreviations used in Source Program References A = used as actual argument D = Declared or Defined M = Contents may get modified U = Its value is used CALL SUMMARY TABLE 16-May-1996 15:02:20 Calling Tree line# routines at nest max. aggregate nest 4 program ATIMESB 25 call MATMUL 0 0 29 subroutine MATMUL Calling Tree ATIMESB MATMUL Code Modules ATIMESB called from MATMUL called from ATIMESB |
The KAP switches table lists the settings of the command switches related to optimization used for this program unit. Some of the values may be changed within the program unit by using directives. Not all of these switches can be changed by the user. An example of a KAP switches table follows.
KAP/Tru64_U_F90 4.2 k3105107 990825 ATIMESB Source 01-Sep-1999 09:31:22 Page 1 Switches Used for this Program Unit no aggressive no library_calls align_common=1 align_struct=4 arclimit=5000 limit=10 assume=cel lines=55 list=mat.out listingwidth=132 cacheline=32,32 listoptions=klo cache_prefetch_line_count=0 logical=4 cachesize=8 machine=s chunk=1 cmp=mat.cmp miifg=500 no cmpoptions minconcurrent=1000 complex=8 no concurrent no namepartitioning datasave directives=akpv optimize=5 no dlines dpregisters=32 no onetrip eiifg=20 no parallelio no escape psyntax=kap fpregisters=32 no freeformat no fuse fuselevel=0 no generateh no hdir real=4 heaplimit=116 hli=1 roundoff=3 no ignoreoptions no include save=manual no inline scalaroptimize=3 no inline_and_copy scan=72 no inline_create scheduling=e inline_depth=2 setassociativity=1 no inline_from_files no skip no inline_from_libraries no srlcd inline_looplevel=2 no suppress no inline_manual inline_optimize=0 no syntax input=mat.f integer=4 interchange tune=EV4 interleave no type intlog no ipa no ipa_create ipa_depth=2 unroll=4 no ipa_from_files unroll2=160 no ipa_from_libraries unroll3=1 ipa_looplevel=2 useh no ipa_manual ipa_optimize=0 |
The loop table shows what KAP did for each DO loop in the program unit. If the loop could not be optimized, a reason is listed. The possible Status entries and brief explanations are in the Loop Table Messages, as shown in the following example:
KAP/Tru64_U_F90 4.2 k3105107 990825 ATIMESB Loop Summary 01-Sep-1999 09:31:22 Loop Summary From To Loop Loop at Unroll Unroll Iteration Loop# line line label index nest weight factor workload Status 1 13 16 Do 10 J 1 not inner loop 2 14 16 Do 10 I 2 3 4 left as DO loop 3 18 21 Do 20 J 1 not inner loop 4 19 21 Do 20 I 2 3 4 left as DO loop . . . |
The program unit names, as processed, are printed to the standard error
file (stderr), preceded by the source file name. If the source
is read from standard input, the source file name is left blank.
10.1.6 Compilation Performance Statistics (P)
The compilation performance statistics list the number of lines in the program unit, the compilation time, the compilation rate in lines per minute, and temporary file use (for large program units or inlining). After all program units have been compiled, the cumulative totals are given, along with the final number of lines in the transformed code file. The cumulative values version is shown in the following example:
KAP/Tru64_U_F90 4.2 k3105107 990825 ATIMESB Compilation Statistics 01-Sep-1999 09:31:22 Compilation Statistics For the Routine ATIMESB 26 Lines in Program Unit 23 Noncomment Lines in Program Unit 0.13 CPU Time 12000 Lines Per Minute 10615 Non Comment Lines Per Minute 0 Symbol Cache File Writes 0 Symbol Cache File Reads 110 Source Saver File Reads 26 Source Saver File Writes 1 Source Saver File Opens 0 Name Table File Writes 0 Name Table File Reads Compilation Statistics For the Routine MATMUL 16 Lines in Program Unit 16 Noncomment Lines in Program Unit 0.90 CPU Time 1066 Lines Per Minute 1066 Non Comment Lines Per Minute 0 Symbol Cache File Writes 0 Symbol Cache File Reads 165 Source Saver File Reads 16 Source Saver File Writes 0 Source Saver File Opens 0 Name Table File Writes 0 Name Table File Reads Cumulative Compilation Statistics 42 Lines in Source File 39 Noncomment Lines in Source File 2 Program Units in Source File 1.03 CPU Time 2446 Lines Per Minute 2271 Non Comment Lines Per Minute 0 Symbol Cache File Writes 0 Symbol Cache File Reads 275 Source Saver File Reads 42 Source Saver File Writes 1 Source Saver File Opens 0 Name Table File Writes 0 Name Table File Reads 259 Lines in Compile File |
The summary table shows how many loops appeared in the program unit and how many loops were optimized in different ways, for example:
KAP/Tru64_U_F90 4.2 k3105107 990825 ATIMESB Optimization Summary 01-Sep-1999 09:31:22 4 loops total 2 loops vectorized 2 with inner loop KAP/Tru64_U_F90 4.2 k3105107 990825 ATIMESB Optimization Summary 01-Sep-1999 09:31:22 3 loops total 1 loops vectorized 2 with scalar directive |
The following example shows the annotated transformed program in the listing file. Much of this information is always recorded in the transformed code file regardless of whether the user specifies -listoptions=t.
KAP/Tru64_U_F90 4.2 k3105107 990825 ATIMESB Transformed 01-Sep-1999 09:31:22 Page 1 Footnotes Actions DO Line Loops 1 2 C Simple Matrix Multiply example. 3 4 PROGRAM ATIMESB 5 6 PARAMETER M = 200, N = 300, P = 200 7 8 DIMENSION A(1:200,1:300), B(1:300,1:200), C(1:200,1:200) I 8 INTEGER II2, II1 I 8 PARAMETER (II2 = 300, II1 = 200) 9 10 11 C Initialize the matrices 12 1 LM +------13 DO 2 J=1,300 2 LM INF !+-----14 DO 2 I=1,200 !! 15 A(I,J) = 1.5 I !!_____16 2 CONTINUE 17 1 LM +------18 DO 3 J=1,200 2 LM INF !+-----19 DO 3 I=1,300 !! 20 B(I,J) = 3.0 I !!_____21 3 CONTINUE 22 23 C Compute C = A * B 24 SO 25 CALL MATMUL (A,(II1),B,(II2),C,(II1)) 26 END Abbreviations Used LM label modification SO scalar optimization I inserted INF informational Footnote List 1: not vectorized Not an inner loop. 2: informational Unrolling of this loop was not done because heuristic says size is ok as is. |
KAP/Tru64_U_F90 4.2 k3105107 990825 MATMUL Transformed 01-Sep-1999 09:31:22 Footnotes Actions DO Loops Line 27 28 29 SUBROUTINE MATMUL (A, LDA, B, LDB, C, LL ) 30 REAL A(LDA,LDB), B(LDB,LL), C(LDA,LL) 31 INTEGER LDA, LDB, LL I 31 INTEGER II17, II16 I 31 PARAMETER (II17 = 25, II16 = 1) I INTEGER II1,II2,II3,II4,II5,II6,II7,II8 X ,II9,II10,II11, II12,II13,II14,II15 I REAL RR1 I 33 II1 = MOD (LL - II16, II17) + II16 I 36 II5 = MOD (LDB - II16, II17) + II16 I 34 II9 = MOD (LDA - II16, II17) + II16 32 1 LM +---------33 DO 2 J=1,LL 2 LM INF !+--------34 DO 2 I=1,LDA !! 35 C(I,J) = 0. I !!________38 2 CONTINUE I 33 II3 = II16 I 33 II2 = II1 3 4 5 I NO SO +---------33 DO 7 II4=II16,LL,II17 I ! 36 II7 = II16 I ! 36 II6 = II5 I ! 36 II15 = II3 + II2 - II16 3 4 5 I NO SO !+--------36 DO 6 II8=II16,LDB,II17 I !! 34 II11 = II16 I !! 34 II10 = II9 I !! 34 II13 = II7 + II6 - II16 3 4 5 I NO SO !!+-------34 DO 5 II12=II16,LDA,II17 !!! 32 I !!! 33 II14 = II11 + II10 - II16 6 LM SO !!!+------33 DO 4 J=II3,II15,II16 6 LM LR SO !!!!+-----34 DO 4 I=II11,II14,II16 I !!!!! 34 RR1 = C(I,J) 2 6 LM LR SO INF!!!!!+----36 DO 3 K=II7,II13,II16 7 !!!!!! 37 RR1 = RR1+(A(I,K) * B(K,J)) I !!!!!!____38 3 CONTINUE I !!!!! 38 C(I,J) = RR1 I !!!!!_____38 4 CONTINUE I !!! 38 II11 = II11 + II10 I !!! 38 II10 = II17 I !!!_______38 5 CONTINUE I !! 38 II7 = II7 + II6 I !! 38 II6 = II17 I !!________38 6 CONTINUE I ! 38 II3 = II3 + II2 I ! 38 II2 = II17 I !_________38 7 CONTINUE 39 40 RETURN 41 42 END Abbreviations Used LM label modification NO not optimized LR loop reordering SO scalar optimization I inserted INF informational Footnote List 1: not vectorized Not an inner loop. 2: informational Unrolling of this loop was not done because heuristic says size is ok as is. 3: inserted DO loop was inserted here. 4: scalar optimization Block loop for strip mining with block size 25. 5: not optimized Loop was asserted serial by directive. 6: scalar optimization Strip loop for strip mining with block size 25. 7: data dependence Data dependence involving this line due to variable C. |
This section presents KAP annotated listing information. Unless overridden with the -suppress command switch, KAP presents the following information in every KAP source (-listoptions=o) or transformed (-listoptions=t) program listing:
The following sections explain the format of these entries.
10.2.1 Line Numbers
A statement in the KAP listing labeled with a line number of 21, for
example, is either the same as line 21 from the original program, or is
derived from line 21. These line numbers are useful when inspecting the
KAP transformed program listing. KAP sometimes generates several lines
of code from a single line of the original program; in that case, each
of the new lines of code is labeled with the same number as that of the
original program. Consequently, lines of the KAP transformed program
listing may be easily related to the lines of the original program
listing. Lines from an INCLUDE file are numbered starting from 1 for
the first line in the included file.
10.2.2 DO Loop Markings
DO loops are graphically displayed in a column headed DO Loops. Brackets mark the extent of each DO loop (up to nest level 10), as shown in the following example:
DO Loops Line +--------- 5 DO 99 I = 1,1000 * 6 A(I,1) = B(1) *+-------- 7 DO 95 J = 2,1000 *! 8 A(I,J) = B(J)*A(I,J-1) *!________ 9 95 CONTINUE *_________ 10 99 CONTINUE |
A statement that is enclosed by n DO loops has n exclamation marks (!) on that line. Loops that have been optimized in a major way have asterisks (*) instead of exclamation points in the source listing.
Compaq KAP Fortran/OpenMP for Tru64 UNIX recognizes certain operations, such as matrix multiplication, as basic entities. Frequently, the loops forming such operations will not be marked. |
Previous | Next | Contents | Index |