Compaq KAP Fortran/OpenMP
for Tru64 UNIX
User Guide

C.16 Avoiding External Routines: Temporary Storage

This example demonstrates three coding styles for temporary storage, one using the external routine omp_get_thread_num() and the other two using only directives.

subroutine local_1a (n) dimension a(100) common /cmn/ t( 100, 0:7 ) ! assume 8 processors ! max. c$omp parallel do c$omp& shared(a,t) c$omp& private(i) do i = 1, n do j = 1, n t(j, omp_get_thread_num()) = a(i) ** 2 enddo call work( t(1,omp_get_thread_num()) ) enddo end

If t is not global, then the above could be accomplished by putting t in the private clause:

subroutine local_1b (n) dimension t(100) c$omp parallel do c$omp& shared(a) c$omp& private(i,t) do i = 1, n do j = 1, n t(j) = a(i) ** 2 enddo call work( t ) enddo end

If t is global, then the instance parallel and new directives can be used instead.

subroutine local_1c (n) dimension t(100) common /cmn/ t c$omp instance parallel (/cmn/) c$omp parallel do c$omp& shared(a) c$omp& private(i) c$omp new (/cmn/) do i = 1, n do j = 1, n t(j) = a(i) ** 2 enddo call work ! access t from common /cmn/ enddo end

C.17 FIRSTPRIVATE: Copying in Initialization Values

Not all of the values of a and b are initialized in the loop before they are used (the rest of the values are produced by init_a and init_b). Using firstprivate for a and b causes the initialization values produced by init_a and init_b to be copied into private copies of a and b for use in the loops.

subroutine dsq3_b (c,n) integer n real a(100), b(100), c(n,n), x, y call init_a( a, n ) call init_b( b, n ) c$omp parallel do shared(c,n) private(i,j,x,y) firstprivate(a,b) do i = 1, n do j = 1, i a(j) = calc_a(i) b(j) = calc_b(i) enddo do j = 1, n x = a(i) - b(i) y = b(i) + a(i) c(j,i) = x * y enddo enddo c$omp end parallel do print *, x, y end

C.18 THREADPRIVATE: Copying in Initialization Values

Similar to Section C.17 except using threadprivate common blocks. For threadprivate, copyin is used instead of firstprivate to copy initialization values from the shared (master) copy of /blk/ to the private copies.

subroutine dsq3_b_tc (c,n) integer n real a(100), b(100), c(n,n), x, y common /blk/ a,b c$omp threadprivate (/blk/) call init_a( a, n ) call init_b( b, n ) c$omp parallel do shared(c,n) private(i,j,x,y) copyin(a,b) do i = 1, n do j = 1, i a(j) = calc_a(i) b(j) = calc_b(i) enddo do j = 1, n x = a(i) - b(i) y = b(i) + a(i) c(j,i) = x * y enddo enddo c$omp end parallel do print *, x, y end

C.19 INSTANCE PARALLEL: Copying in Initialization Values

Similar to Section C.17 except using instance parallel privatizable common blocks. For instance parallel, copy new is used instead of firstprivate to privatize the common block and to copy initialization values from the shared (master) copy of /blk/ to the private copies.

subroutine dsq3_b_ip (c,n) integer n real a(100), b(100), c(n,n), x, y common /blk/ a,b c$omp instance parallel (/blk/) call init_a( a, n ) call init_b( b, n ) c$omp parallel do shared(c,n) private(i,j,x,y) c$omp copy new (/blk/) do i = 1, n do j = 1, i a(j) = calc_a(i) b(j) = calc_b(i) enddo do j = 1, n x = a(i) - b(i) y = b(i) + a(i) c(j,i) = x * y enddo enddo c$omp end parallel do print *, x, y end

Appendix D
PCF Directives

Use Parallel Computing Forum (PCF) directives only with loops that are safe to parallelize. When Compaq KAP sees loops prefaced with PCF directives, it does not perform data dependence analysis and does not prevent you from using a parallel directive incorrectly.

Observe the following rules:

Begin each PCF directive with C*KAP*. To continue a directive onto subsequent lines, begin each line with C*KAP*&.
Except for PARALLEL DO, you must pair the directives, for example, directive - end directive.
Do not nest the same type of directive. For example, do not insert a CRITICAL SECTION directive inside another CRITICAL SECTION and END CRITICAL SECTION directive.
Use the CRITICAL SECTION, ONE PROCESSOR SECTION, and PARALLEL DO directives only within a PARALLEL REGION directive.
Do not use an array element in any variable list as a local_name, a shared_name, or a last_local.
Do not use common block names in variable lists. Either leave the last_local variable undeclared or explicitly declare the last_local variable as SHARED. KAP treats an undeclared last_local variable as a SHARED variable and issues a warning. Do not declare it to be LOCAL.

D.1 PARALLEL REGION Directive

The PARALLEL REGION and END PARALLEL REGION directives delineate where parallelism exists in the program. The following example shows the PARALLEL REGION directive syntax:

C*KAP* PARALLEL REGION C*KAP*& [ IF(logical expression) ] C*KAP*& [ SHARED(shared_name,...) ] C*KAP*& [ LOCAL(local_name,...) ] C*KAP* END PARALLEL REGION

In the syntax example, local_name and shared_name are references to a variable or an array. If the IF clause logical expression evaluates to .FALSE., all of the code between PARALLEL REGION and END PARALLEL REGION executes on a single processor. If the logical expression evaluates to .TRUE., the code between the PARALLEL REGION and the corresponding END PARALLEL REGION may execute on multiple processors.

The SHARED and LOCAL lists on the PARALLEL REGION directive state the explicit forms of data sharing among the processors that execute the code inside the parallel region. When distinct processors reference the same variable or array from the SHARED list, the processors reference the same storage location. When distinct processors reference the same variable or array from the LOCAL list, the processors reference distinct storage locations.

D.2 PARALLEL DO Directive

The PARALLEL DO directive tells KAP the next statement begins an iterative DO loop that can be executed using multiple processors. Each processor applied to the DO loop can execute one or more iterations. The following syntax example shows the PARALLEL DO directive inside a PARALLEL REGION.

C*KAP* PARALLEL REGION C*KAP*& [ IF(logical expression) ] C*KAP*& [ SHARED(shared_name,...) ] C*KAP*& [ LOCAL(local_name,...) ] C*KAP* PARALLEL DO C*KAP*& [ STATIC ] C*KAP*& [ LAST LOCAL(local_name,...) ] ] C*KAP*& [ BLOCKED [ (integer constant expression) ] ] C*KAP* END PARALLEL REGION

In the syntax example, LAST LOCAL(local_name) creates a "local_name" type variable that is used during execution of the PARALLEL DO loop. During the last PARALLEL DO loop iteration, the final value of "local_name" is copied into an identically named variable created by SHARED(shared_name) in the enclosing PARALLEL REGION. For example, the final value of variable LAST LOCAL(x) would be copied into variable SHARED(x), as follows:

C*KAP* PARALLEL REGION C*KAP*& SHARED(x) C*KAP* PARALLEL DO C*KAP*& LAST LOCAL(x) DO 10 . . . 10 CONTINUE C*KAP* END PARALLEL REGION

Because the LAST LOCAL() variable inside the PARALLEL DO implies the SHARED() variable inside the PARALLEL REGION, it is legal to use both PCF directives.

If [ BLOCKED [ (integer constant expression) ] ] is specified in a PARALLEL DO, loop iterations are assigned to run-time "workers." A "worker" is a logical processor, that is, a processor, a process, or a thread in blocks of that size. If BLOCKED is omitted in a PARALLEL DO directive, the default is even scheduling where loop iterations are evenly divided among run-time "workers." If BLOCKED is specified without a number, the default block size is 1.

D.3 DO Loop Example with PCF Directives

The following example shows the use of the PARALLEL REGION and the PARALLEL DO directives in a simple loop:

C*KAP* PARALLEL REGION C*KAP*& SHARED(A,B,C) LOCAL(I) C*KAP* PARALLEL DO do 10 i=1,n a(i) = b(i) * c(i) 10 continue C*KAP* END PARALLEL REGION

D.4 Program Example with PCF Directives

The following program example shows the use of the PARALLEL REGION and the PARALLEL DO directives:

PROGRAM ATIMESB PARAMETER M=512, N=512, P=512 REAL time1,time2 REAL*8 A,B,C DIMENSION A(1:M,1:N), B(1:N,1:P), C(1:M,1:P) C Initialize the matrices C*KAP* PARALLEL REGION SHARED (A,B) LOCAL (J,I) C*KAP* PARALLEL DO DO 10 J=1,N DO 10 I=1,M A(I,J) = 1.5 10 CONTINUE C*KAP* PARALLEL DO DO 20 J=1,P DO 20 I=1,N B(I,J) = 3.0 20 CONTINUE C*KAP* END PARALLEL REGION C Compute C = A * B CALL CSETTIME() time1 = CTIMEC() CALL MATMUL(A, M, B, N, C, P) time2 = CTIMEC() write(*,*)'elapsed time in seconds is:',(time1-time2) END SUBROUTINE MATMUL(A, LDA, B, LDB, C, LL) REAL A(LDA,LDB), B(LDB,LL), C(LDA,LL) INTEGER LDA,LDB,LL C*KAP* PARALLEL REGION SHARED (A,LDA,B,LDB,C,LL) LOCAL (J,K,I) C*KAP* PARALLEL DO DO 20 J=1,LL DO 20 I=1,LDA C(I,J) =0.0 DO 20 K=1,LDB C(I,J) = C(I,J) + ( A(I,K) * B(K,J) ) 20 CONTINUE C*KAP* END PARALLEL REGION RETURN END

D.5 CRITICAL SECTION Directive

The CRITICAL SECTION and END CRITICAL SECTION directives define the scope of a critical section. Exactly one logical processor at a time is allowed inside a CRITICAL SECTION. This construct must be coded lexically inside a PARALLEL REGION and END PARALLEL REGION.

D.6 ONE PROCESSOR SECTION Directive

The ONE PROCESSOR SECTION and END ONE PROCESSOR SECTION directives define the scope of a section of code where exactly one processor is allowed to execute the code. This directive must be coded lexically inside a PARALLEL REGION and END PARALLEL REGION.

D.7 Comparison of KAP PCF and Cray Autotasking Directives

If you formerly used Cray autotasking to perform parallel decomposition, you can substitute KAP PCF directives, as shown in the following table.

KAP Parallel Computing Forum Cray Autotasking

Specifying Regions of Parallel Execution

C*KAP* PARALLEL REGION CMIC$ PARALLEL

C*KAP* END PARALLEL REGION CMIC$ END PARALLEL

Specifying Parallel Loops

C*KAP* PARALLEL DO CMIC$ DO PARALLEL

End defined by loop scope CMIC$ END DO

Specifying Synchronized Code Sections

C*KAP* CRITICAL SECTION CMIC$ GUARD

End defined by loop scope CMIC$ END GUARD

C*KAP* ONE PROCESSOR SECTION

C*KAP* END ONE PROCESSOR SECTION

Specifying Code Sections for Parallel Execution

Equivalent coded with PARALLEL DO CMIC$ END CASE

Controlling Subroutines Called Within Parallel Regions

!*$* ASSERT CONCURRENT CALL CMIC$ CONTINUE

Unstructured Exits from Parallel Region

Not available currently CMIC$ SOFT EXIT

Equivalent coded with PARALLEL REGION with one loop optimization performed by KAP CMIC$ DO ALL (End defined by loop.)

KAP Parallel Computing Forum	Cray Autotasking
Specifying Regions of Parallel Execution
CKAP PARALLEL REGION	CMIC$ PARALLEL
CKAP END PARALLEL REGION	CMIC$ END PARALLEL
Specifying Parallel Loops
CKAP PARALLEL DO	CMIC$ DO PARALLEL
End defined by loop scope	CMIC$ END DO
Specifying Synchronized Code Sections
CKAP CRITICAL SECTION	CMIC$ GUARD
End defined by loop scope	CMIC$ END GUARD
CKAP ONE PROCESSOR SECTION
CKAP END ONE PROCESSOR SECTION
Specifying Code Sections for Parallel Execution
Equivalent coded with PARALLEL DO	CMIC$ END CASE
Controlling Subroutines Called Within Parallel Regions
!$ ASSERT CONCURRENT CALL	CMIC$ CONTINUE
Unstructured Exits from Parallel Region
Not available currently	CMIC$ SOFT EXIT
Equivalent coded with PARALLEL REGION with one loop optimization performed by KAP	CMIC$ DO ALL (End defined by loop.)

Appendix E
KAP and Incorrect Programs

KAP users should be aware that KAP assumes that the Fortran 90 code that it processes conforms to the Fortran 90 rules. Programs that violate these rules may behave differently after KAP transforms it.

As an example, the following is a program in which the loop control variable is modified during execution of the loop:

DIMENSION A(200) COMMON I J = 10 K = 100 L = 10 DO I=J,K,L A(I) = I CALL SETI ENDDO WRITE (6,*) I,A(I) END SUBROUTINE SETI COMMON I I = 20 END

The output of this program is as follows:

30 30.00000

When the Compaq Fortran compiler does not know the value of the "increment" parameter of a DO loop, it always computes a loop iteration count, even if the value is subsequently determined by value propagation. Thus, if the program illegally modifies the loop control variable I, the loop will still complete in the correct number of iterations. In this case, no compilation time or run-time error message is given, even though the program violates the Fortran 77 standard and the Fortran 90 language rules.

KAP produces the following when this program is processed:

DIMENSION A(200) COMMON I DO I=10,100,10 A(I) = I CALL SETI END DO WRITE (6, *) I, A(I) END SUBROUTINE SETI COMMON I I = 20 END

This program incurs an access violation at run time. Because the Compaq Fortran compiler sees that the increment value is a constant, it uses I directly to test for loop termination. Because I is modified in the loop just before the test, the loop does not terminate, but at the top of the loop I is given a new value appropriate for the iteration number the loop has reached. Eventually I exceeds 200 and the assignment statement accesses past A(200), resulting in the access violation.

A wide variety of incorrect behaviors can result from KAP transformations with "illegal" programs. This possibility (and others of like nature) will need to be considered when evaluating programs whose run-time behavior changes when KAP is used.

Appendix F
Listing File Messages

KAP can generate a listing file. As the default, this file contains the original program with notes concerning what KAP did with the program. Other information can be selected with the -listoptions command switch. See Chapter 10 for examples of the available information.

Source and transformed code listings include an Actions section noting what KAP did do, or could not do, with each loop. The notations indicate that class of message was issued for the marked statement. The following section explains these classes.

The list of error and diagnostic messages that appears later in this appendix is organized by class, and within each class is sorted alphabetically.

Note

These messages are written only to the listing file, and only when a program listing is selected (by default, or explicitly with -listoptions=o or =t). In addition, syntax error and warning messages are written only as part of the original (-listoptions=o) code listing. If syntax errors or warnings are found, a brief note stating that syntax errors or warnings were found will be written to the error file.

The Loop Table requested with -listoptions=l has a status column with a short description of what KAP did with the loop, or why it could not optimize it. Section 10.1.4 contains the possible messages that can appear there, with a longer explanation of each.

In addition to the messages written to the listing file, there is a small number of messages that can appear in the error file. Most of these are issued for conditions like command switch errors or missing files, which prevent KAP from running at all. These messages are brief and self-explanatory.

F.1 Classes of Messages

The following lists classes of messages:

Data Dependence (DD) on a statement indicates it could not be optimized because a data dependence involving the statement exists.
Error (E) indicates a syntax error. These may include missing or extra characters, illegal keywords, or text in the wrong column. If a syntax error is encountered, KAP will not process the program unit, but will pass it to the compilable output unchanged. The listing file will indicate where the error is located. If a syntax error is encountered, the program should be corrected and rerun.
Extension (EX) indicates where a construct in the original program is not allowed in the language KAP is producing. Some KAP products accept one form of Fortran, and produce another form. In some cases, an operation or type is allowed in the input language, but not in the output language. The statements containing these constructs will be marked as an extension.
Informational (INF) points out where a condition exists that must be brought to the programmer's attention. Unreachable statements are an example.
Inserted (I) indicates in the transformed listing where a statement has been inserted. In several of the transformations, statements must be inserted. These statements appear only in the transformed code.
Inlining/IPA (INL) The subroutine or function called on this line was inlined or analyzed. If inlined, the CALL or reference has been replaced with the text of the routine. If analyzed, information about the routine is available for optimizing the code surrounding the CALL.
Loop Reordering (LR) informs you that a Compaq Fortran statement is modified in the process of interchanging loops. KAP looks at outer loops in the process of optimizing a program. If an outer loop would be more efficient if innermost, and it is legal to reorder the loops, KAP will place the outer loop inside. In the process of doing this, KAP may have to change loop bounds (for triangular loops) or distribute loops or float IF assignments. These modifications are marked. Only the statements modified to do the exchange will be marked, not all statements in the exchanged loops.
Miscellaneous (MIS) informs you that some KAP information has been lost. This does not mean the program has not been transformed properly, but that some messages generated in the process have been lost.
Not Vectorized (NV) indicates that a statement was not vectorized because it has no vector version (for instance, an assigned GOTO or an I/O statement).
Output Translation (OT) shows where constructs, which are valid in the input language but are not allowed in the output language (if different), were changed.
Output Trans Fails (OTF) marks statements that have constructs that exist in the input language but are not representable in the output language (if different).
Option Error (OW) tells you of a syntax error in a KAP directive or assertion. This error does not cause processing of a program unit to stop, but the directive or assertion is ignored. See Chapter 6 and Chapter 7 for the proper syntax for directives and assertions.
Program Too Large (NO) tells you either that the program unit being processed is too large for KAP to optimize, or that the subprogram structure is too complex to maintain a calling tree. This class of message rarely appears.
Question (Q) informs you that KAP attempted to optimize a loop nest, and in the process came across a data dependence it could not break at compile time without some knowledge not available to it. This knowledge may be available to you, so KAP will generate a question in the listing file. If you can answer the question by placing an assertion (see Chapter 7) before the loop the question referred to and rerunning the program, KAP may be able to optimize the loop.
It should be noted that if the question is answered incorrectly, the transformation that KAP performs will be invalid, and may give incorrect answers.
Scalar Optimization (SO) marks places in the transformed listing where a scalar loop optimization, for example, loop unrolling, has been performed.
Standardized (STD) marks where a program has been changed to improve the chances of finding optimizable code. The dusty-deck transformations KAP performs use this notation. Examples of these are IF...GOTO to block IF, loop rerolling, and IF loop to DO loop conversion.
Translator Error (TE) informs you of an internal KAP error. A notification is written to the error message file and a traceback is written to the listing file. The software vendor must be notified of this class of error so a correction can be made. If the code that causes the traceback can be sent along with the traceback, that will aid diagnosis. (If the error can be reproduced in a small program unit, that small program unit would be preferable to a large program.)
Vector Enhanced (VE) informs you that transformations have been applied to this statement to improve optimization. KAP performs several transformations (such as forward substitution) intended to make the program easier to optimize. These transformations are not themselves major enhancements to the program, but allow other optimizations to work more effectively.
Warning (W) are errors that are correctable. KAP found an illegal construct in the program, but the intent of the programmer is clear. If a syntax warning is encountered, KAP will still process the program unit. The listing file will indicate where the warning occurred.

Contents

Index

Compaq KAP Fortran/OpenMP for Tru64 UNIXUser Guide