Previous | Contents | Index |
This example demonstrates three coding styles for temporary storage, one using the external routine omp_get_thread_num() and the other two using only directives.
subroutine local_1a (n) dimension a(100) common /cmn/ t( 100, 0:7 ) ! assume 8 processors ! max. c$omp parallel do c$omp& shared(a,t) c$omp& private(i) do i = 1, n do j = 1, n t(j, omp_get_thread_num()) = a(i) ** 2 enddo call work( t(1,omp_get_thread_num()) ) enddo end |
If t is not global, then the above could be accomplished by putting t in the private clause:
subroutine local_1b (n) dimension t(100) c$omp parallel do c$omp& shared(a) c$omp& private(i,t) do i = 1, n do j = 1, n t(j) = a(i) ** 2 enddo call work( t ) enddo end |
If t is global, then the instance parallel and new directives can be used instead.
subroutine local_1c (n) dimension t(100) common /cmn/ t c$omp instance parallel (/cmn/) c$omp parallel do c$omp& shared(a) c$omp& private(i) c$omp new (/cmn/) do i = 1, n do j = 1, n t(j) = a(i) ** 2 enddo call work ! access t from common /cmn/ enddo end |
Not all of the values of a and b are initialized in the loop before they are used (the rest of the values are produced by init_a and init_b). Using firstprivate for a and b causes the initialization values produced by init_a and init_b to be copied into private copies of a and b for use in the loops.
subroutine dsq3_b (c,n) integer n real a(100), b(100), c(n,n), x, y call init_a( a, n ) call init_b( b, n ) c$omp parallel do shared(c,n) private(i,j,x,y) firstprivate(a,b) do i = 1, n do j = 1, i a(j) = calc_a(i) b(j) = calc_b(i) enddo do j = 1, n x = a(i) - b(i) y = b(i) + a(i) c(j,i) = x * y enddo enddo c$omp end parallel do print *, x, y end |
Similar to Section C.17 except using threadprivate common blocks. For threadprivate, copyin is used instead of firstprivate to copy initialization values from the shared (master) copy of /blk/ to the private copies.
subroutine dsq3_b_tc (c,n) integer n real a(100), b(100), c(n,n), x, y common /blk/ a,b c$omp threadprivate (/blk/) call init_a( a, n ) call init_b( b, n ) c$omp parallel do shared(c,n) private(i,j,x,y) copyin(a,b) do i = 1, n do j = 1, i a(j) = calc_a(i) b(j) = calc_b(i) enddo do j = 1, n x = a(i) - b(i) y = b(i) + a(i) c(j,i) = x * y enddo enddo c$omp end parallel do print *, x, y end |
Similar to Section C.17 except using instance parallel privatizable common blocks. For instance parallel, copy new is used instead of firstprivate to privatize the common block and to copy initialization values from the shared (master) copy of /blk/ to the private copies.
subroutine dsq3_b_ip (c,n) integer n real a(100), b(100), c(n,n), x, y common /blk/ a,b c$omp instance parallel (/blk/) call init_a( a, n ) call init_b( b, n ) c$omp parallel do shared(c,n) private(i,j,x,y) c$omp copy new (/blk/) do i = 1, n do j = 1, i a(j) = calc_a(i) b(j) = calc_b(i) enddo do j = 1, n x = a(i) - b(i) y = b(i) + a(i) c(j,i) = x * y enddo enddo c$omp end parallel do print *, x, y end |
Use Parallel Computing Forum (PCF) directives only with loops that are safe to parallelize. When Compaq KAP sees loops prefaced with PCF directives, it does not perform data dependence analysis and does not prevent you from using a parallel directive incorrectly.
Observe the following rules:
The PARALLEL REGION and END PARALLEL REGION directives delineate where parallelism exists in the program. The following example shows the PARALLEL REGION directive syntax:
C*KAP* PARALLEL REGION C*KAP*& [ IF(logical expression) ] C*KAP*& [ SHARED(shared_name,...) ] C*KAP*& [ LOCAL(local_name,...) ] C*KAP* END PARALLEL REGION |
In the syntax example, local_name and shared_name are references to a variable or an array. If the IF clause logical expression evaluates to .FALSE., all of the code between PARALLEL REGION and END PARALLEL REGION executes on a single processor. If the logical expression evaluates to .TRUE., the code between the PARALLEL REGION and the corresponding END PARALLEL REGION may execute on multiple processors.
The SHARED and LOCAL lists on the PARALLEL REGION directive state the
explicit forms of data sharing among the processors that execute the
code inside the parallel region. When distinct processors reference the
same variable or array from the SHARED list, the processors reference
the same storage location. When distinct processors reference the same
variable or array from the LOCAL list, the processors reference
distinct storage locations.
D.2 PARALLEL DO Directive
The PARALLEL DO directive tells KAP the next statement begins an iterative DO loop that can be executed using multiple processors. Each processor applied to the DO loop can execute one or more iterations. The following syntax example shows the PARALLEL DO directive inside a PARALLEL REGION.
C*KAP* PARALLEL REGION C*KAP*& [ IF(logical expression) ] C*KAP*& [ SHARED(shared_name,...) ] C*KAP*& [ LOCAL(local_name,...) ] C*KAP* PARALLEL DO C*KAP*& [ STATIC ] C*KAP*& [ LAST LOCAL(local_name,...) ] ] C*KAP*& [ BLOCKED [ (integer constant expression) ] ] C*KAP* END PARALLEL REGION |
In the syntax example, LAST LOCAL(local_name) creates a "local_name" type variable that is used during execution of the PARALLEL DO loop. During the last PARALLEL DO loop iteration, the final value of "local_name" is copied into an identically named variable created by SHARED(shared_name) in the enclosing PARALLEL REGION. For example, the final value of variable LAST LOCAL(x) would be copied into variable SHARED(x), as follows:
C*KAP* PARALLEL REGION C*KAP*& SHARED(x) C*KAP* PARALLEL DO C*KAP*& LAST LOCAL(x) DO 10 . . . 10 CONTINUE C*KAP* END PARALLEL REGION |
Because the LAST LOCAL() variable inside the PARALLEL DO implies the SHARED() variable inside the PARALLEL REGION, it is legal to use both PCF directives.
If [ BLOCKED [ (integer constant expression) ] ] is specified
in a PARALLEL DO, loop iterations are assigned to run-time
"workers." A "worker" is a logical processor, that
is, a processor, a process, or a thread in blocks of that size. If
BLOCKED is omitted in a PARALLEL DO directive, the default is
even scheduling where loop iterations are evenly divided among run-time
"workers." If BLOCKED is specified without a number,
the default block size is 1.
D.3 DO Loop Example with PCF Directives
The following example shows the use of the PARALLEL REGION and the PARALLEL DO directives in a simple loop:
C*KAP* PARALLEL REGION C*KAP*& SHARED(A,B,C) LOCAL(I) C*KAP* PARALLEL DO do 10 i=1,n a(i) = b(i) * c(i) 10 continue C*KAP* END PARALLEL REGION |
The following program example shows the use of the PARALLEL REGION and the PARALLEL DO directives:
PROGRAM ATIMESB PARAMETER M=512, N=512, P=512 REAL time1,time2 REAL*8 A,B,C DIMENSION A(1:M,1:N), B(1:N,1:P), C(1:M,1:P) C Initialize the matrices C*KAP* PARALLEL REGION SHARED (A,B) LOCAL (J,I) C*KAP* PARALLEL DO DO 10 J=1,N DO 10 I=1,M A(I,J) = 1.5 10 CONTINUE C*KAP* PARALLEL DO DO 20 J=1,P DO 20 I=1,N B(I,J) = 3.0 20 CONTINUE C*KAP* END PARALLEL REGION C Compute C = A * B CALL CSETTIME() time1 = CTIMEC() CALL MATMUL(A, M, B, N, C, P) time2 = CTIMEC() write(*,*)'elapsed time in seconds is:',(time1-time2) END SUBROUTINE MATMUL(A, LDA, B, LDB, C, LL) REAL A(LDA,LDB), B(LDB,LL), C(LDA,LL) INTEGER LDA,LDB,LL C*KAP* PARALLEL REGION SHARED (A,LDA,B,LDB,C,LL) LOCAL (J,K,I) C*KAP* PARALLEL DO DO 20 J=1,LL DO 20 I=1,LDA C(I,J) =0.0 DO 20 K=1,LDB C(I,J) = C(I,J) + ( A(I,K) * B(K,J) ) 20 CONTINUE C*KAP* END PARALLEL REGION RETURN END |
The CRITICAL SECTION and END CRITICAL SECTION directives define the
scope of a critical section. Exactly one logical processor at a time is
allowed inside a CRITICAL SECTION. This construct must be coded
lexically inside a PARALLEL REGION and END PARALLEL REGION.
D.6 ONE PROCESSOR SECTION Directive
The ONE PROCESSOR SECTION and END ONE PROCESSOR SECTION directives
define the scope of a section of code where exactly one processor is
allowed to execute the code. This directive must be coded lexically
inside a PARALLEL REGION and END PARALLEL REGION.
D.7 Comparison of KAP PCF and Cray Autotasking Directives
If you formerly used Cray autotasking to perform parallel decomposition, you can substitute KAP PCF directives, as shown in the following table.
KAP Parallel Computing Forum | Cray Autotasking |
---|---|
Specifying Regions of Parallel Execution | |
C*KAP* PARALLEL REGION | CMIC$ PARALLEL |
C*KAP* END PARALLEL REGION | CMIC$ END PARALLEL |
Specifying Parallel Loops | |
C*KAP* PARALLEL DO | CMIC$ DO PARALLEL |
End defined by loop scope | CMIC$ END DO |
Specifying Synchronized Code Sections | |
C*KAP* CRITICAL SECTION | CMIC$ GUARD |
End defined by loop scope | CMIC$ END GUARD |
C*KAP* ONE PROCESSOR SECTION | |
C*KAP* END ONE PROCESSOR SECTION | |
Specifying Code Sections for Parallel Execution | |
Equivalent coded with PARALLEL DO | CMIC$ END CASE |
Controlling Subroutines Called Within Parallel Regions | |
!*$* ASSERT CONCURRENT CALL | CMIC$ CONTINUE |
Unstructured Exits from Parallel Region | |
Not available currently | CMIC$ SOFT EXIT |
Equivalent coded with PARALLEL REGION with one loop optimization performed by KAP | CMIC$ DO ALL (End defined by loop.) |
KAP users should be aware that KAP assumes that the Fortran 90 code that it processes conforms to the Fortran 90 rules. Programs that violate these rules may behave differently after KAP transforms it.
As an example, the following is a program in which the loop control variable is modified during execution of the loop:
DIMENSION A(200) COMMON I J = 10 K = 100 L = 10 DO I=J,K,L A(I) = I CALL SETI ENDDO WRITE (6,*) I,A(I) END SUBROUTINE SETI COMMON I I = 20 END |
The output of this program is as follows:
30 30.00000 |
When the Compaq Fortran compiler does not know the value of the "increment" parameter of a DO loop, it always computes a loop iteration count, even if the value is subsequently determined by value propagation. Thus, if the program illegally modifies the loop control variable I, the loop will still complete in the correct number of iterations. In this case, no compilation time or run-time error message is given, even though the program violates the Fortran 77 standard and the Fortran 90 language rules.
KAP produces the following when this program is processed:
DIMENSION A(200) COMMON I DO I=10,100,10 A(I) = I CALL SETI END DO WRITE (6, *) I, A(I) END SUBROUTINE SETI COMMON I I = 20 END |
This program incurs an access violation at run time. Because the Compaq Fortran compiler sees that the increment value is a constant, it uses I directly to test for loop termination. Because I is modified in the loop just before the test, the loop does not terminate, but at the top of the loop I is given a new value appropriate for the iteration number the loop has reached. Eventually I exceeds 200 and the assignment statement accesses past A(200), resulting in the access violation.
A wide variety of incorrect behaviors can result from KAP transformations with "illegal" programs. This possibility (and others of like nature) will need to be considered when evaluating programs whose run-time behavior changes when KAP is used.
KAP can generate a listing file. As the default, this file contains the original program with notes concerning what KAP did with the program. Other information can be selected with the -listoptions command switch. See Chapter 10 for examples of the available information.
Source and transformed code listings include an Actions section noting what KAP did do, or could not do, with each loop. The notations indicate that class of message was issued for the marked statement. The following section explains these classes.
The list of error and diagnostic messages that appears later in this appendix is organized by class, and within each class is sorted alphabetically.
These messages are written only to the listing file, and only when a program listing is selected (by default, or explicitly with -listoptions=o or =t). In addition, syntax error and warning messages are written only as part of the original (-listoptions=o) code listing. If syntax errors or warnings are found, a brief note stating that syntax errors or warnings were found will be written to the error file. The Loop Table requested with -listoptions=l has a status column with a short description of what KAP did with the loop, or why it could not optimize it. Section 10.1.4 contains the possible messages that can appear there, with a longer explanation of each. In addition to the messages written to the listing file, there is a small number of messages that can appear in the error file. Most of these are issued for conditions like command switch errors or missing files, which prevent KAP from running at all. These messages are brief and self-explanatory. |
The following lists classes of messages:
Previous | Next | Contents | Index |