Compaq KAP Fortran/OpenMP
for Tru64 UNIX
User Guide


Previous Contents Index

C.16 Avoiding External Routines: Temporary Storage

This example demonstrates three coding styles for temporary storage, one using the external routine omp_get_thread_num() and the other two using only directives.


        subroutine local_1a (n) 
           dimension a(100) 
           common /cmn/ t( 100, 0:7 )  ! assume 8 processors 
   !    max. 
   c$omp parallel do 
   c$omp&   shared(a,t) 
   c$omp&   private(i) 
           do i = 1, n 
               do j = 1, n 
                   t(j, omp_get_thread_num()) = a(i) ** 2 
               enddo 
               call work( t(1,omp_get_thread_num()) ) 
           enddo 
           end 

If t is not global, then the above could be accomplished by putting t in the private clause:


           subroutine local_1b (n) 
           dimension t(100) 
 
   c$omp parallel do 
   c$omp&   shared(a) 
   c$omp&   private(i,t) 
           do i = 1, n 
               do j = 1, n 
                   t(j) = a(i) ** 2 
               enddo 
               call work( t ) 
           enddo 
           end 

If t is global, then the instance parallel and new directives can be used instead.


           subroutine local_1c (n) 
           dimension t(100) 
           common /cmn/ t 
   c$omp instance parallel (/cmn/) 
 
   c$omp parallel do 
   c$omp&   shared(a) 
   c$omp&   private(i) 
   c$omp new (/cmn/) 
           do i = 1, n 
               do j = 1, n 
                   t(j) = a(i) ** 2 
               enddo 
               call work   ! access t from common /cmn/ 
           enddo 
           end 

C.17 FIRSTPRIVATE: Copying in Initialization Values

Not all of the values of a and b are initialized in the loop before they are used (the rest of the values are produced by init_a and init_b). Using firstprivate for a and b causes the initialization values produced by init_a and init_b to be copied into private copies of a and b for use in the loops.


           subroutine dsq3_b (c,n) 
           integer n 
           real a(100), b(100), c(n,n), x, y 
           call init_a( a, n ) 
           call init_b( b, n ) 
   c$omp parallel do shared(c,n) private(i,j,x,y) firstprivate(a,b) 
           do i = 1, n 
               do j = 1, i 
                   a(j) = calc_a(i) 
                   b(j) = calc_b(i) 
               enddo 
               do j = 1, n 
                   x = a(i) - b(i) 
                   y = b(i) + a(i) 
                   c(j,i) = x * y 
               enddo 
           enddo 
   c$omp end parallel do 
           print *, x, y 
           end 

C.18 THREADPRIVATE: Copying in Initialization Values

Similar to Section C.17 except using threadprivate common blocks. For threadprivate, copyin is used instead of firstprivate to copy initialization values from the shared (master) copy of /blk/ to the private copies.


           subroutine dsq3_b_tc (c,n) 
           integer n 
           real a(100), b(100), c(n,n), x, y 
           common /blk/ a,b 
   c$omp threadprivate (/blk/) 
 
           call init_a( a, n ) 
           call init_b( b, n ) 
   c$omp parallel do shared(c,n) private(i,j,x,y) copyin(a,b) 
           do i = 1, n 
               do j = 1, i 
                   a(j) = calc_a(i) 
                   b(j) = calc_b(i) 
               enddo 
               do j = 1, n 
                   x = a(i) - b(i) 
                   y = b(i) + a(i) 
                   c(j,i) = x * y 
               enddo 
           enddo 
   c$omp end parallel do 
           print *, x, y 
           end 

C.19 INSTANCE PARALLEL: Copying in Initialization Values

Similar to Section C.17 except using instance parallel privatizable common blocks. For instance parallel, copy new is used instead of firstprivate to privatize the common block and to copy initialization values from the shared (master) copy of /blk/ to the private copies.


           subroutine dsq3_b_ip (c,n) 
           integer n 
           real a(100), b(100), c(n,n), x, y 
           common /blk/ a,b 
   c$omp instance parallel (/blk/) 
 
           call init_a( a, n ) 
           call init_b( b, n ) 
   c$omp parallel do shared(c,n) private(i,j,x,y) 
   c$omp copy new (/blk/) 
           do i = 1, n 
               do j = 1, i 
                   a(j) = calc_a(i) 
                   b(j) = calc_b(i) 
               enddo 
               do j = 1, n 
                   x = a(i) - b(i) 
                   y = b(i) + a(i) 
                   c(j,i) = x * y 
               enddo 
           enddo 
   c$omp end parallel do 
           print *, x, y 
           end 


Appendix D
PCF Directives

Use Parallel Computing Forum (PCF) directives only with loops that are safe to parallelize. When Compaq KAP sees loops prefaced with PCF directives, it does not perform data dependence analysis and does not prevent you from using a parallel directive incorrectly.

Observe the following rules:

D.1 PARALLEL REGION Directive

The PARALLEL REGION and END PARALLEL REGION directives delineate where parallelism exists in the program. The following example shows the PARALLEL REGION directive syntax:


  C*KAP*          PARALLEL REGION 
  C*KAP*&            [ IF(logical expression) ] 
  C*KAP*&            [ SHARED(shared_name,...) ] 
  C*KAP*&            [ LOCAL(local_name,...) ] 
  C*KAP*          END PARALLEL REGION 

In the syntax example, local_name and shared_name are references to a variable or an array. If the IF clause logical expression evaluates to .FALSE., all of the code between PARALLEL REGION and END PARALLEL REGION executes on a single processor. If the logical expression evaluates to .TRUE., the code between the PARALLEL REGION and the corresponding END PARALLEL REGION may execute on multiple processors.

The SHARED and LOCAL lists on the PARALLEL REGION directive state the explicit forms of data sharing among the processors that execute the code inside the parallel region. When distinct processors reference the same variable or array from the SHARED list, the processors reference the same storage location. When distinct processors reference the same variable or array from the LOCAL list, the processors reference distinct storage locations.

D.2 PARALLEL DO Directive

The PARALLEL DO directive tells KAP the next statement begins an iterative DO loop that can be executed using multiple processors. Each processor applied to the DO loop can execute one or more iterations. The following syntax example shows the PARALLEL DO directive inside a PARALLEL REGION.


  C*KAP*  PARALLEL REGION 
  C*KAP*&    [ IF(logical expression) ] 
  C*KAP*&    [ SHARED(shared_name,...) ] 
  C*KAP*&    [ LOCAL(local_name,...) ] 
  C*KAP*          PARALLEL DO 
  C*KAP*&          [ STATIC ] 
  C*KAP*&          [ LAST LOCAL(local_name,...) ] ] 
  C*KAP*&          [ BLOCKED [ (integer constant expression) ] ] 
  C*KAP* END PARALLEL REGION 

In the syntax example, LAST LOCAL(local_name) creates a "local_name" type variable that is used during execution of the PARALLEL DO loop. During the last PARALLEL DO loop iteration, the final value of "local_name" is copied into an identically named variable created by SHARED(shared_name) in the enclosing PARALLEL REGION. For example, the final value of variable LAST LOCAL(x) would be copied into variable SHARED(x), as follows:


  C*KAP*  PARALLEL REGION 
  C*KAP*&    SHARED(x) 
  C*KAP*          PARALLEL DO 
  C*KAP*&          LAST LOCAL(x) 
                   DO 10 
                   . 
                   . 
                   . 
     10            CONTINUE 
  C*KAP*  END PARALLEL REGION 

Because the LAST LOCAL() variable inside the PARALLEL DO implies the SHARED() variable inside the PARALLEL REGION, it is legal to use both PCF directives.

If [ BLOCKED [ (integer constant expression) ] ] is specified in a PARALLEL DO, loop iterations are assigned to run-time "workers." A "worker" is a logical processor, that is, a processor, a process, or a thread in blocks of that size. If BLOCKED is omitted in a PARALLEL DO directive, the default is even scheduling where loop iterations are evenly divided among run-time "workers." If BLOCKED is specified without a number, the default block size is 1.

D.3 DO Loop Example with PCF Directives

The following example shows the use of the PARALLEL REGION and the PARALLEL DO directives in a simple loop:


  C*KAP* PARALLEL REGION 
  C*KAP*& SHARED(A,B,C) LOCAL(I) 
  C*KAP* PARALLEL DO 
        do 10 i=1,n 
            a(i) = b(i) * c(i) 
     10 continue 
  C*KAP* END PARALLEL REGION 

D.4 Program Example with PCF Directives

The following program example shows the use of the PARALLEL REGION and the PARALLEL DO directives:


           PROGRAM ATIMESB 
 
           PARAMETER M=512, N=512, P=512 
 
           REAL time1,time2 
           REAL*8 A,B,C 
           DIMENSION A(1:M,1:N), B(1:N,1:P), C(1:M,1:P) 
 
   C Initialize the matrices 
 
   C*KAP* PARALLEL REGION SHARED (A,B) LOCAL (J,I) 
   C*KAP* PARALLEL DO 
 
           DO 10 J=1,N 
             DO 10 I=1,M 
             A(I,J) = 1.5 
      10   CONTINUE 
 
   C*KAP* PARALLEL DO 
 
           DO 20 J=1,P 
             DO 20 I=1,N 
             B(I,J) = 3.0 
      20   CONTINUE 
 
   C*KAP* END PARALLEL REGION 
 
   C Compute C = A * B 
 
           CALL CSETTIME() 
           time1 = CTIMEC() 
           CALL MATMUL(A, M, B, N, C, P) 
           time2 = CTIMEC() 
           write(*,*)'elapsed time in seconds is:',(time1-time2) 
           END 
 
           SUBROUTINE MATMUL(A, LDA, B, LDB, C, LL) 
           REAL A(LDA,LDB), B(LDB,LL), C(LDA,LL) 
           INTEGER LDA,LDB,LL 
 
   C*KAP* PARALLEL REGION SHARED (A,LDA,B,LDB,C,LL) LOCAL (J,K,I) 
 
   C*KAP* PARALLEL DO 
           DO 20 J=1,LL 
             DO 20 I=1,LDA 
             C(I,J) =0.0 
               DO 20 K=1,LDB 
               C(I,J) = C(I,J) + ( A(I,K) * B(K,J) ) 
      20   CONTINUE 
 
   C*KAP* END PARALLEL REGION 
           RETURN 
 
           END 

D.5 CRITICAL SECTION Directive

The CRITICAL SECTION and END CRITICAL SECTION directives define the scope of a critical section. Exactly one logical processor at a time is allowed inside a CRITICAL SECTION. This construct must be coded lexically inside a PARALLEL REGION and END PARALLEL REGION.

D.6 ONE PROCESSOR SECTION Directive

The ONE PROCESSOR SECTION and END ONE PROCESSOR SECTION directives define the scope of a section of code where exactly one processor is allowed to execute the code. This directive must be coded lexically inside a PARALLEL REGION and END PARALLEL REGION.

D.7 Comparison of KAP PCF and Cray Autotasking Directives

If you formerly used Cray autotasking to perform parallel decomposition, you can substitute KAP PCF directives, as shown in the following table.
KAP Parallel Computing Forum Cray Autotasking
Specifying Regions of Parallel Execution  
C*KAP* PARALLEL REGION CMIC$ PARALLEL
C*KAP* END PARALLEL REGION CMIC$ END PARALLEL
Specifying Parallel Loops  
C*KAP* PARALLEL DO CMIC$ DO PARALLEL
End defined by loop scope CMIC$ END DO
Specifying Synchronized Code Sections  
C*KAP* CRITICAL SECTION CMIC$ GUARD
End defined by loop scope CMIC$ END GUARD
C*KAP* ONE PROCESSOR SECTION  
C*KAP* END ONE PROCESSOR SECTION  
Specifying Code Sections for Parallel Execution  
Equivalent coded with PARALLEL DO CMIC$ END CASE
Controlling Subroutines Called Within Parallel Regions  
!*$* ASSERT CONCURRENT CALL CMIC$ CONTINUE
Unstructured Exits from Parallel Region  
Not available currently CMIC$ SOFT EXIT
Equivalent coded with PARALLEL REGION with one loop optimization performed by KAP CMIC$ DO ALL (End defined by loop.)


Appendix E
KAP and Incorrect Programs

KAP users should be aware that KAP assumes that the Fortran 90 code that it processes conforms to the Fortran 90 rules. Programs that violate these rules may behave differently after KAP transforms it.

As an example, the following is a program in which the loop control variable is modified during execution of the loop:


DIMENSION A(200) 
COMMON I 
J = 10 
K = 100 
L = 10 
 
DO I=J,K,L 
A(I) = I 
CALL SETI 
ENDDO 
 
WRITE (6,*) I,A(I) 
END 
 
SUBROUTINE SETI 
COMMON I 
I = 20 
END 
 

The output of this program is as follows:


30  30.00000 

When the Compaq Fortran compiler does not know the value of the "increment" parameter of a DO loop, it always computes a loop iteration count, even if the value is subsequently determined by value propagation. Thus, if the program illegally modifies the loop control variable I, the loop will still complete in the correct number of iterations. In this case, no compilation time or run-time error message is given, even though the program violates the Fortran 77 standard and the Fortran 90 language rules.

KAP produces the following when this program is processed:


DIMENSION A(200) 
COMMON I 
 
DO   I=10,100,10 
A(I) = I 
CALL SETI 
END DO 
 
WRITE (6, *) I, A(I) 
END 
 
SUBROUTINE SETI 
COMMON I 
I = 20 
END 

This program incurs an access violation at run time. Because the Compaq Fortran compiler sees that the increment value is a constant, it uses I directly to test for loop termination. Because I is modified in the loop just before the test, the loop does not terminate, but at the top of the loop I is given a new value appropriate for the iteration number the loop has reached. Eventually I exceeds 200 and the assignment statement accesses past A(200), resulting in the access violation.

A wide variety of incorrect behaviors can result from KAP transformations with "illegal" programs. This possibility (and others of like nature) will need to be considered when evaluating programs whose run-time behavior changes when KAP is used.


Appendix F
Listing File Messages

KAP can generate a listing file. As the default, this file contains the original program with notes concerning what KAP did with the program. Other information can be selected with the -listoptions command switch. See Chapter 10 for examples of the available information.

Source and transformed code listings include an Actions section noting what KAP did do, or could not do, with each loop. The notations indicate that class of message was issued for the marked statement. The following section explains these classes.

The list of error and diagnostic messages that appears later in this appendix is organized by class, and within each class is sorted alphabetically.

Note

These messages are written only to the listing file, and only when a program listing is selected (by default, or explicitly with -listoptions=o or =t). In addition, syntax error and warning messages are written only as part of the original (-listoptions=o) code listing. If syntax errors or warnings are found, a brief note stating that syntax errors or warnings were found will be written to the error file.

The Loop Table requested with -listoptions=l has a status column with a short description of what KAP did with the loop, or why it could not optimize it. Section 10.1.4 contains the possible messages that can appear there, with a longer explanation of each.

In addition to the messages written to the listing file, there is a small number of messages that can appear in the error file. Most of these are issued for conditions like command switch errors or missing files, which prevent KAP from running at all. These messages are brief and self-explanatory.

F.1 Classes of Messages

The following lists classes of messages:


Previous Next Contents Index