Compaq Fortran
Release Notes for Compaq Tru64 UNIX Systems

Contents

1.7.2 Version 5.2 New Features

The following new features are now supported:

The following new features are now supported:
- The f90 compiler now gives "uninitialized variable" warnings at optimization levels lower than -O4.
- The RTL now has support for handling units *, 5 and 6 as separate units. Use of this feature, requires both rtl and compiler support. Programs must be compiled with a version of the compiler that implements this support and linked with or use a shareable rtl that implements the support. Older existing images will continue to work with the newer rtl. As a consequence of separating the units: if you were to connect unit 6 to a file, and then write to unit * - that write would produce output to the console (or stdout device). Previous to this, a write to unit * would go the same file connected to unit 6. This new behavior is consistent to that of VMS and MS-FPS.
- For F90, a Namelist input group can start with either an ampersand (&) or dollar sign ($) in any column and can be terminated by one of a slash (/), an ampersand (&) or a dollar sign($) in any column.
The DIGITAL Extended Math Library (DXML) routines are now included in the Compaq (DIGITAL) Fortran kit.
The following new f90 command options are now supported:
- -assume gfullpath causes the full source file path to be included in the debug information. The default is -assume nogfullpath .
- -assume [no]pthreads_lock allows the user to select the kind of locking used for an unnamed critical section (when parallel processing is requested with -mp or -omp ). Using the default, -assume nopthreads_lock , provides the fastest performance by providing a single lock for all unnamed critical sections (but does not lock out other process threads).
  To request more restrictive locking, specify -assume pthreads_lock . This locks out all other process threads in addition to all critical sections, which slows application performance.
  When using -assume nopthreads_lock (default), enter critical is used with the _OtsGlobalLock argument. With -assume pthreads_lock , enter critical is used with the _OtsPthreadLock argument.
- -arch ev6 generates instructions for ev6 processors (21264 chips). This option permits the compiler to generate any EV6 instruction, including instructions contained in the BWX (Byte/Word manipulation instructions) or MAX (Multimedia instructions) extension, square root and floating-point convert, and count extension. Applications compiled with this option may incur emulation overhead on ev4, ev5, ev56, and pca56 processors, but will still run correctly.

1.7.3 Version 5.2 Important Information

Some important information to note about this release:

UNIX Virtual Memory from the Compaq Tru64 UNIX docset
There is a new manual in V4.0D of the docset: "System Configuration and Tuning". Section 4.7.3 from that book is "Increasing the Available Address Space".
If your applications are memory-intensive, you may want to increase the available address space. Increasing the address space will cause only a small increase in the demand for memory. However, you may not want to increase the address space if your applications use many forked processes.
The following attributes determine the available address space for processes:
vm-maxvas

This attribute controls the maximum amount of virtual address space available to a process. The default value is 1 GB (1073741824). For Internet servers, you may want to increase this value to 10 GB.
per-proc-address-space
max-per-proc-address-size

These attributes control the maximum amount of user process address space, which is the maximum number of valid virtual regions. The default value for both attributes is 1 GB.
per-proc-stack-size
max-per-proc-stack-size

These attributes control the maximum size of a user process stack. The default value of the per-proc-stack-size attribute is 2097152 bytes. The default value of the max-per-proc-stack-size attribute is 33554432 bytes. You may need to increase these values if you receive cannot grow stack messages.
per-proc-data-size
max-per-proc-data-size

These attributes control the maximum size of a user process data segment. The default value of the per-proc-data-size attribute is 134217728 bytes. The default value of the max-per-proc-data-size is 1 GB. You can use the setrlimit function to control the consumption of system resources by a parent process and its child processes. See setrlimit(2) for information.
If you try to link -non_shared a parallel application that uses -mp or -omp , you must explicitly add -lpset in addition to the libraries f90 links in.
The -noD command switch is now available to allow symbol definitions (using -D ) to be passed to fpp but not to be passed to the conditional compilation facilty inside the f90 compiler.
When -arch ev6 is used, the f90 driver will add -qlm_ev6 before -lm on the cc command so ld will look for the EV6-tuned math library.
Please note the behavior of NOWAIT reductions: each thread contributes its part, and proceeds without waiting for the final value of the reduction variable. The reduction variable's value is undefined until a synchronization operation has occurred, or the parallel region is left.
UNIX v4.0D contains ld switches to restrict library searches to shared and archived libraries. See -no_so , -no_archive , and -so_archive in the ld(1) man page.
Use the setld -d option to install the software to another root directory. Everything in the installation then hangs off that root. Commands like f90 can be pointed to by PATH, the DECF90 environment variable can point to where the compiler is, -l can tell f90 where the RTL is, and the LD_LIBRARY_PATH environment variable can be used to ensure that the desired version of shareable libraries are picked up at run time.

1.7.4 Version 5.2 Corrections

From version V5.1-594-3882K to FT1 T5.2-682-4289P, the following corrections have been made:

Don't create stack temporary for character operands to ALL except when absolutely necessary.
Add -warn argument_checking warning for mismatch between INTEGER kinds with explicit interface.
Add -warn argument_checking warning for insufficent arguments.
Improve display of various diagnostic messages so that the "pointer" is more appropriate.
Fix internal compiler error when compiling a -mp or -omp program with any COMMON or EQUIVALENCED data declared in a PRIVATE, LASTPRIVATE, FIRSTPRIVATE, or REDUCTION list.
Fix problem with TRANSFER of CHARACTER items using non-1 substring offset.
Don't give use-before-defined warning for pointer structure assignment.
Allow LOC(intrinsic_name).
Allow RECORDs of empty STRUCTUREs.
Allow repeat counts in FORMATs to be up to 2147483647.
Always quadword-align EQUIVALENCE groups.
Prevent internal compiler error with very long list of -D definitions.
Correct problem relating to use of an AUTOMATIC array in a parallel region.
Allow contained function result to have dimension bounds depend upon size of one of its array arguments.
Eliminate inappropriate argument mismatch warning with record structures when -wsf is specified. Add support for -assume gfullpath, which causes the full source file path to be included in the debug information.
If -check bounds is in effect, don't optimize implied-DO in I/O as this can prevent bounds checking from occurring.
Eliminate inappropriate use-before-defined warnings when passing array slices.
Improve generated code when calling routines with INTENT(IN). Prevent an output statement (WRITE, etc.) from inhibiting use-before-defined warnings.
Improve generated code when calling intrinsic functions.
-fast or -math_library fast implies -check nopower
Fortran 90 interpretation 100 - ASSOCIATED of two zero-sized arrays always returns .FALSE..
Eliminate internal compiler error for LOC(character-parameter-constant)
Eliminate "text handle table overflow" errors for certain programs that had very large and complicated single statements (eg. DATA).
Allow structure field names which are the same as relational operators.
In pointer assignment, where the right-hand-side is a structure constructor, enforce the standard's requirement that the constructor expression be an allowable target.
Allow a module procedure as an actual argument.
Eliminate inappropriate error about use of PRIVATE type declared later in the module.
Eliminate parsing error where a KIND specifier is continued across multiple source lines.
Eliminate parsing error involving an assignment to a variable whose name begins with "PARAMETER".
When passing an element of a named array constant as an actual argument, make sure that sequence association works as if it had been a variable.
Correct problem with visibility of inherited identifier.
Elmininate internal compiler error for PARAMETER declaration where the constant value is an undefined identifier.
Eliminate internal compiler error involving a statement function having the same name as another routine in the same compilation.
Make severity of -warnings declarations diagnostics warning instead of error.
Eliminate internal compiler error when all source is conditionalized away.
Eliminate internal compiler error for certain programs which use TRANSFER in a PARAMETER declaration.
Allow a tab character in a FORMAT.
Assume INTEGER type for bit constants where required.
Don't sign extend result of ICHAR in a PARAMETER definition.
Eliminate internal compiler error for certain programs using functions with mask arguments.
Make !DEC$ATTRIBUTES (no space) work in any column in fixed-form.
Give proper error instead of internal compiler error when QFLOAT used on platforms that don't support REAL*16.
Don't consider a DECODE to modify the buffer argument for purposes of INTENT.
Eliminate internal compiler error for certain programs when -assume dummy_aliases is in effect.
Correct problem with certain programs using STRUCTUREs with %FILL fields.
When -real_size 64 is in effect, intrinsics with explicitly REAL*4 or COMPLEX*8 arguments are no longer inappropriately promoted to REAL*8/COMPLEX*16.
Do not cause internal compiler error for reference to undefined user operator.
Allow use of an array-constructor's implied DO variable in a specification expression.
Allow SIZE argument to be omitted to IISHFTC, JISHFTC, KISHFTC.
Make result type of IBSET, IBCLR, IBITS, etc. be type of the first argument.
Allow up to 256 arguments to an intrinsic function (eg. MAX, MIN) in a specification expression - the previous limit was 8.
Give error for passing an array section with vector subscript to INTENT(INOUT) or INTENT(OUT) argument.
Fix internal compiler error for use in the length specification expression for a function LEN(concatenation) where one of the concatenation arguments is a passed-length argument to the function being declared.
Fix internal compiler error for use in the length specification expression for a function LEN(TRIM(arg)) where arg is a passed-length argument to the function being declared.
Treat a negative declared length for a CHARACTER variable as if it were zero.
Properly parse "ELSE IFCONSTRUCT" where CONSTRUCT is a construct name.
Give an error when an AUTOMATIC variable is DATA initialized.
Properly propagate (or not) PRIVATE attribute for nested USE.
Eliminate undeserved argument conformance error in certain cases involving WHERE masks.
Ensure that the return kind of ICHAR is "default integer", no matter what kind that is (due to integer_size switch).
Fix internal compiler error for type constructor with string argument for numeric element.
Fix internal compiler error when an INTERFACE TO block has certain syntax errors.
Correctly parse non-standard 'n syntax for REC= in I/O statement when the I/O list contains a quoted literal.
Fix problem relating to ONLY and nested USE.
Make variables whose names begin with $ have implicit INTEGER type.
Allow $ in the range for IMPLICIT (sorts after Z).
If a program has multiple USE statements where the module files cannot be found, give error messages for each of them.
Allow SIZEOF in EQUIVALENCE array index.
Fix internal compiler error with certain array initializers containing an implied DO.
Accept F95-style reference to MAXVAL, MINVAL, MAXLOC, MINLOC with a mask as a second non-keyword argument.
Accept F95-style reference to PRODUCT and SUM with a mask as a second non-keyword argument.
Don't give inappropriate alignment warnings for REAL*16 variables in COMMON.
Don't give error message for empty FORALL statement body.
Allow FORALL to be nested 7 deep (previous limit was 3).
Correctly parse certain complex instances of named FORALL.
Allow RESULT of ENTRY to have same name as host FUNCTION.
Demote diagnostic for not using all active combinations of FORALL index names from error to warning.
Eliminate inappropriate error for certain uses of intrinsic functions in a specification expression.
Eliminate internal compiler error for a peculiar (and erroneous) case of a USE of a NAMELIST whose group contains a variable inherited from another module but which isn't visible due to an ONLY list.
Make OPTIONS /EXTEND_SOURCE persistent across an INCLUDE.
Add support for defined assignment statement from within a WHERE statement.
Allow a function result length to be computed using a field of an array element, where the array is a derived type passed as a dummy argument.
Fix problem with functions returning complex/doublecomplex.

From version FT1 T5.2-682-4289P to FT2 T5.2-695-428AU, the following corrections have been made:

Allow an ALLOCATABLE variable to be PRIVATE in a parallel scope.
Support ISHC for INTEGER*8.
Correct problem with overlapping CHARACTER assignment in FORALL.
Correct debug information for CHARACTER POINTERs.
Correct problems with ISHFTC which can cause alignment errors.
Correct problem with FORALL and WHERE with non-default integer size.
Don't issue spurious UNUSED warning for argument whose interface comes from a MODULE.
Fix internal compiler error for invalid IMPLICIT syntax.
Eliminate inappropriate type mismatch error for certain cases of references to a generic procedure with a procedure argument.
Allow use of . field separator in addition to % in ALLOCATE/DEALLOCATE.
Give warning of unused variable in module procedure when appropriate.
Do not allow a non-integer/logical expression in a logical IF.
Fix another case of recognizing a RECORD field that has the same name as a relational operator.
Correct compiler failure for CMPLX(R8,R8) when real_size=64 is in effect.
Allow gaps in keyword names in MAX/MIN, for example MAX(A1=x,A4=y).
Correct compiler failure when a COMPLEX array is initialized with a REAL array constructor.
Correct compiler failure when the CHAR intrinsic is used in an initialization expression.
Correct compiler failure ("possible out of order or missing USE") in certain uses of nested MODULEs and ONLY.
Show correct source pointer for syntax error in declaration.

From version FT2 T5.2-695-428AU to V5.2-705-428BH, the following corrections have been made:

The compiler now accepts a new DEFAULT keyword on the !DEC$ ATTRIBUTES directive. This tells the compiler to ignore any compiler switches that change external routine or COMMON block naming or argument passing conventions, and uses just the other attributes specified (if any). The switches which this affects are -names and -assume underscore.
Avoid giving a spurious "Inconsistent THREADPRIVATE declaration of common block" error if one COMMON block has a name which is an initial substring of another and one of them is named in a THREADPRIVATE directive.
Prevent FUSE XREF from dying when !DEC$ ATTRIBUTES is used.
Add support for -source_listing switch. The listing file has the extension .lis.
The f66 switch now establishes OPEN defaults of STATUS='NEW' and BLANK='ZERO'.
Correct compiler failure with RESHAPE and SHAPE used in an initialization expression.
Eliminate spurious error when a defined operator is used in a specification expression
Correct compiler failure when undefined user-defined operator is seen.
Eliminate spurious error when component of derived type named constant is used in a context where a constant is required.
Correct problem with host association and contained procedure.
Correct compiler failure with WHERE when non-default integer_size is in effect.

1.8 High Performance Fortran (HPF) Support in Version 5.2

Compaq Fortran (DIGITAL Fortran 90) Version 5.2 supports the entire High Performance Fortran (HPF) Version 2.0 specification with the following exceptions:

Nested FORALL statements
WHERE statements within FORALL statements
Passing CYCLIC(N) arguments to EXTRINSIC (HPF_LOCAL) routines. See Section 1.8.5.3.
Accessing non-local data (other than arguments) within PURE functions in FORALL statements
SORT_UP library procedure
SORT_DOWN library procedure

In addition, the compiler supports many HPF Version 2.0 approved extensions including:

Extrinsic (HPF_LOCAL) routines
Extrinsic (HPF_SERIAL) routines
Mapping of derived type components
Pointers to mapped objects
Shadow-width declarations
All HPF_LOCAL_LIBRARY routines (except LOCAL_BLKCNT, LOCAL_LINDEX, and LOCAL_UINDEX). Other exceptions are the approved extensions to HPF_LOCAL_LIBRARY routines.
ON directive within INDEPENDENT loops
RESIDENT directive used with INDEPENDENT loops

1.8.1 Optimization

This section contains release notes relevant to increasing code performance. You should also refer to Chapter 7 of the DIGITAL High Performance Fortran 90 HPF and PSE Manual for more detail.

1.8.1.1 The -fast Compile-Time Option

To get optimal performance from the compiler, use the -fast option if possible.

Use of the -fast option is not permitted in certain cases, such as programs with zero-sized data objects or with very small nearest-neighbor arrays.

For More Information:

On the cases where use of -fast is not permitted, see the "Optimizing" and "Compiling" chapters of the DIGITAL High Performance Fortran 90 HPF and PSE Manual.

1.8.1.2 Non-Parallel Execution of Code

The following constructs are not handled in parallel:

Reductions with non-constant DIM argument.
CSHIFT, EOSHIFT and SPREAD with non-constant DIM argument.
Some array-constructors
PACK, UNPACK, RESHAPE
xxx_PREFIX, xxx_SUFFIX, GRADE_UP, GRADE_DOWN
In the current implementation of Compaq Fortran 95/90, all I/O operations are serialized through a single processor; see Chapter 7 of the DIGITAL High Performance Fortran 90 HPF and PSE Manual for more details
Date and time intrinsics, including DATE_AND_TIME, SYSTEM_CLOCK, DATE, IDATE, TIME, and SECNDS

If an expression contains a non-parallel construct, the entire statement containing the expression is executed in a nonparallel fashion. The use of such constructs can cause degradation of performance. Compaq recommends avoiding the use of constructs to which the above conditions apply in the computationally intensive kernel of a routine or program.

1.8.1.3 INDEPENDENT DO Loops Currently Parallelized

Not all INDEPENDENT DO loops are currently parallelized. It is important to use the -show hpf or -show hpf_indep compile-time option, which will give a message whenever a loop marked INDEPENDENT is not parallelized.

Currently, a nest of INDEPENDENT DO loops is parallelized whenever the following conditions are met:

When INDEPENDENT DO loops are nested, the NEW keyword must be used to assert that all loop variables (except the outer loop variable) are NEW. It is recommended that the outer DO loop variable be in the NEW list, as well.
The loop does not contain any of the constructs listed in Section 1.8.1.2 that cause non-parallel execution.
Each subscript of each array reference must either
- contain no references to INDEPENDENT DO loop variables, or
- contain one reference to an INDEPENDENT DO loop variable and the subscript expression is an affine function of that DO loop variable.
At least one array reference must reference all the independent loops in a nest of independent loops.
The compiler must be able to prove that loop nest either
- requires no inter-processor communication, or
- can be made to require no inter-processor communication with compiler-generated copyin/copyout code around the loop nest.
Any reductions in an interior (i.e. any but the outer) loop may use an INDEPENDENT DO index as a subscript only if that index represents a serially distributed dimension of the array. An exception to this is the index of the outermost DO loop, which may be used as a subscript even if it represents a non-serially distributed array dimension.
There must not be any assignments to scalars, except for NEW or reduction variables.
Any procedure call inside an INDEPENDENT DO loop must either be PURE, or be encapsulated in an ON HOME RESIDENT region (see Section 1.8.5.6).

When the entire loop nest is encapsulated in an ON HOME RESIDENT region, then only the first two restrictions apply.

For More Information:

On enclosing INDEPENDENT DO loops in an ON HOME RESIDENT region, see Section 1.8.5.6

1.8.1.4 Nearest-Neighbor Optimization

The following is a list of conditions that must be satisfied in an array assignment, FORALL statement, or INDEPENDENT DO loop in order to take advantage of the nearest-neighbor optimization:

Relevant arrays with the POINTER or TARGET attributes must have shadow edges explicitly declared with the SHADOW directive.
The arrays involved in the nearest-neighbor style assignment statements should not be module variables or variables assigned by USE association. However, if both the actual and all associated dummies are assigned a shadow-edge width with the SHADOW directive, this restriction is lifted.
A value must be specified for the -wsf option on the command line.
Some interprocessor communication must be necessary in the statement.
Corresponding dimensions of an array must be distributed in the same way (though they can be offset using an ALIGN directive). If the -nearest_neighbor flag's optional nn field is used to specify a maximum shadow-edge width, only constructs with a subscript difference (adjusted for any ALIGN offset) less than or equal to the value specified by nn will be recognized as nearest neighbor. For example, the assignment statement ( FORALL (i=1:n) A(i) = B(i-3) ) has a subscript difference of 3 . In a program compiled with the flag -nearest_neighbor 2 , this assignment statement would not be eligible for the nearest neighbor optimization.
The left-hand side array must be distributed BLOCK in at least one dimension.
The arrays must not have complicated subscripts (no vector-valued subscripts, and any subscripts containing a FORALL index must be affine functions of one FORALL index; further, that FORALL index must not be repeated in any other subscript of a particular array reference).
Statements with scalar subscripts are eligible only if that array dimension is (effectively) mapped serially.
Subscript triplet strides must be known at compile time and be greater than 0.
The arrays must be distributed BLOCK or serial (*) in each dimension.

Compile with the -show hpf or -show hpf_nearest switch to see which lines are treated as nearest-neighbor.

Nearest-neighbor communications are not profiled by the pprof profiler. See the section about the pprof Profile Analysis Tool in the Parallel Software Environment (PSE) Version 1.6 release notes.

For More Information:

On profiling nearest-neighbor computations, see the section about the pprof Profile Analysis Tool in the Parallel Software Environment (PSE) Version 1.6 release notes.
On using EOSHIFT for nearest-neighbor computations, see Section 1.8.1.6

1.8.1.5 Widths Given with the SHADOW Directive Agree with Automatically Generated Widths

When compiler-determined shadow widths don't agree with the widths given with the SHADOW directive, less efficient code will usually be generated.

To avoid this problem, create a version of your program without the SHADOW directive, and compile with the -show hpf or -show hpf_near option. The compiler will generate messages that include the sizes of the compiler-determined shadow widths. Make sure that any widths you specify with the SHADOW directive match the compiler-generated widths.

1.8.1.6 Using EOSHIFT Intrinsic for Nearest Neighbor Calculations

In the current compiler version, the compiler does not always recognize nearest-neighbor calculations coded using EOSHIFT. Also, EOSHIFT is sometimes converted into a series of statements, only some of which may be eligible for the nearest neighbor optimization.

To avoid these problems, Compaq recommends using CSHIFT or FORALL instead of EOSHIFT if these alternatives meet the needs of your program.

1.8.2 New Features

This section describes the new HPF features in this release of Compaq Fortran.

1.8.2.1 RANDOM_NUMBER Executes in Parallel

The RANDOM_NUMBER intrinsic subroutine now executes in parallel for mapped data. The result is a significant decrease in execution time.

1.8.2.2 Improved Performance of TRANSPOSE Intrinsic

The TRANSPOSE intrinsic will execute faster for most arrays that are mapped either * or BLOCK in all dimensions.

1.8.2.3 Improved Performance of DO Loops Marked as INDEPENDENT

Certain induction variables are now recognized as affine functions of the INDEPENDENT DO loop indices, thus meeting the requirements listed in Section 1.8.1.3. Now, the compiler can parallelize array references containing such variables as subscripts. An example is next.

! Compiler now recognizes a loop as INDEPENDENT because it ! knows that variable k1 is k+1. PROGRAM gauss INTEGER, PARAMETER :: n = 1024 REAL, DIMENSION (n,n) :: A !HPF$ DISTRIBUTE A(*,CYCLIC) DO k = 1, n-1 k1 = k+1 !HPF$ INDEPENDENT, NEW(i) DO j = k1, n DO i = k1, n A(i,j) = A(i,j) - A(i,k) * A(k,j) ENDDO ENDDO ENDDO END PROGRAM gauss

1.8.3 Corrections

This section lists problems in previous versions that have been fixed in this version.

In programs compiled with the -wsf option, pointer assignments inside a FORALL did not work reliably. In many cases, incorrect program results occurred.
The ASSOCIATED intrinisc sometimes returned incorrect results in programs compiled with the -wsf compile-time option.
GRADE_UP and GRADE_DOWN were not stable sorts.

Contents

Compaq FortranRelease Notes for Compaq Tru64 UNIX Systems

1.7.2 Version 5.2 New Features

1.7.3 Version 5.2 Important Information

1.7.4 Version 5.2 Corrections

1.8 High Performance Fortran (HPF) Support in Version 5.2

1.8.1 Optimization

1.8.1.2 Non-Parallel Execution of Code

1.8.1.4 Nearest-Neighbor Optimization

1.8.1.5 Widths Given with the SHADOW Directive Agree with Automatically Generated Widths

Compaq Fortran
Release Notes for Compaq Tru64 UNIX Systems