This is one of a series of papers by Numrich on the use of Fortran co-array syntax and tensor notation for numerical linear algebra. It includes computational examples in tensor notation, and Fortran programs for them using co-array syntax, and presents run data showing near-linear speedup on a parallel processor.
Co-array Fortran is an extension of Fortran in which both code and data space are replicated (each is called an image). Contravariant vectors are represented by conventional Fortran arrays in local memory (indices in curly brackets); covariant vectors are represented by co-arrays in remote memory (indices in square brackets). For an orthonormal basis, they are related by a transpose. The author also defines linear operators to decompose vectors and matrices into blocks to aid in domain decomposition.
The overall technology, consisting of tensor algebra, Fortran arrays and co-arrays, and vector/matrix blocking with block indices, amounts to a powerful new way of efficiently representing concurrent algorithms and mapping them onto parallel processors. As examples of the overall technology, the author presents seven algorithms for matrix multiplication in tensor notation, and gives co-array Fortran code for two of them. Run results show nearly linear speedup as a function of the number of processors. For a more sophisticated example, the author presents the logical unit (LU) decomposition in tensor notation, its co-array program, and runtime data. While the LU example shows less-than-linear speedup, it is faster than ScaLAPACK, even when run on a single processor. The paper is an elegant model of clarity, brevity, and utility.