Chapter 13. THE VULCAN PARALLELIZATION ALGORITHM


TABLE OF CONTENTS CLOSE MANUAL



The VULCAN CFD code was parallelized using generic MPI (Message Passing Interface) libraries in a data-parallel fashion. The structured multi-block formulation of the software offered a natural domain decomposition methodology which was directly exploited in the parallel algorithm. Function-parallel capabilities using shared memory directives (OpenMP, etc.) can be combined at some later time with the current data-parallel algorithm, but this level of parallelism was beyond the scope of the present effort. In the current data-parallel paradigm, MPI library calls are utilized to handle inter-processor communication, as well as control the flow of the program.

The serial version of the VULCAN software stored data globally via arrays of the form:

V(nj,nk,ni,nv,nblk,nlev) -> (npts*nv*nblk*nlev)

where nj, nk, ni are the grid dimensions in the "j", "k", and "i" directions, nv is the number of variables in the array, nblk is the number of grid blocks, and nlev is the number of grid levels present. At first glance, this 1-D array storage ordering appears less than ideal (from a load balancing perspective) for parallel computing because the grid level is the outermost dimension rather than the block number. However, this shortcoming can be overcome (and actually turned into an advantage) by redefining what "nblk" represents.

The idea behind the parallelization strategy was to redefine nblk as the number of blocks to be stored on a given processor. A new parameter, nbtot, was introduced to represent the total number of grid blocks, i.e.

where np is the number of processors used in the simulation. By defining a processor dependent block number, the core solver did not have to be altered. The mapping of the processor dependent block ordering to the global block ordering is only required when swapping data between processors and when performing data input/output. This greatly reduced the parallelization effort while maintaining all of the advantages of the original array ordering
at the processor (computational) level.

An outline of how this strategy was implemented into the VULCAN software is given below:

The following input parameters were added to the VULCAN input file to control the parallel version of the code.


TABLE OF CONTENTS CLOSE MANUAL