ARCHIVED: How do I combine several serial jobs into one MPI parallel job?
Note: Although this document assumes you are using the Research SP at Indiana University, it applies to other platforms as well.
An MPI job consists of multiple processes executing the
same piece of code in parallel. Each process is uniquely identified by
a "rank" variable. Depending on this rank, different processes execute
different parts of the code at the same time. The following Fortran 90
program mytest.F illustrates this idea:
program mytest
include "mpif.h"
integer myrank
integer n
integer ierr
call mpi_init(ierr)
call mpi_comm_rank(MPI_COMM_WORLD, myrank, ierr)
if (myrank==0) then
call system("ls")
write(*,*)
else
call system("echo ""This is a test""")
end if
call mpi_finalize(ierr)
end
The above MPI program uses two processes to execute two different
shell commands in parallel. One lists the content of the current
directory, and the other one prints the text "This is a
test" to the standard output. Since mpi_init()
initializes an MPI execution environment and creates a group
(MPI_COMM_WORLD) that contains all the processes this MPI
job has, it must be called before any other MPI call in the
program. The mpi_comm_rank() call retrieves the rank
(myrank) of the calling process from this
group. Different processes get different rank values (starting at
0 and ascending) when they perform this call. Therefore,
if two processes are used, one process gets 0 for
myrank, and the other gets 1. Next, the
process with rank 0 will execute the ls
command, while the process with rank 1 executes the
echo command. The commands are issued via the
system subroutine, which passes its string argument to
the sh command as input. The sh command will
interpret that input as a command and run it. To terminate the MPI
execution environment, mpi_finalize() is used. You
would compile the program using:
The LoadLeveler job submission script for this program would be:
#@ class = pb #@ job_type = parallel #@ network.MPI = css0,shared,IP #@ node = 1,1 #@ tasks_per_node = 2 #@ wall_clock_limit = 0:10:00 #@ initialdir = . #@ executable = /bin/poe #@ arguments = mytest -stdoutmode ordered -labelio yes #@ environment = COPY_ALL; MP_EUILIB=ip; MP_INFO_LEVEL=2; #@ output = mytest.out #@ error = mytest.error #@ queueThe output of the program would be:
0:hostlist 0:mytest 0:script 0:test.F 0:test.error 0:test.out 0: 1:This is a test
In the job submission script, the -stdoutmode ordered
argument to POE is to instruct it to order the output from different
processes; otherwise, output from different processes executing in
parallel may get interleaved. The -labelio yes argument
instructs the POE to label the lines of output according to the rank
of the processes.
As another example, suppose there is a program called
myprog that takes input from an input file, performs
certain operations, and then outputs the result into an output
file. All these operations are done in one directory. Assume that you
want to apply myprog to several input files that are
distributed into several directories (/N/u/someuser/dir1,
/N/u/someuser/dir2, /N/u/someuser/dir3), and
in each directory there is a copy of myprog. Then the
following Fortran 90 program, myprogmpi.F, bundles these
serial jobs into one parallel job:
It's best to request three processors in the job submission script; you can instruct each process to run a different executable, too.
Using this idea, you could combine several serial jobs into one
parallel job. (For example, instead of submitting several jobs to the
class a or class b queue on the Research SP, you can submit one MPI
job to the pb queue.) Suppose you have n serial jobs; you can create n MPI
processes. Each process is in charge of executing one of the serial
jobs (e.g., using the system() call). As long as these
serial jobs do not depend on or interfere with each other, they will
execute in parallel and yield correct results.
Dependency occurs when the correct execution of one serial program
depends on the result of another one. In this case, you should
consider using job steps (keywords step_name and
dependency) in the LoadLeveler job submission script. For
example, you can write a script that indicates step1 will run
with the executable called myprogram1, and step2 will run
only if LoadLeveler removes step1 from the system. If step2 does run,
the executable called myprogram2 is run. Following is
an example:
# @ class = a
# @ job_type = serial
# @ node_usage=shared
# @ environment = COPY_ALL; MP_EUILIB=IP; MP_SHARED_MEMORY=yes;
# @ output = /N/u/someuser/$(step_name).in
# @ output = /N/u/someuser/$(step_name).out
# @ error = /N/u/someuser/$(step_name).err
# Beginning of step1
# @ step_name = step1
# @ executable = myprogram1
# @ arguments=arg1
# @ queue
# Beginning of step2
# @ step_name = step2
# @ dependency = (step1 == CC_REMOVED)
# @ executable = myprogram2
# @ arguments=arg2
# @ queue
CC_REMOVED is a job return code indicating the job step
is removed from the system. For more information on dependency and job
return codes and their applications in job steps, visit:
You will need to enter your Indiana University Network ID username and passphrase to access this resource.
An example of interference among serial programs is two serial
programs all writing to the same output file (e.g., standard output).
For example, if myprog in the above example writes to the
standard output, then several instances of myprog run
concurrently will garble the standard output file. For this specific
example, changing the system call to call system("cd
/N/u/someuser/dir1; myprog input > stdout1") will solve this
problem.
If you do not resolve the problem, you should realize that the output file will contain mixed output from both programs, and thus will need proper interpretation.
For a tutorial on MPI programming, refer to the Parallel Programming Tutorial at:
http://www.iu.edu/~rac/hpc/mpi_tutorial/index.htmlLast modified on July 17, 2006.







