Use MPI I/O on IU research supercomputers
MPI I/O is an API standard for parallel I/O that allows multiple processes of a parallel program to access data in a common file simultaneously. MPI I/O maps I/O reads and writes to message-passing sends and receives. Implementing parallel I/O can improve the performance of your parallel application.
Ideally, MPI I/O should be used on a parallel file system, as common systems (for example, NFS, EXT3FS) do not provide the MPI I/O API. However, a popular MPI I/O implementation, ROMIO, allows MPI I/O to work on NFS.
The following example uses MPI I/O functions to copy files. Short explanations for each step are provided below:
#include <stdio.h>
#include <stdlib.h>
#include <mpi.h> /* Include the MPI definitions */
void ErrorMessage(int error, int rank, char* string)
{
fprintf(stderr, "Process %d: Error %d in %s\n", rank, error, string);
MPI_Finalize();
exit(-1);
}
main(int argc, char *argv[])
{
int start, end;
int length;
int error;
char* buffer;
int nprocs;
int myrank;
MPI_Status status;
MPI_File fh;
MPI_Offset filesize;
if (argc != 3)
{
fprintf(stderr, "Usage: %s FileToRead FileToWrite\n", argv[0]);
exit(-1);
}
/* Initialize MPI */
MPI_Init(&argc, &argv);
MPI_Comm_size(MPI_COMM_WORLD, &nprocs);
MPI_Comm_rank(MPI_COMM_WORLD, &myrank);
/* Open file to read */
error = MPI_File_open(MPI_COMM_WORLD, argv[1],
MPI_MODE_RDONLY, MPI_INFO_NULL, &fh);
if(error != MPI_SUCCESS) ErrorMessage(error, myrank, "MPI_File_open");
/* Get the size of file */
error = MPI_File_get_size(fh, &filesize);
if(error != MPI_SUCCESS) ErrorMessage(error, myrank, "MPI_File_get_size");
/* calculate the range for each process to read */
length = filesize / nprocs;
start = length * myrank;
if (myrank == nprocs-1)
end = filesize;
else
end = start + length;
fprintf(stdout, "Proc %d: range = [%d, %d)\n", myrank, start, end);
/* Allocate space */
buffer = (char *)malloc((end - start) * sizeof(char));
if (buffer == NULL) ErrorMessage(-1, myrank, "malloc");
/* Each process read in data from the file */
MPI_File_seek(fh, start, MPI_SEEK_SET);
error = MPI_File_read(fh, buffer, end-start, MPI_BYTE, &status);
if(error != MPI_SUCCESS) ErrorMessage(error, myrank, "MPI_File_read");
/* close the file */
MPI_File_close(&fh);
/* Open file to write */
error = MPI_File_open(MPI_COMM_WORLD, argv[2],
MPI_MODE_WRONLY | MPI_MODE_CREATE, MPI_INFO_NULL, amp;fh);
if(error != MPI_SUCCESS) ErrorMessage(error, myrank, "MPI_File_open");
error = MPI_File_write_at(fh, start, buffer, end-start, MPI_BYTE, amp;status);
if(error != MPI_SUCCESS) ErrorMessage(error, myrank, "MPI_File_write");
/* close the file */
MPI_File_close(amp;fh);
/* Finalize MPI */
MPI_Finalize();
}
In the above example:
- The first step is to establish the MPI environment, so the
MPI_Init(C version)
is required, and must be the first call in every MPI program. - The
MPI_File_open
function opens a file on all processes. Several access modes are supported. The one used in the example (MPI_MODE_RDONLY
) is for read only. - The
MPI_File_get_size
function gives the file size, which is used later to determine the offset for each process. - The
MPI_File_seek
function points to the position in the file where each process will start reading data. - The
MPI_File_read
function reads data into the buffer specified in the second parameter. The size to be read is defined in the third parameter. - The
MPI_File_write_at
function will write data from the buffer (the third parameter) into a specific position in the file given by the second parameter. - The
MPI_File_close
function closes the file opened byMPI_File_open
. - The MPI environment in every process must be terminated by the
MPI_Finalize
function. No MPI calls may be made afterMPI_Finalize
.
Fortran examples
Following are two Fortran examples:
Example 1:
!^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
program create_file
!**************************************************************************
! This is a Fortran 90 program to write data directly to a file by each
! member of an MPI group. It is suitable for large jobs which will not
! fit into core memory (such as "out of core" solvers)
!
! Copyright by the Trustees of Indiana University 2005
***************************************************************************
USE MPI
integer, parameter :: kind_val = 4
integer, parameter :: filesize = 40
integer :: realsize = 4
integer :: rank, ierr, fh, nprocs, num_reals
integer :: i, region
real (kind = kind_val) :: datum
integer, dimension (MPI_STATUS_SIZE) :: status
integer (kind = MPI_OFFSET_KIND) :: offset, empty
! Set filename to output datafile
character (len = *), parameter :: filename = "/u/ac/rays/new_data.dat"
real (kind = kind_val), dimension ( : ), allocatable :: bucket
! Basic MPI set-up
call MPI_INIT(ierr)
call MPI_COMM_RANK(MPI_COMM_WORLD, rank, ierr)
call MPI_COMM_SIZE(MPI_COMM_WORLD, nprocs, ierr)
! Sanity print
print*, "myid is ", rank
! Carve out a piece of the output file and create a data bucket
empty = 0
region = filesize / (nprocs )
offset = ( region * rank )
allocate (bucket(region))
! There is no guarantee that an old file will be clobbered, so wipe out any previous output file
if (rank .eq. 0) then
call MPI_File_delete(filename, MPI_INFO_NULL, ierr)
endif
! Set the file handle to an initial value (this should not be required)
fh = 0
! Open the output file
call MPI_FILE_OPEN(MPI_COMM_WORLD, filename, MPI_MODE_CREATE+MPI_MODE_RDWR, MPI_INFO_NULL, fh, ierr)
! Wait on everyone to catch up.
call MPI_BARRIER(MPI_COMM_WORLD, ierr)
! Do some work and fill up the data bucket
call random_seed()
do i = 1, region
call random_number(datum)
bucket(i) = datum * 1000000. * (rank + 1)
print *, " bucket ",i ,"= ", bucket(i)
enddo
! Basic "belt and suspenders insurance that everyone's file pointer is at the beginning of the output file.
call MPI_FILE_SET_VIEW(fh, empty, MPI_REAL4, MPI_REAL4, 'native', MPI_INFO_NULL, ierr)
! Send the data bucket to the output file in the proper place
call MPI_FILE_WRITE_AT(fh, offset, bucket, region, MPI_REAL4, status, ierr)
! Wait on everyone to finish and close up shop
call MPI_BARRIER(MPI_COMM_WORLD, ierr)
call MPI_FILE_CLOSE(fh, ierr)
call MPI_FINALIZE(ierr)
end program create_file
Example 2:
!^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
program read_file
!**************************************************************************
! This is a Fortran 90 program to read data directly from a file by each
! member of an MPI group. It is suitable for large jobs which will not
! fit into core memory (such as "out of core" solvers)
!
! Copyright by the Trustees of Indiana University 2005
***************************************************************************
USE MPI
integer, parameter :: kind_val = 4
integer, parameter :: filesize = 40
integer :: realsize = 4
integer :: rank, ierr, fh, nprocs, num_reals
integer :: i, region
integer, dimension (MPI_STATUS_SIZE) :: status
integer (kind = MPI_OFFSET_KIND) :: offset, empty
! Set filename to output datafile
character (len = *), parameter :: filename = "/u/ac/rays/new_data.dat"
real (kind = kind_val), dimension ( : ), allocatable :: bucket
! Basic MPI set-up
call MPI_INIT(ierr)
call MPI_COMM_RANK(MPI_COMM_WORLD, rank, ierr)
call MPI_COMM_SIZE(MPI_COMM_WORLD, nprocs, ierr)
! Carve out a piece of the output file and create a data bucket
empty = 0
region = filesize / (nprocs )
offset = (region * rank )
allocate (bucket(region))
! Sanity print
print*, "myid is ", rank
! Set the file handle to an initial value (this should not be required)
fh = 0
! Open the output file
call MPI_FILE_OPEN(MPI_COMM_WORLD, filename, MPI_MODE_RDONLY, MPI_INFO_NULL, fh, ierr)
! Wait on everyone to catch up.
call MPI_BARRIER(MPI_COMM_WORLD, ierr)
! Basic "belt and suspenders insurance that everyone's file pointer is at the beginning of the output file.
call MPI_FILE_SET_VIEW(fh, 0, MPI_REAL4, MPI_REAL4, 'native', MPI_INFO_NULL, ierr)
! Read only the section of the data file each process needs and put data in the data bucket.
call MPI_FILE_READ_AT(fh, offset, bucket, region, MPI_REAL4, status, ierr)
! We could check the values received in the bucket (debug hint)
!
! do i = 1, region
! print *, "my id is ", rank, " and my ", i, "number is ", bucket(i)
! enddo
! Wait on everyone to finish and close up shop
call MPI_BARRIER(MPI_COMM_WORLD, ierr)
call MPI_FILE_CLOSE(fh, ierr)
call MPI_FINALIZE(ierr)
end program read_file
Research computing support at IU is provided by the Research Technologies division of UITS. To ask a question or get help regarding Research Technologies services, including IU's research supercomputers and research storage systems, and the scientific, statistical, and mathematical applications available on those systems, contact UITS Research Technologies. For service-specific support contact information, see Research computing support at IU.
This is document aqpe in the Knowledge Base.
Last modified on 2023-10-04 14:21:35.