On IU's research systems, how do I use MPI I/O?
MPI I/O is an important feature of the MPI-2 standard, allowing multiple processes of a parallel program to access data in a common file simultaneously. Parallel I/O will provide high performance.
Ideally, MPI I/O should be used on a parallel file system, as common systems (e.g., NFS, EXT3FS) do not provide the MPI I/O API. For example, an MPI I/O implementation (e.g., ROMIO) allows MPI I/O to work on NFS.
The following example uses MPI I/O functions to copy files. Short explanations for each step follow the example:
/********************************************************************** Copyright 2005, The Trustees of Indiana University. All right reserved. To compile on IU's Quarry machine, say the file name is mpiio_demo.c, type: soft add @openmpi (if openmpi is not already in your .soft file) mpicc -o mpiio_demo mpiio_demo.c **********************************************************************/ #include <stdio.h> #include <stdlib.h> #include <mpi.h> /* Include the MPI definitions */ void ErrorMessage(int error, int rank, char* string) { fprintf(stderr, "Process %d: Error %d in %s\n", rank, error, string); MPI_Finalize(); exit(-1); } main(int argc, char *argv[]) { int start, end; int length; int error; char* buffer; int nprocs; int myrank; MPI_Status status; MPI_File fh; MPI_Offset filesize; if (argc != 3) { fprintf(stderr, "Usage: %s FileToRead FileToWrite\n", argv[0]); exit(-1); } /* Initialize MPI */ MPI_Init(&argc, &argv); MPI_Comm_size(MPI_COMM_WORLD, &nprocs); MPI_Comm_rank(MPI_COMM_WORLD, &myrank); /* Open file to read */ error = MPI_File_open(MPI_COMM_WORLD, argv[1], MPI_MODE_RDONLY, MPI_INFO_NULL, &fh); if(error != MPI_SUCCESS) ErrorMessage(error, myrank, "MPI_File_open"); /* Get the size of file */ error = MPI_File_get_size(fh, &filesize); if(error != MPI_SUCCESS) ErrorMessage(error, myrank, "MPI_File_get_size"); /* calculate the range for each process to read */ length = filesize / nprocs; start = length * myrank; if (myrank == nprocs-1) end = filesize; else end = start + length; fprintf(stdout, "Proc %d: range = [%d, %d)\n", myrank, start, end); /* Allocate space */ buffer = (char *)malloc((end - start) * sizeof(char)); if (buffer == NULL) ErrorMessage(-1, myrank, "malloc"); /* Each process read in data from the file */ MPI_File_seek(fh, start, MPI_SEEK_SET); error = MPI_File_read(fh, buffer, end-start, MPI_BYTE, &status); if(error != MPI_SUCCESS) ErrorMessage(error, myrank, "MPI_File_read"); /* close the file */ MPI_File_close(&fh); /* Open file to write */ error = MPI_File_open(MPI_COMM_WORLD, argv[2], MPI_MODE_WRONLY | MPI_MODE_CREATE, MPI_INFO_NULL, amp;fh); if(error != MPI_SUCCESS) ErrorMessage(error, myrank, "MPI_File_open"); error = MPI_File_write_at(fh, start, buffer, end-start, MPI_BYTE, amp;status); if(error != MPI_SUCCESS) ErrorMessage(error, myrank, "MPI_File_write"); /* close the file */ MPI_File_close(amp;fh); /* Finalize MPI */ MPI_Finalize(); }- The very first step is to establish the MPI environment, so the
MPI_Init(C version)is required, and must be the first call in every MPI program.
- The function
MPI_File_openopens a file on all processes. Several access modes are supported. The one used in the example,MPI_MODE_RDONLY, is for read only.
- The function
MPI_File_get_sizegives the file size, which will be used later to determine the offset for each process.
- The function
MPI_File_seekpoints to the position in the file where each process will start reading data.
- The function
MPI_File_readreads data into the buffer specified in the second parameter. The size to be read is defined in the third parameter.
- The function
MPI_File_write_atwill write data from buffer (the third parameter) into a specific position in the file given by the second parameter.
- The function
MPI_File_closecloses the file opened by the functionMPI_File_open.
- The MPI environment in every process must be terminated by the
function
MPI_Finalize. No MPI calls may be made afterMPI_Finalize.
Fortran examples
Following are two Fortran examples:
Example 1
!^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ program create_file !************************************************************************** ! This is a Fortran 90 program to write data directly to a file by each ! member of an MPI group. It is suitable for large jobs which will not ! fit into core memory (such as "out of core" solvers) ! ! Copyright by the Trustees of Indiana University 2005 *************************************************************************** USE MPI integer, parameter :: kind_val = 4 integer, parameter :: filesize = 40 integer :: realsize = 4 integer :: rank, ierr, fh, nprocs, num_reals integer :: i, region real (kind = kind_val) :: datum integer, dimension (MPI_STATUS_SIZE) :: status integer (kind = MPI_OFFSET_KIND) :: offset, empty ! Set filename to output datafile character (len = *), parameter :: filename = "/u/ac/rays/new_data.dat" real (kind = kind_val), dimension ( : ), allocatable :: bucket ! Basic MPI set-up call MPI_INIT(ierr) call MPI_COMM_RANK(MPI_COMM_WORLD, rank, ierr) call MPI_COMM_SIZE(MPI_COMM_WORLD, nprocs, ierr) ! Sanity print print*, "myid is ", rank ! Carve out a piece of the output file and create a data bucket empty = 0 region = filesize / (nprocs ) offset = ( region * rank ) allocate (bucket(region)) ! There is no guarantee that an old file will be clobbered, so wipe out any previous output file if (rank .eq. 0) then call MPI_File_delete(filename, MPI_INFO_NULL, ierr) endif ! Set the file handle to an initial value (this should not be required) fh = 0 ! Open the output file call MPI_FILE_OPEN(MPI_COMM_WORLD, filename, MPI_MODE_CREATE+MPI_MODE_RDWR, MPI_INFO_NULL, fh, ierr) ! Wait on everyone to catch up. call MPI_BARRIER(MPI_COMM_WORLD, ierr) ! Do some work and fill up the data bucket call random_seed() do i = 1, region call random_number(datum) bucket(i) = datum * 1000000. * (rank + 1) print *, " bucket ",i ,"= ", bucket(i) enddo ! Basic "belt and suspenders insurance that everyone's file pointer is at the beginning of the output file. call MPI_FILE_SET_VIEW(fh, empty, MPI_REAL4, MPI_REAL4, 'native', MPI_INFO_NULL, ierr) ! Send the data bucket to the output file in the proper place call MPI_FILE_WRITE_AT(fh, offset, bucket, region, MPI_REAL4, status, ierr) ! Wait on everyone to finish and close up shop call MPI_BARRIER(MPI_COMM_WORLD, ierr) call MPI_FILE_CLOSE(fh, ierr) call MPI_FINALIZE(ierr) end program create_file !****************************************************** ! Ray Sheppard, HPCST, RAC, UITS, Indiana University * !******************************************************Example 2
!^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ program read_file !************************************************************************** ! This is a Fortran 90 program to read data directly from a file by each ! member of an MPI group. It is suitable for large jobs which will not ! fit into core memory (such as "out of core" solvers) ! ! Copyright by the Trustees of Indiana University 2005 *************************************************************************** USE MPI integer, parameter :: kind_val = 4 integer, parameter :: filesize = 40 integer :: realsize = 4 integer :: rank, ierr, fh, nprocs, num_reals integer :: i, region integer, dimension (MPI_STATUS_SIZE) :: status integer (kind = MPI_OFFSET_KIND) :: offset, empty ! Set filename to output datafile character (len = *), parameter :: filename = "/u/ac/rays/new_data.dat" real (kind = kind_val), dimension ( : ), allocatable :: bucket ! Basic MPI set-up call MPI_INIT(ierr) call MPI_COMM_RANK(MPI_COMM_WORLD, rank, ierr) call MPI_COMM_SIZE(MPI_COMM_WORLD, nprocs, ierr) ! Carve out a piece of the output file and create a data bucket empty = 0 region = filesize / (nprocs ) offset = (region * rank ) allocate (bucket(region)) ! Sanity print print*, "myid is ", rank ! Set the file handle to an initial value (this should not be required) fh = 0 ! Open the output file call MPI_FILE_OPEN(MPI_COMM_WORLD, filename, MPI_MODE_RDONLY, MPI_INFO_NULL, fh, ierr) ! Wait on everyone to catch up. call MPI_BARRIER(MPI_COMM_WORLD, ierr) ! Basic "belt and suspenders insurance that everyone's file pointer is at the beginning of the output file. call MPI_FILE_SET_VIEW(fh, 0, MPI_REAL4, MPI_REAL4, 'native', MPI_INFO_NULL, ierr) ! Read only the section of the data file each process needs and put data in the data bucket. call MPI_FILE_READ_AT(fh, offset, bucket, region, MPI_REAL4, status, ierr) ! We could check the values received in the bucket (debug hint) ! ! do i = 1, region ! print *, "my id is ", rank, " and my ", i, "number is ", bucket(i) ! enddo ! Wait on everyone to finish and close up shop call MPI_BARRIER(MPI_COMM_WORLD, ierr) call MPI_FILE_CLOSE(fh, ierr) call MPI_FINALIZE(ierr) end program read_file !****************************************************** ! Ray Sheppard, HPCST, RAC, UITS, Indiana University * !******************************************************For a detailed MPI I/O description, see this MPI-2 document.
Last modified on November 23, 2011.







