Indiana University
University Information Technology Services
  
What are archived documents?
Login>>
Login

Login is for authorized groups (e.g., UITS, OVPIT, and TCC) that need access to specialized Knowledge Base documents. Otherwise, simply use the Knowledge Base without logging in.

Close

On IU's research systems, how do I use MPI I/O?

MPI I/O is an important feature of the MPI-2 standard, allowing multiple processes of a parallel program to access data in a common file simultaneously. Parallel I/O will provide high performance.

Ideally, MPI I/O should be used on a parallel file system, such as GPFS, as common systems (e.g., NFS, EXT3FS) do not provide the MPI I/O API. For example, an MPI I/O implementation such as ROMIO allows MPI I/O to work on NFS.

The following example uses MPI I/O functions to copy files. Short explanations for each step follow the example:

/********************************************************************** Copyright 2005, The Trustees of Indiana University. All right reserved. To compile on IU's Quarry machine, say the file name is mpiio_demo.c, type: soft add @openmpi (if openmpi is not already in your .soft file) mpicc -o mpiio_demo mpiio_demo.c **********************************************************************/ #include <stdio.h> #include <stdlib.h> #include <mpi.h> /* Include the MPI definitions */ void ErrorMessage(int error, int rank, char* string) { fprintf(stderr, "Process %d: Error %d in %s\n", rank, error, string); MPI_Finalize(); exit(-1); } main(int argc, char *argv[]) { int start, end; int length; int error; char* buffer; int nprocs; int myrank; MPI_Status status; MPI_File fh; MPI_Offset filesize; if (argc != 3) { fprintf(stderr, "Usage: %s FileToRead FileToWrite\n", argv[0]); exit(-1); } /* Initialize MPI */ MPI_Init(&argc, &argv); MPI_Comm_size(MPI_COMM_WORLD, &nprocs); MPI_Comm_rank(MPI_COMM_WORLD, &myrank); /* Open file to read */ error = MPI_File_open(MPI_COMM_WORLD, argv[1], MPI_MODE_RDONLY, MPI_INFO_NULL, &fh); if(error != MPI_SUCCESS) ErrorMessage(error, myrank, "MPI_File_open"); /* Get the size of file */ error = MPI_File_get_size(fh, &filesize); if(error != MPI_SUCCESS) ErrorMessage(error, myrank, "MPI_File_get_size"); /* calculate the range for each process to read */ length = filesize / nprocs; start = length * myrank; if (myrank == nprocs-1) end = filesize; else end = start + length; fprintf(stdout, "Proc %d: range = [%d, %d)\n", myrank, start, end); /* Allocate space */ buffer = (char *)malloc((end - start) * sizeof(char)); if (buffer == NULL) ErrorMessage(-1, myrank, "malloc"); /* Each process read in data from the file */ MPI_File_seek(fh, start, MPI_SEEK_SET); error = MPI_File_read(fh, buffer, end-start, MPI_BYTE, &status); if(error != MPI_SUCCESS) ErrorMessage(error, myrank, "MPI_File_read"); /* close the file */ MPI_File_close(&fh); /* Open file to write */ error = MPI_File_open(MPI_COMM_WORLD, argv[2], MPI_MODE_WRONLY | MPI_MODE_CREATE, MPI_INFO_NULL, amp;fh); if(error != MPI_SUCCESS) ErrorMessage(error, myrank, "MPI_File_open"); error = MPI_File_write_at(fh, start, buffer, end-start, MPI_BYTE, amp;status); if(error != MPI_SUCCESS) ErrorMessage(error, myrank, "MPI_File_write"); /* close the file */ MPI_File_close(amp;fh); /* Finalize MPI */ MPI_Finalize(); }
  • The very first step is to establish the MPI environment, so the MPI_Init(C version) is required and must be the first call in every MPI program.

  • The function MPI_File_open opens a file on all processes. Several access modes are supported. The one used in the example, MPI_MODE_RDONLY, is for read only.

  • The function MPI_File_get_size gives the file size, which will be used later on to determine the offset for each process.

  • The function MPI_File_seek points to the position in the file where each process will start reading data.

  • The function MPI_File_read reads data into the buffer specified in the second parameter. The size to be read is defined in the third parameter.

  • The function MPI_File_write_at will write data from buffer (the third parameter) into a specific position in the file given by the second parameter.

  • The function MPI_File_close closes the file opened by the function MPI_File_open.

  • The MPI environment in every process must be terminated by the function MPI_Finalize. No MPI calls may be made after MPI_Finalize.

Fortran examples

Example 1

Following are two Fortran examples:

!^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ program create_file !************************************************************************** ! This is a Fortran 90 program to write data directly to a file by each ! member of an MPI group. It is suitable for large jobs which will not ! fit into core memory (such as "out of core" solvers) ! ! Copyright by the Trustees of Indiana University 2005 *************************************************************************** USE MPI integer, parameter :: kind_val = 4 integer, parameter :: filesize = 40 integer :: realsize = 4 integer :: rank, ierr, fh, nprocs, num_reals integer :: i, region real (kind = kind_val) :: datum integer, dimension (MPI_STATUS_SIZE) :: status integer (kind = MPI_OFFSET_KIND) :: offset, empty ! Set filename to output datafile character (len = *), parameter :: filename = "/u/ac/rays/new_data.dat" real (kind = kind_val), dimension ( : ), allocatable :: bucket ! Basic MPI set-up call MPI_INIT(ierr) call MPI_COMM_RANK(MPI_COMM_WORLD, rank, ierr) call MPI_COMM_SIZE(MPI_COMM_WORLD, nprocs, ierr) ! Sanity print print*, "myid is ", rank ! Carve out a piece of the output file and create a data bucket empty = 0 region = filesize / (nprocs ) offset = ( region * rank ) allocate (bucket(region)) ! There is no guarantee that an old file will be clobbered, so wipe out any previous output file if (rank .eq. 0) then call MPI_File_delete(filename, MPI_INFO_NULL, ierr) endif ! Set the file handle to an initial value (this should not be required) fh = 0 ! Open the output file call MPI_FILE_OPEN(MPI_COMM_WORLD, filename, MPI_MODE_CREATE+MPI_MODE_RDWR, MPI_INFO_NULL, fh, ierr) ! Wait on everyone to catch up. call MPI_BARRIER(MPI_COMM_WORLD, ierr) ! Do some work and fill up the data bucket call random_seed() do i = 1, region call random_number(datum) bucket(i) = datum * 1000000. * (rank + 1) print *, " bucket ",i ,"= ", bucket(i) enddo ! Basic "belt and suspenders insurance that everyone's file pointer is at the beginning of the output file. call MPI_FILE_SET_VIEW(fh, empty, MPI_REAL4, MPI_REAL4, 'native', MPI_INFO_NULL, ierr) ! Send the data bucket to the output file in the proper place call MPI_FILE_WRITE_AT(fh, offset, bucket, region, MPI_REAL4, status, ierr) ! Wait on everyone to finish and close up shop call MPI_BARRIER(MPI_COMM_WORLD, ierr) call MPI_FILE_CLOSE(fh, ierr) call MPI_FINALIZE(ierr) end program create_file !****************************************************** ! Ray Sheppard, HPCST, RAC, UITS, Indiana University * !******************************************************

Example 2

!^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ program read_file !************************************************************************** ! This is a Fortran 90 program to read data directly from a file by each ! member of an MPI group. It is suitable for large jobs which will not ! fit into core memory (such as "out of core" solvers) ! ! Copyright by the Trustees of Indiana University 2005 *************************************************************************** USE MPI integer, parameter :: kind_val = 4 integer, parameter :: filesize = 40 integer :: realsize = 4 integer :: rank, ierr, fh, nprocs, num_reals integer :: i, region integer, dimension (MPI_STATUS_SIZE) :: status integer (kind = MPI_OFFSET_KIND) :: offset, empty ! Set filename to output datafile character (len = *), parameter :: filename = "/u/ac/rays/new_data.dat" real (kind = kind_val), dimension ( : ), allocatable :: bucket ! Basic MPI set-up call MPI_INIT(ierr) call MPI_COMM_RANK(MPI_COMM_WORLD, rank, ierr) call MPI_COMM_SIZE(MPI_COMM_WORLD, nprocs, ierr) ! Carve out a piece of the output file and create a data bucket empty = 0 region = filesize / (nprocs ) offset = (region * rank ) allocate (bucket(region)) ! Sanity print print*, "myid is ", rank ! Set the file handle to an initial value (this should not be required) fh = 0 ! Open the output file call MPI_FILE_OPEN(MPI_COMM_WORLD, filename, MPI_MODE_RDONLY, MPI_INFO_NULL, fh, ierr) ! Wait on everyone to catch up. call MPI_BARRIER(MPI_COMM_WORLD, ierr) ! Basic "belt and suspenders insurance that everyone's file pointer is at the beginning of the output file. call MPI_FILE_SET_VIEW(fh, 0, MPI_REAL4, MPI_REAL4, 'native', MPI_INFO_NULL, ierr) ! Read only the section of the data file each process needs and put data in the data bucket. call MPI_FILE_READ_AT(fh, offset, bucket, region, MPI_REAL4, status, ierr) ! We could check the values received in the bucket (debug hint) ! ! do i = 1, region ! print *, "my id is ", rank, " and my ", i, "number is ", bucket(i) ! enddo ! Wait on everyone to finish and close up shop call MPI_BARRIER(MPI_COMM_WORLD, ierr) call MPI_FILE_CLOSE(fh, ierr) call MPI_FINALIZE(ierr) end program read_file !****************************************************** ! Ray Sheppard, HPCST, RAC, UITS, Indiana University * !******************************************************

You can find a detailed MPI I/O description in this MPI-2 document.

This document was developed with support from the National Science Foundation (NSF) under Grant No. 0503697 to the University of Chicago and subcontracted to Indiana University. Additional support was provided by IU through its participation in the TeraGrid, which is supported by the NSF under Grants No. 0833618, SCI451237, SCI535258, and SCI504075. Any opinions, findings, and conclusions or recommendations expressed in this material are those of the author(s) and do not necessarily reflect the views of the NSF.

This is document aqpe in domains all and tgrid-all.
Last modified on July 15, 2009.

Comments/Questions/Corrections

Use this form to offer suggestions, corrections, and additions to the Knowledge Base. We welcome your input!

If you are affiliated with Indiana University and would like assistance with a specific computing problem, please use the Ask a Consultant form, or contact your campus Support Center.

Contact Information

Note: We will reply to your comment at this address. If your message concerns a problem receiving email, please enter an alternate email address.