Indiana University
University Information Technology Services
  
What are archived documents?
Login>>
Login

Login is for authorized groups (e.g., UITS, OVPIT, and TCC) that need access to specialized Knowledge Base documents. Otherwise, simply use the Knowledge Base without logging in.

Close

Using MPI-HMMER on Big Red at IU

Note: Big Red is scheduled to be retired from service in June 2013. Indiana University is replacing it with Big Red II, the fastest university-owned supercomputer in the nation, capable of performing one quadrillion floating-point operations per second (1 petaflop). Based on Cray XE/XK technology, Big Red II has 676 XK nodes (each containing one AMD "Interlagos" processor and one NVIDIA "Kepler" GPU) and 344 XE nodes (each containing two AMD "Abu Dhabi" processors). For more, see Big Red II at Indiana University.

On this page:


Introduction

HMMER is a suite of programs that you can use to create and query hidden Markov models that describe molecular sequences. A parallel port of HMMER known as MPI-HMMER is available on Big Red at Indiana University. It contains all the HMMER programs, but only hmmpfam and hmmsearch have been parallelized.

MPI-HMMER is installed in the directory /N/soft/linux-sles9-ppc64/hmmer-2.3.2-MPI-0.92. Documentation for HMMER programs is available as man pages. You can also visit the MPI-HMMER page for more information.

Using the hmmerjob script to submit parallel hmmsearch jobs

The hmmerjob script is written to submit both hmmpfam and hmmsearch jobs. Use the hmmpfam and hmmsearch options with hmmerjob just as you would with serial versions of these programs. If you use only the hmmpfam and hmmsearch options, a job will be submitted that uses four processes for up to two hours in the NORMAL queue on Big Red. You can use other options to change those settings.

The form of the hmmerjob command using hmmpsearch is:

hmmerjob hmmsearch --mpi options_to_hmmsearch -CPUS [count] -wallhours [n] -queue [queue_name]

Replace items in brackets with your chosen values. The -CPUS option specifies the number of processes to start, -wallhours the length of time the job may run, and -queue the name of the queue that is to receive the job. The SERIAL, NORMAL, LONG, and DEBUG queues are available; see Big Red usage policies.

To run a simple hmmsearch with models in models.hmm and sequences in experiment56.fa in the NORMAL queue using 512 processes for 8 hours, the command is:

hmmerjob hmmsearch models.hmm experiment56.fa -CPUS 512 -wallhours 8 -queue NORMAL

When you run hmmerjob, you'll receive a message that your job has been submitted to the queue. You will receive mail when the job finishes. You can check the status of your job by using the llq command.

Output from the job is stored in a file with a name of the form hmmerjob.999999.0.out, where the nines are replaced by other digits that represent the job ID. Errors and debugging output are stored in a separate file with a name of the form hmmerjob.999999.err.

Using the hmmerjob script to submit parallel hmmpfam jobs

Because hmmpfam is more I/O intensive than hmmsearch, it is better to index the HMM database first. This step is required only the first time you use a particular database. You can reuse the resulting .ssi file in subsequent searches.

First, add the SoftEnv key (softkey):

soft add +mpi_hmmer-0.92-mpich-ibm-32

Then index the HMM database:

serialjob hmmindex /path/to/HMM/database

Replace /path/to/HMM/database with the path to the database you will be using.

The form of the hmmerjob command using hmmpfam is:

hmmerjob hmmpfam --mpi /path/to/HMM/database /path/to/search/sequence [other_options_to_hmmpfam]
-CPUS [count] -wallhours [n] -queue [queue_name]

The rules for using hmmpfam are the same as those described above for hmmsearch. For example, suppose you would like to compare all the sequences in a file named unknowns.fa with all the models in models.hmm and select matches that have an E score of 1 or better, using four processes for up to two hours. The command would be:

hmmerjob hmmpfam --mpi -E 1 models.hmm unknowns.fa

To run the same job using 64 processes for up to 72 hours, you would use:

hmmerjob hmmpfam --mpi -E 1 models.hmm unknowns.fa -CPUS 64 -wallhours 72

Using non-parallel HMMER programs

The serial (single-process) HMMER programs are also available on Big Red. The simplest way to use them is to put them on your path by using the +mpi-hmmer softkey. To permanently make HMMER available at the command prompt, run the commands:

echo +mpi-hmmer-0.92-mpich-ibm-32 >> ~/.soft resoft

You should then be able to run serial HMMER programs, and all HMMER manual pages should be available to you by calling the man command, e.g., man hmmbuild . If you need to run serial HMMER programs in batch jobs, the simplest way to do so is to use the serialjob script. A manual page for it is available on Big Red.

This is document awwb in domain all.
Last modified on October 12, 2012.

I need help with a computing problem

  • Fill out this form to submit your issue to the UITS Support Center.
  • Please note that you must be affiliated with Indiana University to receive support.
  • All fields are required.



Please provide your IU email address. If you currently have a problem receiving email at your IU account, enter an alternate email address.

I have a comment for the Knowledge Base

  • Fill out this form to submit your comment to the IU Knowledge Base.
  • If you are affiliated with Indiana University and need help with a computing problem, please use the I need help with a computing problem section above, or contact your campus Support Center.