Indiana University
University Information Technology Services
  
What are archived documents?
Login>>
Login

Login is for authorized groups (e.g., UITS, OVPIT, and TCC) that need access to specialized Knowledge Base documents. Otherwise, simply use the Knowledge Base without logging in.

Close

FASTA programs on Big Red

On this page:


General introduction and usage policy

The FASTA programs search sequence databases for sequences that are similar to a set of query sequences. Both serial and parallel versions are available on Big Red at Indiana University. If you are familiar with the FASTA package and have a large amount of data to analyze, try the parallel version.

Jobs using the parallel version are submitted to the LONG queue, which lets you request up to 128 processors for 336 hours (14 days). Also available are the NORMAL queue, which lets you request up to 1,024 processors for 48 hours (two days), and the DEBUG queue, which lets you request up to 16 processes for up to 15 minutes. For more, see the "Computational resources" section of Big Red usage policies.

Parallel version

To run parallel FASTA programs, use the fastajob shell script. The fastajob script is necessary, because programs in the FASTA package have been compiled to use multiple processes. The fastajob should be in your path by default, and its manual page should be in your path of manual pages. The syntax for fastajob is:

fastajob program_to_run program_options [-CPUS n] [-wallhours h] [-queue queue_name]

Replace program_to_run with the name of the FASTA program to run, program_options with command line options, n with the number of processors to use, and queue_name with the queue to which you're submitting the job. If you omit the -CPUS option, 4 processors will be used. To request more than four processors, specify an integer value that is a multiple of 4. If you specify a value that is not a multiple of 4, the value will be increased to the next multiple of 4. The maximum number of processors is 128 (unless you also specify a larger queue). If you omit the -wallhours option, your job will be allowed to run for 2 hours. Use the -wallhours option to request more time.

For example, to submit a batch job that runs tfastx to query sequences in file myseqs.aa against the FASTA-formatted database refseq.nt, use:

fastajob tfastx myseqs.aa refseq.nt

To run the same program with the same data using 16 processes, use:

fastajob tfastx myseqs.aa refseq.nt -CPUS 16

To run the same program using up to 10 hours of wall clock time, use:

fastajob tfastx myseqs.aa refseq.nt -CPUS 16 -wallhours 10

When you run fastajob, you'll receive a message when your job is submitted to the queue and another when the job finishes. To check the status of your job, use the llq command.

The FASTA package contains many programs with names that include the package version number and the parallel programming library used to build them. The fastajob script will accept either a nickname or the name of the actual binary. Nicknames for most programs are below:

Nickname(s)               Binary Notes
fasta, mpfasta mp35compfa The fasta program compares nucleotide sequences against nucleotide databases, or protein sequences against protein databases.
fastx, mpfastx mp35compfx The fastx program compares a DNA sequence to a protein database in three forward or reverse frames. It uses a simpler algorithm than fasty and runs faster.
fasty, mpfasty mp35compfy The fasty program compares a DNA sequence to a protein database in three forward or reverse frames. It uses a more complex algorithm than fastx and runs more slowly.
tfastx, mptfastx mp35comptfx The tsfastx program compares a protein sequence to a nucleotide database. It uses a simpler and faster algorithm than tfasty.
tfasty, mptfasty mp35comptfy The tfasty program compares a protein sequence to a nucleotide database. It uses a more complex and slower algorithm than tfastx.
fasts, mpfasts mp35compfs The fasts program compares a short peptide sequence to a protein database.
tfasts, mptfasts mp35comptfs The tfasts program compares a short protein sequence to a nucleotide database.
ssearch, mpcompsw mp35compsw This uses the Smith-Waterman algorithm to query nucleotide sequences against a nucleotide data bases, or to query protein sequences against protein databases.

Output from FASTA is stored in a file named after your job number. You can request other output files using the -O option on the command line. The fastajob script produces fastajob.99999.out and fastajob.99999.err, where 99999 is the number of your job. The .out file contains results (Unix standard output), and the .err file contains debugging output and error messages (Unix standard error). FASTA usually prints some processing information to the standard error file even though the information does not reflect errors. It writes results to the output file unless you use the -O option on the command line.

Serial version

A serial version of the FASTA package is in:

/N/soft/whatami/fasta35/bin

To temporarily add that directory to your path and the manual pages to your MANPATH, use:

soft add +fasta

To have them added permanently the next time you log in, use:

echo '+fasta' >> ~/.soft

Help documents and manual pages

For using the parallel or serial version, documentation is available at:

/N/soft/whatami/fasta35/doc/fasta3x.doc

The parallel version uses the MPI parallel programming interface and the PVM parallel programming library; therefore no manual pages are available. Sample datasets are available in:

/N/soft/whatami/fasta35/data

For using the serial version, see the README and documentation files in:

/N/soft/whatami/fasta35/doc

Also for the serial version, see the manual pages in:

/N/soft/whatami/fasta35/man/man1

For more information about the availability of software on the Indiana University shared central systems, see At IU, what software is available on the research computing systems, and how may I request that software be added?

For more information about TeraGrid software, see the following pages in the TeraGrid User Support documentation:

This document was developed with support from the National Science Foundation (NSF) under Grant No. 0503697 to the University of Chicago and subcontracted to Indiana University. Additional support was provided by IU through its participation in the TeraGrid, which is supported by the NSF under Grants No. 0833618, SCI451237, SCI535258, and SCI504075. Any opinions, findings, and conclusions or recommendations expressed in this material are those of the author(s) and do not necessarily reflect the views of the NSF.

This is document awvy in domains all and tgrid-all.
Last modified on July 02, 2008.

Comments/Questions/Corrections

Use this form to offer suggestions, corrections, and additions to the Knowledge Base. We welcome your input!

If you are affiliated with Indiana University and would like assistance with a specific computing problem, please use the Ask a Consultant form, or contact your campus Support Center.

Contact Information

Note: We will reply to your comment at this address. If your message concerns a problem receiving email, please enter an alternate email address.