FASTA programs on Big Red
On this page:
- General introduction and usage policy
- Parallel version
- Serial version
- Help documents and manual pages
General introduction and usage policy
The FASTA programs search sequence databases for sequences that are similar to a set of query sequences. Both serial and parallel versions are available on Big Red at Indiana University. If you are familiar with the FASTA package and have a large amount of data to analyze, try the parallel version.
Jobs using the parallel version are submitted to the LONG queue, which lets you request up to 128 processors for 336 hours (14 days). Also available are the NORMAL queue, which lets you request up to 1,024 processors for 48 hours (two days), and the DEBUG queue, which lets you request up to 16 processes for up to 15 minutes. For more, see the "Computational resources" section of Big Red usage policies.
Parallel version
To run parallel FASTA programs, use the fastajob shell
script. The fastajob script is necessary, because
programs in the FASTA package have been compiled to use multiple
processes. The fastajob should be in your path by
default, and its manual page should be in your path of
manual pages. The syntax for fastajob is:
Replace program_to_run with the name of the FASTA
program to run, program_options with command line
options, n with the number of processors to use, and
queue_name with the queue to which you're submitting the
job. If you omit the -CPUS option, 4 processors will be
used. To request more than four processors, specify an integer value
that is a multiple of 4. If you specify a value that is not a multiple
of 4, the value will be increased to the next multiple of 4. The
maximum number of processors is 128 (unless you also specify a larger
queue). If you omit the -wallhours option, your job will
be allowed to run for 2 hours. Use the -wallhours option
to request more time.
For example, to submit a batch job that runs tfastx to
query sequences in file myseqs.aa against the
FASTA-formatted database refseq.nt, use:
To run the same program with the same data using 16 processes, use:
fastajob tfastx myseqs.aa refseq.nt -CPUS 16To run the same program using up to 10 hours of wall clock time, use:
fastajob tfastx myseqs.aa refseq.nt -CPUS 16 -wallhours 10When you run fastajob, you'll receive a message when
your job is submitted to the queue and another when the job
finishes. To check the status of your job, use the llq
command.
The FASTA package contains many programs with names that include
the package version number and the parallel programming library used
to build them. The fastajob script will accept either a
nickname or the name of the actual binary. Nicknames for most programs
are below:
| Nickname(s) | Binary | Notes |
|---|---|---|
fasta, mpfasta |
mp35compfa
|
The fasta program compares nucleotide sequences
against nucleotide databases, or protein sequences against protein
databases.
|
fastx, mpfastx |
mp35compfx
|
The fastx program compares a DNA sequence to a protein
database in three forward or reverse frames. It uses a simpler
algorithm than fasty and runs faster.
|
fasty, mpfasty |
mp35compfy
|
The fasty program compares a DNA sequence to a protein
database in three forward or reverse frames. It uses a more complex
algorithm than fastx and runs more slowly.
|
tfastx, mptfastx |
mp35comptfx
|
The tsfastx program compares a protein sequence to a
nucleotide database. It uses a simpler and faster algorithm than
tfasty.
|
tfasty, mptfasty |
mp35comptfy
|
The tfasty program compares a protein sequence to a
nucleotide database. It uses a more complex and slower algorithm than
tfastx.
|
fasts, mpfasts |
mp35compfs
|
The fasts program compares a short peptide sequence to
a protein database.
|
tfasts, mptfasts |
mp35comptfs
|
The tfasts program compares a short protein sequence
to a nucleotide database.
|
ssearch, mpcompsw |
mp35compsw
|
This uses the Smith-Waterman algorithm to query nucleotide sequences against a nucleotide data bases, or to query protein sequences against protein databases. |
Output from FASTA is stored in a file named after your job
number. You can request other output files using the -O
option on the command line. The fastajob script produces
fastajob.99999.out and fastajob.99999.err,
where 99999 is the number of your job. The
.out file contains results (Unix standard output), and
the .err file contains debugging output and error
messages (Unix standard error). FASTA usually prints some processing
information to the standard error file even though the information
does not reflect errors. It writes results to the output file unless
you use the -O option on the command line.
Serial version
A serial version of the FASTA package is in:
/N/soft/whatami/fasta35/binTo temporarily add that directory to your path and the manual pages to your MANPATH, use:
soft add +fastaTo have them added permanently the next time you log in, use:
echo '+fasta' >> ~/.softHelp documents and manual pages
For using the parallel or serial version, documentation is available at:
/N/soft/whatami/fasta35/doc/fasta3x.docThe parallel version uses the MPI parallel programming interface and the PVM parallel programming library; therefore no manual pages are available. Sample datasets are available in:
/N/soft/whatami/fasta35/dataFor using the serial version, see the README and
documentation files in:
Also for the serial version, see the manual pages in:
/N/soft/whatami/fasta35/man/man1For more information about the availability of software on the Indiana University shared central systems, see At IU, what software is available on the research computing systems, and how may I request that software be added?
For more information about TeraGrid software, see:
- How can I find out what software is available on the TeraGrid?
- How can I see what site-specific software is installed on a TeraGrid resource?
This document was developed with support from the National Science Foundation (NSF) under Grant No. 0503697 to the University of Chicago and subcontracted to Indiana University. Additional support was provided by IU through its participation in the TeraGrid, which is supported by the NSF under Grants No. 0833618, SCI451237, SCI535258, and SCI504075. Any opinions, findings, and conclusions or recommendations expressed in this material are those of the author(s) and do not necessarily reflect the views of the NSF.
Also see:
Last modified on July 02, 2008.






