Running CLUSTAL W on Big Red
On Big Red, you can use CLUSTAL W to align multiple sequences. A parallel implementation of CLUSTAL W 1.82 (ClustalW-MPI) is installed at:
/N/soft/whatami/clustalw-mpi-1.14The README files are in:
Use the clustalwjob script to run
clustalw. The script submits a job that runs CLUSTAL
W. The clustalwjob script should be in your path by
default, and its manual page should be in your default path
for manual pages. Syntax for clustalwjob is:
Replace options_to_clustalw with command line options,
n with the number of processors to use, and h
with the maximum amount of time the job should be allowed to run. If
you omit the CPUS option, four processors will be used. To
request more than four processors, specify an integer value that is a
multiple of 4. If you specify a value that is not a multiple of 4, the
value will be increased to the next multiple of 4. The maximum number
of processors is 128 (unless you also specify a larger queue; see the
clustalwjob man page). For example, to use 16 processors
to align amino acid sequences in file aaseqs, run:
If you omit the -wallhours option, your job will be
allowed to run for two hours. Use the -wallhours option
to request more time, up to 336 hours (14 days). Queues other than the
default have lower time limits; see the clustalwjob man
page.)
Options to CLUSTAL W are usually available by entering the command with no argument, but this feature is not available in the parallel version. Options are listed in:
/N/soft/whatami/clustalw-mpi-1.14/README.OPTIONSOptions are listed here for your convenience:
CLUSTAL W (1.82) Multiple Sequence Alignments clustalw option list:- -help -check -options -align -newtree=filename -usetree=filename -newtree1=filename -usetree1=filename -newtree2=filename -usetree2=filename -bootstrap -tree -quicktree -convert -interactive -batch -infile=filename -profile1=filename -profile2=filename -type=protein OR dna -profile -sequences -matrix=filename -dnamatrix=filename -negative -noweights -gapopen=f -gapext=f -endgaps -nopgap -nohgap -novgap -hgapresidues=string -maxdiv=n -gapdist=n -pwmatrix=filename -pwdnamatrix=filename -pwgapopen=f -pwgapext=f -ktuple=n -window=n -pairgap=n -topdiags=n -score=percent OR absolute -transweight=f -seed=n -kimura -tossgaps -bootlabels=node OR branch -debug=n -output=gcg OR gde OR pir OR phylip OR nexus -outputtree=nj OR phylip OR dist OR nexus -outfile=filename -outorder=input OR aligned -case=lower OR upper -seqnos=off OR on -nosecstr1 -nosecstr2 -secstrout=structure OR mask OR both OR none -helixgap=n -strandgap=n -loopgap=n -terminalgap=n -helixendin=n -helixendout=n -strandendin=n -strandendout=nWhen you run clustalwjob, you'll receive a message
when your job is submitted to the queue, and another when the job
finishes. To check the status of your job, use the llq
command.
In addition to output files that clustalwjob produces,
such as .aln files, clustalwjob will produce
files with filenames similar to clustalwjob.9999.err and
clustalwjob.9999.out, where 9999 is the
number of your job. Such files contain information that
clustalwjob would print to the screen if you were running
it interactively from the command line.
This document was developed with support from the National Science Foundation (NSF) under Grant No. 0503697 to the University of Chicago and subcontracted to Indiana University. Additional support was provided by IU through its participation in the TeraGrid, which is supported by the NSF under Grants No. 0833618, SCI451237, SCI535258, and SCI504075. Any opinions, findings, and conclusions or recommendations expressed in this material are those of the author(s) and do not necessarily reflect the views of the NSF.
Also see:
Last modified on May 16, 2008.






