ARCHIVED: Run AlphaFold 2 on Carbonate at IU
On this page:
Overview
An implementation of the inference pipeline of AlphaFold v2.0, an application that predicts the 3D structure of arbitrary proteins, is available on ARCHIVED: Carbonate at Indiana University. This is a new model that was entered in CASP14 and published in Nature.
The Indiana University research supercomputers use the Slurm workload manager for resource management and job scheduling; see Use Slurm to submit and manage jobs on IU's research computing systems.
In Slurm, compute resources are grouped into logical sets called partitions, which are essentially job queues. To view details about available partitions and nodes, use the sinfo
command; for more about using sinfo
, see the View partition and node information section of Use Slurm to submit and manage jobs on IU's research computing systems.
To take advantage of the package's GPU capabilities, you should run AlphaFold on Carbonate's GPU partition. For instructions on setting up a project that provides access to Carbonate's GPU partition, see Use RT Projects to request and manage access to specialized Research Technologies resources. For more about running GPU jobs, see Run GPU-accelerated jobs on Quartz or Big Red 200 at IU.
Set up your user environment
On the research supercomputers at Indiana University, the Modules environment management system provides a convenient method for dynamically customizing your software environment.
To use AlphaFold on Carbonate, you first must add the Anaconda and AlphaFold modules to your user environment; on the command line, enter:
module load anaconda/python3.8/2020.07 alphafold
Submit an AlphaFold job
To submit an AlphaFold batch job on Carbonate, create a Slurm submission script (for example, my_alphafold_job.script
) to specify the application you want to run and the resources required to run it. For example, a Slurm submission script for running a batch AlphaFold job on Carbonate may look similar to the following:
#!/bin/bash
#SBATCH -J alphafold_example
#SBATCH -p gpu
#SBATCH --nodes=1
#SBATCH --ntasks-per-node=1
#SBATCH --cpus-per-task=24
#SBATCH --gpus-per-node=1
#SBATCH --time=04:00:00
#SBATCH -A slurm-account-name
module load anaconda/python3.8/2020.07
module load alphafold
export TF_FORCE_UNIFIED_MEMORY='1'
export XLA_PYTHON_CLIENT_MEM_FRACTION='4.0'
python /N/soft/rhel7/alphafold/alphafold/run_alphafold.py \
--fasta_paths=T1050.fasta \
--output_dir=-/N/slate/$USER \
--model_names=model_1,model_2,model_3,model_4,model_5 \
--max_template_date=2020-05-14 \
--preset=full_dbs \
--benchmark=False \
--logtostderr \
--flagfile=alphafold_flags
In the above example:
- The first line indicates that the script should be read using the Bash command interpreter.
- The next lines are
#SBATCH
directives used to pass options to thesbatch
command:-J alphafold_example
names the jobalphafold_example
.-p gpu
specifies that the job should run in the GPU partition.--nodes=1
requests that a minimum of one node be allocated to this job.--ntasks-per-node=1
specifies that one task should be launched per node.--cpus-per-task=24
allots 24 processors to the task.--gpus-per-node=1
allots one GPU to the task.--time=04:00:00
allots a maximum of four hours for the job to run.-A slurm-account-name
indicates the Slurm Account Name to which resources used by this job should be charged.Users belonging to projects approved through RT Projects can find their allocation's Slurm Account Name on the "Home" page in RT Projects; look under "Submitting Slurm Jobs with your Project's Account"; alternatively, on the "Home" page, under "Allocations", select an allocation and look in the table under "Allocation Attributes".
- The final line calls the AlphaFold script:
--fasta_paths=T1050.fasta
specifies the protein that should be folded.--output_dir=/N/slate/$USER
specifies where the folded protein should be placed. UITS Research Technologies recommends using a directory on a large-capacity drive, such as Slate, rather a directory in your home directory space..
For descriptions of the other flags, refer to the AlphaFold documentation.
Notes
- AlphaFold uses large databases, which are over 2 TB in total size, to process the proteins. These databases are available on high-speed flash storage on Slate-Scratch and should not be downloaded separately by users. The locations of the databases are specified in the
alphafold_flags
file, which should look like this:--jackhmmer_binary_path=/N/soft/rhel7/alphafold/conda/bin/jackhmmer --uniref90_database_path=/N/scratch/afdb/nvme/13AUG2021/uniref90/uniref90.fasta --mgnify_database_path=/N/scratch/afdb/nvme/13AUG2021/mgnify/mgy_clusters_2018_12.fa --pdb70_database_path=/N/scratch/afdb/nvme/13AUG2021/pdb70/pdb70 --data_dir=/N/scratch/afdb/nvme/13AUG2021 --template_mmcif_dir=/N/scratch/afdb/nvme/13AUG2021/pdb_mmcif/mmcif_files --obsolete_pdbs_path=/N/scratch/afdb/nvme/13AUG2021/pdb_mmcif/obsolete.dat --uniclust30_database_path=/N/scratch/afdb/nvme/13AUG2021/uniclust30/uniclust30_2018_08/uniclust30_2018_08 --bfd_database_path=/N/scratch/afdb/nvme/13AUG2021/bfd/bfd_metaclust_clu_complete_id30_c90_final_seq.sorted_opt --hhblits_binary_path=/N/soft/rhel7/alphafold/conda/bin/hhblits --hhsearch_binary_path=/N/soft/rhel7/alphafold/conda/bin/hhsearch --kalign_binary_path=/N/soft/rhel7/alphafold/conda/bin/kalign
- If you choose to run AlphaFold from your own directory, note that a quirk in the application requires
stereo_chemical_props.txt
to be available in your directory. You can copy it from the/N/soft/rhel7/alphafold/example/alphafold/common
directory.
Get help
If you need help or have a question about using AlphaFold on Carbonate, contact the UITS Research Applications and Deep Learning team.
This is document bhwm in the Knowledge Base.
Last modified on 2023-12-17 07:04:55.