Prepare your environment and copy datasets
This section explains what is needed in order to complete this exercise. It will address loading the appropriate modules and copying the data to your personal scratch directory.
This exercise assumes that you are working on NYUAD's HPC platform (Dalma). In case you are not performing your analysis on Dalma, we have provided links to the tools used in this exercise. The commands and the parameters used should be the same.
Log in to Dalma and copy the data
ssh [email protected]
Change directory and copy the data
cd /scratch/$USER
cp -r /scratch/gencore/January_workshop/de_novo_genome_assembly .
-r means copy recursively (needed when copying directories rather than files).
Explore the contents of the folder
tree -L 4 de_novo_genome_assembly
-L 4 means display or "drill down" 4 levels in the directory tree.
Expected output of the tree command
de_novo_genome_assembly
├── conf
│ └── de_novo_sequencing_training.yml
└── data
├── analysis
│ ├── gapclose.config
│ └── Sample_test
│ └── spades
├── precomputed_results
│ ├── gapclose.config
│ └── Sample_test
│ ├── abyss_pe
│ ├── gapcloser_abyss
│ ├── gapcloser_spades
│ ├── quast
│ └── spades
└── raw
└── Sample_test
├── Sample_test_trimmed_R1_PE.fastq.gz
├── Sample_test_trimmed_R1_SE.fastq.gz
├── Sample_test_trimmed_R2_PE.fastq.gz
└── Sample_test_trimmed_R2_SE.fastq.gz
14 directories, 7 files
Required modules (NYUAD-Dalma)
The software modules at NYUAD's HPC (Dalma) have been grouped according to analysis disciplines. For this tutorial, you will need the following modules.
module load gencore/1
module load gencore_dev
module load gencore_de_novo_genomic/1.0
This will ensure that you have all the available software in your environment, as well as the biox-workflow.pl and hpcrunner.pl scripts, which are used to execute and submit analysis workflows.
Required software
Once the modules are loaded, all the required software will be available for you. Below is a specific list of the software that we will be using for this tutorial and the links to the software pages.
Trimmomatic - http://www.usadellab.org/cms/?page=trimmomatic
Abyss - http://www.bcgsc.ca/platform/bioinfo/software/abyss
SPADES - http://bioinf.spbau.ru/spades
SOAPdenovo-GapCloser - https://github.com/aquaskyline/SOAPdenovo2
Quast - http://bioinf.spbau.ru/quast
PATRIC - https://www.patricbrc.org/