Prepare your environment and copy datasets

This section explains what is needed in order to complete this exercise. It will address loading the appropriate modules and copying the data to your personal scratch directory.

This exercise assumes that you are working on NYUAD's HPC platform (Dalma). In case you are not performing your analysis on Dalma, we have provided links to the tools used in this exercise. The commands and the parameters used should be the same.

Log in to Dalma and copy the data

ssh [email protected]

Change directory and copy the data

cd /scratch/$USER

cp -r /scratch/gencore/January_workshop/de_novo_genome_assembly .

-r means copy recursively (needed when copying directories rather than files).

Explore the contents of the folder

tree -L 4 de_novo_genome_assembly

-L 4 means display or "drill down" 4 levels in the directory tree.

Expected output of the tree command

de_novo_genome_assembly
├── conf
│   └── de_novo_sequencing_training.yml
└── data
    ├── analysis
    │   ├── gapclose.config
    │   └── Sample_test
    │       └── spades
    ├── precomputed_results
    │   ├── gapclose.config
    │   └── Sample_test
    │       ├── abyss_pe
    │       ├── gapcloser_abyss
    │       ├── gapcloser_spades
    │       ├── quast
    │       └── spades
    └── raw
        └── Sample_test
            ├── Sample_test_trimmed_R1_PE.fastq.gz
            ├── Sample_test_trimmed_R1_SE.fastq.gz
            ├── Sample_test_trimmed_R2_PE.fastq.gz
            └── Sample_test_trimmed_R2_SE.fastq.gz

14 directories, 7 files

Required modules (NYUAD-Dalma)

The software modules at NYUAD's HPC (Dalma) have been grouped according to analysis disciplines. For this tutorial, you will need the following modules.

module load gencore/1
module load gencore_dev 
module load gencore_de_novo_genomic/1.0

This will ensure that you have all the available software in your environment, as well as the biox-workflow.pl and hpcrunner.pl scripts, which are used to execute and submit analysis workflows.

Required software

Once the modules are loaded, all the required software will be available for you. Below is a specific list of the software that we will be using for this tutorial and the links to the software pages.

Trimmomatic - http://www.usadellab.org/cms/?page=trimmomatic

Abyss - http://www.bcgsc.ca/platform/bioinfo/software/abyss

SPADES - http://bioinf.spbau.ru/spades

SOAPdenovo-GapCloser - https://github.com/aquaskyline/SOAPdenovo2

Quast - http://bioinf.spbau.ru/quast

PATRIC - https://www.patricbrc.org/

results matching ""

    No results matching ""