Comparing assemblies to the reference

The Quast program can be used to generate similar metrics as the assemblathon_stat.pl script, pluss some more and some visualisations.

Program Options Explanation
Quast   Evaluating genome assemblies
  -o name of output folder
  -R Reference genome
  -G File with positions of genes in the reference (see manual)
  -T number of threads (cpu’s) to use
  sequences.fasta one or more files with assembled sequences
  -l comma-separates list of names for the assemblies, e.g. “assembly 1”, “assembly 2” (in the same order as the sequence files)
  –scaffolds input sequences are scaffolds, not contigs. They will be split at 10 N’s or more to analyse contigs (‘broken’ assembly)
  –est-ref-size estimated reference genome size (when not provided)
  –gene-finding apply GenemarkS for gene finding

See the manual for information on the output of Quast: http://quast.bioinf.spbau.ru/manual.html#sec3

Running Quast

TIP: log in to the cod3 server using the Y flag with ssh:

ssh -Y username@cod3.hpc.uio.no

This becomes useful at the end.

Set up quast:

module load quast/3.0

On the server, make a folder called quast and move into it. Then run:

quast.py -T 2 \
-o out_folder_name \
-R /data/assembly/NC_000913_K12_MG1655.fasta \
-G /data/assembly/e.coli_genes.gff \
../path/to/assembly1.fasta \
../path/to/assembly2.fasta \
-l "Assembly 1, Assembly 2"

Note that the --scaffold option is not used here for simplification. Also, make sure you name the assemblies (-l) in the same order as you give them to quast!

Quast output

Quast will produce a html report file report.html. If you have logged in to the cod3 server using ssh -Y you can now type

cd out_folder_name
firefox report.html

Otherwise, download the report and the report_html_aux folder to your PC and open the html file in your browser.

Hover over the row names to get a description. Also have a look at the ‘Extended report’.

Alternatively, have a look at the report.pdf file (it has a few more plots).