Assembly using miniasm+racon¶
A recent preprint described a fast approach for assembling and correcting PacBio and MinION data. The principle is:
- using
minimap
for fast all-against-all overlap of raw reads - using
miniasm
, this “simply concatenates pieces of read sequences to generate the final sequences. Thus the per-base error rate is similar to the raw input reads.” - mapping the raw reads back to the assembly using
minimap
again - using
racon
(‘rapid consensus’) for consensus calling - perform the
racon
step at least twice
Running miniasm
and racon
on MinION data¶
All-against-all overlap with minimap
¶
Note how the reads are used twice here, as we map the reads against themselves:
minimap -Sw5 -L100 -m0 \
-t 2 \
/data/assembly/MAP006-1_2D_pass.fastq \
/data/assembly/MAP006-1_2D_pass.fastq \
| gzip -1 >racon_MAP006-1_2D_1.paf.gz
The output is in the so-called PAF (Pairwise mApping) Format, and is compressed ‘on the fly’.
Assembly with miniasm
¶
miniasm
takes the paf
file and produces an assembly in GFA (Graphical Fragment Assembly) format.
miniasm -f /data/assembly/MAP006-1_2D_pass.fastq \
racon_MAP006-1_2D_1.paf.gz \
>racon_MAP006-1_2D_1.gfa
Since we have only one sequence in the GFA
file (at least for this assembly), we can use a simple set of unix commands to turn it into a fasta
file:
head -n 1 racon_MAP006-1_2D_1.gfa | awk '{print ">"$2; print $3}' > racon_MAP006-1_2D_1.raw_assembly.fasta
Correction with racon
, round 1¶
We first use minimap
again, this time with the original reads mapped against the ‘raw’ assembly:
minimap racon_MAP006-1_2D_1.raw_assembly.fasta \
/data/assembly/MAP006-1_2D_pass.fastq \
>racon_MAP006-1_2D_1.raw_assembly.reads_mapped.paf
racon
is basically run as racon -t num_threads reads.fastq mapped_reads.paf assembly.fasta consensus.fasta
:
racon -t 2 \
/data/assembly/MAP006-1_2D_pass.fastq \
racon_MAP006-1_2D_1.raw_assembly.reads_mapped.paf \
racon_MAP006-1_2D_1.raw_assembly.fasta \
racon_MAP006-1_2D_1.racon1.fasta
This will take some time.
Correction with racon
, round 2¶
Run the mapping with minimap
and the correction with racon
again, but now with the results of the first round of correction as input. Please be careful when naming files!
Running miniasm
and racon
on PacBio data¶
Use all available reads from the P6C4 run, i.e. :
/data/assembly/pacbio/Analysis_Results/m141013_011508_sherri_c100709962550000001823135904221533_s1_p0.filtered_subreads.fastq
The commands are the same as for the MinION data. Again, please be careful when naming files!