16S rRNA Analysis (1): QC with FastQC & Nanoplot

Racoon doing FastQC 

 

I got a small set of 16S rRNA sequence data from professor.
(I said small, but it's still 1.8GB!!)


First step,

I installed some necessary packages and ran QC with FastQC and NanoPlot.

I'm running everything on my personal laptop.
(AMD Ryzen 5 4500U … & 16+4GB RAM)

It took a long time… I don’t know—I watched a movie, and it was still working.
So I just went to bed, and it was done by morning. 


What do I look for in the result? And why?


1) Length Information

  • a. Mean Length

  • b. Median Length

  • c. N50

    a weighted median statistic such that 50% of the entire assembly is contained in contigs or scaffolds equal to or larger than this value. :Wikipedia)

We use this to decide the --min_length value for Filtlong.

Also, 16S amplicon length should be approx. 1500bp long—so my result (all around 1500) looks good!

Google search result for 16S amplicon length

 


2) Q-Score (Read Quality)

  • Q10: 90% accuracy

  • Q20: 99% accuracy

  • Q30: 99.9% accuracy
    ...

My result says:

  • >Q10 is only 39%

  • >Q15 is 0%

So it’s low-quality data We need to set Filtlong more generous (by setting --keep_percent).


In the next article, I’ll proceed to Filtlong with the data I got today!

 



 

 

Comments