Data processing
vignette-data-processing.Rmd
Pipeline
Raw fastq files were downloaded from NCBI or from author web site if not deposited to NCBI
All datasets were processed with cutapdapt (Martin et al. 2011) to remove primers and the dada2 R package (Callahan et al. 2016) to compute ASVs.
Samples with less than 1,000 reads after processing not considered.
Assignment was done with dada2 assignTaxa using the 18S PR2 4.14.0 as reference
ASVs with less 100 reads total and with bootstrap value at the supergroup level < 75 were not considered.
It is possible to use either all ASVs or clustered ASVs (default option). ASVs are clustered at 100% identity with
VSEARCH --cluster_fast --id 1.00
. We used the centroid of each cluster as the reference sequence for this cluster. See the metaPR2 paper for more information.Total read number per sample has been normalized to 100 with 3 decimals so that the value displayed in the different panels correspond to % of total eukaryotic reads. For this the number of reads for a given ASV in a given sample was divided by the total number of reads in this sample multiplied by 100.
References
Callahan, B.J., McMurdie, P.J., Rosen, M.J., Han, A.W., Johnson, A.J.A., Holmes, S.P., 2016. DADA2: High-resolution sample inference from Illumina amplicon data. Nature Methods 13, 581–583. https://doi.org/10.1038/nmeth.3869
Martin, M., 2011. Cutadapt removes adapter sequences from high-throughput sequencing reads. EMBnet.journal 17, 10. https://doi.org/10.14806/ej.17.1.200