Skip to contents

Pipeline

  • Raw fastq files were downloaded from NCBI or from author web site if not deposited to NCBI

  • All datasets were processed with cutapdapt (Martin et al. 2011) to remove primers and the dada2 R package (Callahan et al. 2016) to compute ASVs.

  • Samples with less than 1,000 reads after processing not considered.

  • Assignment was done with dada2 assignTaxa using the 18S PR2 4.14.0 as reference

  • ASVs with less 100 reads total and with bootstrap value at the supergroup level < 75 were not considered.

  • It is possible to use either all ASVs or clustered ASVs (default option). ASVs are clustered at 100% identity with VSEARCH --cluster_fast --id 1.00. We used the centroid of each cluster as the reference sequence for this cluster. See the metaPR2 paper for more information.

  • Total read number per sample has been normalized to 100 with 3 decimals so that the value displayed in the different panels correspond to % of total eukaryotic reads. For this the number of reads for a given ASV in a given sample was divided by the total number of reads in this sample multiplied by 100.





References

  • Callahan, B.J., McMurdie, P.J., Rosen, M.J., Han, A.W., Johnson, A.J.A., Holmes, S.P., 2016. DADA2: High-resolution sample inference from Illumina amplicon data. Nature Methods 13, 581–583. https://doi.org/10.1038/nmeth.3869

  • Martin, M., 2011. Cutadapt removes adapter sequences from high-throughput sequencing reads. EMBnet.journal 17, 10. https://doi.org/10.14806/ej.17.1.200