## Pipeline

• Raw fastq files were downloaded from NCBI or from author web site if not deposited to NCBI

• All datasets were processed with cutapdapt (Martin et al. 2011) to remove primers and the dada2 R package (Callahan et al. 2016) to compute ASVs.

• Samples with less than 1,000 reads after processing not considered.

• Assignment was done with dada2 assignTaxa using the 18S PR2 4.14.0 as reference

• ASVs with less 100 reads total and with bootstrap value at the supergroup level < 75 were not considered.

• It is possible to use either all ASVs or clustered ASVs (default option). ASVs are clustered at 100% identity with VSEARCH --cluster_fast --id 1.00. We used the centroid of each cluster as the reference sequence for this cluster. See the metaPR2 paper for more information.

• Total read number per sample has been normalized to 100 with 3 decimals so that the value displayed in the different panels correspond to % of total eukaryotic reads. For this the number of reads for a given ASV in a given sample was divided by the total number of reads in this sample multiplied by 100.

## References

• Callahan, B.J., McMurdie, P.J., Rosen, M.J., Han, A.W., Johnson, A.J.A., Holmes, S.P., 2016. DADA2: High-resolution sample inference from Illumina amplicon data. Nature Methods 13, 581–583. https://doi.org/10.1038/nmeth.3869