PR2 version 4.12.0
Apicomplexa from J. del Campo

1 Init

Load the variables common to the different scripts and the necessary libraries

3 Apicomplexa - Round 1 - 2019-05

3.3 Read the reformatted data api_02

  • The api_01.tsv file is imported into Excel into sheet api_02
  • Taxonomy is manually edited…
  • Columns taxo01-> taxo5 correspond to kingdom to order
  • family is either taxo6 or taxo7 depending on groups.
  • For genus which are split into different taxo8 level, I created clades based on taxo8 (e.g. for Babesia, Thelleria etc…)
  • species corresponds to what is in the eukref_name when possible except when this leads to incoherent taxonomy (e.g. the same species in 2 different families). In this case I kept the species for the clade from which it was defined and renamed the other species with “Genus_sp.”.

3.5 Add new metadata

  • The metadata have been extracted previously by a batch file (PR2 genbank download.R): 2535 new metadata. * Metadata from 92 sequences could not be retrieved because they correspond to sequences that have been removed from GenBank

4 Apicomplexa - Round 3 - 2019-07-29

From Javier

    I am done with the Apicomplexa.
    All the changes a made on your annotation have been highlighted in red, you will see that almost everything is red..., sadly the paper has been submitted and is under review but I will try my best to adequate the DB we are going to release with the paper with the one I am sending to you that I think it makes more sense.
    Apart from the changes in the taxonomy I have also looked at the seqeunces that are in PR2 but not in my DB. An dI have classified them in 5 categories.
    1. Missed and New. Sequences that are new, most o them and 20 that for some reason I missed.
    2. No 18S. Sequences that contain more than the 18S or genomes that I did not add to my DB.
    3. Quarantine. Sequences that not being in principle chimeras are too divergent or too weird to be assigned to the Apis. We need to build pretty general tree in order to place or discard them. Maybe by blast the best hit is an API but the second best hit is a plant, and both with low similarity.
    4. No apis. Sequences that I know for sure that are no apis, most of them come from a single study.
    5. Chimeras. Chimeras.
    
    BONUS: I don't know what to do with the sequences related to Toxoplasma (Toxoplasma 1 and Toxoplasma 2). Is pure chaos, any suggestion would be welcome.

4.6 Add and updated sequences

5 Final

5.1 Corrections

  • Besnoitia_besnoiti split into 2 species because they belong to different Toxoplasma
    • Besnoitia1_besnoiti
    • Besnoitia2_besnoiti
  • Neospora_caninum split into 2 species because they belong to different Toxoplasma
    • Neospora1_caninum
    • Neospora2_caninum
  • Toxoplasma_gondii into 2 species because they belong to different Toxoplasma
    • Toxoplasma1_gondii
    • Toxoplasma2_gondii
  • For all the genera that have been split in a number of genera the species have also been given the same number (e.g. Babesia1_xxx).

5.2 Actions

  • Sequences not in Javier table
    • Quarantine : removed from active PR2 database (they can be re-added latter)
    • Not 18S: removed from active PR2 database (not that HQ219405, HG328237, HG328236, HG328235 seems to be OK)
    • Not Apicomplexa: removed from active PR2 database
    • Chimera: removed from active PR2 database and tagged as chimeras
    • Missed/New: Kept in PR2 - species updated with new taxonomy
  • Apicoplast sequences updated with new taxonomy
  • Updates
    • New: 2619
    • Updated: 5889+239
    • Removed: 89

Daniel Vaulot

08 08 2019