FAQ

About SpartExplorer and taxonomy Tools

What is Spart Explorer ?

Spart Explorer is a user-friendly web platform, in which species hypotheses are based on DNA sequences using two popular species delimitation tools, ABGD and ASAP. The platform also implements LIMES, which can compare different partitions. Species partitions obtained on the platform or uploaded from local files can then be compared in a dynamic and intuitive graphic representation, associated to a phylogenetic tree, to facilitate result interpretation and comparison in an integrative taxonomy framework.

When using Spart Explorer, you can start with a species delimitation analysis using ABGD and ASAP, simply by uploading your DNA sequence alignment. The species partitions obtained can then be compared with LIMES, which will calculate some statistics related to the similarity difference between each pair of partitions, and then visualized in the graphic interface : the spart viewer. Alternatively, you can also directly upload species partitions in the spart format, obtained independently with other tools, or even prepared de novo from e.g. morphospecies hypotheses, and visualize them in the spart viewer. Here, you will be able to compare visually all the partitions in front of a phylogenetic tree, organize them by simply drag-and-drop them, hide/show any you want, in order to more easily decide which partition(s) is/are the more likely, in an integrative taxonomy context.

Which of the different species partitions inferred is the “true” one?

In brief, possibly none of them.

The analysis of single-locus DNA barcodes is probably the simplest, fastest and cheapest approach for a preliminary exploration of a large unpartitioned multispecific dataset(i.e. to produce a partition of Primary Species Hypotheses). Nevertheless, keep in mind that these approaches cannot distinguish with full reliability deep conspecific lineages from species, and therefore tend to overestimate the number of species. Their multiple outputs are therefore to be seen as various alternative primary species partitions, composed of primary species hypotheses (PSHs) that are more or less likely and that require further testing.

For these reasons, do not use DNA barcoding as a sole decision tool from which to derive taxonomic and nomenclatural actions, but rather as a remarkably efficient screening especially in large datasets.

To conclude, we recommend users to interpret their results within an integrative taxonomy framework, in order to turn PSHs into secondary species hypotheses, and eventually into a nominal species, by analyzing additional characters and applying additional criteria of species delimitation.

What is ABGD ?

ABGD (Automatic Barcode Gap Discovery) is a fast, simple method to split a sequence alignment data set into species hypotheses. The method uses a distance threshold estimated from the distribution of genetic distances between specimens from the same species (intraspecific diversity) and from different species (interspecific diversity).

What is ASAP ?

ASAP (Assemble Species by Automatic Partitioning) implements a hierarchical clustering algorithm that only uses pairwise genetic distances, thus avoiding the computational burden of phylogenetic reconstruction, to propose species hypotheses. ASAP proposes species partitions ranked by a scoring system that uses no biological prior insight of intraspecific diversity.

What is LIMES ?

Limes is a free automated tool dedicated to perform exact comparisons of alternative taxonomies, whatever the methods or type of data-set involved to infer them. It is more especially adapted to compare taxonomic data-set composed by several species (typically at the genus level). To this end, Limes calculates four different indexes: Ctax, mCtax, Rtax (Miralles & Vences 2013) and the Match Ratio (Ahrens et al. 2016). Limes relies on a cladistic conceptual approach, and therefore assume that the species delimited by different methods represent monophyletic units.

Limes is also the first tool that can merge, extract and export partition files (.SPART)

What is a SPART file ?

SPART means species partition file. This standardized format enables compatibility between different species delimitation tools exporting or importing partitions (ex. ABGD, ASAP, but also some recent versions of DELINEATE, GMYC, PTP, TR2 and SpartMapper). This format reports the partitions and describes, for each of them, the assignment of individuals to the “inferred species”. The syntax also allows support values to be optionally reported, as well as original trees and the full command lines used in the respective SD analyses. Two variants of this format exist, overall using the same terminology but presenting the data either optimized for human readability (matricial SPART) or in a format in which each partition forms a separate block (SPART.XML).

How can I integrate morphological data with my molecular results ?

Although Spart Explorer doesn't yet offer delimitation based on morphological data, once the molecular delimitation results have been displayed, it is possible to download hand-crafted partitions (as a complementary SPART file) based on other types of data (e.g. morphological data, geographical distribution, etc.) and thus to compare the results as a whole. See below FAQ about complementary SPART files.

About ABGD and ASAP

How many sequences can I load?

ABGD and ASAP can handle more than 10 000 sequences, but the computation time can be quite important in this case (several hours). If you need to work with a very large dataset, we recommend you to download the command line versions to run it on your computer.

What are the data formats accepted ?

The Fasta format is the most convenient format and the only one implemented in SpartExplorer for now.

How to prepare a sequence alignment in FASTA file for Spart Explorer?

The FASTA format is a near-universal text-based format for representing nucleotide sequences alignments. To be used as input for species delimitation using Spart explorer :

- Your sequence alignement must of course comply with FASTA format specifications. This also means that your file must have a correct extension (.fasta or .fas). If necessary, re-save your sequence file using other tools (e.g. Mega).

- The sequences must be aligned beforehand, ie. be of identical length (same number of nucleotides)

- In the description of each sequence (sequence title), only numbers, capital and lower case letter and underscore (no spaces nor any other diacritic signs) are allowed: 0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZ_abcdefghijklmnopqrstuvwxyz We recommend to use underscore to replace any forbidden character

- In the representation of each sequence (the nucleotide sequence itself), letters representing the nucleotids (ATGC, including IUPAC codes for ambiguities, R, K, S, Y, M, W, N) or a gap (-) are allowed.

What is an ideal sampling for ABGD and ASAP?

Logically, as species delimitation tools, ABGD and ASAP won't give any usable results if your dataset contains only one sample per species, or if all the samples belong to the same species.

As much as possible, sampling should include multiple sequences per nominal species (based on their a priori definitions), covering geographic ranges of focal species, with multiple individuals per site. Practically, species delimitation analysis tools may perform poorly when the number of sampled individuals per species is too low, and the distinctness of the barcode gap is often exagerated with insufficient sampling.

Can I run ABGD and ASAP as a command line ?

Yes, you can download the source code here for ABGD : http://bioinfo.mnhn.fr/abi/public/abgd/last.tgz

And here for ASAP : https://bioinfo.mnhn.fr/abi/public/asap/last.tgz

I have my own hypothesis concerning the number of species included in my dataset, but my partition is not proposed/ranked by ABGD/ASAP.

Remember that ABGD and ASAP are exploratory tools designed to identify the best partitions of species, given the criteria used by ABGD and ASAP (in particular the genetic distances). Your own species hypotheses might be based on other data, methods or criteria of species delimitation, and might thus be different from the (best) ABGD/ASAP partitions. Combining all these results in an integrative taxonomy approach is generally a good idea.

What are the available options for ABGD ?

Substitution model: ABGD proposes three different models to calculate genetic distances, choose the one you prefer.
X : Proxy for the minimum relative gap width. ABGD will look at a barcode gap whose width is X times larger than any gap in the prior intraspecific distribution. The default value (1,5) can be decreased if the analysis results in a partition of a single species hypothesis (no barcode gap found).
Pmin and Pmax : minimal and maximal a priori threshold value (by default 0.001 and 0.1, respectively), used by ABGD to infer the barcode gap. The values should be smaller and larger, respectively, than the expected threshold of genetic distances for species delimitation, corresponding to the barcode gap (the default values generally fulfill this condition).
Number of steps : number of prior values tested by ABGD in the interval between Pmin and Pmax. The default value is 10.
Number of bins : Number of bins used in the pairwise distribution of genetic distances ; this has no impact on the ABGD analysis, simply on the visual representation of this distribution.

What are the available options for ASAP ?

Substitution model : ASAP proposes three different models to calculate genetic distances, choose the one you prefer.
Probability : At each step of the process, ASAP clusters objects within a same distance range into a node. An object is either a node or a specimen. A probability is calculated for each node at each step of the process. If the probability of a node is below the value indicated here, then ASAP will readjust the number of putative species, splitting each node which probability value is below. The default value is 0.01.
Number of best scores : Number of results with the highest scores to be displayed in the table and on the curve. The default value is 10.
Fixed seed value : ASAP makes simulations which are based on a random seed generator. If you change the seed, the probability may be slightly different at each run. (leave -1 if you don't want to use a fixed seed value).

What is the meaning of the column titles in the ASAP best partitions table ?

Number of subsets is the number of species as identified by ASAP in the corresponding partition
Asap Score : ASAP identifies different partitions, and the score is an indicator of which partition you have to look at. It is a combination between the two following parameters (probability and slope).The lower the score, the better the partition.
P-val is the probability that the partition at the step n is different from the partition at the step n-1. Please, refer to the publication for more details.
W is the slope of the blue curve "Ranked distances" displayed above at a given genetic distance value. A high value means that the next distances (bigger and smaller) values are far.
Threshold distance is the value of the "jump" distance used to calculate the slope. is ...
Partition details presents the different partitions in two format :
- .csv file: each line is a sequence label followed by the group number, both separated with a semicolon.
- .txt file: each line is a group and all the sequences belonging to that group are listed

Why results can be slightly different if I re-run ASAP with the same data?

ASAP uses a seed to generate random partitions in order to estimate the probability of a partition. A new seed can slightly change the probabilities.

Where can I get an exemple of input FASTA files ?

Right here!

Example of alignment FASTA file for molecular species delimitation (exemple-sequences.fas).

TXT

exemplesequences-67a37f2e17a2a.fas

16.7 KiB

Download

About complementary SPART files

What is a complementary SPART file ?

In an integrative framework, it is highly recommended to compare the partitions resulting from molecular analyzes to other types of data (e.g. morphology, biogeography, other genes).
The present version of Spart Explorer infers partitions from single locus molecular data, but it also allows them to be compared graphically to any other type of data (whatever its nature), as long as it is possible, from the latter, to construct hypothetical partitions of individuals (e.g. cluster based on phenotype, geographic distribution across various regions, behavior, etc.).

How to visualise alternative kinds of data, using a complementary SPART file ?

First, create an additional partition in .spart format. For more details, see below "How to make a complementary SPART file ?" and the original publication of Spart Explorer (XXXXin preparationXXXX).

Second, load it into Spart Explorer : (1) either directly, by clicking on the "Visualisation and Compare" icon, or (2) consecutively to an online species delimitation : once the "visualise & compare" page apprears, click on "other features", then "+ add file".

How to prepare a complementary SPART file ?

- For a small dataset, it can be easier to prepare it by hand, based on the example files (XXX LINK XXX). For more details, the SPART format is described in detail in the appendices of this open-access article (Miralles et al. (2021), https://doi.org/10.1111/1755-0998.13470 ). Make always sure your file extension is correct (.spart)

- For larger datasets, it is recommended to rather use the GUI version of LIMES 2.0 available at itaxotools.org (https://itaxotools.org/download.html#hyperlinkDataInt): In such a case, prepare first your partitions on a spreadsheet editor, load in LIMES your partition(s) in .csv or .xls format, and save it (them) as a .spart file.

Where can I get an exemple of complementary SPART files?

Exemple of complementary SPART file (exemple-complementary.spart), which can be used in conjunction with the FASTA file indicated above (exemple-sequences.fas).