SYSTEMATIC BIOLOGY

Rapid Evolution of Host Repertoire and Geographic Range in a Young and Diverse Genus of Montane Butterflies
Mo S, Zhu Y, Braga MP, Lohman DJ, Nylin S, Moumou A, Wheat CW, Wahlberg N, Wang M, Ma F, Zhang P and Wang H
Evolutionary changes in geographic distribution and larval host plants may promote the rapid diversification of montane insects, but this scenario has been rarely investigated. We studied rapid radiation of the butterfly genus Colias, which has diversified in mountain ecosystems in Eurasia, Africa, and the Americas. Based on a dataset of 150 nuclear protein-coding genetic loci and mitochondrial genomes, we constructed a time-calibrated phylogenetic tree of Colias species with broad taxon sampling. We then inferred their ancestral geographic ranges, historical diversification rates, and the evolution of host use. We found that the most recent common ancestor of Colias was likely geographically widespread and originated ~3.5 Ma. The group subsequently diversified in different regions across the world, often in tandem with geographic expansion events. No aspect of elevation was found to have a direct effect on diversification. The genus underwent a burst of diversification soon after the divergence of the Neotropical lineage, followed by an exponential decline in diversification rate toward the present. The ancestral host repertoire included the legume genera Astragalus and Trifolium but later expanded to include a wide range of Fabaceae genera and plants in more distantly related families, punctuated with periods of host range expansion and contraction. We suggest that the widespread distribution of the ancestor of all extant Colias lineages set the stage for diversification by isolation of populations that locally adapted to the various different environments they encountered, including different host plants. In this scenario, elevation is not the main driver but might have accelerated diversification by isolating populations.
Are Modern Cryptic Species Detectable in the Fossil Record? A Case Study on Agamid Lizards
Ramm T, Gray JA, Hipsley CA, Hocknull S, Melville J and Müller J
Comparisons of extant and extinct biodiversity are often dependent on objective morphology-based identifications of fossils and assume a well-established and comparable taxonomy for both fossil and modern taxa. However, since many modern (cryptic) species are delimitated mainly via external morphology and / or molecular data, it is often unclear to what degree fossilized (osteological) remains allow classification to a similar level. When intraspecific morphological variation in extant taxa is poorly known, the definition of extinct species as well as the referral of fossils to extant species can be heavily biased, particularly if fossils are represented by incomplete isolated skeletal elements. This problem is especially pronounced in squamates (lizards and snakes) owing to a lack of osteological comparative knowledge for many lower taxonomic groups, concomitant with a recent increase of molecular studies revealing great cryptic diversity. Here, we apply a quantitative approach using 3D geometric morphometrics on 238 individuals of 14 genera of extant Australian and Papua New Guinean agamid lizards to test the value of two isolated skull bones (frontals and maxillae) for inferring taxonomic and ecological affinities. We further test for the consistency of intra- and interspecific morphological variability of these elements as a proxy for extinct taxonomic richness. We show that both bones are diagnostic at the generic level, and both can infer microhabitat and are of palaeoecological utility. However, species-level diversity is likely underestimated by both elements, with ~30-40% of species pairs showing no significant differences in shape. Mean intraspecific morphological variability is largely consistent across species and bones and thus a useful proxy for extinct species diversity. Reducing sample size and landmark completeness to approximate fossil specimens led to decreased classification accuracy and increased variance of morphological disparity, raising further doubts on the transferability of modern species borders to the fossil record of agamids. Our results highlight the need to establish appropriate levels of morphology-based taxonomic or ecological groupings prior to comparing extant and extinct biodiversity.
How to validate a Bayesian evolutionary model
Mendes FK, Bouckaert R, Carvalho LM and Drummond AJ
Biology has become a highly mathematical discipline in which probabilistic models play a central role. As a result, research in the biological sciences is now dependent on computational tools capable of carrying out complex analyses. These tools must be validated before they can be used, but what is understood as validation varies widely among methodological contributions. This may be a consequence of the still embryonic stage of the literature on statistical software validation for computational biology. Our manuscript aims to advance this literature. Here, we describe, illustrate and introduce new good practices for assessing the correctness of a model implementation, with an emphasis on Bayesian methods. We also introduce a suite of functionalities for automating validation protocols. It is our hope that the guidelines presented here help sharpen the focus of discussions on (as well as elevate) expected standards of statistical software for biology.
Evolution of Large Eyes in Stromboidea (Gastropoda): Impact of Photic Environment and Life History Traits
Irwin AR, Roberts NW, Strong EE, Kano Y, Speiser DI, Harper EM and Williams ST
Eyes within the marine gastropod superfamily Stromboidea range widely in size, from 0.2 to 2.3 mm - the largest eyes known in any gastropod. Despite this interesting variation, the underlying evolutionary pressures remain unknown. Here, we use the wealth of material available in museum collections to explore the evolution of stromboid eye size and structure. Our results suggest that depth is a key light-limiting factor in stromboid eye evolution; here, increasing water depth is correlated with increasing aperture width relative to lens diameter, and therefore an increasing investment in sensitivity in dim light environments. In the major clade containing all large-eyed stromboid families, species observed active during the day and the night had wider eye apertures relative to lens sizes than species observed active during the day only, thereby prioritising sensitivity over resolution. Species with no consistent diel activity pattern also had smaller body sizes than exclusively day-active species, which may suggest that smaller animals are more vulnerable to shell-crushing predators, and avoid the higher predation pressure experienced by animals active during the day. Within the same major clade, ancestral state reconstruction suggests that absolute eye size increased above 1 mm twice. The unresolved position of Varicospira, however, weakens this hypothesis and further work with additional markers is needed to confirm this result.
Robustness of Divergence Time Estimation Despite Gene Tree Estimation Error: A Case Study of Fireflies (Coleoptera: Lampyridae)
Höhna S, Lower SE, Duchen P and Catalán A
Genomic data has become ubiquitous in phylogenomic studies, including divergence time estimation, but provide new challenges. These challenges include, amongst others, biological gene tree discordance, methodological gene tree estimation error, and computational limitations on performing full Bayesian inference under complex models. In this study, we use a recently published firefly (Coleoptera: Lampyridae) anchored hybrid enrichment dataset (AHE; 436 loci for 88 Lampyridae species and 10 outgroup species) as a case study to explore gene tree estimation error and the robustness of divergence time estimation. First, we explored the amount of model violation using posterior predictive simulations because model violations are likely to bias phylogenetic inferences and produce gene tree estimation error. We specifically focused on missing data (either uniformly distributed or systematically) and the distribution of highly variable and conserved sites (either uniformly distributed or clustered). Our assessment of model adequacy showed that standard phylogenetic substitution models are not adequate for any of the 436 AHE loci. We tested if the model violations and alignment errors resulted indeed in gene tree estimation error by comparing the observed gene tree discordance to simulated gene tree discordance under the multispecies coalescent model. Thus, we show that the inferred gene tree discordance is not only due to biological mechanism but primarily due to inference errors. Lastly, we explored if divergence time estimation is robust despite the observed gene tree estimation error. We selected four subsets of the full AHE dataset, concatenated each subset and performed a Bayesian relaxed clock divergence estimation in RevBayes. The estimated divergence times overlapped for all nodes that are shared between the topologies. Thus, divergence time estimation is robust using any well selected data subset as long as the topology inference is robust.
Testing relationships between multiple regional features and biogeographic processes of speciation, extinction, and dispersal
Swiston SK and Landis MJ
The spatial and environmental features of regions where clades are evolving are expected to impact biogeographic processes such as speciation, extinction, and dispersal. Any number of regional features (such as elevation, distance, area, etc.) may be directly or indirectly related to these processes. For example, it may be that distances or differences in elevation or both may limit dispersal rates. However, it is difficult to disentangle which features are most strongly related to rates of different processes. Here, we present an extensible Multi-feature Feature-Informed GeoSSE (MultiFIG) model that allows for the simultaneous investigation of any number of regional features. MultiFIG provides a conceptual framework for incorporating large numbers of features of different types, including categorical, quantitative, within-region, and between-region features, along with a mathematical framework for translating those features into biogeographic rates for statistical hypothesis testing. Using traditional Bayesian parameter estimation and reversible-jump Markov chain Monte Carlo, MultiFIG allows for the exploration of models with different numbers and combinations of feature-effect parameters, and generates estimates for the strengths of relationships between each regional feature and core process. We validate this model with a simulation study covering a range of scenarios with different numbers of regions, tree sizes, and feature values. We also demonstrate the application of MultiFIG with an empirical case study of the South American lizard genus Liolaemus, investigating sixteen regional features related to area, distance, and elevation. Our results show two important feature-process relationships: a negative distance/dispersal relationship, and a negative area/extinction relationship. Interestingly, although speciation rates were found to be higher in Andean versus non-Andean regions, the model did not assign significance to Andean- or elevation-related parameters. These results highlight the need to consider multiple regional features in biogeographic hypothesis testing.
Bayesian Selection of Relaxed-clock Models: Distinguishing Between Independent and Autocorrelated Rates
Panchaksaram M, Freitas L and Dos Reis M
In Bayesian molecular-clock dating of species divergences, rate models are used to construct the prior on the molecular evolutionary rates for branches in the phylogeny, with independent and autocorrelated rate models being commonly used. The two classes of models, however, can result in markedly different divergence time estimates for the same dataset, and thus selecting the best rate model appears important for obtaining reliable in- ferences of divergence times. However, the properties of Bayesian rate model selection are not well understood, in particular when the number of sequence partitions analysed increases and when age calibrations (such as fossil calibrations) are misspecified. Further- more, Bayesian rate model selection is computationally expensive as it requires calculation of marginal likelihoods by MCMC sampling, and therefore methods that can speed up the model selection procedure without compromising its accuracy are desirable. In this study, we use a combination of computer simulations and real data analysis to investigate the sta- tistical behaviour of Bayesian rate model selection and we also explore approximations of the likelihood to improve computational efficiency in large phylogenomic datasets. Our simulations demonstrate that the posterior probability for the correct rate model converges to one as more molecular sequence partitions are analysed and when no calibrations are used, as expected due to asymptotic Bayesian model selection theory. Furthermore, we also show the model selection procedure is robust to slight misspecification of calibrations, and reliable inference of the correct rate model is possible in this case. However, we show that when calibrations are seriously misspecified, calculated model probabilities are com- pletely wrong and may converge to one for the wrong rate model. Finally, we demonstrate that approximating the phylogenetic likelihood under an arcsine branch-length transform can dramatically reduce the computational cost of rate model selection without compro- mising accuracy. We test the approximate procedure on two large phylogenies of primates (372 species) and flowering plants (644 species), replicating results obtained on smaller datasets using exact likelihood. Our findings and methodology can assist users in selecting the optimal rate model for estimating times and rates along the Tree of Life.
A Double-edged Sword: Evolutionary Novelty along Deep-time Diversity Oscillation in An Iconic Group of Predatory Insects (Neuroptera: Mantispoidea)
Li H, Zhuo D, Wang B, Nakamine H, Yamamoto S, Zhang W, Jepson JE, Ohl M, Aspöck U, Aspöck H, Tin Nyunt T, Engel MS, Benton MJ, Donoghue P and Liu X
Evolutionary novelties are commonly identified as drivers of lineage diversification, with key innovations potentially triggering adaptive radiation. Nevertheless, testing hypotheses on the role of evolutionary novelties in promoting diversification through deep time has proven challenging. Here we unravel the role of the raptorial appendages, with evolutionary novelties for predation, in the macroevolution of a predatory insect lineage, the Superfamily Mantispoidea (mantidflies, beaded lacewings, thorny lacewings, and dipteromantispids), based on a new dated phylogeny and quantitative evolutionary analyses on modern and fossil species. We demonstrate a single origin of the raptorial foreleg and its associated novelties as key innovations triggering an early radiation of raptorial mantispoids from the Late Triassic to the Early Jurassic. Subsequently, the evolution of the raptorial foreleg influenced the diversification in different modes among lineages. At times, it might have limited the morphological diversity of other body parts and lead to lineage constraint by intensifying competition and lowering environmental resilience, e.g., in thorny lacewings, whose extant diversity is meagre. Conversely, in mantidflies, reduced emphasis on foreleg novelties and increased plasticity in other body parts may lead to better adaptation to predator-prey interactions and environmental shifts, thus maintaining a stable or accelerated level of diversification. We also reveal how major environmental change and lineage interactions interplayed with raptorial novelties in shaping the significant oscillations of mantispoid diversification over deep time, especially the abrupt shift near the mid-Cretaceous. However, by excluding a substantial portion of samples from the mid-Cretaceous of Myanmar, these shifts of some evolutionary parameters, such as morphological disparity, body size, and diversification rates, became inconspicuous and might be overestimated due to sampling bias. Our results uncover the intricate evolutionary patterns and profound significance of raptorial specializations, providing new insights into the role of novelties in forming evolutionary trajectories, both for the better and worse. [evolutionary novelty; macroevolution; diversification; raptorial foreleg; fossil; insect; Mantispoidea].
Inference of Phylogenetic Networks from Sequence Data using Composite Likelihood
Kong S, Swofford DL and Kubatko LS
While phylogenies have been essential in understanding how species evolve, they do not adequately describe some evolutionary processes. For instance, hybridization, a common phenomenon where interbreeding between two species leads to formation of a new species, must be depicted by a phylogenetic network, a structure that modifies a phylogenetic tree by allowing two branches to merge into one, resulting in reticulation. However, existing methods for estimating networks become computationally expensive as the dataset size and/or topological complexity increase. The lack of methods for scalable inference hampers phylogenetic networks from being widely used in practice, despite accumulating evidence that hybridization occurs frequently in nature. Here, we propose a novel method, PhyNEST (Phylogenetic Network Estimation using SiTe patterns), that estimates binary, level-1 phylogenetic networks with a fixed, user-specified number of reticulations directly from sequence data. By using the composite likelihood as the basis for inference, PhyNEST is able to use the full genomic data in a computationally tractable manner, eliminating the need to summarize the data as a set of gene trees prior to network estimation. To search network space, PhyNEST implements both hill climbing and simulated annealing algorithms. PhyNEST assumes that the data are composed of coalescent independent sites that evolve according to the Jukes-Cantor substitution model and that the network has a constant effective population size. Simulation studies demonstrate that PhyNEST is often more accurate than two existing composite likelihood summary methods (SNaQ and PhyloNet) and that it is robust to at least one form of model misspecification (assuming a less complex nucleotide substitution model than the true generating model). We applied PhyNEST to reconstruct the evolutionary relationships among Heliconius butterflies and Papionini primates, characterized by hybrid speciation and widespread introgression, respectively. PhyNEST is implemented in an open-source Julia package and is publicly available at https://github.com/sungsik-kong/PhyNEST.jl.
Complex Models of Sequence Evolution Improve Fit, but not Gene Tree Discordance, for Tetrapod Mitogenomes
Toups BS, Thomson RC and Brown JM
Variation in gene tree estimates is widely observed in empirical phylogenomic data and is often assumed to be the result of biological processes. However, a recent study using tetrapod mitochondrial genomes to control for biological sources of variation due to their haploid, uniparentally inherited, and non-recombining nature found that levels of discordance among mitochondrial gene trees were comparable to those found in studies that assume only biological sources of variation. Additionally, they found that several of the models of sequence evolution chosen to infer gene trees were doing an inadequate job fitting the sequence data. These results indicated that significant amounts of gene tree discordance in empirical data may be due to poor fit of sequence evolution models, and that more complex and biologically realistic models may be needed. To test how the fit of sequence evolution models relates to gene tree discordance, we analyzed the same mitochondrial datasets as the previous study using two additional, more complex models of sequence evolution that each includes a different biologically realistic aspect of the evolutionary process: a covarion model to incorporate site-specific rate variation across lineages (heterotachy), and a partitioned model to incorporate variable evolutionary patterns by codon position. Our results show that both additional models fit the data better than the models used in the previous study, with the covarion being consistently and strongly preferred as tree size increases. However, even these more preferred models still inferred highly discordant mitochondrial gene trees, thus deepening the mystery around what we label the "Mito-Phylo Paradox" and leading us to ask whether the observed variation could, in fact, be biological in nature after all.
The Fossilised Birth-Death Model is Identifiable
Truman K, Vaughan TG, Gavryushkin A and Gavryushkina AS
Time-dependent birth-death sampling models have been used in numerous studies for inferring past evolutionary dynamics in different biological contexts, e.g. speciation and extinction rates in macroevolutionary studies, or effective reproductive number in epidemiological studies. These models are branching processes where lineages can bifurcate, die, or be sampled with time-dependent birth, death, and sampling rates, generating phylogenetic trees. It has been shown that in some subclasses of such models, different sets of rates can result in the same distributions of reconstructed phylogenetic trees, and therefore the rates become unidentifiable from the trees regardless of their size. Here we show that widely used time-dependent fossilised birth-death (FBD) models are identifiable. This subclass of models makes more realistic assumptions about the fossilisation process and certain infectious disease transmission processes than the unidentifiable birth-death sampling models. Namely, FBD models assume that sampled lineages stay in the process rather than being immediately removed upon sampling. Identifiability of the time-dependent FBD model justifies using statistical methods that implement this model to infer the underlying temporal diversification or epidemiological dynamics from phylogenetic trees or directly from molecular or other comparative data. We further show that the time-dependent fossilised-birth-death model with an extra parameter, the removal after sampling probability, is unidentifiable. This implies that in scenarios where we do not know how sampling affects lineages we are unable to infer this extra parameter together with birth, death, and sampling rates solely from trees.
Phylogenetic tree instability after taxon addition: empirical frequency, predictability, and consequences for online inference
Collienne L, Barker M, Suchard MA and Matsen Iv FA
Online phylogenetic inference methods add sequentially arriving sequences to an inferred phylogeny without the need to recompute the entire tree from scratch. Some online method implementations exist already, but there remains concern that additional sequences may change the topological relationship among the original set of taxa. We call such a change in tree topology a lack of stability for the inferred tree. In this paper, we analyze the stability of single taxon addition in a Maximum Likelihood framework across 1, 000 empirical datasets. We find that instability occurs in almost 90% of our examples, although observed topological differences do not always reach significance under the AU-test. Changes in tree topology after addition of a taxon rarely occur close to its attachment location, and are more frequently observed in more distant tree locations carrying low bootstrap support. To investigate whether instability is predictable, we hypothesize sources of instability and design summary statistics addressing these hypotheses. Using these summary statistics as input features for machine learning under random forests, we are able to predict instability and can identify the most influential features. In summary, it does not appear that a strict insertion-only online inference method will deliver globally optimal trees, although relaxing insertion strictness by allowing for a small number of final tree rearrangements or accepting slightly suboptimal solutions appears feasible.
Complex Hybridization in a Clade of Polytypic Salamanders (Plethodontidae: Desmognathus) Uncovered by Estimating Higher-Level Phylogenetic Networks
Pyron RA, O'Connell KA, Myers EA, Beamer DA and Baños H
Reticulation between incipient lineages is a common feature of diversification. We examine these phenomena in the Pisgah clade of Desmognathus salamanders from the southern Appalachian Mountains of the eastern United States. The group contains four to seven species exhibiting two discrete phenotypes, aquatic "shovel-nosed" and semi-aquatic "black-bellied" forms. These ecomorphologies are ancient and have apparently been transmitted repeatedly between lineages through introgression. Geographically proximate populations of both phenotypes exhibit admixture, and at least two black-bellied lineages have been produced via reticulations between shovel-nosed parentals, suggesting potential hybrid speciation dynamics. However, computational constraints currently limit our ability to reconstruct network radiations from gene-tree data. Available methods are limited to level-1 networks wherein reticulations do not share edges, and higher-level networks may be non-identifiable in many cases. We present a heuristic approach to recover information from higher-level networks across a range of potentially identifiable empirical scenarios, supported by theory and simulation. When extrinsic information indicates the location and direction of reticulations, our method can successfully estimate a reduced possible set of non-level-1 networks. Phylogenomic data support a single backbone topology with up to five overlapping hybrid edges in the Pisgah clade. These results suggest an unusual mechanism of ecomorphological hybrid speciation, wherein a binary threshold trait causes some hybrid populations to shift between microhabitat niches, promoting ecological divergence between sympatric hybrids and parentals. This contrasts with other well-known systems in which hybrids exhibit intermediate, novel, or transgressive phenotypes. The genetic basis of these phenotypes is unclear and further data are needed to clarify the evolutionary basis of morphological changes with ecological consequences.
Assessing the Adequacy of Morphological Models using Posterior Predictive Simulations
Mulvey LPA, May MR, Brown JM, Höhna S, Wright AM and Warnock RCM
Reconstructing the evolutionary history of different groups of organisms provides insight into how life originated and diversified on Earth. Phylogenetic trees are commonly used to estimate this evolutionary history. Within Bayesian phylogenetics a major step in estimating a tree is in choosing an appropriate model of character evolution. While the most common character data used is molecular sequence data, morphological data remains a vital source of information. The use of morphological characters allows for the incorporation fossil taxa, and despite advances in molecular sequencing, continues to play a significant role in neontology. Moreover, it is the main data source that allows us to unite extinct and extant taxa directly under the same generating process. We therefore require suitable models of morphological character evolution, the most common being the Mk Lewis model. While it is frequently used in both palaeobiology and neontology, it is not known whether the simple Mk substitution model, or any extensions to it, provide a sufficiently good description of the process of morphological evolution. In this study we investigate the impact of different morphological models on empirical tetrapod data sets. Specifically, we compare unpartitioned Mk models with those where characters are partitioned by the number of observed states, both with and without allowing for rate variation across sites and accounting for ascertainment bias. We show that the choice of substitution model has an impact on both topology and branch lengths, highlighting the importance of model choice. Through simulations, we validate the use of the model adequacy approach, posterior predictive simulations, for choosing an appropriate model. Additionally, we compare the performance of model adequacy with Bayesian model selection. We demonstrate how model selection approaches based on marginal likelihoods are not appropriate for choosing between models with partition schemes that vary in character state space (i.e., that vary in Q-matrix state size). Using posterior predictive simulations, we found that current variations of the Mk model are often performing adequately in capturing the evolutionary dynamics that generated our data. We do not find any preference for a particular model extension across multiple data sets, indicating that there is no 'one size fits all' when it comes to morphological data and that careful consideration should be given to choosing models of discrete character evolution. By using suitable models of character evolution, we can increase our confidence in our phylogenetic estimates, which should in turn allow us to gain more accurate insights into the evolutionary history of both extinct and extant taxa.
A Phylogenomic Backbone for Acoelomorpha Inferred from Transcriptomic Data
Abalde S and Jondelius U
Xenacoelomorpha are mostly microscopic, morphologically simple worms, lacking many structures typical of other bilaterians. Xenacoelomorphs -which include three main groups: Acoela, Nemertodermatida, and Xenoturbella- have been proposed to be an early diverging Bilateria, sister to protostomes and deuterostomes, but other phylogenomic analyses have recovered this clade nested within the deuterostomes, as sister to Ambulacraria. The position of Xenacoelomorpha within the metazoan tree has understandably attracted a lot of attention, overshadowing the study of phylogenetic relationships within this group. Given that Xenoturbella includes only six species whose relationships are well understood, we decided to focus on the most speciose Acoelomorpha (Acoela + Nemertodermatida). Here, we have sequenced 29 transcriptomes, doubling the number of sequenced species, to infer a backbone tree for Acoelomorpha based on genomic data. The recovered topology is mostly congruent with previous studies. The most important difference is the recovery of Paratomella as the first off-shoot within Acoela, dramatically changing the reconstruction of the ancestral acoel. Besides, we have detected incongruence between the gene trees and the species tree, likely linked to incomplete lineage sorting, and some signal of introgression between the families Dakuidae and Mecynostomidae, which hampers inferring the correct placement of this family and, particularly, of the genus Notocelis. We have also used this dataset to infer for the first time diversification times within Acoelomorpha, which coincide with known bilaterian diversification and extinction events. Given the importance of morphological data in acoelomorph phylogenetics, we tested several partitions and models. Although morphological data failed to recover a robust phylogeny, phylogenetic placement has proven to be a suitable alternative when a reference phylogeny is available.
The limits of the metapopulation: Lineage fragmentation in a widespread terrestrial salamander (Plethodon cinereus)
Waldron BP, Watts EF, Morgan DJ, Hantak MM, Lemmon AR, Lemmon EM and Kuchta SR
In vicariant species formation, divergence results primarily from periods of allopatry and restricted gene flow. Widespread species harboring differentiated, geographically distinct sublineages offer a window into what may be a common mode of species formation, whereby a species originates, spreads across the landscape, then fragments into multiple units. However, incipient lineages usually lack reproductive barriers that prevent their fusion upon secondary contact, blurring the boundaries between a single, large metapopulation-level lineage and multiple independent species. Here we explore this model of species formation in the Eastern Red-backed Salamander (Plethodon cinereus), a widespread terrestrial vertebrate with at least six divergent mitochondrial clades throughout its range. Using anchored hybrid enrichment data, we applied phylogenomic and population genomic approaches to investigate patterns of divergence, gene flow, and secondary contact. Genomic data broadly match most mitochondrial groups but reveal mitochondrial introgression and extensive admixture at several contact zones. While species delimitation analyses in BPP supported five lineages of P. cinereus, genealogical divergence indices (gdi) were highly sensitive to the inclusion of admixed samples and the geographic representation of candidate species, with increasing support for multiple species when removing admixed samples or limiting sampling to a single locality per group. An analysis of morphometric data revealed differences in body size and limb proportions among groups, with a reduction of forelimb length among warmer and drier localities consistent with increased fossoriality. We conclude that P. cinereus is a single species, but one with highly structured component lineages of various degrees of independence.
Phylogenomics of Bivalvia Using Ultraconserved Elements (UCEs) Reveal New Topologies for Pteriomorphia and Imparidentia
Li YX, Ip JC, Chen C, Xu T, Zhang Q, Sun Y, Ma PZ and Qiu JW
Despite significant advances in phylogenetics over the past decades, the deep relationships within Bivalvia (phylum Mollusca) remain inconclusive. Previous efforts based on morphology or several genes have failed to resolve many key nodes in the phylogeny of Bivalvia. Advances have been made recently using transcriptome data, but the phylogenetic relationships within Bivalvia historically lacked consensus, especially within Pteriomorphia and Imparidentia. Here, we inferred the relationships of key lineages within Bivalvia using matrices generated from specifically designed ultraconserved elements (UCEs) with 16 available genomic resources and 85 newly sequenced specimens from 55 families. Our new probes (Bivalve UCE 2k v.1) for target sequencing captured an average of 849 UCEs with 1085-bp in mean length from in vitro experiments. Our results introduced novel schemes from six major clades (Protobranchina, Pteriomorphia, Palaeoheterodonta, Archiheterodonta, Anomalodesmata and Imparidentia), though some inner nodes were poorly resolved, such as paraphyletic Heterodonta in some topologies potentially due to insufficient taxon sampling. The resolution increased when analyzing specific matrices for Pteriomorphia and Imparidentia. We recovered three Pteriomorphia topologies different from previously published trees, with the strongest support for ((Ostreida + (Arcida + Mytilida)) + (Pectinida + (Limida + Pectinida))). Limida were nested within Pectinida, warranting further studies. For Imparidentia, our results strongly supported the new hypothesis of (Galeommatida + (Adapedonta + Cardiida)), while the possible non-monophyly of Lucinida was inferred but poorly supported. Overall, our results provide important insights into the phylogeny of Bivalvia and show that target enrichment sequencing of UCEs can be broadly applied to study both deep and shallow phylogenetic relationships.
Persistent Gene Flow Suggests an Absence of Reproductive Isolation in an African Antelope Speciation Model
Wang X, Pedersen CT, Athanasiadis G, Garcia-Erill G, Hanghøj K, Bertola LD, Rasmussen MS, Schubert M, Liu X, Li Z, Lin L, Balboa RF, Jørsboe E, Nursyifa C, Liu S, Muwanika V, Masembe C, Chen L, Wang W, Moltke I, Siegismund HR, Albrechtsen A and Heller R
African antelope diversity is a globally unique vestige of a much richer world-wide Pleistocene megafauna. Despite this, the evolutionary processes leading to the prolific radiation of African antelopes are not well understood. Here, we sequenced 145 whole genomes from both subspecies of the waterbuck (Kobus ellipsiprymnus), an African antelope believed to be in the process of speciation. We investigated genetic structure and population divergence and found evidence of a mid-Pleistocene separation on either side of the eastern Great Rift Valley, consistent with vicariance caused by a rain shadow along the so-called 'Kingdon's Line'. However, we also found pervasive evidence of both recent and widespread historical gene flow across the Rift Valley barrier. By inferring the genome-wide landscape of variation among subspecies, we found 14 genomic regions of elevated differentiation, including a locus that may be related to each subspecies' distinctive coat pigmentation pattern. We investigated these regions as candidate speciation islands. However, we observed no significant reduction in gene flow in these regions, nor any indications of selection against hybrids. Altogether, these results suggest a pattern whereby climatically driven vicariance is the most important process driving the African antelope radiation, and suggest that reproductive isolation may not set in until very late in the divergence process. This has a significant impact on taxonomic inference, as many taxa will be in a gray area of ambiguous systematic status, possibly explaining why it has been hard to achieve consensus regarding the species status of many African antelopes. Our analyses demonstrate how population genetics based on low-depth whole genome sequencing can provide new insights that can help resolve how far lineages have gone along the path to speciation.
Phylogenomics and pervasive genome-wide phylogenetic discordance among fin whales (Balaenoptera physalus)
Furni F, Secchi ER, Speller C, DenDanto D, Ramp C, Larsen F, Mizroch S, Robbins J, Sears R, Urbán R J, Bérubé M and Palsbøll PJ
Phylogenomics has the power to uncover complex phylogenetic scenarios across the genome. In most cases, no single topology is reflected across the entire genome as the phylogenetic signal differs among genomic regions due to processes, such as introgression and incomplete lineage sorting. Baleen whales are among the largest vertebrates on Earth with a high dispersal potential in a relatively unrestricted habitat, the oceans. The fin whale (Balaenoptera physalus) is one of the most enigmatic baleen whale species, currently divided into four subspecies. It has been a matter of debate whether phylogeographic patterns explain taxonomic variation in fin whales. Here we present a chromosome-level whole genome analysis of the phylogenetic relationships among fin whales from multiple ocean basins. First, we estimated concatenated and consensus phylogenies for both the mitochondrial and nuclear genomes. The consensus phylogenies based upon the autosomal genome uncovered monophyletic clades associated with each ocean basin, aligning with the current understanding of subspecies division. Nevertheless, discordances were detected in the phylogenies based on the Y chromosome, mitochondrial genome, autosomal genome and X chromosome. Furthermore, we detected signs of introgression and pervasive phylogenetic discordance across the autosomal genome. This complex phylogenetic scenario could be explained by a puzzle of introgressive events, not yet documented in fin whales. Similarly, incomplete lineage sorting and low phylogenetic signal could lead to such phylogenetic discordances. Our study reinforces the pitfalls of relying on concatenated or single locus phylogenies to determine taxonomic relationships below the species level by illustrating the underlying nuances which some phylogenetic approaches may fail to capture. We emphasize the significance of accurate taxonomic delineation in fin whales by exploring crucial information revealed through genome-wide assessments.
Phylogenetic biogeography inference using dynamic paleogeography models and explicit geographic ranges
Arias JS
To model distribution ranges, the most popular methods of phylogenetic biogeography divide Earth into a handful of predefined areas. Other methods use explicit geographic ranges, but unfortunately, these methods assume a static Earth, ignoring the effects of plate tectonics and the changes in the landscape. To address this limitation, I propose a method that uses explicit geographic ranges and incorporates a plate motion model and a paleolandscape model directly derived from the models used by geologists in their tectonic and paleogeographic reconstructions. The underlying geographic model is a high-resolution pixelation of a spherical Earth. Biogeographic inference is based on diffusion, approximates the effects of the landscape, uses a time-stratified model to take into account the geographic changes, and directly integrates over all probable histories. By using a simplified stochastic mapping algorithm, it is possible to infer the ancestral locations as well as the distance traveled by the ancestral lineages. For illustration, I applied the method to an empirical phylogeny of the Sapindaceae plants. This example shows that methods based on explicit geographic data, coupled with high-resolution paleogeographic models, can provide detailed reconstructions of the ancestral areas but also include inferences about the probable dispersal paths and diffusion speed across the taxon history. The method is implemented in the program PhyGeo.
Hierarchical Heuristic Species Delimitation under the Multispecies Coalescent Model with Migration
Kornai D, Jiao X, Ji J, Flouri T and Yang Z
The multispecies coalescent (MSC) model accommodates genealogical fluctuations across the genome and provides a natural framework for comparative analysis of genomic sequence data from closely related species to infer the history of species divergence and gene flow. Given a set of populations, hypotheses of species delimitation (and species phylogeny) may be formulated as instances of MSC models (e.g., MSC for one species versus MSC for two species) and compared using Bayesian model selection. This approach, implemented in the program bpp, has been found to be prone to over-splitting. Alternatively heuristic criteria based on population parameters (such as popula- tion split times, population sizes, and migration rates) estimated from genomic data may be used to delimit species. Here we develop hierarchical merge and split algorithms for heuristic species delimitation based on the genealogical divergence index (𝑔𝑑𝑖) and implement them in a python pipeline called hhsd. We characterize the behavior of the 𝑔𝑑𝑖 under a few simple scenarios of gene flow. We apply the new approaches to a dataset simulated under a model of isolation by distance as well as three empirical datasets. Our tests suggest that the new approaches produced sensible results and were less prone to over-splitting. We discuss possible strategies for accommodating paraphyletic species in the hierarchical algorithm, as well as the challenges of species delimitation based on heuristic criteria.