Nuclear and Mitochondrial Genome Assemblies for the Endangered Wood-Decaying Fungus Somion occarium
Somion occarium is a wood-decaying bracket fungus belonging to an order known to be rich in useful chemical compounds. Despite its widespread distribution, S. occarium has been assessed as endangered on at least 1 national Red List, presumably due to loss of old-growth forest habitat. Here, we present a near-complete, annotated nuclear genome assembly for S. occarium consisting of 31 Mbp arranged in 11 pseudochromosomes-9 of which are telomere-to-telomere-as well as a complete mitochondrial genome assembly of 112.9 Kbp. We additionally performed phylogenomic analysis and annotated carbohydrate-active enzymes (CAZymes) to compare gene and CAZyme content across closely related species. This genome was sequenced as the representative for Kingdom Fungi in the European Reference Genome Atlas Pilot Project.
Horizontal Transfer and Recombination Fuel Ty4 Retrotransposon Evolution in Saccharomyces
Horizontal transposon transfer (HTT) plays an important role in the evolution of eukaryotic genomes; however, the detailed evolutionary history and impact of most HTT events remain to be elucidated. To better understand the process of HTT in closely related microbial eukaryotes, we studied Ty4 retrotransposon subfamily content and sequence evolution across the genus Saccharomyces using short- and long-read whole genome sequence data, including new PacBio genome assemblies for two Saccharomyces mikatae strains. We find evidence for multiple independent HTT events introducing the Tsu4 subfamily into specific lineages of Saccharomyces paradoxus, Saccharomyces cerevisiae, Saccharomyces eubayanus, Saccharomyces kudriavzevii and the ancestor of the S. mikatae/Saccharomyces jurei species pair. In both S. mikatae and S. kudriavzevii, we identified novel Ty4 clades that were independently generated through recombination between resident and horizontally transferred subfamilies. Our results reveal that recurrent HTT and lineage-specific extinction events lead to a complex pattern of Ty4 subfamily content across the genus Saccharomyces. Moreover, our results demonstrate how HTT can lead to coexistence of related retrotransposon subfamilies in the same genome that can fuel evolution of new retrotransposon clades via recombination.
The chromosome-level genome of the ctenophore Mnemiopsis leidyi A. Agassiz, 1865 reveals a unique immune gene repertoire
Ctenophora are basal marine metazoans, the sister group of all other animals. Mnemiopsis leidyi is one of the most successful invasive species worldwide with intense ecological and evolutionary research interest. Here, we generated a chromosome-level genome assembly of M. leidyi with a focus on its immune gene repertoire. The genome was 247.97 Mb, with N50 16.84 Mb, and 84.7% completeness. Its karyotype was 13 chromosomes. In this genome and that of two other ctenophores, Bolinopsis microptera and Hormiphora californensis, we detected a high number of protein domains related to potential immune receptors. Among those, proteins containing Toll/interleukin-1(TIR2) domain, NACHT domain, Scavenger Receptor Cystein-Rich (SRCR) domain, or C-type Lectin domain (CTLD) were abundant and presented unique domain architectures in M. leidyi. M. leidyi seems to lack bona fide Toll like Receptors, but it does possess a repertoire of 15 TIR2-domain containing genes. Besides, we detected a bona fide NOD-like receptor and 38 NACHT-domain containing genes. In order to verify the function of those domain containing genes, we exposed M. leidyi to the pathogen Vibrio coralliilyticus. Among the differentially expressed genes, we identified potential immune receptors, including four TIR2-domain containing genes, all of which were upregulated in response to pathogen exposure. To conclude, many common immune receptor domains, highly conserved across metazoans, are already present in Ctenophora. These domains have large expansions and unique architectures in M. leidyi, findings consistent with the basal evolutionary position of this group, but still might have conserved functions in immunity and host-microbe interaction.
Phylogenetic signal in primate tooth enamel proteins and its relevance for paleoproteomics
Ancient tooth enamel, and to some extent dentin and bone, contain characteristic peptides that persist for long periods of time. In particular, peptides from the enamel proteome (enamelome) have been used to reconstruct the phylogenetic relationships of fossil taxa. However, the enamelome is based on only about 10 genes, whose protein products undergo fragmentation in vivo and post mortem. This raises the question as to whether the enamelome alone provides enough information for reliable phylogenetic inference. We address these considerations on a selection of enamel-associated proteins that has been computationally predicted from genomic data from 232 primate species. We created multiple sequence alignments for each protein and estimated the evolutionary rate for each site. We examined which sites overlap with the parts of the protein sequences that are typically isolated from fossils. Based on this, we simulated ancient data with different degrees of sequence fragmentation, followed by phylogenetic analysis. We compared these trees to a reference species tree. Up to a degree of fragmentation that is similar to that of fossil samples from 1-2 million years ago, the phylogenetic placements of most nodes at family level are consistent with the reference species tree. We tested phylogenetic analysis on combinations of different enamel proteins and found that the composition of the proteome can influence deep splits in the phylogeny. With our methods, we provide guidance for researchers on how to evaluate the potential of paleoproteomics for phylogenetic studies before sampling valuable ancient specimens.
Population Genomics of Japanese Macaques (Macaca fuscata): Insights Into Deep Population Divergence and Multiple Merging Histories
The influence of long-term climatic changes such as glacial cycles on the history of living organisms has been a subject of research for decades, but the detailed population dynamics during the environmental fluctuations and their effects on genetic diversity and genetic load are not well understood on a genome-wide scale. The Japanese macaque (Macaca fuscata) is a unique primate adapted to the cold environments of the Japanese archipelago. Despite the past intensive research for the Japanese macaque population genetics, the genetic background of Japanese macaques at the whole-genome level has been limited to a few individuals, and the comprehensive demographic history and genetic differentiation of Japanese macaques have been underexplored. We conducted whole-genome sequencing of 64 Japanese macaque individuals from 5 different regions, revealing significant genetic differentiation and functional variant diversity across populations. In particular, Japanese macaques have low genetic diversity and harbor many shared and population-specific gene loss, which might contribute to population-specific phenotypes. Our estimation of population demography using phased haplotypes suggested that, after the strong population bottleneck shared among all populations around 400 to 500 kya, the divergence among populations initiated around 150 to 200 kya, but there has been the time with strong gene flow between some populations after the split, indicating multiple population split and merge events probably due to habitat fragmentation and fusion during glacial cycles. These findings not only present a complex population history of Japanese macaques but also enhance their value as research models, particularly in neuroscience and behavioral studies. This comprehensive genomic analysis sheds light on the adaptation and evolution of Japanese macaques, contributing valuable insights to both evolutionary biology and biomedical research.
The B Chromosome of Pseudococcus viburni: A Selfish Chromosome that Exploits Whole-Genome Meiotic Drive
Meiosis is generally a fair process: each chromosome has a 50% chance of being included into each gamete. However, meiosis can become aberrant with some chromosomes having a higher chance of making it into gametes than others. Yet, why and how such systems evolve remains unclear. Here, we study the unusual reproductive genetics of mealybugs, where only maternal-origin chromosomes are included in gametes during male meiosis, while paternal chromosomes are eliminated. One species-Pseudococcus viburni-has a segregating B chromosome that drives by escaping paternal genome elimination. We present whole genome and gene expression data from lines with and without B chromosomes. We identify B-linked sequences including 204 protein-coding genes and a satellite repeat that makes up a significant proportion of the chromosome. The few paralogs between the B and the core genome are distributed throughout the genome, arguing against a simple, or at least recent, chromosomal duplication of one of the autosomes to create the B. We do, however, find one 373 kb region containing 146 genes that appears to be a recent translocation. Finally, we show that while many B-linked genes are expressed during meiosis, most of these are encoded on the recently translocated region. Only a small number of B-exclusive genes are expressed during meiosis. Of these, only one was overexpressed during male meiosis, which is when the drive occurs: an acetyltransferase involved in H3K56Ac, which has a putative role in meiosis and is, therefore, a promising candidate for further studies.
Convergent evolution and predictability of gene copy numbers associated with diets in mammals
Convergent evolution, the evolution of the same or similar phenotypes in phylogenetically independent lineages, is a widespread phenomenon in nature. If the genetic basis for convergent evolution is predictable to some extent, it may be possible to infer organismic phenotypes and the capability of organisms to utilize new ecological resources based on genome sequence data. While repeated amino acid changes have been studied in association with convergent evolution, relatively little is known about the potential contribution of repeated gene copy number changes. In this study, we explore whether gene copy number changes of particular gene families are linked to diet shifts in mammals and assess if trophic ecology can be inferred from the copy numbers of a specific set of gene families. Using 86 mammalian genome sequences, we identified 24 gene families with a trend toward higher copy numbers in herbivores, carnivores, and omnivores, even after phylogenetic corrections. We were able to confirm previous findings on genes such as amylase, olfactory receptors, and xenobiotic metabolism genes, and identify novel gene families whose copy numbers correlate with dietary patterns. For example, omnivores exhibited higher copy numbers of genes encoding regulators of translation. We also established a discriminant function based on the copy numbers of 13 gene families that can help predict trophic ecology to some extent. These findings highlight a possible association between convergent evolution and repeated copy number changes in specific gene families, suggesting the potential to develop a method for predicting animal ecology from genome sequence data.
The Complex Epigenetic Panorama in the Multipartite Genome of the Nitrogen-Fixing Bacterium Sinorhizobium meliloti
In prokaryotes, DNA methylation plays roles in DNA repair, gene expression, cell cycle progression, and immune recognition of foreign DNA. Genome-wide methylation patterns can vary between strains, influencing phenotype, and gene transfer. However, broader evolutionary studies on bacterial epigenomic variation remain limited. In this study, we conducted an epigenomic analysis using single-molecule real-time sequencing on 21 strains of Sinorhizobium meliloti, a facultative plant nitrogen-fixing alphaproteobacterium. This species is notable for its multipartite genome structure, consisting of a chromosome, chromid, and megaplasmid, leading to significant genomic and phenotypic diversity. We identified 16 palindromic and nonpalindromic methylated DNA motifs, including N4-methylcytosine and N6-methyladenine modifications, and analyzed their associated methyltransferases. Some motifs were methylated across all strains, forming a core set of epigenomic signatures, while others exhibited variable methylation frequencies, indicating a dispensable (shell) epigenome. Additionally, we observed differences in methylation frequency between replicons and within coding sequences versus regulatory regions, suggesting that methylation patterns may reflect multipartite genome evolution and influence gene regulation. Overall, our findings reveal extensive epigenomic diversity in S. meliloti, with complex epigenomic signatures varying across replicons and genomic regions. These results enhance our understanding of multipartite genome evolution and highlight the potential role of epigenomic diversity in phenotypic variation.
Functional carbohydrate-active enzymes acquired by horizontal gene transfer from plants in the whitefly Bemisia tabaci
Carbohydrate-active enzymes (CAZymes) involved in the degradation of plant cell walls and/or the assimilation of plant carbohydrates for energy uptake are widely distributed in microorganisms. In contrast, they are less frequent in animals, although there are exceptions, including examples of CAZymes acquired by horizontal gene transfer (HGT) from bacteria or fungi in several of phytophagous arthropods and plant-parasitic nematodes. Although the whitefly Bemisia tabaci is a major agricultural pest, knowledge of HGT-acquired CAZymes in this phloem-feeding insect of the Hemiptera order (subfamily Aleyrodinae) is still lacking. We performed a comprehensive and accurate detection of HGT candidates in B. tabaci and identified 136 HGT events, 14 of which corresponding to CAZymes. The B. tabaci HGT-acquired CAZymes were not only of bacterial or fungal origin, but some were also acquired from plants. Biochemical analysis revealed that members of the glycoside hydrolase families 17 (GH17) and 152 (GH152) acquired from plants are functional beta-glucanases with different substrate specificities, suggesting distinct roles. These two CAZymes are the first characterized GH17 and GH152 glucanases in an animal. We identified a lower number of HGT events in the related Aleyrodinae Trialeurodes vaporariorum, with only three HGT-acquired CAZymes, including a GH152 glucanase, with phylogenetic analysis suggesting a unique HGT event in the ancestor of the Aleyrodinae. Another GH152 CAZyme, most likely independently acquired from plants, was also identified in two plant cell-feeding insects of the Thysanoptera order, highlighting the importance of plant-acquired CAZymes in the biology of piercing-sucking insects.
Tandem repeats provide evidence for convergent evolution to similar protein structures
Homology is a key concept underpinning the comparison of sequences across organisms. Sequence-level homology is based on a statistical framework optimized over decades of work. Recently, computational protein structure prediction has enabled large-scale homology inference beyond the limits of accurate sequence alignment. In this regime it is possible to observe nearly identical protein structures lacking detectable sequence similarity. In the absence of a robust statistical framework for structure comparison, it is largely assumed similar structures are homologous. However, it is conceivable that matching structures could arise through convergent evolution, resulting in analogous proteins without shared ancestry. Large databases of predicted structures offer a means of determining whether analogs are present among structure matches. Here, I find that a small subset (∼2.6%) of Foldseek clusters lack sequence-level support for homology, including ∼1% of strong structure matches with TM-score ≥ 0.5. This result by itself does not imply these structure pairs are non-homologous, since their sequences could have diverged beyond the limits of recognition. Yet, strong matches without sequence-level support for homology are enriched in structures with predicted repeats that could induce spurious matches. Some of these structural repeats are underpinned by sequence-level tandem repeats in both matching structures. I show that many of these tandem repeat units have genealogies inconsistent with their corresponding structures sharing a common ancestor, implying these highly similar structure pairs are analogous rather than homologous. This result suggests caution is warranted when inferring homology from structural resemblance alone in the absence of sequence-level support for homology.
Assembly and Annotation of the Tetraploid Salsola tragus (Russian thistle) Genome
This report presents two phased chromosome-scale genome assemblies of allotetraploid Salsola tragus (2n=4x=36) and fills the current genomics resource gap for this species. Flow cytometry estimated 1C genome size was 1.319 Gbp. PacBio HiFi reads were assembled and scaffolded with Hi-C chromatin contact mapping and Bionano optical mapping data. For annotation, a PacBio Iso-Seq library was generated from root, stem, leaf, and floral tissues followed by annotation using a modified Maker pipeline. The assembled haploid S. tragus genomes contained 18 chromosomes each, with 9 chromosomes assigned to subgenome A and 9 chromosomes to subgenome B. Each haplome assembly represented 95% of the total flow cytometry estimated genome size. Haplome 1 and haplome 2 contained 43,354 and 42,221 annotated genes, respectively. The availability of high-quality reference genomes for this economically important weed will facilitate future omics analysis of S. tragus and a better understanding of chenopod plants.
Massive gene loss in the fungus Sporothrix epigloea accompanied a shift to life in a glucuronoxylomannan-based gel matrix
Fungi are well known for their ability to both produce and catabolize complex carbohydrates to acquire carbon, often in the most extreme of environments. Glucuronoxylomannan (GXM)-based gel matrices are widely produced by fungi in nature and though they are of key interest in medicine and pharmaceuticals, their biodegradation is poorly understood. Though some organisms, including other fungi, are adapted to life in and on GXM-like matrices in nature, they are almost entirely unstudied, and it is unknown if they are involved in matrix degradation. Sporothrix epigloea is an ascomycete fungus that completes its life cycle entirely in the short-lived secreted polysaccharide matrix of a white jelly fungus, Tremella fuciformis. To gain insight into how S. epigloea adapted to life in this unusual microhabitat, we compared the predicted protein composition of S. epigloea to that of 21 other Sporothrix species. We found that the genome of S. epigloea is smaller than that of any other sampled Sporothrix, with widespread functional gene loss, including those coding for serine proteases and biotin synthesis. In addition, many predicted CAZymes degrading both plant and fungal cell wall components were lost while a lytic polysaccharide monooxygenase (LPMO) with no previously established activity or substrate specificity, appears to have been gained. Phenotype assays suggest narrow use of mannans and other oligosaccharides as carbon sources. Taken together, the results suggest a streamlined machinery, including potential carbon sourcing from GXM building blocks, facilitates the hyperspecialized ecology of S. epigloea in the GXM-like milieu.
Plasmodium falciparum CyRPA glycan binding does not explain adaptation to humans
The human malaria parasite Plasmodium falciparum evolved from a parasite that infects gorillas, termed Plasmodium praefalciparum. The sialic acids on glycans on the surface of erythrocytes differ between humans and other apes. It has recently been shown that the P. falciparum cysteine-rich protective antigen (PfCyRPA) binds human sialoglycans as an essential step in the erythrocyte invasion pathway, while that of the chimpanzee parasite Plasmodium reichenowi has affinities matching ape glycans. Two amino acid changes, at sites 154 and 209, were shown to be sufficient to switch glycan binding preferences and inferred to reflect adaptation of P. falciparum to humans. However, we show that sites 154 and 209 are identical in P. falciparum and P. praefalciparum, with no other differences located in or near the CyRPA glycan binding sites. Thus, the gorilla precursor appears to have already been preadapted to bind human sialoglycans.
Adaptation in the Alleyways: Candidate Genes Under Potential Selection in Urban Coyotes
In the context of evolutionary time, cities are an extremely recent development. Although our understanding of how urbanization alters ecosystems is well developed, empirical work examining the consequences of urbanization on adaptive evolution remains limited. To facilitate future work, we offer candidate genes for one of the most prominent urban carnivores across North America. The coyote (Canis latrans) is a highly adaptable carnivore distributed throughout urban and nonurban regions in North America. As such, the coyote can serve as a blueprint for understanding the various pathways by which urbanization can influence the genomes of wildlife via comparisons along urban-rural gradients, as well as between metropolitan areas. Given the close evolutionary relationship between coyotes and domestic dogs, we leverage the well-annotated dog genome and highly conserved mammalian genes from model species to outline how urbanization may alter coyote genotypes and shape coyote phenotypes. We identify variables that may alter selection pressure for urban coyotes and offer suggestions of candidate genes to explore. Specifically, we focus on pathways related to diet, health, behavior, cognition, and reproduction. In a rapidly urbanizing world, understanding how species cope and adapt to anthropogenic change can facilitate the persistence of, and coexistence with, these species.
ZW sex chromosome differentiation in paleognathous birds is associated with mitochondrial effective population size but not mitochondrial genome size or mutation rate
Eukaryotic genome size varies considerably, even among closely related species. The causes of this variation are unclear, but weak selection against supposedly costly "extra" genomic sequences has been central to the debate for over 50 years. The mutational hazard hypothesis, which focuses on the increased mutation rate to null alleles in superfluous sequences, is particularly influential, though challenging to test. This study examines the sex chromosomes and mitochondrial genomes of 15 flightless or semi-flighted paleognathous bird species. In this clade, the non-recombining portion of the W chromosome has independently expanded stepwise in multiple lineages. Given the shared maternal inheritance of the W chromosome and mitochondria, theory predicts that mitochondrial effective population size (Ne) should decrease due to increased Hill-Robertson Interference in lineages with expanded non-recombining W regions. Our findings support the extent of the non-recombining W region with three indicators of reduced selective efficiency: (1) the ratio of non-synonymous to synonymous nucleotide changes in the mitochondrion, (2) the probability of radical amino acid changes, and (3) the number of ancient, W-linked genes lost through evolution. Next, we tested whether reduced Ne affects mitochondrial genome size, as predicted by weak selection against genome expansion. We find no support for a relationship between mitochondrial genome size and expanded non-recombining W regions, nor with increased mitochondrial mutation rates (predicted to modulate selective costs). These results highlight the utility of non-recombining regions and mitochondrial genomes for studying genome evolution and challenge the general idea of a negative relation between the efficacy of selection and genome size.
A Complete Assembly and Annotation of the American Shad Genome Yields Insights into the Origins of Diadromy
Transitions across ecological boundaries, such as those separating freshwater from the sea, are major drivers of phenotypic innovation and biodiversity. Despite their importance to evolutionary history, we know little about the mechanisms by which such transitions are accomplished. To help shed light on these mechanisms, we generated the first high-quality, near-complete assembly and annotation of the genome of the American shad (Alosa sapidissima), an ancestrally diadromous (migratory between salinities) fish in the order Clupeiformes of major cultural and historical significance. Among the Clupeiformes, there is a large amount of variation in salinity habitat and many independent instances of salinity boundary crossing, making this taxon well-suited for studies of mechanisms underlying ecological transitions. Our initial analysis of the American shad genome reveals several unique insights for future study including: (i) that genomic repeat content is among the highest of any fish studied to date; (ii) that genome-wide heterozygosity is low and may be associated with range-wide population collapses since the 19th century; and (iii) that natural selection has acted on the branch leading to the diadromous genus Alosa. Our analysis suggests that functional targets of natural selection may include diet, particularly lipid metabolism, as well as cytoskeletal remodeling and sensing of salinity changes. Natural selection on these functions is expected in the transition from a marine to diadromous life history, particularly in the tolerance of nutrient- and ion-devoid freshwater. We anticipate that our assembly of the American shad genome will be used to test future hypotheses on adaptation to novel environments, the origins of diadromy, and adaptive variation in life history strategies, among others.
Protein quality control is a master modulator of molecular evolution in bacteria
The bacterial protein quality control (PQC) network comprises a set of genes that promote proteostasis (proteome homeostasis) through proper protein folding and function via chaperones, proteases, and a protein translational machinery. It participates in vital cellular processes and influences organismal development and evolution. In this review, we examine the mechanistic bases for how the bacterial PQC network influences molecular evolution. We discuss the relevance of PQC components to contemporary issues in evolutionary biology including epistasis, evolvability, and the navigability of protein space. We examine other areas where proteostasis affects aspects of evolution and physiology, including host-parasite interactions. More generally, we demonstrate that the study of bacterial systems can aid in broader efforts to understand the relationship between genotype and phenotype across the biosphere.
Transcriptomic data reveal divergent paths of chitinase evolution underlying dietary convergence in anteaters and pangolins
Ant-eating mammals represent a textbook example of convergent evolution. Among them, anteaters and pangolins exhibit the most extreme convergent phenotypes with complete tooth loss, elongated skulls, protruding tongues, and hypertrophied salivary glands producing large amounts of saliva. However, comparative genomic analyses have shown that anteaters and pangolins differ in their chitinase acidic gene (CHIA) repertoires, which potentially degrade the chitinous exoskeletons of ingested ants and termites. While the southern tamandua (Tamandua tetradactyla) harbors four functional CHIA paralogs (CHIA1-4), Asian pangolins (Manis spp.) have only one functional paralog (CHIA5). Here, we performed a comparative transcriptomic analysis of salivary glands in 33 placental species, including 16 novel transcriptomes from ant-eating species and close relatives. Our results suggest that salivary glands play an important role in adaptation to an insect-based diet, as expression of different CHIA paralogs is observed in insectivorous species. Furthermore, convergently-evolved pangolins and anteaters express different chitinases in their digestive tracts. In the Malayan pangolin, CHIA5 is overexpressed in all major digestive organs, whereas in the southern tamandua, all four functional paralogs are expressed, at very high levels for CHIA1 and CHIA2 in the pancreas, and for CHIA3 and CHIA4 in the salivary glands, stomach, liver, and pancreas. Overall, our results demonstrate that divergent molecular mechanisms within the chitinase acidic gene family underlie convergent adaptation to the ant-eating diet in pangolins and anteaters. This study highlights the role of historical contingency and molecular tinkering of the chitin-digestive enzyme toolkit in this classic example of convergent evolution.
Evolutionary Genomics of Two Co-occurring Congeneric Fore Reef Coral Species on Guam (Mariana Islands)
Population structure provides essential information for developing meaningful conservation plans. This is especially important in remote places, such as oceanic islands, where limited population sizes and genetic isolation can make populations more susceptible and self-dependent. In this study, we assess and compare the relatedness, population genetics and molecular ecology of two sympatric Acropora species, A. surculosa sensu Randall & Myers (1983) and A. cf. verweyi Veron & Wallace, 1984 around Guam, using genome-wide sequence data (ddRAD). We further contrast our findings with the results of a recent study on back reef A. cf. pulchra (Brook, 1891) to assess the impact of habitat, colony morphology, and phylogenetic relatedness on these basic population genetic characteristics and generate testable hypotheses for future studies. Both target species were found to have small effective population sizes, low levels of genetic diversity, and minimal population structure around Guam. Nonetheless, A. cf. verweyi had significantly higher levels of genetic diversity, some population structure as well as more clones, close relatives and putative loci under selection. Comparisons with A. cf. pulchra indicate a potentially significant impact by habitat on population structure and genetic diversity while colony morphology seems to significantly impact clonality. This study revealed significant differences in the basic population genetic makeup of two sympatric Acropora species on Guam. Our results suggest that colony morphology and habitat/ecology may have a significant impact on the population genetic makeup in reef corals, which could offer valuable insights for future management decisions in the absence of genetic data.
Novel High-Quality Amoeba Genomes Reveal Widespread Codon Usage Mismatch Between Giant Viruses and Their Hosts
The need for high-quality protist genomes has prevented in-depth computational and experimental studies of giant virus-host interactions. In addition, our current knowledge of host range is highly biased due to the few hosts used to isolate novel giant viruses. This study presents 6 high-quality amoeba genomes from known and potential giant virus hosts belonging to 2 distinct eukaryotic clades: Amoebozoa and Discoba. We employ their genomic data to investigate the predictability of giant virus host range. Using a combination of long- and short-read sequencing, we obtained highly contiguous and complete genomes of Acanthamoeba castellanii, Acanthamoeba griffini, Acanthamoeba terricola, Naegleria clarki, Vermamoeba vermiformis, and Willaertia magna, contributing to the collection of sequences for the eukaryotic tree of life. We found that the 6 amoebae have distinct codon usage patterns and that, contrary to other virus groups, giant viruses often have different and even opposite codon usage with their known hosts. Conversely, giant viruses with matching codon usage are frequently not known to infect or replicate in these hosts. Interestingly, analyses of integrated viral sequences in the amoeba host genomes reveal potential novel virus-host associations. Matching of codon usage preferences is often used to predict virus-host pairs. However, with the broad-scale analyses performed in this study, we demonstrate that codon usage alone appears to be a poor predictor of host range for giant viruses infecting amoeba. We discuss the potential strategies that giant viruses employ to ensure high viral fitness in nonmatching hosts. Moreover, this study emphasizes the need for more high-quality protist genomes. Finally, the amoeba genomes presented in this study set the stage for future experimental studies to better understand how giant viruses interact with different host species.
Chromosome-Scale Assembly of Capsella orientalis, Maternal Progenitor of Cosmopolitan Allotetraploid C. bursa-pastoris
The genus Capsella serves as a model for understanding speciation, hybridization, and genome evolution in plants. Here, we present a chromosome-scale genome assembly of Capsella orientalis, the maternal progenitor of a cosmopolitan allotetraploid C. bursa-pastoris. Using nanopore sequencing and data on chromatin contacts (Hi-C), we assembled the genome into eight pseudo-chromosomes with high contiguity, evidenced by a benchmarking universal single-copy orthologs (BUSCO) completeness score of 99.3%. Comparative analysis with C. rubella and C. bursa-pastoris revealed overall synteny, except for 2 Mb inversion on chromosome 4 of C. rubella. Comparative genome analysis highlighted the conservation of gene content and structural integrity in the C. orientalis-derived subgenome of C. bursa-pastoris, with the exception of a 1.8 Mb region absent in O subgenome but present in C. orientalis. The genome annotation includes 27,675 protein-coding genes, with most exhibiting one-to-one orthology with Arabidopsis thaliana. Notably, 2,155 genes showed no similarity to A. thaliana ones. These results establish a robust genomic resource for C. orientalis, facilitating future studies on polyploid evolution, gene regulation, and species divergence within Capsella.