GENETICS

Leveraging a new data resource to define the response of C. neoformans to environmental signals
Kang YS, Jung J, Brown H, Mateusiak C, Doering TL and Brent MR
Cryptococcus neoformans is an opportunistic fungal pathogen with a polysaccharide capsule that becomes greatly enlarged in the mammalian host and during in vitro growth under host-like conditions. To understand how individual environmental signals affect capsule size and gene expression, we grew cells in all combinations of five signals implicated in capsule size and systematically measured cell and capsule sizes. We also sampled these cultures over time and performed RNA-Seq in quadruplicate, yielding 881 RNA-Seq samples. Analysis of the resulting data sets showed that capsule induction in tissue culture medium, typically used to represent host-like conditions, requires the presence of either CO2 or exogenous cyclic AMP (cAMP). Surprisingly, adding either of these pushes overall gene expression in the opposite direction from tissue culture media alone, even though both are required for capsule development. Another unexpected finding was that rich medium blocks capsule growth completely. Statistical analysis further revealed many genes whose expression is associated with capsule thickness; deletion of one of these significantly reduced capsule size. Beyond illuminating capsule induction, our massive, uniformly collected dataset will be a significant resource for the research community.
A review of multimodal deep learning methods for genomic-enabled prediction in plant breeding
Montesinos-López OA, Chavira-Flores M, Kiasmiantini , Crespo-Herrera L, Saint Piere C, Li H, Fritsche-Neto R, Al-Nowibet K, Montesinos-López A and Crossa J
Deep learning methods have been applied when working to enhance the prediction accuracy of traditional statistical methods in the field of plant breeding. Although deep learning seems to be a promising approach for genomic prediction, it has proven to have some limitations, since its conventional methods fail to leverage all available information. Multimodal deep learning methods aim to improve the predictive power of their unimodal counterparts by introducing several modalities (sources) of input information. In this review, we introduce some theoretical basic concepts of multimodal deep learning and provide a list of the most widely used neural network architectures in deep learning, as well as the available strategies to fuse data from different modalities. We mention some of the available computational resources for the practical implementation of multimodal deep learning problems. We finally performed a review of applications of multimodal deep learning to genomic selection in plant breeding and other related fields. We present a meta-picture of the practical performance of multimodal deep learning methods to highlight how these tools can help address complex problems in the field of plant breeding. We discussed some relevant considerations that researchers should keep in mind when applying multimodal deep learning methods. Multimodal deep learning holds significant potential for various fields, including genomic selection. While multimodal deep learning displays enhanced prediction capabilities over unimodal deep learning and other machine learning methods, it demands more computational resources. Multimodal deep learning effectively captures intermodal interactions, especially when integrating data from different sources. To apply multimodal deep learning in genomic selection, suitable architectures and fusion strategies must be chosen. It is relevant to keep in mind that multimodal deep learning, like unimodal deep learning, is a powerful tool but should be carefully applied. Given its predictive edge over traditional methods, multimodal deep learning is valuable in addressing challenges in plant breeding and food security amid a growing global population.
Megavariate Methods Capture Complex Genotype-by-Environment Interactions
Xavier A, Runcie D and Habier D
Genomic prediction models that capture genotype-by-environment interaction are useful for predicting site-specific performance by leveraging information among related individuals and correlated environments, but implementing such models is computationally challenging. This study describes the algorithm of these scalable approaches, including two models with latent representations of genotype-by-environment interactions, namely MegaLMM and MegaSEM, and an efficient multivariate mixed model solver, namely PEGS, fitting different covariance structures (unstructured, XFA, HCS). Accuracy and runtime are benchmarked on simulated scenarios with varying numbers of genotypes and environments. MegaLMM and PEGS-based XFA and HCS models provided the highest accuracy under sparse testing with 100 testing environments. PEGS-based unstructured model was orders of magnitude faster than REML-based multivariate GBLUP while providing the same accuracy. MegaSEM provided the lowest runtime, fitting a model with 200 traits and 20,000 individuals in approximately 5 minutes, and a model with 2,000 traits and 2,000 individuals in less than 3 minutes. With the G2F data, the most accurate predictions were attained with the univariate model fitted across environments and by averaging environment-level GEBVs from models with HCS and XFA covariance structures.
A single mutation G454A in the P450 CYP9K1 drives pyrethroid resistance in the major malaria vector Anopheles funestus reducing bed net efficacy
Djoko Tagne CS, Kouamo MFM, Tchouakui M, Muhammad A, Mugenzi LJL, Tatchou-Nebangwa NMT, Thiomela RF, Gadji M, Wondji MJ, Hearn J, Desire MH, Ibrahim SS and Wondji CS
Metabolic mechanisms conferring pyrethroid resistance in malaria vectors are jeopardizing the effectiveness of insecticide-based interventions, and identification of their markers is a key requirement for robust resistance management. Here, using a field-lab-field approach, we demonstrated that a single mutation G454A in the P450 CYP9K1 is driving pyrethroid resistance in the major malaria vector Anopheles funestus in East and Central Africa. Drastic reduction in CYP9K1 diversity was observed in Ugandan samples collected in 2014, with selection of a predominant haplotype (G454A mutation at 90%), which was completely absent in the other African regions. However, six years later (2020) the Ugandan 454A-CYP9K1 haplotype was found predominant in Cameroon (84.6%), but absent in Malawi (Southern Africa) and Ghana (West Africa). Comparative in vitro heterologous expression and metabolism assays revealed that the mutant 454A-CYP9K1 (R) allele significantly metabolises more type II pyrethroid (deltamethrin) compared with the wild G454-CYP9K1 (S) allele. Transgenic Drosophila melanogaster flies expressing 454A-CYP9K1 (R) allele exhibited significantly higher type I and II pyrethroids resistance compared to flies expressing the wild G454-CYP9K1 (S) allele. Furthermore, laboratory testing and field experimental hut trials in Cameroon demonstrated that mosquitoes harbouring the resistant 454A-CYP9K1 allele significantly survived to pyrethroids exposure (Odds ratio = 567, p < 0.0001). This study highlights the rapid spread of pyrethroid resistant CYP9K1 allele, under directional selection in East and Central Africa, contributing to reduced bed net efficacy. The newly designed DNA-based assay here will add to the toolbox of resistance monitoring and improving its management strategies.
Population size rescaling significantly biases outcomes of forward-in-time population genetic simulations
Dabi A and Schrider DR
Simulations are an essential tool in all areas of population genetic research, used in tasks such as the validation of theoretical analysis and the study of complex evolutionary models. Forward-in-time simulations are especially flexible, allowing for various types of natural selection, complex genetic architectures, and non-Wright-Fisher dynamics. However, their intense computational requirements can be prohibitive to simulating large populations and genomes. A popular method to alleviate this burden is to scale down the population size by some scaling factor while scaling up the mutation rate, selection coefficients, and recombination rate by the same factor. However, this rescaling approach may in some cases bias simulation results. To investigate the manner and degree to which rescaling impacts simulation outcomes, we carried out simulations with different demographic histories and distributions of fitness effects using several values of the rescaling factor, Ǫ, and compared the deviation of key outcomes (fixation times, allele frequencies, linkage disequilibrium, and the fraction of mutations that fix during the simulation) between the scaled and unscaled simulations. Our results indicate that scaling introduces substantial biases to each of these measured outcomes, even at small values of Ʈ. Moreover, the nature of these effects depends on the evolutionary model and scaling factor being examined. While increasing the scaling factor tends to increase the observed biases, this relationship is not always straightforward, thus it may be difficult to know the impact of scaling on simulation outcomes a priori. However, it appears that for most models, only a small number of replicates was needed to accurately quantify the bias produced by rescaling for a given Ʈ. In summary, while rescaling forward-in-time simulations may be necessary in many cases, researchers should be aware of the rescaling procedure's impact on simulation outcomes and consider investigating its magnitude in smaller scale simulations of the desired model(s) before selecting an appropriate value of Ʈ.
A path integral approach for allele frequency dynamics under polygenic selection
Anderson NW, Kirk L, Schraiber JG and Ragsdale AP
Many phenotypic traits have a polygenic genetic basis, making it challenging to learn their genetic architectures and predict individual phenotypes. One promising avenue to resolve the genetic basis of complex traits is through evolve-and-resequence experiments, in which laboratory populations are exposed to some selective pressure and trait-contributing loci are identified by extreme frequency changes over the course of the experiment. However, small laboratory populations will experience substantial random genetic drift, and it is difficult to determine whether selection played a role in a given allele frequency change. Predicting allele frequency changes under drift and selection, even for alleles contributing to simple, monogenic traits, has remained a challenging problem. Recently, there have been efforts to apply the path integral, a method borrowed from physics, to solve this problem. So far, this approach has been limited to genic selection, and is therefore inadequate to capture the complexity of quantitative, highly polygenic traits that are commonly studied. Here we extend one of these path integral methods, the perturbation approximation, to selection scenarios that are of interest to quantitative genetics. We derive analytic expressions for the transition probability (i.e., the probability that an allele will change in frequency from x to y in time t) of an allele contributing to a trait subject to stabilizing selection, as well as that of an allele contributing to a trait rapidly adapting to a new phenotypic optimum. We use these expressions to characterize the use of allele frequency change to test for selection, as well as explore optimal design choices for evolve-and-resequence experiments to uncover the genetic architecture of polygenic traits under selection.
Bayesian hierarchical hypothesis testing in large-scale genome-wide association analysis
Samaddar A, Maiti T and de Los Campos G
Variable selection and large-scale hypothesis testing are techniques commonly used to analyze high-dimensional genomic data. Despite recent advances in theory and methodology, variable selection and inference with highly collinear features remain challenging. For instance, collinearity poses a great challenge in genome-wide association studies involving millions of variants, many of which may be in high linkage disequilibrium. In such settings, collinearity can significantly reduce the power of variable selection methods to identify individual variants associated with an outcome. To address such challenges, we developed a Bayesian hierarchical hypothesis testing (BHHT)-a novel multiresolution testing procedure that offers high power with adequate error control and fine-mapping resolution. We demonstrate through simulations that the proposed methodology has a power-FDR performance that is competitive with (and in many scenarios better than) state-of-the-art methods. Finally, we demonstrate the feasibility of using BHHT with large sample size (n∼ 300,000) and ultra dimensional genotypes (∼ 15 million single-nucleotide polymorphisms or SNPs) by applying it to eight complex traits using data from the UK-Biobank. Our results show that the proposed methodology leads to many more discoveries than those obtained using traditional SNP-centered inference procedures. The article is accompanied by open-source software that implements the methods described in this study using algorithms that scale to biobank-size ultra-high-dimensional data.
The recombination landscape of the barn owl, from families to populations
Topaloudis A, Cumer T, Lavanchy E, Ducrest AL, Simon C, Machado AP, Paposhvili N, Roulin A and Goudet J
Homologous recombination is a meiotic process that generates diversity along the genome and interacts with all evolutionary forces. Despite its importance, studies of recombination landscapes are lacking due to methodological limitations and limited data. Frequently used approaches include linkage mapping based on familial data that provides sex-specific broad-scale estimates of realized recombination and inferences based on population LD that reveal a more fine scale resolution of the recombination landscape, albeit dependent on the effective population size and the selective forces acting on the population. In this study, we use a combination of these two methods to elucidate the recombination landscape for the Afro-European barn owl (Tyto alba). We find subtle differences in crossover placement between sexes that leads to differential effective shuffling of alleles. LD based estimates of recombination are concordant with family-based estimates and identify large variation in recombination rates within and among linkage groups. Larger chromosomes show variation in recombination rates, while smaller chromosomes have a universally high rate which shapes the diversity landscape. We find that recombination rates are correlated with gene content, genetic diversity and GC content. We find no conclusive differences in the recombination landscapes between populations. Overall, this comprehensive analysis enhances our understanding of recombination dynamics, genomic architecture, and sex-specific variation in the barn owl, contributing valuable insights to the broader field of avian genomics.
Network hub gene detection using the entire solution path information
Kuismin M and Sillanp MJ
Gene co-expression networks typically comprise modules and their associated hub genes, which are regulating numerous downstream interactions within the network. Methods for hub screening, as well as data-driven estimation of hub co-expression networks using graphical models, can serve as useful tools for identifying these hubs. Graphical model-based penalization methods typically have one or multiple regularization terms, each of which encourages some favorable characteristics (e.g., sparsity, hubs, power-law) to the estimated complex gene network. It is common practice to find a single optimal graphical model corresponding to a specific value of the regularization parameter(s). However, instead of doing this, one could aggregate information across several graphical models, all of which depend on the same data set, along the solution path in the hub gene detection process. We propose a novel method for detecting hub genes that utilizes the information available in the solution path. Our procedure is related to stability selection, but we replace resampling with a simple statistic. This procedure amalgamates information from each node of the data-driven graphical models into a single influence statistic, similar to Cook's distance. We call this statistic the Mean Degree Squared Distance (MDSD). Our simulation and empirical studies demonstrate that the MDSD statistic maintains a good balance between false positive and true positive hubs. An R package MDSD is publicly available on GitHub under the General Public License https://github.com/markkukuismin/MDSD.
Balancing selfing and outcrossing: the genetics and cell biology of nematodes with three sexual morphs
Adams S, Tandonnet S and Pires-daSilva A
Trioecy, a rare reproductive system where hermaphrodites, females, and males coexist, is found in certain algae, plants, and animals. Though it has evolved independently multiple times, its rarity suggests it may be an unstable or transitory evolutionary strategy. In the well-studied Caenorhabditis elegans, attempts to engineer a trioecious strain have reverted to the hermaphrodite/male system, reinforcing this view. However, these studies did not consider the sex-determination systems of naturally stable trioecious species. The discovery of free-living nematodes of the Auanema genus, which have naturally stable trioecy, provides an opportunity to study these systems. In Auanema, females produce only oocytes, while hermaphrodites produce both oocytes and sperm for self-fertilization. Crosses between males and females primarily produce daughters (XX hermaphrodites and females), while male-hermaphrodite crosses result in sons only. These skewed sex ratios are due to X-chromosome drive during spermatogenesis, where males produce only X-bearing sperm through asymmetric cell division. The stability of trioecy in Auanema is influenced by maternal control over sex determination and environmental cues. These factors offer insights into the genetic and environmental dynamics that maintain trioecy, potentially explaining its evolutionary stability in certain species.
Allopolyploidy expanded gene content but not pangenomic variation in the hexaploid oilseed Camelina sativa
Bird KA, Brock JR, Grabowski PP, Harder AM, Healy A, Shu S, Barry K, Boston L, Daum C, Guo J, Lipzen A, Walstead R, Grimwood J, Schmutz J, Lu C, Comai L, McKay JK, Pires JC, Edger PP, Lovell JT and Kliebenstein DJ
Ancient whole-genome duplications (WGDs) are believed to facilitate novelty and adaptation by providing the raw fuel for new genes. However, it is unclear how recent WGDs may contribute to evolvability within recent polyploids. Hybridization accompanying some WGDs may combine divergent gene content among diploid species. Some theory and evidence suggest that polyploids have a greater accumulation and tolerance of gene presence-absence and genomic structural variation, but it is unclear to what extent either is true. To test how recent polyploidy may influence pangenomic variation, we sequenced, assembled, and annotated twelve complete, chromosome-scale genomes of Camelina sativa, an allohexaploid biofuel crop with three distinct subgenomes. Using pangenomic comparative analyses, we characterized gene presence-absence and genomic structural variation both within and between the subgenomes. We found over 75% of ortholog gene clusters are core in Camelina sativa and <10% of sequence space was affected by genomic structural rearrangements. In contrast, 19% of gene clusters were unique to one subgenome, and the majority of these were Camelina-specific (no ortholog in Arabidopsis). We identified an inversion that may contribute to vernalization requirements in winter-type Camelina, and an enrichment of Camelina-specific genes with enzymatic processes related to seed oil quality and Camelina's unique glucosinolate profile. Genes related to these traits exhibited little presence-absence variation. Our results reveal minimal pangenomic variation in this species, and instead show how hybridization accompanied by WGD may benefit polyploids by merging diverged gene content of different species.
Acentric chromosome congression and alignment on the metaphase plate via kinetochore-independent forces in Drosophila
Vicars H, Mills A, Karg T and Sullivan W
Chromosome congression and alignment on the metaphase plate involves lateral and microtubule plus-end interactions with the kinetochore. Here we take advantage of our ability to efficiently generate a GFP-marked acentric X chromosome fragment in Drosophila neuroblasts to identify forces acting on chromosome arms that drive congression and alignment. We find acentrics efficiently congress and align on the metaphase plate, often more rapidly than kinetochore-bearing chromosomes. Unlike intact chromosomes, the paired sister acentrics oscillate as they move to and reside on the metaphase plate in a plane distinct and significantly further from the main mass of intact chromosomes. Consequently, at anaphase onset acentrics are oriented either parallel or perpendicular to the spindle. Parallel-oriented sisters separate by sliding while those oriented perpendicularly separate via unzipping. This oscillation, together with the fact that in the presence of spindles with disrupted interpolar microtubules acentrics are rapidly shunted away from the poles, indicates that distributed plus-end directed forces are primarily responsible for acentric migration. This conclusion is supported by the observation that reduction of EB1 preferentially disrupts acentric alignment. Taken together these studies suggest that plus-end forces mediated by the outer interpolar microtubules contribute significantly to acentric congression and alignment. Surprisingly, we observe disrupted telomere pairing and alignment of sister acentrics indicating that the kinetochore is required to ensure proper gene-to-gene alignment of sister chromatids. Finally, we demonstrate that like mammalian cells, the Drosophila congressed chromosomes on occasion exhibit a toroid configuration.
Genomic prediction of heterosis, inbreeding control, and mate allocation in outbred diploid and tetraploid populations
Endelman JB
Breeders have long appreciated the need to balance selection for short-term genetic gain with maintaining genetic variance for long-term gain. For outbred populations, the method called Optimum Contribution Selection (OCS) chooses parental contributions to maximize the average breeding value at a prescribed inbreeding rate. With Optimum Mate Allocation (OMA), the contribution of each mating is optimized, which allows for specific combining ability due to dominance. To enable OCS and OMA in polyploid species, new theoretical results were derived to (1) predict mid-parent heterosis due to dominance and (2) control inbreeding in a population of arbitrary ploidy. A new Convex optimization framework for OMA, named COMA, was developed and released as public software. Under stochastic simulation of a genomic selection program, COMA maintained a target inbreeding rate of 0.5% using either pedigree or genomic IBD kinship. Significantly more genetic gain was realized with pedigree kinship, which is consistent with previous studies showing the selective advantage of an individual under OCS is dominated by its Mendelian sampling term. Despite the higher accuracy (+0.2-0.3) when predicting mate performance with OMA compared to OCS, there was little long-term gain advantage. The sparsity of the COMA mating design and flexibility to incorporate mating constraints offer practical incentives over OCS. In a potato breeding case study with 170 candidates, the optimal solution at 0.5% inbreeding involved 43 parents but only 43 of the 903 possible matings.
A tale of two serines: the effects of histone H2A mutations S122A and S129A on chromosome non-disjunction in Saccharomyces cerevisiae
Kozmin SG, Dominska M, Kokoska RJ and Petes TD
Near the C-terminus of histone H2A in the yeast S. cerevisiae, there are two serines (S122 and S129) that are targets of phosphorylation. The phosphorylation of Serine 129 in response to DNA damage is dependent on the Tel1 and Mec1 kinases. In S. pombe and S. cerevisiae, the phosphorylation of Serine 122 is dependent on the Bub1 kinase, and S. pombe strains with an alanine mutation of this serine have elevated levels of lagging chromosomes in mitosis. Strains that lack both Tel1 and Mec1 in S. cerevisiae have very elevated rates of non-disjunction. To clarify the functional importance of phosphorylation of serines 122 and 129 in H2A, we measured chromosome loss rates in single mutant strains and double mutant combinations. We also examined the interaction of mutations of BUB1, TEL1, and MEC1 in combination with mutations of serine 122 and 129 in H2A. We conclude that the phosphorylation state of S129 has no effect on chromosome disjunction whereas mutations that inactivate Bub1 or a S122A mutation in the histone H2A greatly elevate the rate of chromosome non-disjunction. Based on this analysis, we suggest that Bub1 exerts its primary effect on chromosome disjunction by phosphorylating S122 of histone H2A. However, Tel1, Mec1 and Bub1 also functionally redundant in a second pathway affecting chromosome disjunction that is at least partially independent of phosphorylation of S122 of H2A.
Dominant myosin storage myopathy mutations disrupt striated muscles in Drosophila and the myosin tail-tail interactome of human cardiac thick filaments
Viswanathan MC, Dutta D, Kronert WA, Chitre K, Padron R, Craig R, Bernstein SI and Cammarato A
Myosin storage myopathy (MSM) is a rare skeletal muscle disorder caused by mutations in the slow muscle/β-cardiac myosin heavy chain (MHC) gene. MSM missense mutations frequently disrupt the tail's stabilizing heptad repeat motif. Disease hallmarks include subsarcolemmal hyaline-like β-MHC aggregates, muscle weakness and, occasionally, cardiomyopathy. We generated transgenic, heterozygous Drosophila to examine the dominant physiological and structural effects of the L1793P, R1845W, and E1883K MHC MSM mutations on diverse muscles. The MHC variants reduced lifespan and flight and jump abilities. Moreover, confocal and electron microscopy revealed that they provoked indirect flight muscle breaks and myofibrillar disarray/degeneration with filamentous inclusions. Incorporation of GFP-myosin enabled in situ determination of thick filament lengths, which were significantly reduced in all mutants. Semi-automated heartbeat analysis uncovered aberrant cardiac function, which worsened with age. Thus, our fly models phenocopied traits observed among MSM patients. We additionally mapped the mutations onto a recently-determined, 6Å resolution, cryo-EM structure of the human cardiac thick filament. The R1845W mutation replaces a basic arginine with a polar-neutral, bulkier tryptophan, while E1883K reverses charge at critical filament loci. Both would be expected to disrupt the core and the outer shell of the backbone structure. Replacing L1793 with a proline, a potent breaker of alpha-helices, could disturb the coiled-coil of the myosin rod and alter the tail-tail interactome. Hence, all mutations likely destabilize and weaken the filament backbone. This may trigger disease in humans, while potentially analogous perturbations are likely to yield the observed thick filament and muscle disruption in our fly models.
Saccharomyces Genome Database: Advances in Genome Annotation, Expanded Biochemical Pathways, and Other Key Enhancements
Engel SR, Aleksander S, Nash RS, Wong ED, Weng S, Miyasato SR, Sherlock G and Cherry JM
Budding yeast (Saccharomyces cerevisiae) is the most extensively characterized eukaryotic model organism and has long been used to gain insight into the fundamentals of genetics, cellular biology, and the functions of specific genes and proteins. The Saccharomyces Genome Database (SGD) is a scientific resource that provides information about the genome and biology of S. cerevisiae. For more than 30 years, SGD has maintained the genetic nomenclature, chromosome maps, and functional annotation for budding yeast along with search and analysis tools to explore these data. Here we describe recent updates at SGD, including the two most recent reference genome annotation updates, expanded biochemical pathways representation, changes to SGD search and data files, and other enhancements to the SGD website and user interface. These activities are part of our continuing effort to promote insights gained from yeast to enable the discovery of functional relationships between sequence and gene products in fungi and higher eukaryotes.
Editor's Note: Ribosome Association and Stability of the Nascent Polypeptide-Associated Complex Is Dependent Upon Its Own Ubiquitination
Higher-order epistasis within Pol II trigger loop haplotypes
Duan B, Qiu C, Lockless SW, Sze SH and Kaplan CD
RNA polymerase II (Pol II) has a highly conserved domain, the trigger loop (TL), that controls transcription fidelity and speed. We previously probed pairwise genetic interactions between residues within and surrounding the TL for the purpose of understand functional interactions between residues and to understand how individual mutants might alter TL function. We identified widespread incompatibility between TLs of different species when placed in the Saccharomyces cerevisiae Pol II context, indicating species-specific interactions between otherwise highly conserved TLs and its surroundings. These interactions represent epistasis between TL residues and the rest of Pol II. We sought to understand why certain TL sequences are incompatible with S. cerevisiae Pol II and to dissect the nature of genetic interactions within multiply substituted TLs as a window on higher order epistasis in this system. We identified both positive and negative higher-order residue interactions within example TL haplotypes. Intricate higher-order epistasis formed by TL residues was sometimes only apparent from analysis of intermediate genotypes, emphasizing complexity of epistatic interactions. Furthermore, we distinguished TL substitutions with distinct classes of epistatic patterns, suggesting specific TL residues that potentially influence TL evolution. Our examples of complex residue interactions suggest possible pathways for epistasis to facilitate Pol II evolution.
MegaLMM improves genomic predictions in new environments using environmental covariates
Hu H, Rincent R and Runcie DE
Multi-environment trials (METs) are crucial for identifying varieties that perform well across a target population of environments (TPE). However, METs are typically too small to sufficiently represent all relevant environment-types, and face challenges from changing environment-types due to climate change. Statistical methods that enable prediction of variety performance for new environments beyond the METs are needed. We recently developed MegaLMM, a statistical model that can leverage hundreds of trials to significantly improve genetic value prediction accuracy within METs. Here, we extend MegaLMM to enable genomic prediction in new environments by learning regressions of latent factor loadings on Environmental Covariates (ECs) across trials. We evaluated the extended MegaLMM using the maize Genome-To-Fields dataset, consisting of 4402 varieties cultivated in 195 trials with 87.1\% of phenotypic values missing, and demonstrated its high accuracy in genomic prediction under various breeding scenarios. Furthermore, we showcased MegaLMM's superiority over univariate GBLUP in predicting trait performance of experimental genotypes in new environments. Finally, we explored the use of higher-dimensional quantitative ECs and discussed when and how detailed environmental data can be leveraged for genomic prediction from METs. We propose that MegaLMM can be applied to plant breeding of diverse crops and different fields of genetics where large-scale linear mixed models are utilized.
Drosophila ring chromosomes interact with sisters and homologs to produce anaphase bridges in mitosis
Lin HC, Golic MM, Hill HJ, Lemons KF, Vuong TT, Smith M, Golic F and Golic KG
Ring chromosomes are known in many eukaryotic organisms, including humans. They are typically associated with a variety of maladies, including abnormal development and lethality. Underlying these phenotypes are anaphase chromatin bridges that can lead to chromosome loss, nondisjunction and breakage. By cytological examination of ring chromosomes in Drosophila melanogaster we identified five causes for anaphase bridges produced by ring chromosomes. Catenation of sister chromatids appears to be the most common cause and these bridges frequently resolve during anaphase, presumably by the action of topoisomerase II. Sister chromatid exchange and chromosome breakage followed by sister chromatid union also produce anaphase bridges. Mitotic recombination with the homolog was rare, but was another route to generation of anaphase bridges. Most surprising, was the discovery of homolog capture, where the ring chromosome was connected to its linear homolog in anaphase. We hypothesize that this is a remnant of mitotic pairing and that the linear chromosome is connected to the ring by multiple wraps produced through the action of topoisomerase II during establishment of homolog pairing. In support, we showed that in a ring/ring homozygote the two rings are frequently catenated in mitotic metaphase, a configuration that requires breaking and rejoining of at least one chromosome.
Parental-effect gene-drive elements under partial selfing, or why do Caenorhabditis genomes have hyperdivergent regions?
Rockman MV
Self-fertile Caenorhabditis nematodes carry a surprising number of Medea elements, alleles that act in heterozygous mothers and cause death or developmental delay in offspring that don't inherit them. At some loci, both alleles in a cross operate as independent Medeas, affecting all the homozygous progeny of a selfing heterozygote. The genomic coincidence of Medea elements and ancient, deeply coalescing haplotypes, which pepper the otherwise homogeneous genomes of these animals, raises questions about how these apparent gene-drive elements persist for long periods of time. Here I investigate how mating system affects the evolution of Medeas, and their paternal-effect counterparts, peels. Despite an intuition that antagonistic alleles should induce balancing selection by killing homozygotes, models show that, under partial selfing, antagonistic elements experience positive frequency dependence: the common allele drives the rare one extinct, even if the rare one is more penetrant. Analytical results for the threshold frequency required for one allele to invade a population show that a very weakly penetrant allele, one whose effects would escape laboratory detection, could nevertheless prevent a much more penetrant allele from invading under high rates of selfing. Ubiquitous weak antagonistic Medeas and peels could then act as localized barriers to gene flow between populations, generating genomic islands of deep coalescence. Analysis of gene expression data, however, suggest that this cannot be the whole story. A complementary explanation is that ordinary ecological balancing selection generates ancient haplotypes on which Medeas can evolve, while high homozygosity in these selfers minimizes the role of gene drive in their evolution.