Biothermodynamics of Hemoglobin and Red Blood Cells: Analysis of Structure and Evolution of Hemoglobin and Red Blood Cells, Based on Molecular and Empirical Formulas, Biosynthesis Reactions, and Thermodynamic Properties of Formation and Biosynthesis
Hemoglobin and red blood cells (erythrocytes) have been studied extensively from the perspective of life and biomedical sciences. However, no analysis of hemoglobin and red blood cells from the perspective of chemical thermodynamics has been reported in the literature. Such an analysis would provide an insight into their structure and turnover from the aspect of biothermodynamics and bioenergetics. In this paper, a biothermodynamic analysis was made of hemoglobin and red blood cells. Molecular formulas, empirical formulas, biosynthesis reactions, and thermodynamic properties of formation and biosynthesis were determined for the alpha chain, beta chain, heme B, hemoglobin and red blood cells. Empirical formulas and thermodynamic properties of hemoglobin were compared to those of other biological macromolecules, which include proteins and nucleic acids. Moreover, the energetic requirements of biosynthesis of hemoglobin and red blood cells were analyzed. Based on this, a discussion was made of the specific structure of red blood cells (i.e. no nuclei nor organelles) and its role as an evolutionary adaptation for more energetically efficient biosynthesis needed for the turnover of red blood cells.
In Silico Investigation of the Interactions Between Cotton Leaf Curl Multan Virus Proteins and the Transcriptional Gene Silencing Factors of Gossypium hirsutum L
The highly dynamic nature of the Cotton leaf curl virus (CLCuV) complex (causing Cotton leaf curl disease, a significant global threat to cotton) presents a formidable challenge in unraveling precise molecular mechanisms governing viral-host interactions. To address this challenge, the present study investigated the molecular interactions of 6 viral proteins (Rep, TrAP, C4, C5, V2, and βC1) with 18 cotton Transcriptional Gene Silencing (TGS) proteins. Protein-protein dockings conducted for different viral-host protein pairs using Clustered Protein Docking (ClusPro) and Global RAnge Molecular Matching (GRAMM) (216 docking runs), revealed variable binding energies. The interacting pairs with the highest binding affinities were further scrutinized using bioCOmplexes COntact MAPS (COCOMAPS) server, which revealed robust binding of three viral proteins- TrAP, C4, and C5 with 14 TGS proteins, identifying several novel interactions (not reported yet by earlier studies), such as TrAP targeting DCL3, HDA6, and SUVH6; C4 targeting RAV2, CMT2, and DMT1; and C5 targeting CLSY1, RDR1, RDR2, AGO4, SAMS, and SAHH. Visualizing these interactions in PyMol provided a detailed insight into interacting regions. Further assessment of the impact of 18 variants of the C4 protein on interaction with CMT2 revealed no correlation between sequence variation and docking energies. However, conserved residues in the C4 binding regions emerged as potential targets for disrupting viral integrity. Hence, this study provides valuable insights into the viral-host interplay, advancing our understanding of Cotton leaf curl Multan virus pathogenicity and opening novel avenues for devising various antiviral strategies by targeting the host-viral interacting regions after experimental validation.
Models of Fluctuating Selection Between Generations: A Solution for the Theoretical Inconsistency
The theory of selection fluctuation between generations has been a topic with much activities in population genetics and molecular evolution in 1970's. Most studies suggested that, as the result of fluctuating selection between generations, the frequency of an (on average) neutral mutation may fluctuate around 0.5 during the long-term evolution before it was ultimately fixed or lost. However, this pattern can only be derived from a specific type Wright-Fisher additive model, coined by the Nei-Yokoyama puzzle. In this commentary, I revisited this issue and figured out a theoretical assumption that has never been claimed explicitly, the notion of reference phenotype. Consider one locus with two-alleles: A is the wildtype allele and A' is the mutation. The fluctuating selection model actually requires a constraint that one of three genotypes (AA, AA', or A'A') must maintain a constant fitness without fluctuating between generations. It appears that the balancing selection at a frequency of 0.5 emerges only when the heterozygote (AA') is the reference genotype. Because it is difficult to determine which genotype could be the reference genotype in a real population, a desirable population genetics model should take all three possibilities into account. To this end, I propose a mixture model, where each genotype has a certain chance to be the reference genotype. My analysis showed that the emergence of balancing selection depends on the relative proportions of three different reference genotypes.
Structural Insights into Cold-Active Lipase from Glaciozyma antarctica PI12: Alphafold2 Prediction and Molecular Dynamics Simulation
Cold-active enzymes have recently gained popularity because of their high activity at lower temperatures than their mesophilic and thermophilic counterparts, enabling them to withstand harsh reaction conditions and enhance industrial processes. Cold-active lipases are enzymes produced by psychrophiles that live and thrive in extremely cold conditions. Cold-active lipase applications are now growing in the detergency, synthesis of fine chemicals, food processing, bioremediation, and pharmaceutical industries. The cold adaptation mechanisms exhibited by these enzymes are yet to be fully understood. Using phylogenetic analysis, and advanced deep learning-based protein structure prediction tool Alphafold2, we identified an evolutionary processes in which a conserved cold-active-like motif is presence in a distinct subclade of the tree and further predicted and simulated the three-dimensional structure of a putative cold-active lipase with the cold active motif, Glalip03, from Glaciozyma antarctica PI12. Molecular dynamics at low temperatures have revealed global stability over a wide range of temperatures, flexibility, and the ability to cope with changes in water and solvent entropy. Therefore, the knowledge we uncover here will be crucial for future research into how these low-temperature-adapted enzymes maintain their overall flexibility and function at lower temperatures.
Cryptic Diversity in Scorpaenodes xyris (Jordan & Gilbert 1882) (Scorpaeniformes: Scorpaenidae) Throughout the Tropical Eastern Pacific
The tropical eastern Pacific (TEP) is a biogeographic region with a substantial set of isolated oceanic islands and mainland shoreline habitat barriers, as well as complex oceanographic dynamics due to major ocean currents, upwelling areas, eddies, and thermal instabilities. These characteristics have shaped spatial patterns of biodiversity between and within species of reef and shore fishes of the region, which has a very high rate of endemism. Scorpaenodes xyris, a small ecologically cryptic reef-dwelling scorpionfish, is widely distributed throughout the TEP, including all the mainland reef areas and all the oceanic islands. This wide distribution and its ecological characteristics make this species a good model to study the evolutionary history of this type of reef fish across the breadth of a tropical biogeographical region. Our evaluation of geographic patterns of genetic (mitochondrial and nuclear) variation shows that S. xyris comprises two highly differentiated clades (A and B), one of which contains four independent evolutionary subunits. Clade A includes four sub-clades: 1. The Cortez mainland Province; 2. The Revillagigedo Islands; 3. Clipperton Atoll; and 4. The Galapagos Islands. Clade B, in contrast, comprises a single unit that includes the Mexican and Panamic mainland provinces, plus Cocos Island. This geographical arrangement largely corresponds to previously indicated regionalization of the TEP. Oceanic distances isolating the islands have produced much of that evolutionary pattern, although oceanographic processes likely have also contributed.
Stochastic Epigenetic Modification and Evolution of Sex Determination in Vertebrates
In this report, we propose a novel mathematical model of the origin and evolution of sex determination in vertebrates that is based on the stochastic epigenetic modification (SEM) mechanism. We have previously shown that SEM, with rates consistent with experimental observation, can both increase the rate of gene fixation and decrease pseudogenization, thus dramatically improving the efficacy of evolution. Here, we present a conjectural model of the origin and evolution of sex determination wherein the SEM mechanism alone is sufficient to parsimoniously trigger and guide the evolution of heteromorphic sex chromosomes from the initial homomorphic chromosome configuration, without presupposing any allele frequency differences. Under this theoretical model, the SEM mechanism (i) predated vertebrate sex determination origins and evolution, (ii) has been conveniently and parsimoniously co-opted by the vertebrate sex determination systems during the evolutionary transitioning to the extant vertebrate sex determination, likely acting "on top" of these systems, and (iii) continues existing, alongside all known vertebrate sex determination systems, as a universal pan-vertebrate sex determination modulation mechanism.
Putative MutS2 Homologs in Algae: More Goods in Shopping Bag?
MutS2 proteins are presumably involved in either control of recombination or translation quality control in bacteria. MutS2 homologs have been found in plants and some algae; however, their actual diversity in eukaryotes remains unknown. We found putative MutS2 homologs in various species of photosynthetic eukaryotes and performed a detailed analysis of the revealed amino acid sequences. Three groups of homologs were distinguished depending on their domain composition: MutS2 homologs with full set of specific domains, MutS2-like sequences without endonuclease Smr domain, and MutS2-like homologs lacking Smr and clamp in domain IV, the extreme form of which are proteins with only a complete ATPase domain. We clarified the information about amino acid composition and set of specific motifs in the conserved domains in MutS2 and MutS2-like sequences. The models of the predicted tertiary structure were obtained for each group of homologs. The phylogenetic analysis demonstrated that all eukaryotic sequences split into two large groups. The first group included homologs belonging to species of Archaeplastida and a subset of haptophyte homologs, while the second-sequences of organisms from CASH groups (cryptophytes, alveolates, stramenopiles, haptophytes) and chlorarachniophytes. The cyanobacterial MutS2 clustered together with the first group, and proteins belonging to Deltaproteobacteria (orders Myxococcales and Bradymonadales) showed phylogenetic affinity to the CASH-including group with strong support. The observed tree pattern did not support a clear differentiation of eukaryotes into lineages with red and green algae-derived plastids. The results are discussed in the context of current conceptions of serial endosymbioses and genetic mosaicism in algae with complex plastids.
Correction: Analysis of Cancer-Resisting Evolutionary Adaptations in Wild Animals and Applications for Human Oncology
Stress-Induced Constraint on Expression Noise of Essential Genes in E. coli
Gene expression is an inherently noisy process that is constrained by natural selection. Yet the condition dependence of constraint on expression noise remains unclear. Here, we address this problem by studying constraint on expression noise of E. coli genes in eight diverse growth conditions. In particular, we use variation in expression noise as an analog for constraint, examining its relationships to expression level and to the number of regulatory inputs from transcription factors across and within conditions. We show that variation in expression noise is negatively associated with expression level, implicating constraint to minimize expression noise of highly expressed genes. However, this relationship is condition dependent, with the strongest constraint observed when E. coli are grown in the presence of glycerol or ciprofloxacin, which result in carbon or antibiotic stress, respectively. In contrast, we do not observe evidence of constraint on expression noise of highly regulated genes, suggesting that highly expressed and highly regulated genes represent distinct classes of genes. Indeed, we find that essential genes are often highly expressed but not highly regulated, with elevated expression noise in glycerol and ciprofloxacin conditions. Thus, our findings support the hypothesis that selective constraint on expression noise is condition dependent in E. coli, illustrating how it may play a critical role in ensuring expression stability of essential genes in unstable environments.
Perspective: Protocells and the Path to Minimal Life
The path to minimal life involves a series of stages that can be understood in terms of incremental, stepwise additions of complexity ranging from simple solutions of organic compounds to systems of encapsulated polymers capable of capturing nutrients and energy to grow and reproduce. This brief review will describe the initial stages that lead to populations of protocells capable of undergoing selection and evolution. The stages incorporate knowledge of chemical and physical properties of organic compounds, self-assembly of membranous compartments, non-enzymatic polymerization of amino acids and nucleotides followed by encapsulation of polymers to produce protocell populations. The results are based on laboratory simulations related to cyclic hydrothermal conditions on the prebiotic Earth. The final portion of the review looks ahead to what remains to be discovered about this process in order to understand the evolutionary path to minimal life.
Stem Life: A Framework for Understanding the Prebiotic-Biotic Transition
Abiogenesis is frequently envisioned as a linear, ladder-like progression of increasingly complex chemical systems, eventually leading to the ancestors of extant cellular life. This "pre-cladistics" view is in stark contrast to the well-accepted principles of organismal evolutionary biology, as informed by paleontology and phylogenetics. Applying this perspective to origins, I explore the paradigm of "Stem Life," which embeds abiogenesis within a broader continuity of diversification and extinction of both hereditary lineages and chemical systems. In this new paradigm, extant life's ancestral lineage emerged alongside and was dependent upon many other complex prebiotic chemical systems, as part of a diverse and fecund prebiosphere. Drawing from several natural history analogies, I show how this shift in perspective enriches our understanding of Origins and directly informs debates on defining Life, the emergence of the Last Universal Common Ancestor (LUCA), and the implications of prebiotic chemical experiments.
On the Nature of the Last Common Ancestor: A Story from its Translation Machinery
The Last Common Ancestor (LCA) is understood as a hypothetical population of organisms from which all extant living creatures are thought to have descended. Its biology and environment have been and continue to be the subject of discussions within the scientific community. Since the first bacterial genomes were obtained, multiple attempts to reconstruct the genetic content of the LCA have been made. In this review, we compare 10 of the most extensive reconstructions of the gene content possessed by the LCA as they relate to aspects of the translation machinery. Although each reconstruction has its own methodological biases and many disagree in the metabolic nature of the LCA all, to some extent, indicate that several components of the translation machinery are among the most conserved genetic elements. The datasets from each reconstruction clearly show that the LCA already had a largely complete translational system with a genetic code already in place and therefore was not a progenote. Among these features several ribosomal proteins, transcription factors like IF2, EF-G, and EF-Tu and both class I and class II aminoacyl tRNA synthetases were found in essentially all reconstructions. Due to the limitations of the various methodologies, some features such as the occurrence of rRNA posttranscriptional modified bases are not fully addressed. However, conserved as it is, non-universal ribosomal features found in various reconstructions indicate that LCA's translation machinery was still evolving, thereby acquiring the domain specific features in the process. Although progenotes from the pre-LCA likely no longer exist recent results obtained by unraveling the early history of the ribosome and other genetic processes can provide insight to the nature of the pre-LCA world.
Introduction to the Special Issue on Early Evolution and the Last Common Ancestor
The early evolution of life spans an extensive period preceding the emergence of the first eukaryotic cell. This epoch, which transpired from 4.5 to 2.5 billion years ago, marked the advent of many fundamental cellular attributes and witnessed the existence of the Last Common Ancestor (LCA) of all life forms. Uncovering and reconstructing this elusive LCA's characteristics and genetic makeup represents a formidable challenge and a pivotal pursuit in early evolution. While most scientific accounts concur that the LCA resembles contemporary prokaryotes, its precise definition, genome composition, metabolic capabilities, and ecological niche remain subjects of contentious debate.
High Nucleotide Skew Palindromic DNA Sequences Function as Potential Replication Origins due to their Unzipping Propensity
Locations of DNA replication initiation in prokaryotes, called "origins of replication", are well-characterized. However, a mechanistic understanding of the sequence dependence of the local unzipping of double-stranded DNA, the first step towards replication initiation, is lacking. Here, utilizing a Markov chain model that was created to address the directional nature of DNA unzipping and replication, we model the sequence dependence of local melting of double-stranded linear DNA segments. We show that generalized palindromic sequences with high nucleotide skews have a low kinetic barrier for local melting near melting temperatures. This allows for such sequences to function as potential replication origins. We support our claim with evidence for high-skew palindromic sequences within the replication origins of mitochondrial DNA, bacteria, archaea and plasmids.
Analysis of Cancer-Resisting Evolutionary Adaptations in Wild Animals and Applications for Human Oncology
This literature review is to present a new direction in developing better treatment or preventive measures. The larger the body of an organism, the more numerous the cells, which theoretically lead to a higher risk of cancer. However, observational studies suggest the lack of correlation between body size and cancer risk, which is known as Peto's paradox. The corollary of Peto's paradox is that large organisms must be cancer-resistant. Further investigation of the anti-cancer mechanisms in each species could be potentially rewarding, and how the anti-cancer mechanisms found in wild animals can help influence and develop more effective cancer treatment in humans is the main focus of this literature review. Due to a lack of research and understanding of the exact molecular mechanisms of the researched species, only a few (Elephants and rodents) that have been extensively researched have made substantive contributions to human oncology. A new research direction is to investigate the positively selective genes that are related to cancer resistance and see if homologous genes are presented in humans. Despite the great obstacle of applying anti-cancer mechanisms to the human body from phylogenetically distant species, this research direction of gaining insights through investigating cancer-resisting evolutionary adaptations in wild animals has great potential in human oncology research.
Unveiling the Genomic Symphony: Identification Cultivar-Specific Genes and Enhanced Insights on Sweet Sorghum Genomes Through Comprehensive superTranscriptomic Analysis
Sorghum (Sorghum bicolor (L.) Moench) is a multipurpose crop grown for food, fodder, and bioenergy production. Its cultivated varieties, along with their wild counterparts, contribute to the core genetic pool. Despite the availability of several re-sequenced sorghum genomes, a variable portion of sorghum genomes is not reported during reference genome assembly and annotation. The present analysis used 223 publicly available RNA-seq datasets from seven sweet sorghum cultivars to construct superTranscriptome. This approach yielded 45,864 Representative Transcript Assemblies (RTAs) that showcased intriguing Presence/Absence Variation (PAV) across 15 published sorghum genomes. We found 301 superTranscripts were exclusive to sweet sorghum, including 58 de novo genes encoded core and linker histones, zinc finger domains, glucosyl transferases, cellulose synthase, etc. The superTranscriptome added 2,802 new protein-coding genes to the Sweet Sorghum Reference Genome (SSRG), of which 559 code for different transcription factors (TFs). Our analysis revealed that MULE-like transposases were abundant in the sweet sorghum genome and could play a hidden role in the evolution of sweet sorghum. We observed large deletions in the D locus and terminal deletions in four other NAC encoding loci in the SSRG compared to its wild progenitor (353) suggesting non-functional NAC genes contributed to trait development in sweet sorghum. Moreover, superTranscript-based methods for Differential Exon Usage (DEU) and Differential Gene Expression (DGE) analyses were more accurate than those based on the SSRG. This study demonstrates that the superTranscriptome can enhance our understanding of fundamental sorghum mechanisms, improve genome annotations, and potentially even replace the reference genome.
Recurrent Independent Pseudogenization Events of the Sperm Fertilization Gene ZP3r in Apes and Monkeys
Many reproductive proteins show signatures of rapid evolution through sequence divergence and duplication. These features of reproductive genes may complicate the detection of orthologs across taxa, making it difficult to connect studies in model systems to human biology. In mice, ZP3r/sp56 is a binding partner to the egg coat protein ZP3 and may mediate induction of the acrosome reaction, a crucial step in fertilization. In rodents, ZP3r, as a member of the Regulators of Complement Activation cluster, is surrounded by paralogs, some of which have been shown to be evolving under positive selection. Although primate egg coats also contain ZP3, sequence divergence paired with paralogous relationships with neighboring genes has complicated the accurate identification of the human ZP3r ortholog. Here, we phylogenetically and syntenically resolve that the human ortholog of ZP3r is the pseudogene C4BPAP1. We investigate the evolution of this gene within primates. We observe independent pseudogenization events of ZP3r in all Apes with the exception of Orangutans, and independent pseudogenization events in many monkey species. ZP3r in both primates that retain ZP3r and in rodents contains positively selected sites. We hypothesize that redundant mechanisms mediate ZP3 recognition in mammals and ZP3r's relative importance to ZP recognition varies across species.
Correction: G:U-Independent RNA Minihelix Aminoacylation by Nanoarchaeum equitans Alanyl-tRNA Synthetase: An Insight into the Evolution of Aminoacyl-tRNA Synthetases
Untangling Zebrafish Genetic Annotation: Addressing Complexities and Nomenclature Issues in Orthologous Evaluation of TCOF1 and NOLC1
Treacher Collins syndrome (TCS) is a genetic disorder affecting facial development, primarily caused by mutations in the TCOF1 gene. TCOF1, along with NOLC1, play important roles in ribosomal RNA transcription and processing. Previously, a zebrafish model of TCS successfully recapitulated the main characteristics of the syndrome by knocking down the expression of a gene on chromosome 13 (coding for Uniprot ID B8JIY2), which was identified as the TCOF1 orthologue. However, database updates renamed this gene as nolc1 and the zebrafish database (ZFIN) identified a different gene on chromosome 14 as the TCOF1 orthologue (coding for Uniprot ID E7F9D9). NOLC1 and TCOF1 are large proteins with unstructured regions and repetitive sequences that complicate alignments and comparisons. Also, the additional whole genome duplication of teleosts sets further difficulty. In this study, we present evidence that endorses that NOLC1 and TCOF1 are paralogs, and that the zebrafish gene on chromosome 14 is a low-complexity LisH domain-containing factor that displays homology to NOLC1 but lacks essential sequence features to accomplish TCOF1 nucleolar functions. Our analysis also supports the idea that zebrafish, as has been suggested for other non-tetrapod vertebrates, lack the TCOF1 gene that is associated with tripartite nucleolus. Using BLAST searches in a group of teleost genomes, we identified fish-specific sequences similar to E7F9D9 zebrafish protein. We propose naming them "LisH-containing Low Complexity Proteins" (LLCP). Interestingly, the gene on chromosome 13 (nolc1) displays the sequence features, developmental expression patterns, and phenotypic impact of depletion that are characteristic of TCOF1 functions. These findings suggest that in teleost fish, the nucleolar functions described for both NOLC1 and TCOF1 mediated by their repeated motifs, are carried out by a single gene, nolc1. Our study, which is mainly based on computational tools available as free web-based algorithms, could help to solve similar conflicts regarding gene orthology in zebrafish.
Structural and Evolutionary Analysis of Proteins Endowed with a Nucleotidyltransferase, or Non-canonical Palm, Catalytic Domain
Many polymerases and other proteins are endowed with a catalytic domain belonging to the nucleotidyltransferase fold, which has also been deemed the non-canonical palm domain, in which three conserved acidic residues coordinate two divalent metal ions. Tertiary structure-based evolutionary analyses provide valuable information when the phylogenetic signal contained in the primary structure is blurry or has been lost, as is the case with these proteins. Pairwise structural comparisons of proteins with a nucleotidyltransferase fold were performed in the PDBefold web server: the RMSD, the number of superimposed residues, and the Qscore were obtained. The structural alignment score (RMSD × 100/number of superimposed residues) and the 1-Qscore were calculated, and distance matrices were constructed, from which a dendogram and a phylogenetic network were drawn for each score. The dendograms and the phylogenetic networks display well-defined clades, reflecting high levels of structural conservation within each clade, not mirrored by primary sequence. The conserved structural core between all these proteins consists of the catalytic nucleotidyltransferase fold, which is surrounded by different functional domains. Hence, many of the clades include proteins that bind different substrates or partake in non-related functions. Enzymes endowed with a nucleotidyltransferase fold are present in all domains of life, and participate in essential cellular and viral functions, which suggests that this domain is very ancient. Despite the loss of evolutionary traces in their primary structure, tertiary structure-based analyses allow us to delve into the evolution and functional diversification of the NT fold.
Correction: Perspectives on the Origin of Biological Homochirality on Earth