Evolutionary Bioinformatics

Genomic Characterization of IS Insertions in
Refaya AK, Vetrivel U and Palaniyandi K
, a subspecies of the complex (MTBC), has emerged as a significant concern in the context of One Health, with implications for zoonosis or zooanthroponosis or both. MTBC strains are characterized by the unique insertion element IS, which is widely used as a diagnostic marker. IS transposition drives genetic modifications in MTBC, imparting genome plasticity and profound biological consequences. While IS insertions are customarily found in the MTBC genomes, the evolutionary trajectory of strains seems to correlate with the number of IS copies, indicating enhanced adaptability with increasing copy numbers. Here, we present a comprehensive analysis of IS insertions in the genome, utilizing ISMapper, and elucidate their genetic consequences in promoting successful host adaptation. Our study encompasses a panel of 67 paired-end reads, comprising 11 isolates from our laboratory and 56 sequences downloaded from public databases. Among these sequences, 91% exhibited high-copy, 4.5% low-copy, and 4.5% lacked IS insertions. We identified 255 insertion loci, including 141 intragenic and 114 intergenic insertions. Most of these loci were either unique or shared among a limited number of isolates, potentially influencing strain behavior. Furthermore, we conducted gene ontology and pathway analysis, using eggNOG-mapper 5.0, on the protein sequences disrupted by IS insertions, revealing 63 genes involved in diverse functions of Gene Ontology and 45 genes participating in various KEGG pathways. Our findings offer novel insights into IS insertions, their preferential insertion regions, and their impact on metabolic processes and pathways, providing valuable knowledge on the genetic changes underpinning IS transposition in .
Large-scale Pan Genomic Analysis of Reveals Key Insights Into Molecular Evolutionary Rate of Specific Processes and Functions
Bundhoo E, Ghoorah AW and Jaufeerally-Fakim Y
(Mtb) is the causative agent of tuberculosis (TB), an infectious disease that is a major killer worldwide. Due to selection pressure caused by the use of antibacterial drugs, Mtb is characterised by mutational events that have given rise to multi drug resistant (MDR) and extensively drug resistant (XDR) phenotypes. The rate at which mutations occur is an important factor in the study of molecular evolution, and it helps understand gene evolution. Within the same species, different protein-coding genes evolve at different rates. To estimate the rates of molecular evolution of protein-coding genes, a commonly used parameter is the ratio N/S, where N is the rate of non-synonymous substitutions and S is the rate of synonymous substitutions. Here, we determined the estimated rates of molecular evolution of select biological processes and molecular functions across 264 strains of Mtb. We also investigated the molecular evolutionary rates of core genes of Mtb by computing the N/S values, and estimated the pan genome of the 264 strains of Mtb. Our results show that the cellular amino acid metabolic process and the kinase activity function evolve at a significantly higher rate, while the carbohydrate metabolic process evolves at a significantly lower rate for s. These high rates of evolution correlate well with Mtb physiology and pathogenicity. We further propose that the core genome of likely experiences varying rates of molecular evolution which may drive an interplay between core genome and accessory genome during evolution.
An Integrated Framework for Analysis and Prediction of Impact of Single Nucleotide Polymorphism Associated with Human Diseases
Muhammad SS, Shoaib M and Pervez MT
Single nucleotide polymorphisms are most common type of genetic variation in human genome. Analyzing genetic variants can help us better understand the genetic basis of diseases and develop predictive models which are useful to identify individuals who are at increased risk for certain diseases. Several SNP analysis tools have already been developed. For running these tools, the user needs to collect data from various databases. Secondly, often researchers have to use multiple variant analysis tools for cross validating their results and increase confidence in their findings. Extracting data from multiple databases and running multiple tools at a time, increases complexity and time required for analysis. There are some web-based tools that integrate multiple genetic variant databases and provide variant annotations for a few tools. These approaches have some limitations such as retrieving annotation information, filtering common pathogenic variants. The proposed web-based tool, namely IPSNP: An Integrated Platform for Predicting Impact of SNPs is written in Django which is a python-based framework. It uses RESTful API of MyVariant.info to extract annotation information of variants associated with a given gene, rsID, HGVS format variants specified in a VCF file for 29 tools. The results are in the form of a CSV file of predictions (1) derived from the consensus decision, (2) a file having annotations for the variants associated with the given gene, (3) a file showing variants declared as pathogenic commonly by the selected tools, and (4) a CSV file containing chromosome coordinates based on GRCh37 and GRCh38 genome assemblies, rsIDs and proteomic data, so that users may use tools of their choice and avoiding manual parameter collection for each tool. IPSNP is a valuable resource for researchers and clinicians and it can help to save time and effort in discovering the novel disease-associated variants and the development of personalized treatments.
HNF4A-Bridging the Gap Between Intestinal Metaplasia and Gastric Cancer
Zhao Y, Tang H, Xu J, Sun F, Zhao Y and Li Y
Intestinal metaplasia (IM) of gastric epithelium has traditionally been regarded as an irreversible stage in the process of the Correa cascade. Exploring the potential molecular mechanism of IM is significant for effective gastric cancer prevention.
Toward a Better Understanding of G4 Evolution in the 3 Living Kingdoms
Vannutelli A, Ouangraoua A and Perreault JP
G-quadruplexes (G4s) are secondary structures in DNA and RNA that impact various cellular processes, such as transcription, splicing, and translation. Due to their numerous functions, G4s are involved in many diseases, making their study important. Yet, G4s evolution remains largely unknown, due to their low sequence similarity and the poor quality of their sequence alignments across several species. To address this, we designed a strategy that avoids direct G4s alignment to study G4s evolution in the 3 species kingdoms. We also explored the coevolution between RBPs and G4s.
Characterization of a Hypothetical Protein (PBJ89160.1) from Exhibits a New Insight on Nutritional Virulence and Molecular Docking to Uncover a Therapeutic Target
Asha IJ, Gupta SD, Hossain MM, Islam MN, Akter NN, Islam MM, Das SC and Barman DN
is an encapsulated, diplococcus, kidney bean-shaped bacteria that causes bacterial meningitis. Our study hopes to advance our understanding of disease progression, the spread frequency of the bacteria in people, and the interactions between the bacteria and human body by identifying a functional protein, potentially serving as a target for meningococcal medicine in the future.
Recombination Events Among SARS-CoV-2 Omicron Subvariants: Impact on Spike Interaction With ACE2 Receptor and Neutralizing Antibodies
Arbi M, Khedhiri M, Ayouni K, Souiai O, Dhouib S, Ghanmi N, Benkahla A, Triki H and Haddad-Boubaker S
The recombination plays a key role in promoting evolution of RNA viruses and emergence of potentially epidemic variants. Some studies investigated the recombination occurrence among SARS-CoV-2, without exploring its impact on virus-host interaction. In the aim to investigate the burden of recombination in terms of frequency and distribution, the occurrence of recombination was first explored in 44 230 Omicron sequences among BQ subvariants and the under investigation "ML" (Multiple Lineages) denoted sequences, using 3seq software. Second, the recombination impact on interaction between the Spike protein and ACE2 receptor as well as neutralizing antibodies (nAbs), was analyzed using docking tools. Recombination was detected in 56.91% and 82.20% of BQ and ML strains, respectively. It took place mainly in spike and ORF1a genes. For BQ recombinant strains, the docking analysis showed that the spike interacted strongly with ACE2 and weakly with nAbs. The mutations S373P, S375F and T376A constitute a residue network that enhances the RBD interaction with ACE2. Thirteen mutations in RBD (S373P, S375F, T376A, D405N, R408S, K417N, N440K, S477N, P494S, Q498R, N501Y, and Y505H) and NTD (Y240H) seem to be implicated in immune evasion of recombinants by altering spike interaction with nAbs. In conclusion, this "in silico" study demonstrated that the recombination mechanism is frequent among Omicron BQ and ML variants. It highlights new key mutations, that potentially implicated in enhancement of spike binding to ACE2 (F376A) and escape from nAbs (RBD: F376A, D405N, R408S, N440K, S477N, P494S, and Y505H; NTD: Y240H). Our findings present considerable insights for the elaboration of effective prophylaxis and therapeutic strategies against future SARS-CoV-2 waves.
Screening and Validation of Key Genes of Autophagy in Acute Myocardial Infarction Based on Bioinformatics
Geng Y, Han Y, Wang S, Qi J and Bi X
Autophagy plays a significant role in the development of acute myocardial infarction (AMI), and cardiomyocyte autophagy is of major importance in maintaining cardiac function. We aimed to identify key genes associated with autophagy in AMI through bioinformatics analysis and verify them through clinical validation.
Label Transfer for Drug Disease Association in Three Meta-Paths
Dao NA, Le MH and Dang XT
The identification of potential interactions and relationships between diseases and drugs is significant in public health care and drug discovery. As we all know, experimenting to determine the drug-disease interactions is very expensive in both time and money. However, there are still many drug-disease associations that are still undiscovered and potential. Therefore, the development of computational methods to explore the relationship between drugs and diseases is very important and essential. Many computational methods for predicting drug-disease associations have been developed based on known interactions to learn potential interactions of unknown drug-disease pairs. In this paper, we propose 3 new main groups of meta-paths based on the heterogeneous biological network of drug-protein-disease objects. For each meta-path, we design a machine learning model, then an integrated learning method is formed by these models. We evaluated our approach on 3 standard datasets which are DrugBank, OMIM, and Gottlieb's dataset. Experimental results demonstrate that the proposed method is better than some recent methods such as EMP-SVD, LRSSL, MBiRW, MPG-DDA, SCMFDD,. . . in some measures such as AUC, AUPR, and F1-score.
Single-cell RNA Sequencing Identifies Natural Kill Cell-Related Transcription Factors Associated With Age-Related Macular Degeneration
Luo Y, Liu J, Feng W, Lin D, Chen M and Zheng H
Age-related Macular Degeneration (AMD) poses a growing global health concern as the leading cause of central vision loss in elderly people.
An Effective Computational Method for Predicting Self-Interacting Proteins Based on VGGNet Convolutional Neural Network and Gray-Level Co-occurrence Matrix
Chu DH, An JY and Nie XM
Predicting Self-interacting proteins (SIPs) is a crucial area of research in predicting protein functions, as well as in understanding gene-disease and disease-drug associations. These interactions are integral to numerous cellular processes and play pivotal roles within cells. However, traditional methods for identifying SIPs through biological experiments are often expensive, time-consuming, and have long cycles. Therefore, the development of effective computational methods for accurately predicting SIPs is not only necessary but also presents a significant challenge.
Comprehensive Profiling of Transcriptome and m6A Epitranscriptome Uncovers the Neurotoxic Effects of Yunaconitine on HT22 Cells
Lin B, Zhang J, Chen M, Gao X, Wen J, Tian K, Wu Y, Chen Z, Yang Q, Zhu A and Du C
To explore different mRNA transcriptome patterns and RNA N6-methyladenosine (m6A) alteration in yunaconitine (YA)-treated HT22 mouse hippocampal neuron, and uncover the role of abnormal mRNA expression and RNA m6A modification in YA-induced neurotoxicity.
Comparative Phylogenetic Analysis and Protein Prediction Reveal the Taxonomy and Diverse Distribution of Virulence Factors in Foodborne Strains
Zhang M and Yin Z
and , 2 major foodborne pathogenic fusobacteria, have a variety of virulent protein types with nervous and enterotoxic pathogenic potential, respectively.
Draft Genome Sequence of Strain MHSD4, a Bacterial Endophyte With Bioremediation Potential
Morobane DM, Tshishonga K and Serepa-Dlamini MH
sp. strain MHSD4 is a bacterial endophyte isolated from the leaves of the medicinal plant Here, we report on strain MHSD4 draft whole genome sequence and annotation. The draft genome size of sp. strain MHSD4 is 4 647 677 bp with a G+C content of 54.2% and 41 contigs. The National Center for Biotechnology Information Prokaryotic Genome Annotation Pipeline tool predicted a total of 4395 genes inclusive of 4235 protein-coding genes, 87 total RNA genes, 14 non-coding (nc) RNAs and 70 tRNAs, and 73 pseudogenes. Biosynthesis pathways for naphthalene and anthracene degradation were identified. Putative genes involved in bioremediation such as , and were identified. Putative genes involved in copper homeostasis and tolerance were identified which may suggest that sp. strain MHSD4 has biotechnological potential for bioremediation of heavy metals.
MicroRNA Transcriptomes Reveal Prevalence of Rare and Species-Specific Arm Switching Events During Zebrafish Ontogenesis
de Oliveira AC, Bovolenta LA, Figueiredo L, Ribeiro AO, Pereira BJA, de Almeida TRA, Campos VF, Patton JG and Pinhal D
In metazoans, microRNAs (miRNAs) are essential regulators of gene expression, affecting critical cellular processes from differentiation and proliferation, to homeostasis. During miRNA biogenesis, the miRNA strand that loads onto the RNA-induced Silencing Complex (RISC) can vary, leading to changes in gene targeting and modulation of biological pathways. To investigate the impact of these "arm switching" events on gene regulation, we analyzed a diverse range of tissues and developmental stages in zebrafish by comparing 5p and 3p arms accumulation dynamics between embryonic developmental stages, adult tissues, and sexes. We also compared variable arm usage patterns observed in zebrafish to other vertebrates including arm switching data from fish, birds, and mammals. Our comprehensive analysis revealed that variable arm usage events predominantly take place during embryonic development. It is also noteworthy that isomiR occurrence correlates to changes in arm selection evidencing an important role of microRNA distinct isoforms in reinforcing and modifying gene regulation by promoting dynamics switches on miRNA 5p and 3p arms accumulation. Our results shed new light on the emergence and coordination of gene expression regulation and pave the way for future investigations in this field.
A Comprehensive Analysis of 3 Moroccan Genomes Revealed Contributions From Both African and European Ancestries
Boumajdi N, Bendani H, Kartti S, Alouane T, Belyamani L and Ibrahimi A
Genetic variations in the human genome represent the differences in DNA sequence within individuals. This highlights the important role of whole human genome sequencing which has become the keystone for precision medicine and disease prediction. Morocco is an important hub for studying human population migration and mixing history. This study presents the analysis of 3 Moroccan genomes; the variant analysis revealed 6 379 606 single nucleotide variants (SNVs) and 1 050 577 small InDels. Of those identified SNVs, 219 152 were novel, with 1233 occurring in coding regions, and 5580 non-synonymous single nucleotide variants (nsSNP) variants were predicted to affect protein functions. The InDels produced 1055 coding variants and 454 non-3n length variants, and their size ranged from -49 and 49 bp. We further analysed the gene pathways of 8 novel coding variants found in the 3 genomes and revealed 5 genes involved in various diseases and biological pathways. We found that the Moroccan genomes share 92.78% of African ancestry, and 92.86% of Non-Finnish European ancestry, according to the gnomAD database. Then, population structure inference, by admixture analysis and network-based approach, revealed that the studied genomes form a mixed population structure, highlighting the increased genetic diversity in Morocco.
The Spatio-Temporal Expression Profiles of Silkworm Pseudogenes Provide Valuable Insights into Their Biological Roles
Wan L, Su S, Liu J, Zou B, Jiang Y, Jiao B, Tang S, Zhang Y, Deng C and Xiao W
Pseudogenes are sequences that have lost the ability to transcribe RNA molecules or encode truncated but possibly functional proteins. While they were once considered to be meaningless remnants of evolution, recent researches have shown that pseudogenes play important roles in various biological processes. However, the studies of pseudogenes in the silkworm, an important model organism, are limited and have focused on single or only a few specific genes.
Phylogenetic Analysis Provides Insight Into the Molecular Evolution of Nociception and Pain-Related Proteins
Zhai R and Wang Q
Nociception and pain sensation are important neural processes in humans to avoid injury. Many proteins are involved in nociception and pain sensation in humans; however, the evolution of these proteins in animals is unknown. Here, we chose nociception- and pain-related proteins, including G protein-coupled receptors (GPCRs), ion channels (ICs), and neuropeptides (NPs), which are reportedly associated with nociception and pain in humans, and identified their homologs in various animals by BLAST, phylogenetic analysis and protein architecture comparison to reveal their evolution from protozoans to humans. We found that the homologs of transient receptor potential channel A 1 (TRPA1), TRAPM, acid-sensing IC (ASIC), and voltage-dependent calcium channel (VDCC) first appear in Porifera. Substance-P receptor 1 (TACR1) emerged from Coelenterata. Somatostatin receptor type 2 (SSTR2), TRPV1 and voltage-dependent sodium channels (VDSC) appear in Platyhelminthes. Calcitonin gene-related peptide receptor (CGRPR) was first identified in Nematoda. However, opioid receptors (OPRs) and most NPs were discovered only in vertebrates and exist from agnatha to humans. The results demonstrated that homologs of nociception and pain-related ICs exist from lower animal phyla to high animal phyla, and that most of the GPCRs originate from low to high phyla sequentially, whereas OPRs and NPs are newly evolved in vertebrates, which provides hints of the evolution of nociception and pain-related proteins in animals and humans.
Study on Allele Specific Expression of Long-Term Residents in High Altitude Areas
He C, Zhu B, Gao W, Wu Q and Zhang C
In diploid organisms, half of the chromosomes in each cell come from the father and half from the mother. Through previous studies, it was found that the paternal chromosome and the maternal chromosome can be regulated and expressed independently, leading to the emergence of allele specific expression (ASE). In this study, we analyzed the differential expression of alleles in the high-altitude population and the normal population based on the RNA sequencing data. Through gene cluster analysis and protein interaction network analysis, we found some changes occurred at the gene level, and some negative effects. During the study, we realized that the calmodulin homology domain may have a certain correlation with long-term survival at high altitude. The plateau environment is characterized by hypoxia, low air pressure, strong ultraviolet radiation, and low temperature. Accordingly, the genetic changes in the process of adaptation are mainly reflected in these characteristics. High altitude generation living is also highly related to cancer, immune disease, cardiovascular disease, neurological disease, endocrine disease, and other diseases. Therefore, the medical system in high altitude areas should pay more attention to these diseases.
Computer-Assisted Drug Discovery of a Novel Theobromine Derivative as an EGFR Protein-Targeted Apoptosis Inducer
Eissa IH, Yousef RG, Elkaeed EB, Alsfouk AA, Husein DZ, Ibrahim IM, El-Mahdy HA, Elkady H and Metwaly AM
The overexpression of the Epidermal Growth Factor Receptor (EGFR) marks it as a pivotal target in cancer treatment, with the aim of reducing its proliferation and inducing apoptosis. This study aimed at the CADD of a new apoptotic EGFR inhibitor. The natural alkaloid, theobromine, was used as a starting point to obtain a new semisynthetic (di-ortho-chloro acetamide) derivative (). Firstly, 's total electron density, energy gap, reactivity indices, and electrostatic surface potential were determined by DFT calculations, Then, molecular docking studies were carried out to predict the potential of against wild and mutant EGFR proteins. 's correct binding was further confirmed by molecular dynamics (MD) over 100 ns, MM-GPSA, and PLIP experiments. In vitro, showed noticeable efficacy compared to erlotinib by suppressing EGFR and EGFR with IC values of 56.94 and 269.01 nM, respectively. inhibited also the proliferation of H1975 and HCT-116 malignant cell lines, exhibiting IC values of 14.12 and 23.39 µM, with selectivity indices of 6.8 and 4.1, respectively, indicating its anticancer potential and general safety. The apoptotic effects of were indicated by flow cytometric analysis and were further confirmed through its potential to increase the levels of BAX, Casp3, and Casp9, and decrease Bcl-2 levels. In conclusion, , a new apoptotic EGFR inhibitor, was designed and evaluated both computationally and experimentally. The results suggest that is a promising candidate for further development as an anti-cancer drug.
Phylodynamic Investigation of Yellow Fever Virus Sheds New Insight on Geographic Dispersal Across Africa
Motayo BO, Opayele A, Akinduti PA, Faneye AO and Omoregie IP
Molecular epidemiology has shown the presence of four genotypes circulating across Africa, a paucity of data exists regarding phylogeography of the African Yellow fever (YF) genotypes. The need to fill this gap with spatiotemporal data from continuous YF outbreaks in Africa conceptualized this study; which aims to investigate the most recent transmission events and directional spread of yellow fever virus (YFV) using updated genomic sequence data.