NanoTube Construct: A web tool for the digital construction of nanotubes of single-layer materials and the calculation of their atomistic descriptors powered by Enalos Cloud Platform
NanoTube Construct is a web tool for the digital construction of nanotubes based on real and hypothetical single-layer materials including carbon-based materials such as graphene, graphane, graphyne polymorphs, graphidiyene and non-carbon materials such as silicene, germanene, boron nitride, hexagonal bilayer silica, haeckelite silica, molybdene disulfide and tungsten disulfide. Contrary to other available tools, NanoTube Construct has the following features: a) it is not limited to zero thickness materials with specific symmetry, b) it applies energy minimisation to the geometrically constructed Nanotubes to generate realistic ones, c) it derives atomistic descriptors (e.g., the average potential energy per atom, the average coordination number, etc.), d) it provides the primitive unit cell of the constructed Nanotube which corresponds to the selected rolling vector (i.e., the direction in which the starting nanosheet is rolled to form a tube), e) it calculates whether the Nanotube or its corresponding nanosheet is more energetically stable and f) it allows negative chirality indexes. Application of NanoTube Construct for the construction of energy minimised graphane and molybdenum disulfide nanotubes are presented, showcasing the tool's capability. NanoTube Construct is freely accessible through the Enalos Cloud Platform (https://enaloscloud.novamechanics.com/diagonal/nanotube/).
Machine learning and artificial intelligence: Enabling the clinical translation of atomic force microscopy-based biomarkers for cancer diagnosis
The influence of biomechanics on cell function has become increasingly defined over recent years. Biomechanical changes are known to affect oncogenesis; however, these effects are not yet fully understood. Atomic force microscopy (AFM) is the gold standard method for measuring tissue mechanics on the micro- or nano-scale. Due to its complexity, however, AFM has yet to become integrated in routine clinical diagnosis. Artificial intelligence (AI) and machine learning (ML) have the potential to make AFM more accessible, principally through automation of analysis. In this review, AFM and its use for the assessment of cell and tissue mechanics in cancer is described. Research relating to the application of artificial intelligence and machine learning in the analysis of AFM topography and force spectroscopy of cancer tissue and cells are reviewed. The application of machine learning and artificial intelligence to AFM has the potential to enable the widespread use of nanoscale morphologic and biomechanical features as diagnostic and prognostic biomarkers in cancer treatment.
An AI-driven multiscale methodology to develop transparent wood as sustainable functional material by using the SSbD concept
Efficient design, production, and optimization of new safe and sustainable by design materials for various industrial sectors is an on-going challenge for our society, poised to escalate in the future. Wood-based composite materials offer an attractive sustainable alternative to high impact materials such as glass and polymers and have been the focus of experimental research and development for years. Computational and AI-based materials design provides significant speed-up the development of these materials compared to traditional methods of development. However, reliable numerical models are essential for achieving this goal. The AI-TranspWood project, recently funded by the European Commission, has the ambition to develop such computational and AI-based tools in the context of transparent wood (TW), a promising composite with potential applications in various industrial fields. In this project we advance the development specifically by using an Artificial Intelligence (AI)-driven multiscale methodology.
A systematic review on the state-of-the-art and research gaps regarding inorganic and carbon-based multicomponent and high-aspect ratio nanomaterials
This review explores the state-of-the-art with respect to multicomponent nanomaterials (MCNMs) and high aspect ratio nanomaterials (HARNs), with a focus on their physicochemical characterisation, applications, and hazard, fate, and risk assessment. Utilising the PRISMA approach, this study investigates specific MCNMs including cerium-zirconium mixtures (CeZrO) and ZnO nanomaterials doped with transition metals and rare earth elements, as well as Titanium Carbide (TiC) nanomaterials contained in Ti-6Al-4V alloy powders. HARNs of interest include graphene, carbon-derived nanotubes (CNTs), and metallic nanowires, specifically Ag-based nanowires. The review reveals a significant shift in research and innovation (R&I) efforts towards these advanced nanomaterials due to their unique properties and functionalities that promise enhanced performance across various applications including photocatalysis, antibacterial and biomedical uses, and advanced manufacturing. Despite the commercial potential of MCNMs and HARNs, the review identifies critical gaps in our understanding of their environmental fate and transformations upon exposure to new environments, and their potential adverse effects on organisms and the environment. The findings underscore the necessity for further research focused on the environmental transformations and toxicological profiles of these nanomaterials to inform Safe and Sustainable by Design (SSbD) strategies. This review contributes to the body of knowledge by cataloguing current research, identifying research gaps, and highlighting future directions for the development of MCNMs and HARNs, facilitating their safe and effective integration into industry.
Determining key residues of engineered scFv antibody variants with improved MMP-9 binding using deep sequencing and machine learning
Given the crucial role of specific matrix metalloproteinases (MMPs) in the extracellular matrix, an imbalance in the regulation of activation of matrix metalloproteinase-9 (MMP-9) zymogen and inhibition of the enzyme can result in various diseases, such as cancer, neurodegenerative, and gynecological diseases. Thus, developing novel therapeutics that target MMP-9 with single-chain antibody fragments (scFvs) is a promising approach. We used fluorescent-activated cell sorting (FACS) to screen a synthetic scFv antibody library displayed on yeast for enhanced binding to MMP-9. The screened scFv mutants demonstrated improved binding to MMP-9 compared to the natural inhibitor of MMPs, tissue inhibitor of metalloproteinases (TIMPs). To identify the molecular determinants of these engineered scFv variants that affect binding to MMP-9, we used next-generation DNA sequencing and computational protein structure analysis. Additionally, a deep-learning language model was trained on the screened scFv library of variants to predict the binding affinities of scFv variants based on their CDR-H3 sequences.
Corrigendum to 'Downregulation of ABLIM3 confers to the metastasis of neuroblastoma via regulating the cell adhesion molecules pathway' [Comput. Struct. Biotechnol. J., Vol. 23 (2024) 1547-1561]
[This corrects the article DOI: 10.1016/j.csbj.2024.04.024.].
Easy-MODA: Simplifying standardised registration of scientific simulation workflows through MODA template guidelines powered by the Enalos Cloud Platform
Modelling Data (MODA) reporting guidelines have been proposed for common terminology and for recording metadata for physics-based materials modelling and simulations in a CEN Workshop Agreement (CWA 17284:2018). Their purpose is similar to that of the Quantitative Structure-Activity Relationship (QSAR) model report form (QMRF) that aims to increase industry and regulatory confidence in QSAR models, but for a wider range of model types. Recently, the WorldFAIR project's nanomaterials case study suggested that both QMRF and MODA templates are an important means to enhance compliance of nanoinformatics models, and their underpinning datasets, with the FAIR principles (Findable, Accessible, Interoperable, Reusable). Despite the advances in computational modelling of materials properties and phenomena, regulatory uptake of predictive models has been slow. This is, in part, due to concerns about lack of validation of complex models and lack of documentation of scientific simulations. The models are often complex, output can be hardware- and software-dependent, and there is a lack of shared standards. Despite advocating for standardised and transparent documentation of simulation protocols through its templates, the MODA guidelines are rarely used in practice by modellers because of a lack of tools for automating their creation, sharing, and storage. They also suffer from a paucity of user guidance on their use to document different types of models and systems. Such tools exist for the more well-established QMRF and have aided widespread implementation of QMRFs. To address this gap, a simplified procedure and online tool, Easy-MODA, has been developed to guide users through MODA creation for physics-based and data-based models, and their various combinations. Easy-MODA is available as a web-tool on the Enalos Cloud Platform (https://www.enaloscloud.novamechanics.com/insight/moda/). The tool streamlines the creation of detailed MODA documentation, even for complex multi-model workflows, and facilitates the registration of MODA workflows and documentation in a database, thereby increasing their Findability and thus Re-usability. This enhances communication, interoperability, and reproducibility in multiscale materials modelling and improves trust in the models through improved documentation. The use of the Easy-MODA tool is exemplified by a case study for nanotoxicity evaluation, involving interlinked models and data transformation, to demonstrate the effectiveness of the tool in integrating complex computational methodologies and its significant role in improving the FAIRness of scientific simulations.
Biased activation of the vasopressin V2 receptor probed by molecular dynamics simulations, NMR and pharmacological studies
G protein-coupled receptors (GPCRs) control critical cell signaling. Their response to extracellular stimuli involves conformational changes to convey signals to intracellular effectors, among which the most important are G proteins and β-arrestins (βArrs). Biased activation of one pathway is a field of intense research in GPCR pharmacology. Combining NMR, site-directed mutagenesis, molecular pharmacology, and molecular dynamics (MD) simulations, we studied the conformational diversity of the vasopressin V2 receptor (V2R) bound to different types of ligands: the antagonist Tolvaptan, the endogenous unbiased agonist arginine-vasopressin, and MCF14, a partial Gs protein-biased agonist. A double-labeling NMR scheme was developed to study the receptor conformational changes and ligand binding: V2R was subjected to lysine CH methylation for complementary NMR studies, whereas the agonists were tagged with a paramagnetic probe. Paramagnetic relaxation enhancements and site-directed mutagenesis validated the ligand binding modes in the MD simulations. We found that the bias for the Gs protein over the βArr pathway involves interactions between the conserved NPxxY motif in the transmembrane helix 7 (TM7) and TM3, compacting helix 8 (H8) toward TM1 and likely inhibiting βArr signaling. A similar mechanism was elicited for the pathogenic mutation I130N, which constitutively activates the Gs proteins without concomitant βArr recruitment. The findings suggest common patterns of biased signaling in class A GPCRs, as well as a rationale for the design of G protein-biased V2R agonists.
PENGUIN: A rapid and efficient image preprocessing tool for multiplexed spatial proteomics
Multiplex spatial proteomic methodologies can provide a unique perspective on the molecular and cellular composition of complex biological systems. Several challenges are associated to the analysis of imaging data, specifically in regard to the normalization of signal-to-noise ratios across images and subtracting background noise. However, there is a lack of user-friendly solutions for denoising multiplex imaging data that can be applied to large datasets. We have developed PENGUIN -Percentile Normalization GUI Image deNoising: a straightforward image preprocessing tool for multiplexed spatial proteomics data. Compared to existing approaches, PENGUIN distinguishes itself by eliminating the need for manual annotation or machine learning models. It effectively preserves signal intensity differences while reducing noise, improving downstream tasks such as cell segmentation and phenotyping. PENGUIN's simplicity, speed, and intuitive interface, available as both a script and a Jupyter notebook, make it easy to adjust image processing parameters, providing a user-friendly experience. We further demonstrate the effectiveness of PENGUIN by comparing it to conventional image processing techniques and solutions tailored for multiplex imaging data.
Optimizing fountain codes for DNA data storage
Fountain codes, originally developed for reliable multicasting in communication networks, are effectively applied in various data transmission and storage systems. Their recent use in DNA data storage systems has unique challenges, since the DNA storage channel deviates from the traditional Gaussian white noise erasure model considered in communication networks and has several restrictions as well as special properties. Thus, optimizing fountain codes to address these challenges promises to improve their overall usability in DNA data storage systems. In this article, we present several methods for optimizing fountain codes for DNA data storage. Apart from generally applicable optimizations for fountain codes, we propose optimization algorithms to create tailored distribution functions of fountain codes, which is novel in the context of DNA data storage. We evaluate the proposed methods in terms of various metrics related to the DNA storage channel. Our evaluation shows that optimizing fountain codes for DNA data storage can significantly enhance the reliability and capacity of DNA data storage systems. The developed methods represent a step forward in harnessing the full potential of fountain codes for DNA-based data storage applications. The new coding schemes and all developed methods are available under a free and open-source software license.
Corrigendum to "Cryo-EM reveals architectural diversity in active rotavirus particles" [Comput. Struct. Biotechnol. J. 17 (2019) 1178-1183]
[This corrects the article DOI: 10.1016/j.csbj.2019.07.019.].
Protein allosteric site identification using machine learning and per amino acid residue reported internal protein nanoenvironment descriptors
Allosteric regulation plays a crucial role in modulating protein functions and represents a promising strategy in drug development, offering enhanced specificity and reduced toxicity compared to traditional active site inhibition. Existing computational methods for predicting allosteric sites on proteins often rely on static protein surface pocket features, normal mode analysis or extensive molecular dynamics simulations encompassing both the protein function modulator and the protein itself. In this study, we introduce an innovative methodology that employs a per amino acid residue classifier to distinguish allosteric site-forming residues (AFRs) from non-allosteric, or free residues (FRs). Our model, STINGAllo, exhibits robust performance, achieving Distance Center Center (DCC) success rate when all AFRs were predicted within pockets identified by FPocket, overall DCC, F1 score and a Matthews correlation coefficient (MCC) of 78 %, 60 %, 64 % and 64 % respectively. Furthermore, we identified key descriptors that characterize the internal protein nanoenvironment of AFRs, setting them apart from FRs. These descriptors include the sponge effect, distance to the protein centre of geometry (cg), hydrophobic interactions, electrostatic potentials, eccentricity, and graph bottleneck features.
Molecular dynamics simulations unveil the aggregation patterns and salting out of polyarginines at zwitterionic POPC bilayers in solutions of various ionic strengths
This study employs molecular dynamics (MD) simulations to investigate the adsorption and aggregation behavior of simple polyarginine cell-penetrating peptides (CPPs), specifically modeled as R peptides, at zwitterionic phosphocholine POPC membranes under varying ionic strengths of two peptide concentrations and two concentrations of NaCl and CaCl. The results reveal an intriguing phenomenon of R aggregation at the membrane, which is dependent on the ionic strength, indicating a salting-out effect. As the peptide concentration and ionic strength increase, peptide aggregation also increases, with aggregate lifetimes and sizes showing a corresponding rise, accompanied by the total decrease of adsorbed peptides at the membrane surface. Notably, in high ionic strength environments, large R aggregates, such as octamers, are also observed occasionally. The salting-out, typically uncommon for short positively charged peptides, is attributed to the unique properties of arginine amino acid, specifically by its side chain containing amphiphilic guanidinium (Gdm) ion which makes both intermolecular hydrophobic like-charge Gdm - Gdm and salt-bridge Gdm - C-terminus interactions, where the former are increased with the ionic strength, and the latter decreased due to electrostatic screening. The aggregation behavior of R peptides at membranes can also be linked to their CPP translocation properties, suggesting that aggregation may aid in translocation across cellular membranes.
Prediction of conformational states in a coronavirus channel using Alphafold-2 and DeepMSA2: Strengths and limitations
The envelope (E) protein is present in all coronavirus genera. This protein can form pentameric oligomers with ion channel activity which have been proposed as a possible therapeutic target. However, high resolution structures of E channels are limited to those of the severe acute respiratory syndrome coronavirus-2 (SARS-CoV-2), responsible for the recent COVID-19 pandemic. In the present work, we used Alphafold-2 (AF2), in ColabFold without templates, to predict the transmembrane domain (TMD) structure of six E-channels representative of genera alpha-, beta- and gamma-coronaviruses in the family. High-confidence models were produced in all cases when combining multiple sequence alignments (MSAs) obtained from DeepMSA2. Overall, AF2 predicted at least two possible orientations of the α-helices in E-TMD channels: one where a conserved polar residue (Asn-15 in the SARS sequence) is oriented towards the center of the channel, 'polar-in', and one where this residue is in an interhelical orientation 'polar-inter'. For the SARS models, the comparison with the two experimental models 'closed' (PDB: 7K3G) and 'open' (PDB: 8SUZ) is described, and suggests a ∼60˚ α-helix rotation mechanism involving either the full TMD or only its N-terminal half, to allow the passage of ions. While the results obtained are not identical to the two high resolution models available, they suggest various conformational states with striking similarities to those models. We believe these results can be further optimized by means of MSA subsampling, and guide future high resolution structural studies in these and other viral channels.
NaCTR: Natural product-derived compound-based drug discovery pipeline from traditional oriental medicine by search space reduction
The drug discovery pipelines require enormous time and cost, albeit their infamously high risk of failures. Reducing such risk has therefore been the utmost goal in the process. Recently, natural products (NPs) in traditional oriental medicine (TOM) have come into the spotlight for their efficacy and safety supported throughout the history. Not only that, with the ever-increasing repository of various biological datasets, many data-driven approaches have also been extensively studied for better efficient search and testing. However, TOM-based datasets lack information on recently prevalent diseases, while experimental datasets are prone to provide target spaces that are too large. Adequate combination of both approaches can therefore fill in each other's blanks. In this study, we introduce NaCTR, an discovery pipeline that achieves such integration to suggest NPs-derived drug candidates for a given disease. First, phenotypes and disease genes for the disease are identified in literature and public databases. Secondly, a pool of potentially therapeutic NPs are identified based on both TOM-based phenotype records and compound-gene interaction datasets. Lastly, the compounds contained in the NPs are further screened for toxicity and pharmacokinetic properties. We use the Parkinson's disease as the case study to test the NaCTR pipeline. Through the pipeline, we propose glutathione and four other compounds as novel drug candidates. We further highlight the finding with literature support. As the first to effectively combine data from ancient and recent repositories, the NaCTR pipeline can be a novel pipeline that can be applied successfully to any other diseases.
Microsatellites explorer: A database of short tandem repeats across genomes
Short tandem repeats (STRs) are widespread, repetitive elements, with a number of biological functions and are among the most rapidly mutating regions in the genome. Their distribution varies significantly between taxonomic groups in the tree of life and are highly polymorphic within the human population. Advances in sequencing technologies coupled with decreasing costs have enabled the generation of an ever-growing number of complete genomes. Additionally, the arrival of accurate long reads has facilitated the generation of Telomere-to-Telomere (T2T) assemblies of complete genomes. Nevertheless, there is no comprehensive database that encompasses the STRs found per genome across different organisms and for different human genomes across diverse ancestries. Here we introduce Microsatellites Explorer, a database of STRs found in the genomes of 117,253 organisms across all major taxonomic groups, 15 T2T genome assemblies of different organisms, and 94 human haplotypes from the human pangenome. The database currently hosts 406,758,798 STR sequences, serving as a centralized user-friendly repository to perform searches, interactive visualizations, and download existing STR data for independent analysis. Microsatellites Explorer is implemented as a web-portal for browsing, analyzing and downloading STR data. Microsatellites Explorer is publicly available at https://www.microsatellitesexplorer.com.
Constructing phylogenetic trees for microbiome data analysis: A mini-review
As next-generation sequencing technologies advance rapidly and the cost of metagenomic sequencing continues to decrease, researchers now face an unprecedented volume of microbiome data. This surge has stimulated the development of scalable microbiome data analysis methods and necessitated the incorporation of phylogenetic information into microbiome analysis for improved accuracy. Tools for constructing phylogenetic trees from 16S rRNA sequencing data are well-established, as the highly conserved regions of the 16S gene are limited, simplifying the identification of marker genes. In contrast, metagenomic and whole genome shotgun (WGS) sequencing involve sequencing from random fragments of the entire gene, making identification of consistent marker genes challenging owing to the vast diversity of genomic regions, resulting in a scarcity of robust tools for constructing phylogenetic trees. Although bacterial sequence tree construction tools exist for upstream bioinformatics, many downstream researchers-those integrating these trees into statistical models or machine learning-are either unaware of these tools or find them difficult to use due to the steep learning curve of processing raw sequences. This is compounded by the fact that public datasets often lack phylogenetic trees, providing only abundance tables and taxonomic classifications. To address this, we present a comprehensive review of phylogenetic tree construction techniques for microbiome data (16S rRNA or whole-genome shotgun sequencing). We outline the strengths and limitations of current methods, offering expert insights and step-by-step guidance to make these tools more accessible and widely applicable in quantitative microbiome data analysis.
Comparative genomics reveals carbohydrate enzymatic fluctuations and herbivorous adaptations in arthropods
Arthropods represent the largest and most diverse phylum on Earth, playing a pivotal role in the biosphere. One key to their evolutionary success is their ability to feed on plant material. However, their endogenous enzymatic repertoire, which contributes to plant digestion, remains largely unexplored and poorly understood.
Landscape of intrinsically disordered proteins in mental disorder diseases
Disrupted genes linked to mental disorders sometimes exhibit characteristics of Intrinsically Disordered Proteins (IDPs). However, few studies have comprehensively explored the functional associations between protein disorder properties and different psychiatric disorders. In this study, we collected disrupted proteins for seven mental diseases (MDD, SCZ, BP, ID, AD, ADHD, ASD) and a control dataset from normal brains. After calculating the disorder scores for each protein, we thoroughly compared the proportions and functions of IDPs between differentially expressed proteins in each disease and healthy controls. Our findings revealed that disrupted proteins, particularly in ASD and ADHD, contain more IDPs than controls from normal brains. Distinct patterns in disorder properties were observed among different mental disorders. Functional enrichment analysis indicated that IDPs in mental disorders were associated with neurodevelopment, synaptic signaling, and gene expression regulatory pathways. In addition, we analyzed the proportion and function of liquid-phase-separated proteins (LLPS) in psychiatric disorders, finding that LLPS proteins are mainly enriched in pathways related to neurodevelopment and inter-synaptic signaling. Furthermore, to validate our findings, we conducted an analysis of differentially expressed genes in an ASD cohort, revealing that the encoded proteins also exhibit a higher proportion of IDPs. Notably, these IDPs were particularly enriched in pathways related to neurodevelopment, including head development, a process known to be disrupted in ASD. Our study sheds light on the crucial role of IDPs in psychiatric disorders, enhancing our understanding of their molecular mechanisms.
Improving compound-protein interaction prediction by focusing on intra-modality and inter-modality dynamics with a multimodal tensor fusion strategy
Identifying novel compound-protein interactions (CPIs) plays a pivotal role in target identification and drug discovery. Although the recent multimodal methods have achieved outstanding advances in CPI prediction, they fail to effectively learn both intra-modality and inter-modality dynamics, which limits their prediction performance. To address the limitation, we propose a novel multimodal tensor fusion CPI prediction framework, named MMTF-CPI, which contains three unimodal learning modules for structure, heterogeneous network and transcriptional profiling modalities, a tensor fusion module and a prediction module. MMTF-CPI is capable of focusing on both intra-modality and inter-modality dynamics with the tensor fusion module. We demonstrated that MMTF-CPI is superior to multiple state-of-the-art multimodal methods across seven datasets. The prediction performance of MMTF-CPI is significantly improved with the tensor fusion module compared to other fusion methods. Moreover, our case studies confirmed the practical value of MMTF-CPI in target identification. Via MMTF-CPI, we also discovered several candidate compounds for the therapy of breast cancer and non-small cell lung cancer.
Advancing titanium dioxide coated photocatalytic depolluting surfaces: Leveraging ASINA's roadmap for safer and sustainable solutions
This report, the second of its kind from ASINA project, aims at providing a roadmap with quantitative metrics for Safe(r) and (more) Sustainable by Design (SSbD) solutions for titanium dioxide (TiO) nanomaterials (NMs). We begin with a brief description of ASINA's methodology across the product lifecycle, highlighting the quantitative elements, such as the Key Performance Indicators (KPIs). We then propose a decision support tool for implementing SSbD objectives across various dimensions-functionality, cost, environment, and human health safety. This is followed by the main innovative findings, a consolidation of the technical processes involved, design rationales, experimental procedures, tools and models, used and developed, to deliver photocatalytic depolluting surfaces by spray- finishing techniques based on TiO NMs formulations. The roadmap is thoroughly described to inform similar projects through the integration of KPIs into SSbD methodologies, fostering data-driven decision-making. While specific results are beyond this report's scope, its primary aim is to demonstrate the roadmap (SSbD know-how) and promote SSbD-oriented innovation in nanotechnology. Finally, we provide a comparison of the approaches followed in two case studies that target different industrial sectors. This case-specific SSbD assessments provide a concrete exemplification of the addressed methodology that contributes to the efforts towards attaining a common roadmap for implementing SSbD solutions aligned with the EU's Green Deal objectives.