IEEE-ACM Transactions on Computational Biology and Bioinformatics

Enhancing Single-Cell RNA-seq Data Completeness with a Graph Learning Framework

Lall S, Ray S and Bandyopadhyay S

Single cell RNA sequencing (scRNA-seq) is a powerful tool to capture gene expression snapshots in individual cells. However, a low amount of RNA in the individual cells results in dropout events, which introduce huge zero counts in the single cell expression matrix. We have developed VAImpute, a variational graph autoencoder based imputation technique that learns the inherent distribution of a large network/graph constructed from the scRNA-seq data leveraging copula correlation ( Ccor) among cells/genes. The trained model is utilized to predict the dropouts events by computing the probability of all non-edges (cell-gene) in the network. We devise an algorithm to impute the missing expression values of the detected dropouts. The performance of the proposed model is assessed on both simulated and real scRNA-seq datasets, comparing it to established single-cell imputation methods. VAImpute yields significant improvements to detect dropouts, thereby achieving superior performance in cell clustering, detecting rare cells, and differential expression. All codes and datasets are given in the github link: https://github.com/sumantaray/VAImputeAvailability.

View more:

Pubmed

IEEE/ACM Trans Comput Biol Bioinform

Performance Comparison between Deep Neural Network and Machine Learning based Classifiers for Huntington Disease Prediction from Human DNA Sequence

Vishnuppriya C and Tamilpavai G

Huntington Disease (HD) is a type of neurodegenerative disorder which causes problems like psychiatric disturbances, movement problem, weight loss and problem in sleep. It needs to be addressed in earlier stage of human life. Nowadays Deep Learning (DL) based system could help physicians provide second opinion in treating patient's disease. In this work, human Deoxyribo Nucleic Acid (DNA) sequence is analyzed using Deep Neural Network (DNN) algorithm to predict the HD disease. The main objective of this work is to identify whether the human DNA is affected by HD or not. Human DNA sequences are collected from National Center for Biotechnology Information (NCBI) and synthetic human DNA data are also constructed for process. Then numerical conversion of human DNA sequence data is done by Chaos Game Representation (CGR) method. After that, numerical values of DNA data are used for feature extraction. Mean, median, standard deviation, entropy, contrast, correlation, energy and homogeneity are extracted. Additionally, the following features such as counts of adenine, thymine, guanine and cytosine are extracted from the DNA sequence data itself. The extracted features are used as input to the DNN classifier and other machine learning based classifiers such as NN (Neural Network), Support Vector Machine (SVM), Random Forest (RF) and Classification Tree with Forward Pruning (CTWFP). Six performance measures are used such as Accuracy, Sensitivity, Specificity, Precision, F1 score and Mathew Correlation Co-efficient (MCC). The study concludes DNN, NN, SVM, RF achieve 100% accuracy and CTWFP achieves accuracy of 87%.

View more:

Pubmed

IEEE/ACM Trans Comput Biol Bioinform

AI-based Computational Methods in Early Drug Discovery and Post Market Drug Assessment: A Survey

Rajaei F, Minoccheri C, Wittrup E, Wilson RC, Athey BD, Omenn GS and Najarian K

Over the past few years, artificial intelligence (AI) has emerged as a transformative force in drug discovery and development (DDD), revolutionizing many aspects of the process. This survey provides a comprehensive review of recent advancements in AI applications within early drug discovery and post-market drug assessment. It addresses the identification and prioritization of new therapeutic targets, prediction of drug-target interaction (DTI), design of novel drug-like molecules, and assessment of the clinical efficacy of new medications. By integrating AI technologies, pharmaceutical companies can accelerate the discovery of new treatments, enhance the precision of drug development, and bring more effective therapies to market. This shift represents a significant move towards more efficient and cost-effective methodologies in the DDD landscape.

View more:

Pubmed

IEEE/ACM Trans Comput Biol Bioinform

DeepLigType: Predicting Ligand Types of Protein-Ligand Binding Sites Using a Deep Learning Model

Orhun V, Leon J and Lurong P

The analysis of protein-ligand binding sites plays a crucial role in the initial stages of drug discovery. Accurately predicting the ligand types that are likely to bind to protein-ligand binding sites enables more informed decision making in drug design. Our study, DeepLigType, determines protein-ligand binding sites using Fpocket and then predicts the ligand type of these pockets with the deep learning model, Convolutional Block Attention Module (CBAM) with ResNet. CBAM-ResNet has been trained to accurately predict five distinct ligand types. We classified protein-ligand binding sites into five different categories according to the type of response ligands cause when they bind to their target proteins, which are antagonist, agonist, activator, inhibitor, and others. We created a novel dataset, referred to as LigType5, from the widely recognized PDBbind and scPDB dataset for training and testing our model. While the literature mostly focuses on the specificity and characteristic analysis of protein binding sites by experimental (laboratory-based) methods, we propose a computational method with the DeepLigType architecture. DeepLigType demonstrated an accuracy of 74.30% and an AUC of 0.83 in ligand type prediction on a novel test dataset using the CBAM-ResNet deep learning model. For access to the code implementation of this research, please visit our GitHub repository at https://github.com/drorhunvural/DeepLigType.

View more:

Pubmed

IEEE/ACM Trans Comput Biol Bioinform

iAnOxPep: a machine learning model for the identification of anti-oxidative peptides using ensemble learning

Hassan MT, Tayara H and Chong KT

Due to their safety, high activity, and plentiful sources, antioxidant peptides, particularly those produced from food, are thought to be prospective competitors to synthetic antioxidants in the fight against free radical-mediated illnesses. The lengthy and laborious trial-and-error method for identifying antioxidative peptides (AOP) has raised interest in creating computational-based methods. There exist two state-of-the-art AOP predictors; however, the restriction on peptide sequence length makes them inviable. By overcoming the aforementioned problem, a novel predictor might be useful in the context of AOP prediction. The method has been trained, tested, and evaluated on two datasets: a balanced one and an unbalanced one. We used seven different descriptors and five machine-learning (ML) classifiers to construct 35 baseline models. Five ML classifiers were further trained to create five meta-models using the combined output of 35 baseline models. Finally, these five meta-models were aggregated together through ensemble learning to create a robust predictive model named iAnOxPep. On both datasets, our proposed model demonstrated good prediction performance when compared to baseline models and meta-models, demonstrating the superiority of our approach in the identification of AOPs. For the purpose of screening and identifying possible AOPs, we anticipate that the iAnOxPep method will be an invaluable tool.

View more:

Pubmed

IEEE/ACM Trans Comput Biol Bioinform

A comprehensive evaluation framework for benchmarking multi-objective feature selection in omics-based biomarker discovery

Cattelani L, Ghosh A, Rintala T and Fortino V

Machine learning algorithms have been extensively used for accurate classification of cancer subtypes driven by gene expression-based biomarkers. However, biomarker models combining multiple gene expression signatures are often not reproducible in external validation datasets and their feature set size is often not optimized, jeopardizing their translatability into cost-effective clinical tools. We investigated how to solve the multi-objective problem of finding the best trade-offs between classification performance and set size applying seven algorithms for machine learning-driven feature subset selection and analyse how they perform in a benchmark with eight large-scale transcriptome datasets of cancer, covering both training and external validation sets. The benchmark includes evaluation metrics assessing the performance of the individual biomarkers and the solution sets, according to their accuracy, diversity, and stability of the composing genes. Moreover, a new evaluation metric for cross-validation studies is proposed that generalizes the hypervolume, which is commonly used to assess the performance of multi-objective optimization algorithms. Biomarkers exhibiting 0.8 of balanced accuracy on the external dataset for breast, kidney and ovarian cancer using respectively 4, 2 and 7 features, were obtained. Genetic algorithms often provided better performance than other considered algorithms, and the recently proposed NSGA2-CH and NSGA2-CHS were the best performing methods in most cases.

View more:

Pubmed

IEEE/ACM Trans Comput Biol Bioinform

Generative Biomedical Event Extraction with Constrained Decoding Strategy

Su F, Teng C, Li F, Li B, Zhou J and Ji D

Currently, biomedical event extraction has received considerable attention in various fields, including natural language processing, bioinformatics, and computational biomedicine. This has led to the emergence of numerous machine learning and deep learning models that have been proposed and applied to tackle this complex task. While existing models typically adopt an extraction-based approach, which requires breaking down the extraction of biomedical events into multiple subtasks for sequential processing, making it prone to cascading errors. This paper presents a novel approach by constructing a biomedical event generation model based on the framework of the pre-trained language model T5. We employ a sequence-tosequence generation paradigm to obtain events, the model utilizes constrained decoding algorithm to guide sequence generation, and a curriculum learning algorithm for efficient model learning. To demonstrate the effectiveness of our model, we evaluate it on two public benchmark datasets, Genia 2011 and Genia 2013. Our model achieves superior performance, illustrating the effectiveness of generative modeling of biomedical events.

View more:

Pubmed

IEEE/ACM Trans Comput Biol Bioinform

An End-to-end Knowledge Graph Fused Graph Neural Network for Accurate Protein-Protein Interactions Prediction

Yang J, Li Y, Wang G, Chen Z and Wu D

Protein-protein interactions (PPIs) are essential to understanding cellular mechanisms, signaling networks, disease processes, and drug development, as they represent the physical contacts and functional associations between proteins. Recent advances have witnessed the achievements of artificial intelligence (AI) methods aimed at predicting PPIs. However, these approaches often handle the intricate web of relationships and mechanisms among proteins, drugs, diseases, ribonucleic acid (RNA), and protein structures in a fragmented or superficial manner. This is typically due to the limitations of non-end-to-end learning frameworks, which can lead to sub-optimal feature extraction and fusion, thereby compromising the prediction accuracy. To address these deficiencies, this paper introduces a novel end-to-end learning model, the Knowledge Graph Fused Graph Neural Network (KGF-GNN). This model comprises three integral components: (1) Protein Associated Network (PAN) Construction: We begin by constructing a PAN that extensively captures the diverse relationships and mechanisms linking proteins with drugs, diseases, RNA, and protein structures. (2) Graph Neural Network for Feature Extraction: A Graph Neural Network (GNN) is then employed to distill both topological and semantic features from the PAN, alongside another GNN designed to extract topological features directly from observed PPI networks. (3) Multi-layer Perceptron for Feature Fusion: Finally, a multi-layer perceptron integrates these varied features through end-to-end learning, ensuring that the feature extraction and fusion processes are both comprehensive and optimized for PPI prediction. Extensive experiments conducted on real-world PPI datasets validate the effectiveness of our proposed KGF-GNN approach, which not only achieves high accuracy in predicting PPIs but also significantly surpasses existing state-of-the-art models. This work not only enhances our ability to predict PPIs with a higher precision but also contributes to the broader application of AI in Bioinformatics, offering profound implications for biological research and therapeutic development.

View more:

Pubmed

IEEE/ACM Trans Comput Biol Bioinform

MLW-BFECF: a multi-weighted dynamic cascade forest based on bilinear feature extraction for predicting the stage of Kidney Renal Clear Cell Carcinoma on multi-modal gene data

Jia L, Jiang L, Yue J, Hao F, Wu Y and Liu X

The stage prediction of kidney renal clear cell carcinoma (KIRC) is important for the diagnosis, personalized treatment, and prognosis of patients. Many prediction methods have been proposed, but most of them are based on unimodal gene data, and their accuracy is difficult to further improve. Therefore, we propose a novel multi-weighted dynamic cascade forest based on the bilinear feature extraction (MLW-BFECF) model for stage prediction of KIRC using multimodal gene datasets (RNA-seq, CNA, and methylation). The proposed model utilizes a dynamic cascade framework with shuffle layers to prevent early degradation of the model. In each cascade layer, a voting technique based on three gene selection algorithms is first employed to effectively retain gene features more relevant to KIRC and eliminate redundant information in gene features. Then, two new bilinear models based on the gated attention mechanism are proposed to better extract new intra-modal and inter-modal gene features; Finally, based on the idea of the bagging, a multi-weighted ensemble forest classifiers module is proposed to extract and fuse probabilistic features of the three-modal gene data. A series of experiments demonstrate that the MLW-BFECF model based on the three-modal KIRC dataset achieves the highest prediction performance with an accuracy of 88.92%.

View more:

Pubmed

IEEE/ACM Trans Comput Biol Bioinform

Improving Molecule Generation and Drug Discovery with a Knowledge-enhanced Generative Model

Malusare A and Aggarwal V

Recent advancements in generative models have established state-of-the-art benchmarks in the generation of molecules and novel drug candidates. Despite these successes, a significant gap persists between generative models and the utilization of extensive biomedical knowledge, often systematized within knowledge graphs, whose potential to inform and enhance generative processes has not been realized. In this paper, we present a novel approach that bridges this divide by developing a framework for knowledge-enhanced generative models called KARL. We develop a scalable methodology to extend the functionality of knowledge graphs while preserving semantic integrity, and incorporate this contextual information into a generative framework to guide a diffusion-based model. The integration of knowledge graph embeddings with our generative model furnishes a robust mechanism for producing novel drug candidates possessing specific characteristics while ensuring validity and synthesizability. KARL outperforms state-of-the-art generative models on both unconditional and targeted generation tasks.

View more:

Pubmed

IEEE/ACM Trans Comput Biol Bioinform

A knowledge graph-based method for drug-drug interaction prediction with contrastive learning

Zhong J, Zhao H, Zhao Q and Wang J

Precisely predicting Drug-Drug Interactions (DDIs) carries the potential to elevate the quality and safety of drug therapies, protecting the well-being of patients, and providing essential guidance and decision support at every stage of the drug development process. In recent years, leveraging large-scale biomedical knowledge graphs has improved DDI prediction performance. However, the feature extraction procedures in these methods are still rough. More refined features may further improve the quality of predictions. To overcome these limitations, we develop a knowledge graph-based method for multi-typed DDI prediction with contrastive learning (KG-CLDDI). In KG-CLDDI, we combine drug knowledge aggregation features from the knowledge graph with drug topological aggregation features from the DDI graph. Additionally, we build a contrastive learning module that uses horizontal reversal and dropout operations to produce high-quality embeddings for drug-drug pairs. The comparison results indicate that KG-CLDDI is superior to state-of-the-art models in both the transductive and inductive settings. Notably, for the inductive setting, KG-CLDDI outperforms the previous best method by 17.49% and 24.97% in terms of AUC and AUPR, respectively. Furthermore, we conduct the ablation analysis and case study to show the effectiveness of KG-CLDDI. These findings illustrate the potential significance of KG-CLDDI in advancing DDI research and its clinical applications. The codes of KG-CLDDI are available at https://github.com/jianzhong123/KG-CLDDI.

View more:

Pubmed

IEEE/ACM Trans Comput Biol Bioinform

Hierarchical hypergraph learning in association-weighted heterogeneous network for miRNA-disease association identification

Ning Q, Zhao Y, Gao J, Chen C and Yin M

MicroRNAs (miRNAs) play a significant role in cell differentiation, biological development as well as the occurrence and growth of diseases. Although many computational methods contribute to predicting the association between miRNAs and diseases, they do not fully explore the attribute information contained in associated edges between miRNAs and diseases. In this study, we propose a new method, Hierarchical Hypergraph learning in Association-Weighted heterogeneous network for MiRNA-Disease association identification (HHAWMD). HHAWMD first adaptively fuses multi-view similarities based on channel attention and distinguishes the relevance of different associated relationships according to changes in expression levels of disease-related miRNAs, miRNA similarity information, and disease similarity information. Then, HHAWMD assigns edge weights and attribute features according to the association level to construct an association-weighted heterogeneous graph. Next, HHAWMD extracts the subgraph of the miRNA-disease node pair from the heterogeneous graph and builds the hyperedge (a kind of virtual edge) between the node pair to generate the hypergraph. Finally, HHAWMD proposes a hierarchical hypergraph learning approach, including node-aware attention and hyperedge-aware attention, which aggregates the abundant semantic information contained in deep and shallow neighborhoods to the hyperedge in the hypergraph. Our experiment results suggest that HHAWMD has better performance and can be used as a powerful tool for miRNA-disease association identification. The source code and data of HHAWMD are available at https://github.com/ningq669/HHAWMD/.

View more:

Pubmed

IEEE/ACM Trans Comput Biol Bioinform

LHPre: Phage Host Prediction with VAE-based Class Imbalance Correction and Lyase Sequence Embedding

Wang J, Yu Z and Li J

The escalation of antibiotic resistance underscores the need for innovative approaches to combat bacterial infections. Phage therapy has emerged as a promising solution, wherein host determination plays an important role. Phage lysins, characterized by their specificity in targeting and cleaving corresponding host bacteria, serve as key players in this paradigm. In this study, we present a novel approach by leveraging genes of phage-encoded lytic enzymes for host prediction, culminating in the development of LHPre. Initially, gene fragments of phage-encoded lytic enzymes and their respective hosts were collected from the database. Secondly, DNA sequences were encoded using the Frequency Chaos Game Representation (FCGR) method, and pseudo samples were generated employing the Variational Autoencoder (VAE) model to address class imbalance. Finally, a prediction model was constructed using the Vision Transformer(Vit) model. Five-fold cross-validation results demonstrated that LHPre surpassed other state-of-the-art phage host prediction methods, achieving accuracies of 85.04%, 90.01%, and 93.39% at the species, genus, and family levels, respectively.

View more:

Pubmed

IEEE/ACM Trans Comput Biol Bioinform

circ2DGNN: circRNA-disease Association Prediction via Transformer-based Graph Neural Network

Cen K, Xing Z, Wang X, Wang Y and Li J

Investigating the associations between circRNA and diseases is vital for comprehending the underlying mechanisms of diseases and formulating effective therapies. Computational prediction methods often rely solely on known circRNA-disease data, indirectly incorporating other biomolecules' effects by computing circRNA and disease similarities based on these molecules. However, this approach is limited, as other biomolecules also play significant roles in circRNA-disease interactions. To address this, we construct a comprehensive heterogeneous network incorporating data on human circRNAs, diseases, and other biomolecule interactions to develop a novel computational model, circ2DGNN, which is built upon a heterogeneous graph neural network. circ2DGNN directly takes heterogeneous networks as inputs and obtains the embedded representation of each node for downstream link prediction through graph representation learning. circ2DGNN employs a Transformer-like architecture, which can compute heterogeneous attention score for each edge, and perform message propagation and aggregation, using a residual connection to enhance the representation vector. It uniquely applies the same parameter matrix only to identical meta-relationships, reflecting diverse parameter spaces for different relationship types. After fine-tuning hyperparameters via five-fold cross-validation, evaluation conducted on a test dataset shows circ2DGNN outperforms existing state-of-the-art(SOTA) methods.

View more:

Pubmed

IEEE/ACM Trans Comput Biol Bioinform

GrapHiC: An integrative graph based approach for imputing missing Hi-C reads

Murtaza G, Wagner J, Zook JM and Singh R

Hi-C experiments allow researchers to study and understand the 3D genome organization and its regulatory function. Unfortunately, sequencing costs and technical constraints severely restrict access to high-quality Hi-C data for many cell types. Existing frameworks rely on a sparse Hi-C dataset or cheaper-to-acquire ChIP-seq data to predict Hi-C contact maps with high read coverage. However, these methods fail to generalize to sparse or cross-cell-type inputs because they do not account for the contributions of epigenomic features or the impact of the structural neighborhood in predicting Hi-C reads. We propose GrapHiC, which combines Hi-C and ChIP-seq in a graph representation, allowing more accurate embedding of structural and epigenomic features. Each node represents a binned genomic region, and we assign edge weights using the observed Hi-C reads. Additionally, we embed ChIP-seq and relative positional information as node attributes, allowing our representation to capture structural neighborhoods and the contributions of proteins and their modifications for predicting Hi-C reads. We show that GrapHiC generalizes better than the current state-of-the-art on cross-cell-type settings and sparse Hi-C inputs. Moreover, we can utilize our framework to impute Hi-C reads even when no Hi-C contact map is available, thus making high-quality Hi-C data accessible for many cell types. Availability: https://github.com/rsinghlab/GrapHiC.

View more:

Pubmed

IEEE/ACM Trans Comput Biol Bioinform

De Novo Drug Design by Multi-Objective Path Consistency Learning with Beam A∗ Search

Zhao D, Zhou J, Tu S and Xu L

Generating high-quality and drug-like molecules from scratch within the expansive chemical space presents a significant challenge in the field of drug discovery. In prior research, value-based reinforcement learning algorithms have been employed to generate molecules with multiple desired properties iteratively. The immediate reward was defined as the evaluation of intermediate-state molecules at each step, and the learning objective would be maximizing the expected cumulative evaluation scores for all molecules along the generative path. However, this definition of the reward was misleading, as in reality, the optimization target should be the evaluation score of only the final generated molecule. Furthermore, in previous works, randomness was introduced into the decision-making process, enabling the generation of diverse molecules but no longer pursuing the maximum future rewards. In this paper, immediate reward is defined as the improvement achieved through the modification of the molecule to maximize the evaluation score of the final generated molecule exclusively. Originating from the A ∗ search, path consistency (PC), i.e., f values on one optimal path should be identical, is employed as the objective function in the update of the f value estimator to train a multi-objective de novo drug designer. By incorporating the f value into the decision-making process of beam search, the DrugBA∗ algorithm is proposed to enable the large-scale generation of molecules that exhibit both high quality and diversity. Experimental results demonstrate a substantial enhancement over the state-of-theart algorithm QADD in multiple molecular properties of the generated molecules.

View more:

Pubmed

IEEE/ACM Trans Comput Biol Bioinform

Detecting Boolean Asymmetric Relationships with a Loop Counting Technique and its Implications for Analyzing Heterogeneity within Gene Expression Datasets

Zhou H, Lin W, Labra SR, Lipton SA, Elman JA, Schork NJ and Rangan AV

Many traditional methods for analyzing gene-gene relationships focus on positive and negative correlations, both of which are a kind of 'symmetric' relationship. Biclustering is one such technique that typically searches for subsets of genes exhibiting correlated expression among a subset of samples. However, genes can also exhibit 'asymmetric' relationships, such as 'if-then' relationships used in boolean circuits. In this paper we develop a very general method that can be used to detect biclusters within gene-expression data that involve subsets of genes which are enriched for these 'boolean-asymmetric' relationships (BARs). These BAR-biclusters can correspond to heterogeneity that is driven by asymmetric gene-gene interactions, e.g., reflecting regulatory effects of one gene on another, rather than more standard symmetric interactions. Unlike typical approaches that search for BARs across the entire population, BAR-biclusters can detect asymmetric interactions that only occur among a subset of samples. We apply our method to a single-cell RNA-sequencing data-set, demonstrating that the statistically-significant BARbiclusters indeed contain additional information not present within the more traditional 'boolean-symmetric'-biclusters. For example, the BAR-biclusters involve different subsets of cells, and highlight different gene-pathways within the data-set. Moreover, by combining the boolean-asymmetric- and boolean-symmetricsignals, one can build linear classifiers which outperform those built using only traditional boolean-symmetric signals.

View more:

Pubmed

IEEE/ACM Trans Comput Biol Bioinform

Orientation Determination of Cryo-EM Projection Images Using Reliable Common Lines and Spherical Embeddings

Wang X, Jin Q, Zou L, Lin X and Lu Y

Three-dimensional (3D) reconstruction in single-particle cryo-electron microscopy (cryo-EM) is a critical technique for recovering and studying the fine 3D structure of proteins and other biological macromolecules, where the primary issue is to determine the orientations of projection images with high levels of noise. This paper proposes a method to determine the orientations of cryo-EM projection images using reliable common lines and spherical embeddings. First, the reliability of common lines between projection images is evaluated using a weighted voting algorithm based on an iterative improvement technique and binarized weighting. Then, the reliable common lines are used to calculate the normal vectors and local X-axis vectors of projection images after two spherical embeddings. Finally, the orientations of projection images are determined by aligning the results of the two spherical embeddings using an orthogonal constraint. Experimental results on both synthetic and real cryo-EM projection image datasets demonstrate that the proposed method can achieve higher accuracy in estimating the orientations of projection images and higher resolution in reconstructing preliminary 3D structures than some common line-based methods, indicating that the proposed method is effective in single-particle cryo-EM 3D reconstruction.

View more:

Pubmed

IEEE/ACM Trans Comput Biol Bioinform

Discriminative Domain Adaption Network for Simultaneously Removing Batch Effects and Annotating Cell Types in Single-Cell RNA-Seq

Zhu Q, Li A, Zhang Z, Zheng C, Zhao J, Liu JX, Zhang D and Shao W

Machine learning techniques have become increasingly important in analyzing single-cell RNA and identifying cell types, providing valuable insights into cellular development and disease mechanisms. However, the presence of batch effects poses major challenges in scRNA-seq analysis due to data distribution variation across batches. Although several batch effect mitigation algorithms have been proposed, most of them focus only on the correlation of local structure embeddings, ignoring global distribution matching and discriminative feature representation in batch correction. In this paper, we proposed the discriminative domain adaption network (D2AN) for joint batch effects correction and type annotation with single-cell RNA-seq. Specifically, we first captured the global low-dimensional embeddings of samples from the source and target domains by adversarial domain adaption strategy. Second, a contrastive loss is developed to preliminarily align the source domain samples. Moreover, the semantic alignment of class centroids in the source and target domains is achieved for further local alignment. Finally, a self-paced learning mechanism based on inter-domain loss is adopted to gradually select samples with high similarity to the target domain for training, which is used to improve the robustness of the model. Experimental results demonstrated that the proposed method on multiple real datasets outperforms several state-of-the-art methods.

View more:

Pubmed

IEEE/ACM Trans Comput Biol Bioinform

RFLP-inator: interactive web platform for in silico simulation and complementary tools of the PCR-RFLP technique

Bedoya Benites KA and Garcia-Quispes WA

Polymerase chain reaction - Restriction Fragment Length Polymorphism (PCR-RFLP) is an established molecular biology technique leveraging DNA sequence variability for organism identification, genetic disease detection, biodiversity analysis, etc. Traditional PCR-RFLP requires wet-laboratory procedures that can result in technical errors, procedural challenges, and financial costs. With the aim of providing an accessible and efficient PCR-RFLP technique complement, we introduce RFLP-inator. This is a comprehensive web-based platform developed in R using the package Shiny, which simulates the PCR-RFLP technique, integrates analysis capabilities, and offers complementary tools for both pre- and post-evaluation of in vitro results. We developed the RFLP-inator's algorithm independently and our platform offers seven dynamic tools: RFLP simulator, Pattern identifier, Enzyme selector, RFLP analyzer, Multiplex PCR, Restriction map maker, and Gel plotter. Moreover, the software includes a restriction pattern database of more than 250,000 sequences of the bacterial 16S rRNA gene. We successfully validated the core tools against published research findings. This new platform is open access and user-friendly, offering a valuable resource for researchers, educators, and students specializing in molecular genetics. RFLP-inator not only streamlines RFLP technique application but also supports pedagogical efforts in genetics, illustrating its utility and reliability. The software is available for free at https://kodebio.shinyapps.io/RFLP-inator/.

View more:

Pubmed

IEEE/ACM Trans Comput Biol Bioinform

ESGC-MDA: Identifying miRNA-disease associations using enhanced Simple Graph Convolutional Networks

Bi X, Jiang C, Yan C, Zhao K, Zhang L and Wang J

MiRNAs play an important role in the occurrence and development of human disease. Identifying potential miRNA-disease associations is valuable for disease diagnosis and treatment. Therefore, it is urgent to develop efficient computational methods for predicting potential miRNA-disease associations to reduce the cost and time associated with biological wet experiments. In addition, high-quality feature representation remains a challenge for miRNA-disease association prediction using graph neural network methods. In this paper, we propose a method named ESGC-MDA, which employs an enhanced Simple Graph Convolution Network to identify miRNA-disease associations. We first construct a bipartite attributed graph for miRNAs and diseases by computing multi-source similarity. Then, we enhance the feature representations of miRNA and disease nodes by applying two strategies in the simple convolution network, which include randomly dropping messages during propagation to ensure the model learns more reliable feature representations, and using adaptive weighting to aggregate features from different layers. Finally, we calculate the prediction scores of miRNA-disease pairs by using a fully connected neural network decoder. We conduct 5-fold cross-validation and 10-fold cross-validation on HDMM v2.0 and HMDD v3.2, respectively, and ESGC-MDA achieves better performance than state-of-the-art baseline methods. The case studies for cardiovascular disease, lung cancer and colon cancer also further confirm the effectiveness of ESGC-MDA. The source codes are available at https://github.com/bixuehua/ESGC-MDA.

View more:

Pubmed

IEEE/ACM Trans Comput Biol Bioinform

联系我们

: 2607197746

www.shsmu.top