JOURNAL OF COMPUTER SCIENCE AND TECHNOLOGY

Feature-Preserving Mesh Denoising via Anisotropic Surface Fitting
Wang J and Yu Z
We propose in this paper a robust surface mesh denoising method that can effectively remove mesh noise while faithfully preserving sharp features. This method utilizes surface fitting and projection techniques. Sharp features are preserved in the surface fitting algorithm by considering an anisotropic neighborhood of each vertex detected by the normal-weighted distance. In addition, to handle the mesh with a high level of noise, we perform a pre-filtering of surface normals prior to the neighborhood searching. A number of experimental results and comparisons demonstrate the excellent performance of our method in preserving important surface geometries while filtering mesh noise.
New Generations: Sequencing Machines and Their Computational Challenges
Schwartz DC and Waterman MS
New generation sequencing systems are changing how molecular biology is practiced. The widely promoted $1000 genome will be a reality with attendant changes for healthcare, including personalized medicine. More broadly the genomes of many new organisms with large samplings from populations will be commonplace. What is less appreciated is the explosive demands on computation, both for CPU cycles and storage as well as the need for new computational methods. In this article we will survey some of these developments and demands.
Computational Cellular Dynamics Based on the Chemical Master Equation: A Challenge for Understanding Complexity
Liang J and Qian H
Modern molecular biology has always been a great source of inspiration for computational science. Half a century ago, the challenge from understanding macromolecular dynamics has led the way for computations to be part of the tool set to study molecular biology. Twenty-five years ago, the demand from genome science has inspired an entire generation of computer scientists with an interest in discrete mathematics to join the field that is now called bioinformatics. In this paper, we shall lay out a new mathematical theory for dynamics of biochemical reaction systems in a small volume (i.e., mesoscopic) in terms of a stochastic, discrete-state continuous-time formulation, called the chemical master equation (CME). Similar to the wavefunction in quantum mechanics, the dynamically changing probability landscape associated with the state space provides a fundamental characterization of the biochemical reaction system. The stochastic trajectories of the dynamics are best known through the simulations using the Gillespie algorithm. In contrast to the Metropolis algorithm, this Monte Carlo sampling technique does not follow a process with detailed balance. We shall show several examples how CMEs are used to model cellular biochemical systems. We shall also illustrate the computational challenges involved: multiscale phenomena, the interplay between stochasticity and nonlinearity, and how macroscopic determinism arises from mesoscopic dynamics. We point out recent advances in computing solutions to the CME, including exact solution of the steady state landscape and stochastic differential equations that offer alternatives to the Gilespie algorithm. We argue that the CME is an ideal system from which one can learn to understand "complex behavior" and complexity theory, and from which important biological insight can be gained.
Metagenomics: Facts and Artifacts, and Computational Challenges*
Wooley JC and Ye Y
Metagenomics is the study of microbial communities sampled directly from their natural environment, without prior culturing. By enabling an analysis of populations including many (so-far) unculturable and often unknown microbes, metagenomics is revolutionizing the field of microbiology, and has excited researchers in many disciplines that could benefit from the study of environmental microbes, including those in ecology, environmental sciences, and biomedicine. Specific computational and statistical tools have been developed for metagenomic data analysis and comparison. New studies, however, have revealed various kinds of artifacts present in metagenomics data caused by limitations in the experimental protocols and/or inadequate data analysis procedures, which often lead to incorrect conclusions about a microbial community. Here, we review some of the artifacts, such as overestimation of species diversity and incorrect estimation of gene family frequencies, and discuss emerging computational approaches to address them. We also review potential challenges that metagenomics may encounter with the extensive application of next-generation sequencing (NGS) techniques.
Bio-molecule Surfaces Construction via a Higher-Order Level-Set Method
Bajaj CL, Xu GL and Zhang Q
We present a general framework for a higher-order spline level-set (HLS) method and apply this to bio-molecule surfaces construction. Starting from a first order energy functional, we obtain a general level set formulation of geometric partial differential equation, and provide an efficient approach to solve this partial differential equation using a C(2) spline basis. We also present a fast cubic spline interpolation algorithm based on convolution and the Z-transform, which exploits the local relationship of interpolatory cubic spline coefficients with respect to given function data values. One example of our HLS method is demonstrated, which is the construction of bio-molecule surfaces (an implicit solvation interface) with their individual atomic coordinates and solvated radii as prerequisite.
Seg-CapNet: A Capsule-Based Neural Network for the Segmentation of Left Ventricle from Cardiac Magnetic Resonance Imaging
Cao YJ, Wu S, Liu C, Lin N, Wang Y, Yang C and Li J
Deep neural networks (DNNs) have been extensively studied in medical image segmentation. However, existing DNNs often need to train shape models for each object to be segmented, which may yield results that violate cardiac anatomical structure when segmenting cardiac magnetic resonance imaging (MRI). In this paper, we propose a capsule-based neural network, named Seg-CapNet, to model multiple regions simultaneously within a single training process. The Seg-CapNet model consists of the encoder and the decoder. The encoder transforms the input image into feature vectors that represent objects to be segmented by convolutional layers, capsule layers, and fully-connected layers. And the decoder transforms the feature vectors into segmentation masks by up-sampling. Feature maps of each down-sampling layer in the encoder are connected to the corresponding up-sampling layers, which are conducive to the backpropagation of the model. The output vectors of Seg-CapNet contain low-level image features such as grayscale and texture, as well as semantic features including the position and size of the objects, which is beneficial for improving the segmentation accuracy. The proposed model is validated on the open dataset of the Automated Cardiac Diagnosis Challenge 2017 (ACDC 2017) and the Sunnybrook Cardiac Magnetic Resonance Imaging (MRI) segmentation challenge. Experimental results show that the mean Dice coefficient of Seg-CapNet is increased by 4.7% and the average Hausdorff distance is reduced by 22%. The proposed model also reduces the model parameters and improves the training speed while obtaining the accurate segmentation of multiple regions.
SMRI: A New Method for siRNA Design for COVID-19 Therapy
Chen MX, Zhu XD, Zhang H, Liu Z and Liu YN
First discovered in Wuhan, China, SARS-CoV-2 is a highly pathogenic novel coronavirus, which rapidly spread globally and became a pandemic with no vaccine and limited distinctive clinical drugs available till March 13th, 2020. Ribonucleic Acid interference (RNAi) technology, a gene-silencing technology that targets mRNA, can cause damage to RNA viruses effectively. Here, we report a new efficient small interfering RNA (siRNA) design method named Simple Multiple Rules Intelligent Method (SMRI) to propose a new solution of the treatment of COVID-19. To be specific, this study proposes a new model named Base Preference and Thermodynamic Characteristic model (BPTC model) indicating the siRNA silencing efficiency and a new index named siRNA Extended Rules index (SER index) based on the BPTC model to screen high-efficiency siRNAs and filter out the siRNAs that are difficult to take effect or synthesize as a part of the SMRI method, which is more robust and efficient than the traditional statistical indicators under the same circumstances. Besides, to silence the spike protein of SARS-CoV-2 to invade cells, this study further puts forward the SMRI method to search candidate high-efficiency siRNAs on SARS-CoV-2's S gene. This study is one of the early studies applying RNAi therapy to the COVID-19 treatment. According to the analysis, the average value of predicted interference efficiency of the candidate siRNAs designed by the SMRI method is comparable to that of the mainstream siRNA design algorithms. Moreover, the SMRI method ensures that the designed siRNAs have more than three base mismatches with human genes, thus avoiding silencing normal human genes. This is not considered by other mainstream methods, thereby the five candidate high-efficiency siRNAs which are easy to take effect or synthesize and much safer for human body are obtained by our SMRI method, which provide a new safer, small dosage and long efficacy solution for the treatment of COVID-19.
Diagnosis of COVID-19 Pneumonia via a Novel Deep Learning Architecture
Zhang X, Lu S, Wang SH, Yu X, Wang SJ, Yao L, Pan Y and Zhang YD
COVID-19 is a contagious infection that has severe effects on the global economy and our daily life. Accurate diagnosis of COVID-19 is of importance for consultants, patients, and radiologists. In this study, we use the deep learning network AlexNet as the backbone, and enhance it with the following two aspects: 1) adding batch normalization to help accelerate the training, reducing the internal covariance shift; 2) replacing the fully connected layer in AlexNet with three classifiers: SNN, ELM, and RVFL. Therefore, we have three novel models from the deep COVID network (DC-Net) framework, which are named DC-Net-S, DC-Net-E, and DC-Net-R, respectively. After comparison, we find the proposed DC-Net-R achieves an average accuracy of 90.91% on a private dataset (available upon email request) comprising of 296 images while the specificity reaches 96.13%, and has the best performance among all three proposed classifiers. In addition, we show that our DC-Net-R also performs much better than other existing algorithms in the literature.
Towards Exploring Large Molecular Space: An Efficient Chemical Genetic Algorithm
Zhu JF, Hao ZK, Liu Q, Yin Y, Lu CQ, Huang ZY and Chen EH
Generating molecules with desired properties is an important task in chemistry and pharmacy. An efficient method may have a positive impact on finding drugs to treat diseases like COVID-19. Data mining and artificial intelligence may be good ways to find an efficient method. Recently, both the generative models based on deep learning and the work based on genetic algorithms have made some progress in generating molecules and optimizing the molecule's properties. However, existing methods need to be improved in efficiency and performance. To solve these problems, we propose a method named the Chemical Genetic Algorithm for Large Molecular Space (CALM). Specifically, CALM employs a scalable and efficient molecular representation called molecular matrix. Then, we design corresponding crossover, mutation, and mask operators inspired by domain knowledge and previous studies. We apply our genetic algorithm to several tasks related to molecular property optimization and constraint molecular optimization. The results of these tasks show that our approach outperforms the other state-of-the-art deep learning and genetic algorithm methods, where the z tests performed on the results of several experiments show that our method is more than 99% likely to be significant. At the same time, based on the experimental results, we point out the insufficiency in the experimental evaluation standard which affects the fair evaluation of previous work.
Experiments and Analyses of Anonymization Mechanisms for Trajectory Data Publishing
Sun S, Ma S, Song JH, Yue WH, Lin XL and Ma T
With the advancing of location-detection technologies and the increasing popularity of mobile phones and other location-aware devices, trajectory data is continuously growing. While large-scale trajectories provide opportunities for various applications, the locations in trajectories pose a threat to individual privacy. Recently, there has been an interesting debate on the reidentifiability of individuals in the Science magazine. The main finding of Sánchez is exactly opposite to that of De Montjoye , which raises the first question: "what is the true situation of the privacy preservation for trajectories in terms of reidentification?" Furthermore, it is known that anonymization typically causes a decline of data utility, and anonymization mechanisms need to consider the trade-off between privacy and utility. This raises the second question: "what is the true situation of the utility of anonymized trajectories?" To answer these two questions, we conduct a systematic experimental study, using three real-life trajectory datasets, five existing anonymization mechanisms (i.e., identifier anonymization, grid-based anonymization, dummy trajectories, -anonymity and -differential privacy), and two practical applications (i.e., travel time estimation and window range queries). Our findings reveal the true situation of the privacy preservation for trajectories in terms of reidentification and the true situation of the utility of anonymized trajectories, and essentially close the debate between De Montjoye and Sánchez To the best of our knowledge, this study is among the first systematic evaluation and analysis of anonymized trajectories on the individual privacy in terms of unicity and on the utility in terms of practical applications.
JOURNAL OF COMPUTER SCIENCE AND TECHNOLOGY
Improving Friend Recommendation for Online Learning with Fine-Grained Evolving Interest
Shao MM, Jiang WJ, Wu J, Shi YQ, Yum T and Zhang J
Friend recommendation plays a key role in promoting user experience in online social networks (OSNs). However, existing studies usually neglect users' fine-grained interest as well as the evolving feature of interest, which may cause unsuitable recommendation. In particular, some OSNs, such as the online learning community, even have little work on friend recommendation. To this end, we strive to improve friend recommendation with fine-grained evolving interest in this paper. We take the online learning community as an application scenario, which is a special type of OSNs for people to learn courses online. Learning partners can help improve learners' learning effect and improve the attractiveness of platforms. We propose a learning partner recommendation framework based on the evolution of fine-grained learning interest (LPRF-E for short). We extract a sequence of learning interest tags that changes over time. Then, we explore the time feature to predict evolving learning interest. Next, we recommend learning partners by fine-grained interest similarity. We also refine the learning partner recommendation framework with users' social influence (denoted as LPRF-F for differentiation). Extensive experiments on two real datasets crawled from Chinese University MOOC and Douban Book validate that the proposed LPRF-E and LPRF-F models achieve a high accuracy (i.e., approximate 50% improvements on the precision and the recall) and can recommend learning partners with high quality (e.g., more experienced and helpful).
I/O Efficient Early Bursting Cohesive Subgraph Discovery in Massive Temporal Networks
Li Y, Dai J, Fan XL, Zhao YH and Wang GR
Temporal networks are an effective way to encode temporal information into graph data losslessly. Finding the bursting cohesive subgraph (BCS), which accumulates its cohesiveness at the fastest rate, is an important problem in temporal networks. The BCS has a large number of applications, such as representing emergency events in social media, traffic congestion in road networks and epidemic outbreak in communities. Nevertheless, existing methods demand the BCS lasting for a time interval, which neglects the timeliness of the BCS. In this paper, we design an early bursting cohesive subgraph (EBCS) model based on the k-core to enable identifying the burstiness as soon as possible. To find the EBCS, we first construct a time weight graph (TWG) to measure the bursting level by integrating the topological and temporal information. Then, we propose a global search algorithm, called GS-EBCS, which can find the exact EBCS by iteratively removing nodes from the TWG. Further, we propose a local search algorithm, named LS-EBCS, to find the EBCS by first expanding from a seed node until obtaining a candidate k-core and then refining the k-core to the result subgraph in an optimal time complexity. Subsequently, considering the situation that the massive temporal networks cannot be completely put into the memory, we first design an I/O method to build the TWG and then develop I/O efficient global search and local search algorithms, namely I/O-GS and I/O-LS respectively, to find the EBCS under the semi-external model. Extensive experiments, conducted on four real temporal networks, demonstrate the efficiency and effectiveness of our proposed algorithms. For example, on the DBLP dataset, I/O-LS and LS-EBCS have comparable running time, while the maximum memory usage of I/O-LS is only 6.5 MB, which is much smaller than that of LS-EBCS taking 308.7 MB.
HXPY: A High-Performance Data Processing Package for Financial Time-Series Data
Guo J, Peng J, Yuan H and Ni LM
A tremendous amount of data has been generated by global financial markets everyday, and such time-series data needs to be analyzed in real time to explore its potential value. In recent years, we have witnessed the successful adoption of machine learning models on financial data, where the importance of accuracy and timeliness demands highly effective computing frameworks. However, traditional financial time-series data processing frameworks have shown performance degradation and adaptation issues, such as the outlier handling with stock suspension in Pandas and TA-Lib. In this paper, we propose HXPY, a high-performance data processing package with a C++/Python interface for financial time-series data. HXPY supports miscellaneous acceleration techniques such as the streaming algorithm, the vectorization instruction set, and memory optimization, together with various functions such as time window functions, group operations, down-sampling operations, cross-section operations, row-wise or column-wise operations, shape transformations, and alignment functions. The results of benchmark and incremental analysis demonstrate the superior performance of HXPY compared with its counterparts. From MiBs to GiBs data, HXPY significantly outperforms other in-memory dataframe computing rivals even up to hundreds of times.
Ubiquitous WiFi and Acoustic Sensing: Principles, Technologies, and Applications
Huang JL, Wang YS, Zou YP, Wu KS and Ni LM
With the increasing pervasiveness of mobile devices such as smartphones, smart TVs, and wearables, smart sensing, transforming the physical world into digital information based on various sensing medias, has drawn researchers' great attention. Among different sensing medias, WiFi and acoustic signals stand out due to their ubiquity and zero hardware cost. Based on different basic principles, researchers have proposed different technologies for sensing applications with WiFi and acoustic signals covering human activity recognition, motion tracking, indoor localization, health monitoring, and the like. To enable readers to get a comprehensive understanding of ubiquitous wireless sensing, we conduct a survey of existing work to introduce their underlying principles, proposed technologies, and practical applications. Besides we also discuss some open issues of this research area. Our survey reals that as a promising research direction, WiFi and acoustic sensing technologies can bring about fancy applications, but still have limitations in hardware restriction, robustness, and applicability.