DATA & KNOWLEDGE ENGINEERING

A new approach to COVID-19 data mining: A deep spatial-temporal prediction model based on tree structure for traffic revitalization index
Lv Z, Wang X, Cheng Z, Li J, Li H and Xu Z
The outbreak of the COVID-19 epidemic has had a huge impact on a global scale and its impact has covered almost all human industries. The Chinese government enacted a series of policies to restrict the transportation industry in order to slow the spread of the COVID-19 virus in early 2020. With the gradual control of the COVID-19 epidemic and the reduction of confirmed cases, the Chinese transportation industry has gradually recovered. The traffic revitalization index is the main indicator for evaluating the degree of recovery of the urban transportation industry after being affected by the COVID-19 epidemic. The prediction research of traffic revitalization index can help the relevant government departments to know the state of urban traffic from the macro level and formulate relevant policies. Therefore, this study proposes a deep spatial-temporal prediction model based on tree structure for the traffic revitalization index. The model mainly includes spatial convolution module, temporal convolution module and matrix data fusion module. The spatial convolution module builds a tree convolution process based on the tree structure that can contain directional features and hierarchical features of urban nodes. The temporal convolution module constructs a deep network for capturing temporal dependent features of the data in the multi-layer residual structure. The matrix data fusion module can perform multi-scale fusion of COVID-19 epidemic data and traffic revitalization index data to further improve the prediction effect of the model. In this study, experimental comparisons between our model and multiple baseline models are conducted on real datasets. The experimental results show that our model has an average improvement of 21%, 18%, and 23% in MAE, RMSE and MAPE indicators, respectively.
An automated multi-web platform voting framework to predict misleading information proliferated during COVID-19 outbreak using ensemble method
Varshney D and Vishwakarma DK
The spreading of misleading information on social web platforms has fuelled massive panic and confusion among the public regarding the Corona disease, the detection of which is of paramount importance. Previous studies mainly relied on a specific web platform to collect crucial evidence to detect fake content. The analysis identifies that retrieving clues from two or more different sources/web platforms gives more reliable prediction and confidence concerning a specific claim. This study proposed a novel multi-web platform voting framework that incorporates 4 sets of novel features: content, linguistic, similarity, and sentiments. The features have been gathered from each web-platforms to validate the news. To validate the fact/claim, a unique source platform is designed to collect relevant clues/headlines from two web platforms (YouTube, Google) based on specific queries and extracted features concerning each clue/headline. The proposed idea is to incorporate a unique platform to assist researchers in gathering relevant and vital evidence from diverse web platforms. After evaluation and validation, it has been identified that the built model is quite intelligent, gives promising results, and effectively predicts misleading information. The model correctly detected about 98% of the COVID misinformation on the constraint Covid-19 fake news dataset. Furthermore, it is observed that it is efficient to gather clues from multiple web platforms for more reliable predictions to validate the news. The suggested work depicts numerous practical applications for health policy-makers and practitioners that could be useful in safeguarding and implicating awareness among society from misleading information dissemination during this pandemic.
Deep learning in the COVID-19 epidemic: A deep model for urban traffic revitalization index
Lv Z, Li J, Dong C, Li H and Xu Z
The research of traffic revitalization index can provide support for the formulation and adjustment of policies related to urban management, epidemic prevention and resumption of work and production. This paper proposes a deep model for the prediction of urban Traffic Revitalization Index (DeepTRI). The DeepTRI builds model for the data of COVID-19 epidemic and traffic revitalization index for major cities in China. The location information of 29 cities forms the topological structure of graph. The Spatial Convolution Layer proposed in this paper captures the spatial correlation features of the graph structure. The special Graph Data Fusion module distributes and fuses the two kinds of data according to different proportions to increase the trend of spatial correlation of the data. In order to reduce the complexity of the computational process, the Temporal Convolution Layer replaces the gated recursive mechanism of the traditional recurrent neural network with a multi-level residual structure. It uses the dilated convolution whose dilation factor changes according to convex function to control the dynamic change of the receptive field and uses causal convolution to fully mine the historical information of the data to optimize the ability of long-term prediction. The comparative experiments among DeepTRI and three baselines (traditional recurrent neural network, ordinary spatial-temporal model and graph spatial-temporal model) show the advantages of DeepTRI in the evaluation index and resolving two under-fitting problems (under-fitting of edge values and under-fitting of local peaks).
Leveraging output term co-occurrence frequencies and latent associations in predicting medical subject headings
Kavuluru R and Lu Y
Trained indexers at the National Library of Medicine (NLM) manually tag each biomedical abstract with the most suitable terms from the Medical Subject Headings (MeSH) terminology to be indexed by their PubMed information system. MeSH has over 26,000 terms and indexers look at each article's full text while assigning the terms. Recent automated attempts focused on using the article title and abstract text to identify MeSH terms for the corresponding article. Most of these approaches used supervised machine learning techniques that use already indexed articles and the corresponding MeSH terms. In this paper, we present a new indexing approach that leverages term co-occurrence frequencies and latent term associations computed using MeSH term sets corresponding to a set of nearly 18 million articles already indexed with MeSH terms by indexers at NLM. The main goal of our study is to gauge the potential of output label co-occurrences, latent associations, and relationships extracted from free text in both unsupervised and supervised indexing approaches. In this paper, using a novel and purely unsupervised approach, we achieve a micro-F-score that is comparable to those obtained using supervised machine learning techniques. By incorporating term co-occurrence and latent association features into a supervised learning framework, we also improve over the best results published on two public datasets.
Interaction mining and skill-dependent recommendations for multi-objective team composition
Dorn C, Skopik F, Schall D and Dustdar S
Web-based collaboration and virtual environments supported by various Web 2.0 concepts enable the application of numerous monitoring, mining and analysis tools to study human interactions and team formation processes. The composition of an effective team requires a balance between adequate skill fulfillment and sufficient team connectivity. The underlying interaction structure reflects social behavior and relations of individuals and determines to a large degree how well people can be expected to collaborate. In this paper we address an extended team formation problem that does not only require direct interactions to determine team connectivity but additionally uses implicit recommendations of collaboration partners to support even sparsely connected networks. We provide two heuristics based on Genetic Algorithms and Simulated Annealing for discovering efficient team configurations that yield the best trade-off between skill coverage and team connectivity. Our self-adjusting mechanism aims to discover the best combination of direct interactions and recommendations when deriving connectivity. We evaluate our approach based on multiple configurations of a simulated collaboration network that features close resemblance to real world expert networks. We demonstrate that our algorithm successfully identifies efficient team configurations even when removing up to 40% of experts from various social network configurations.
Extracting Hot spots of Topics from Time Stamped Documents
Chen W and Chundi P
Identifying time periods with a burst of activities related to a topic has been an important problem in analyzing time-stamped documents. In this paper, we propose an approach to extract a hot spot of a given topic in a time-stamped document set. Topics can be basic, containing a simple list of keywords, or complex. Logical relationships such as and, or, and not are used to build complex topics from basic topics. A concept of presence measure of a topic based on fuzzy set theory is introduced to compute the amount of information related to the topic in the document set. Each interval in the time period of the document set is associated with a numeric value which we call the discrepancy score. A high discrepancy score indicates that the documents in the time interval are more focused on the topic than those outside of the time interval. A hot spot of a given topic is defined as a time interval with the highest discrepancy score. We first describe a naive implementation for extracting hot spots. We then construct an algorithm called EHE (Efficient Hot Spot Extraction) using several efficient strategies to improve performance. We also introduce the notion of a topic DAG to facilitate an efficient computation of presence measures of complex topics. The proposed approach is illustrated by several experiments on a subset of the TDT-Pilot Corpus and DBLP conference data set. The experiments show that the proposed EHE algorithm significantly outperforms the naive one, and the extracted hot spots of given topics are meaningful.