Future Generation Computer Systems-The International Journal of eScience

MAG-D: A multivariate attention network based approach for cloud workload forecasting
Patel YS and Bedi J
The Coronavirus pandemic and the work-from-home have drastically changed the working style and forced us to rapidly shift towards cloud-based platforms & services for seamless functioning. The pandemic has accelerated a permanent shift in cloud migration. It is estimated that over 95% of digital workloads will reside in cloud-native platforms. Real-time workload forecasting and efficient resource management are two critical challenges for cloud service providers. As cloud workloads are highly volatile and chaotic due to their time-varying nature; thus classical machine learning-based prediction models failed to acquire accurate forecasting. Recent advances in deep learning have gained massive popularity in forecasting highly nonlinear cloud workloads; however, they failed to achieve excellent forecasting outcomes. Consequently, demands for designing more accurate forecasting algorithms exist. Therefore, in this work, we propose 'MAG-D', a ultivariate ttention and ated recurrent unit based eep learning approach for Cloud workload forecasting in data centers. We performed an extensive set of experiments on the Google cluster traces, and we confirm that MAG-DL exploits the long-range nonlinear dependencies of cloud workload and improves the prediction accuracy on average compared to the recent techniques applying hybrid methods using Long Short Term Memory Network (LSTM), Convolutional Neural Network (CNN), Gated Recurrent Units (GRU), and Bidirectional Long Short Term Memory Network (BiLSTM).
Exploratory Machine Learning Modeling of Adaptive and Maladaptive Personality Traits from Passively Sensed Behavior
Yan R, Ringwald WR, Hernandez JV, Kehl M, Bae SW, Dey AK, Low C, Wright AGC and Doryab A
Continuous passive sensing of daily behavior from mobile devices has the potential to identify behavioral patterns associated with different aspects of human characteristics. This paper presents novel analytic approaches to extract and understand these behavioral patterns and their impact on predicting adaptive and maladaptive personality traits. Our machine learning analysis extends previous research by showing that both adaptive and maladaptive traits are associated with passively sensed behavior providing initial evidence for the utility of this type of data to study personality and its pathology. The analysis also suggests directions for future confirmatory studies into the underlying behavior patterns that link adaptive and maladaptive variants consistent with contemporary models of personality pathology.
Scenario prediction of public health emergencies using infectious disease dynamics model and dynamic Bayes
Gao S and Wang H
This study was aimed to discuss the predictive value of infectious disease dynamics model (IDD model) and dynamic Bayesian network (DBN) for scenario deduction of public health emergencies (PHEs). Based on the evolution law of PHEs and the meta-scenario representation of basic knowledge, this study established a DBN scenario deduction model for scenario deduction and evolution path analysis of PHEs. At the same time, based on the average field dynamics model of the SIR network, the dimensionality reduction process was performed to calculate the epidemic scale and epidemic time based on the IDD model, so as to determine the calculation methods of threshold value and epidemic time under emergency measures (quarantine). The Corona Virus Disease (COVID) epidemic was undertaken as an example to analyze the results of DBN scenario deduction, and the infectious disease dynamics model was used to analyze the number of reproductive numbers, peak arrival time, epidemic time, and latency time of the COVID epidemic. It was found that after the M1 measure was used to process the S1 state, the state probability and the probability of being true (T) were the highest, which were 91.05 and 90.21, respectively. In the sixth stage of the development of the epidemic, the epidemic had developed to level 5, the number of infected people was about 26, and the estimated loss was about 220 million yuan. The comprehensive cumulative foreground (CF) values of O1   O3 schemes were -1.34, -1.21, and -0.77, respectively, and the final CF values were -1.35, 0.01, and -0.08, respectively. The final CF value of O2 was significantly higher than the other two options. The household infection probability was the highest, which was 0.37 and 0.35 in Wuhan and China, respectively. Under the measures of home quarantine, the numbers of confirmed cases of COVID in China and Wuhan were 1.503 (95% confidential interval (CI) = 1.328   1.518) and 1.729 (95% CI = 1.107   1.264), respectively, showing good fits with the real data. On the 21st day after the quarantine measures were taken, the number of COVID across the country had an obvious peak, with the confirmed cases of 24495, and the model prediction value was 24085 (95% CI = 23988   25056). The incubation period 1/q was shortened from 8 days to 3 days, and the number of confirmed cases showed an upward trend. The peak period of confirmed cases was advanced, shortening the overall epidemic time. It showed that the prediction results of scenario deduction based on DBN were basically consistent with the actual development scenario and development status of the epidemic. It could provide corresponding decisions for the prevention and control of COVID based on the relevant parameters of the infectious disease dynamic model, which verified the rationality and feasibility of the scenario deduction method proposed in this study.
Assessing vulnerability to psychological distress during the COVID-19 pandemic through the analysis of microblogging content
Viviani M, Crocamo C, Mazzola M, Bartoli F, Carrà G and Pasi G
In recent years we have witnessed a growing interest in the analysis of social media data under different perspectives, since these online platforms have become the preferred tool for generating and sharing content across different users organized into virtual communities, based on their common interests, needs, and perceptions. In the current study, by considering a collection of social textual contents related to COVID-19 gathered on the Twitter microblogging platform in the period between August and December 2020, we aimed at evaluating the possible effects of some critical factors related to the pandemic on the mental well-being of the population. In particular, we aimed at investigating potential lexicon identifiers of vulnerability to psychological distress in digital social interactions with respect to distinct COVID-related scenarios, which could be "at risk" from a psychological discomfort point of view. Such scenarios have been associated with peculiar topics discussed on Twitter. For this purpose, two approaches based on a "top-down" and a "bottom-up" strategy were adopted. In the top-down approach, three potential scenarios were initially selected by medical experts, and associated with topics extracted from the Twitter dataset in a hybrid unsupervised-supervised way. On the other hand, in the bottom-up approach, three topics were extracted in a totally unsupervised way capitalizing on a Twitter dataset filtered according to the presence of keywords related to vulnerability to psychological distress, and associated with at-risk scenarios. The identification of such scenarios with both approaches made it possible to capture and analyze the potential psychological vulnerability in critical situations.
Artificial intelligence-enabled Internet of Things-based system for COVID-19 screening using aerial thermal imaging
Barnawi A, Chhikara P, Tekchandani R, Kumar N and Alzahrani B
Internet of Things (IoT) has recently brought an influential research and analysis platform in a broad diversity of academic and industrial disciplines, particularly in healthcare. The IoT revolution is reshaping current healthcare practices by consolidating technological, economic, and social views. Since December 2019, the spreading of COVID-19 across the world has impacted the world's economy. IoT technology integrated with Artificial Intelligence (AI) can help to address COVID-19. UAVs equipped with IoT devices can collect raw data that demands computing and analysis to make intelligent decision without human intervention. To mitigate the effect of COVID-19, in this paper, we propose an IoT-UAV-based scheme to collect raw data using onboard thermal sensors. The thermal image captured from the thermal camera is used to determine the potential people in the image (of the massive crowd in a city), which may have COVID-19, based on the temperature recorded. An efficient hybrid approach for a face recognition system is proposed to detect the people in the image having high body temperature from infrared images captured in a real-time scenario. Also, a face mask detection scheme is introduced, which detects whether a person has a mask on the face or not. The schemes' performance evaluation is done using various machine learning and deep learning classifiers. We use the edge computing infrastructure (onboard sensors and actuators) for data processing to reduce the response time for real-time analytics and prediction. The proposed scheme has an average accuracy of  99.5% using various performance evaluation metrics indicating its practical applicability in real-time scenarios.
An AI-enabled lightweight data fusion and load optimization approach for Internet of Things
Jan MA, Zakarya M, Khan M, Mastorakis S, Menon VG, Balasubramaniam V and Ur Rehman A
In the densely populated Internet of Things (IoT) applications, sensing range of the nodes might overlap frequently. In these applications, the nodes gather highly correlated and redundant data in their vicinity. Processing these data depletes the energy of nodes and their upstream transmission towards remote datacentres, in the fog infrastructure, may result in an unbalanced load at the network gateways and edge servers. Due to heterogeneity of edge servers, few of them might be overwhelmed while others may remain less-utilized. As a result, time-critical and delay-sensitive applications may experience excessive delays, packet loss, and degradation in their Quality of Service (QoS). To ensure QoS of IoT applications, in this paper, we eliminate correlation in the gathered data via a lightweight data fusion approach. The buffer of each node is partitioned into strata that broadcast only non-correlated data to edge servers via the network gateways. Furthermore, we propose a dynamic service migration technique to reconfigure the load across various edge servers. We assume this as an optimization problem and use two meta-heuristic algorithms, along with a migration approach, to maintain an optimal Gateway-Edge configuration in the network. These algorithms monitor the load at each server, and once it surpasses a threshold value (which is dynamically computed with a simple machine learning method), an exhaustive search is performed for an optimal and balanced periodic reconfiguration. The experimental results of our approach justify its efficiency for large-scale and densely populated IoT applications.
Susceptible user search for defending opinion manipulation
Tang W, Tian L, Zheng X, Luo G and He Z
The development of cyberspace offers unprecedentedly convenient access to online communication, thus inducing malicious individuals to subtly manipulate user opinions for benefits. Such malicious manipulations usually target those influential and susceptible users to mislead and control public opinion, posing a bunch of threats to public security. Therefore, an intelligent and efficient searching strategy for targeted users is one prominent and critical approach to defend malicious manipulations. However, the major body of current studies either provide solutions under ideal scenarios or offer inefficient solutions without guaranteed performance. As a result, this work adopts the combination of unsupervised learning and heuristic search to discover susceptible and key users for defense. We first propose a greedy algorithm fully considering the susceptibilities of different users, then adopt unsupervised learning and utilize the community property to design an accelerated algorithm. Moreover, the approximation guarantees of both greedy and community-based algorithms are systematically analyzed for some practical circumstances. Extensive experiments on real-world datasets demonstrate that our algorithms significantly outperform the state-of-the-art algorithm.
Ontology based recommender system using social network data
Arafeh M, Ceravolo P, Mourad A, Damiani E and Bellini E
Online Social Network (OSN) is considered a key source of information for real-time decision making. However, several constraints lead to decreasing the amount of information that a researcher can have while increasing the time of social network mining procedures. In this context, this paper proposes a new framework for sampling Online Social Network (OSN). Domain knowledge is used to define tailored strategies that can decrease the budget and time required for mining while increasing the recall. An ontology supports our filtering layer in evaluating the relatedness of nodes. Our approach demonstrates that the same mechanism can be advanced to prompt recommendations to users. Our test cases and experimental results emphasize the importance of the strategy definition step in our social miner and the application of ontologies on the knowledge graph in the domain of recommendation analysis.
Estimation of laryngeal closure duration during swallowing without invasive X-rays
Mao S, Sabry A, Khalifa Y, Coyle JL and Sejdic E
Laryngeal vestibule (LV) closure is a critical physiologic event during swallowing, since it is the first line of defense against food bolus entering the airway. Identifying the laryngeal vestibule status, including closure, reopening and closure duration, provides indispensable references for assessing the risk of dysphagia and neuromuscular function. However, commonly used radiographic examinations, known as videofluoroscopy swallowing studies, are highly constrained by their radiation exposure and cost. Here, we introduce a non-invasive sensor-based system, that acquires high-resolution cervical auscultation signals from neck and accommodates advanced deep learning techniques for the detection of LV behaviors. The deep learning algorithm, which combined convolutional and recurrent neural networks, was developed with a dataset of 588 swallows from 120 patients with suspected dysphagia and further clinically tested on 45 samples from 16 healthy participants. For classifying the LV closure and opening statuses, our method achieved 78.94% and 74.89% accuracies for these two datasets, suggesting the feasibility of implementing sensor signals for LV prediction without traditional videofluoroscopy screening methods. The sensor supported system offers a broadly applicable computational approach for clinical diagnosis and biofeedback purposes in patients with swallowing disorders without the use of radiographic examination.
A drone-based networked system and methods for combating coronavirus disease (COVID-19) pandemic
Kumar A, Sharma K, Singh H, Naugriya SG, Gill SS and Buyya R
Coronavirus disease (COVID-19) is an infectious disease caused by a newly discovered coronavirus. It is similar to influenza viruses and raises concerns through alarming levels of spread and severity resulting in an ongoing pandemic worldwide. Within eight months (by August 2020), it infected 24.0 million persons worldwide and over 824 thousand have died. Drones or Unmanned Aerial Vehicles (UAVs) are very helpful in handling the COVID-19 pandemic. This work investigates the drone-based systems, COVID-19 pandemic situations, and proposes an architecture for handling pandemic situations in different scenarios using real-time and simulation-based scenarios. The proposed architecture uses wearable sensors to record the observations in Body Area Networks (BANs) in a push-pull data fetching mechanism. The proposed architecture is found to be useful in remote and highly congested pandemic areas where either the wireless or Internet connectivity is a major issue or chances of COVID-19 spreading are high. It collects and stores the substantial amount of data in a stipulated period and helps to take appropriate action as and when required. In real-time drone-based healthcare system implementation for COVID-19 operations, it is observed that a large area can be covered for sanitization, thermal image collection, and patient identification within a short period (2 KMs within 10 min approx.) through aerial route. In the simulation, the same statistics are observed with an addition of collision-resistant strategies working successfully for indoor and outdoor healthcare operations. Further, open challenges are identified and promising research directions are highlighted.
Ontology-driven aspect-based sentiment analysis classification: An infodemiological case study regarding infectious diseases in Latin America
García-Díaz JA, Cánovas-García M and Valencia-García R
Infodemiology is the process of mining unstructured and textual data so as to provide public health officials and policymakers with valuable information regarding public health. The appearance of this new data source, which was previously unimaginable, has opened up a new way in which to improve public health systems, resulting in better communication policies and better detection systems. However, the unstructured nature of the Internet, along with the complexity of the infectious disease domain, prevents the information extracted from being easily understood. Moreover, when dealing with languages other than English, for which some of the most common Natural Language Processing resources are not available, the correct exploitation of this data becomes even more difficult. We intend to fill these gaps proposing an ontology-driven aspect-based sentiment analysis with which to measure the general public's opinions as regards infectious diseases when expressed in Spanish by employing a case study of tweets concerning the Zika, Dengue and Chikungunya viruses in Latin America. Our proposal is based on two technologies. We first use ontologies in order to model the infectious disease domain with concepts such as risks, symptoms, transmission methods or drugs, among other concepts. We then measure the relationship between these concepts in order to determine the degree to which one concept influences other concepts. This new information is subsequently applied in order to build an aspect-based sentiment analysis model based on statistical and linguistic features. This is done by applying deep-learning models. Our proposal is available on a web platform, where users can see the sentiment for each concept at a glance and analyse how each concept influences the sentiment of the others.
A network-based method with privacy-preserving for identifying influential providers in large healthcare service systems
Qi X, Mei G, Cuomo S and Xiao L
In data science, networks provide a useful abstraction of the structure of many complex systems, ranging from social systems and computer networks to biological networks and physical systems. Healthcare service systems are one of the main social systems that can also be understood using network-based approaches, for example, to identify and evaluate influential providers. In this paper, we propose a network-based method with privacy-preserving for identifying influential providers in large healthcare service systems. First, the provider-interacting network is constructed by employing publicly available information on locations and types of healthcare services of providers. Second, the ranking of nodes in the generated provider-interacting network is conducted in parallel on the basis of four nodal influence metrics. Third, the impact of the top-ranked influential nodes in the provider-interacting network is evaluated using three indicators. Compared with other research work based on patient-sharing networks, in this paper, the provider-interacting network of healthcare service providers can be roughly created according to the locations and the publicly available types of healthcare services, without the need for personally private electronic medical claims, thus protecting the privacy of patients. The proposed method is demonstrated by employing Physician and Other Supplier Data CY 2017, and can be applied to other similar datasets to help make decisions for the optimization of healthcare resources in the response to public health emergencies.
Simultaneous left atrium anatomy and scar segmentations via deep learning in multiview information with attention
Yang G, Chen J, Gao Z, Li S, Ni H, Angelini E, Wong T, Mohiaddin R, Nyktari E, Wage R, Xu L, Zhang Y, Du X, Zhang H, Firmin D and Keegan J
Three-dimensional late gadolinium enhanced (LGE) cardiac MR (CMR) of left atrial scar in patients with atrial fibrillation (AF) has recently emerged as a promising technique to stratify patients, to guide ablation therapy and to predict treatment success. This requires a segmentation of the high intensity scar tissue and also a segmentation of the left atrium (LA) anatomy, the latter usually being derived from a separate bright-blood acquisition. Performing both segmentations automatically from a single 3D LGE CMR acquisition would eliminate the need for an additional acquisition and avoid subsequent registration issues. In this paper, we propose a joint segmentation method based on multiview two-task (MVTT) recursive attention model working directly on 3D LGE CMR images to segment the LA (and proximal pulmonary veins) and to delineate the scar on the same dataset. Using our MVTT recursive attention model, both the LA anatomy and scar can be segmented accurately (mean Dice score of 93% for the LA anatomy and 87% for the scar segmentations) and efficiently ( 0.27 s to simultaneously segment the LA anatomy and scars directly from the 3D LGE CMR dataset with 60-68 2D slices). Compared to conventional unsupervised learning and other state-of-the-art deep learning based methods, the proposed MVTT model achieved excellent results, leading to an automatic generation of a patient-specific anatomical model combined with scar segmentation for patients in AF.
I-TASSER gateway: A protein structure and function prediction server powered by XSEDE
Zheng W, Zhang C, Bell EW and Zhang Y
There is an increasing gap between the number of known protein sequences and the number of proteins with experimentally characterized structure and function. To alleviate this issue, we have developed the I-TASSER gateway, an online server for automated and reliable protein structure and function prediction. For a given sequence, I-TASSER starts with template recognition from a known structure library, followed by full-length atomic model construction by iterative assembly simulations of the continuous structural fragments excised from the template alignments. Functional insights are then derived from comparative matching of the predicted model with a library of proteins with known function. The I-TASSER pipeline has been recently integrated with the XSEDE Gateway system to accommodate pressing demand from the user community and increasing computing costs. This report summarizes the configuration of the I-TASSER Gateway with the XSEDE-Comet supercomputer cluster, together with an overview of the I-TASSER method and milestones of its development.
Language model-based automatic prefix abbreviation expansion method for biomedical big data analysis
Du X, Zhu R, Li Y and Anjum A
In biomedical domain, abbreviations are appearing more and more frequently in various data sets, which has caused significant obstacles to biomedical big data analysis. The dictionary-based approach has been adopted to process abbreviations, but it cannot handle ad hoc abbreviations, and it is impossible to cover all abbreviations. To overcome these drawbacks, this paper proposes an automatic abbreviation expansion method called LMAAE (Language Model-based Automatic Abbreviation Expansion). In this method, the abbreviation is firstly divided into blocks; then, expansion candidates are generated by restoring each block; and finally, the expansion candidates are filtered and clustered to acquire the final expansion result according to the language model and clustering method. Through restrict the abbreviation to prefix abbreviation, the search space of expansion is reduced sharply. And then, the search space is continuous reduced by restrained the effective and the length of the partition. In order to validate the effective of the method, two types of experiments are designed. For standard abbreviations, the expansion results include most of the expansion in dictionary. Therefore, it has a high precision. For ad hoc abbreviations, the precisions of schema matching, knowledge fusion are increased by using this method to handle the abbreviations. Although the recall for standard abbreviation needs to be improved, but this does not affect the good complement effect for the dictionary method.
CloudLaunch: Discover and Deploy Cloud Applications
Afgan E, Lonie A, Taylor J and Goonasekera N
Cloud computing is a common platform for delivering software to end users. However, the process of making complex-to-deploy applications available across different cloud providers requires isolated and uncoordinated application-specific solutions, often locking-in developers to a particular cloud provider. Here, we present the CloudLaunch application as a uniform platform for discovering and deploying applications for different cloud providers. CloudLaunch allows arbitrary applications to be added to a catalog with each application having its own customizable user interface and control over the launch process, while preserving cloud-agnosticism so that authors can easily make their applications available on multiple clouds with minimal effort. It then provides a uniform interface for launching available applications by end users across different cloud providers. Architecture details are presented along with examples of different deployable applications that highlight architectural features.
Globus Nexus: A Platform-as-a-Service Provider of Research Identity, Profile, and Group Management
Chard K, Lidman M, McCollam B, Bryan J, Ananthakrishnan R, Tuecke S and Foster I
Globus Nexus is a professionally hosted Platform-as-a-Service that provides identity, profile and group management functionality for the research community. Many collaborative e-Science applications need to manage large numbers of user identities, profiles, and groups. However, developing and maintaining such capabilities is often challenging given the complexity of modern security protocols and requirements for scalable, robust, and highly available implementations. By outsourcing this functionality to Globus Nexus, developers can leverage best-practice implementations without incurring development and operations overhead. Users benefit from enhanced capabilities such as identity federation, flexible profile management, and user-oriented group management. In this paper we present Globus Nexus, describe its capabilities and architecture, summarize how several e-Science applications leverage these capabilities, and present results that characterize its scalability, reliability, and availability.
SOAs for Scientific Applications: Experiences and Challenges
Krishnan S and Bhatia K
Over the past several years, with the advent of the Open Grid Services Architecture (OGSA) (19) and the Web Services Resource Framework (WSRF) (25), Service-oriented Architectures (SOA) and Web service technologies have been embraced in the field of scientific and Grid computing. These new principles promise to help make scientific infrastructures simpler to use, more cost effective to implement, and easier to maintain. However, understanding how to leverage these developments to actually design and build a system remains more of an art than a science. In this paper, we present some positions learned through experience that provide guidance in leveraging SOA technologies to build scientific infrastructures. In addition, we present the technical challenges that need to be addressed in building an SOA, and as a case study, we present the SOA that we have designed for the National Biomedical Computation Resource (NBCR) (9) community. We discuss how we have addressed these technical challenges, and present the overall architecture, the individual software toolkits developed, the client interfaces, and the usage scenarios. We hope that our experiences prove to be useful in building similar infrastructures for other scientific applications.