Information Fusion

NAPS Fusion: A framework to overcome experimental data limitations to predict human performance and cognitive task outcomes
Napoli NJ, Stephens CL, Kennedy KD, Barnes LE, Juarez Garcia E and Harrivel AR
In the area of human performance and cognitive research, machine learning (ML) problems become increasingly complex due to limitations in the experimental design, resulting in the development of poor predictive models. More specifically, experimental study designs produce very few data instances, have large class imbalances and conflicting ground truth labels, and generate wide data sets due to the diverse amount of sensors. From an ML perspective these problems are further exacerbated in anomaly detection cases where class imbalances occur and there are almost always more features than samples. Typically, dimensionality reduction methods (e.g., PCA, autoencoders) are utilized to handle these issues from wide data sets. However, these dimensionality reduction methods do not always map to a lower dimensional space appropriately, and they capture noise or irrelevant information. In addition, when new sensor modalities are incorporated, the entire ML paradigm has to be remodeled because of new dependencies introduced by the new information. Remodeling these ML paradigms is time-consuming and costly due to lack of modularity in the paradigm design, which is not ideal. Furthermore, human performance research experiments, at times, creates ambiguous class labels because the ground truth data cannot be agreed upon by subject-matter experts annotations, making ML paradigm nearly impossible to model. This work pulls insights from Dempster-Shafer theory (DST), stacking of ML models, and bagging to address uncertainty and ignorance for multi-classification ML problems caused by ambiguous ground truth, low samples, subject-to-subject variability, class imbalances, and wide data sets. Based on these insights, we propose a probabilistic model fusion approach, Naive Adaptive Probabilistic Sensor (NAPS), which combines ML paradigms built around bagging algorithms to overcome these experimental data concerns while maintaining a modular design for future sensor (new feature integration) and conflicting ground truth data. We demonstrate significant overall performance improvements using NAPS (an accuracy of 95.29%) in detecting human task errors (a four class problem) caused by impaired cognitive states and a negligible drop in performance with the case of ambiguous ground truth labels (an accuracy of 93.93%), when compared to other methodologies (an accuracy of 64.91%). This work potentially sets the foundation for other human-centric modeling systems that rely on human state prediction modeling.
UncertaintyFuseNet: Robust uncertainty-aware hierarchical feature fusion model with Ensemble Monte Carlo Dropout for COVID-19 detection
Abdar M, Salari S, Qahremani S, Lam HK, Karray F, Hussain S, Khosravi A, Acharya UR, Makarenkov V and Nahavandi S
The COVID-19 (Coronavirus disease 2019) pandemic has become a major global threat to human health and well-being. Thus, the development of computer-aided detection (CAD) systems that are capable of accurately distinguishing COVID-19 from other diseases using chest computed tomography (CT) and X-ray data is of immediate priority. Such automatic systems are usually based on traditional machine learning or deep learning methods. Differently from most of the existing studies, which used either CT scan or X-ray images in COVID-19-case classification, we present a new, simple but efficient deep learning feature fusion model, called , which is able to classify accurately large datasets of both of these types of images. We argue that the uncertainty of the model's predictions should be taken into account in the learning process, even though most of the existing studies have overlooked it. We quantify the prediction uncertainty in our feature fusion model using effective Ensemble Monte Carlo Dropout (EMCD) technique. A comprehensive simulation study has been conducted to compare the results of our new model to the existing approaches, evaluating the performance of competing models in terms of Precision, Recall, F-Measure, Accuracy and ROC curves. The obtained results prove the efficiency of our model which provided the prediction accuracy of 99.08% and 96.35% for the considered CT scan and X-ray datasets, respectively. Moreover, our model was generally robust to noise and performed well with previously unseen data. The source code of our implementation is freely available at: https://github.com/moloud1987/UncertaintyFuseNet-for-COVID-19-Classification.
Data harmonisation for information fusion in digital healthcare: A state-of-the-art systematic review, meta-analysis and future research directions
Nan Y, Ser JD, Walsh S, Schönlieb C, Roberts M, Selby I, Howard K, Owen J, Neville J, Guiot J, Ernst B, Pastor A, Alberich-Bayarri A, Menzel MI, Walsh S, Vos W, Flerin N, Charbonnier JP, van Rikxoort E, Chatterjee A, Woodruff H, Lambin P, Cerdá-Alberich L, Martí-Bonmatí L, Herrera F and Yang G
Removing the bias and variance of multicentre data has always been a challenge in large scale digital healthcare studies, which requires the ability to integrate clinical features extracted from data acquired by different scanners and protocols to improve stability and robustness. Previous studies have described various computational approaches to fuse single modality multicentre datasets. However, these surveys rarely focused on evaluation metrics and lacked a checklist for computational data harmonisation studies. In this systematic review, we summarise the computational data harmonisation approaches for multi-modality data in the digital healthcare field, including harmonisation strategies and evaluation metrics based on different theories. In addition, a comprehensive checklist that summarises common practices for data harmonisation studies is proposed to guide researchers to report their research findings more effectively. Last but not least, flowcharts presenting possible ways for methodology and metric selection are proposed and the limitations of different methods have been surveyed for future research.
Unbox the black-box for the medical explainable AI via multi-modal and multi-centre data fusion: A mini-review, two showcases and beyond
Yang G, Ye Q and Xia J
Explainable Artificial Intelligence (XAI) is an emerging research topic of machine learning aimed at how AI systems' choices are made. This research field inspects the measures and models involved in decision-making and seeks solutions to explain them explicitly. Many of the machine learning algorithms cannot manifest how and why a decision has been cast. This is particularly true of the most popular deep neural network approaches currently in use. Consequently, our confidence in AI systems can be hindered by the lack of explainability in these models. The XAI becomes more and more crucial for deep learning powered applications, especially for medical and healthcare studies, although in general these deep neural networks can return an arresting dividend in performance. The insufficient explainability and transparency in most existing AI systems can be one of the major reasons that successful implementation and integration of AI tools into routine clinical practice are uncommon. In this study, we first surveyed the current progress of XAI and in particular its advances in healthcare applications. We then introduced our solutions for XAI leveraging multi-modal and multi-centre data fusion, and subsequently validated in two showcases following real clinical scenarios. Comprehensive quantitative and qualitative analyses can prove the efficacy of our proposed XAI solutions, from which we can envisage successful applications in a broader range of clinical questions.
A critic evaluation of methods for COVID-19 automatic detection from X-ray images
Maguolo G and Nanni L
In this paper, we compare and evaluate different testing protocols used for automatic COVID-19 diagnosis from X-Ray images in the recent literature. We show that similar results can be obtained using X-Ray images that do not contain most of the lungs. We are able to remove the lungs from the images by turning to black the center of the X-Ray scan and training our classifiers only on the outer part of the images. Hence, we deduce that several testing protocols for the recognition are not fair and that the neural networks are learning patterns in the dataset that are not correlated to the presence of COVID-19. Finally, we show that creating a fair testing protocol is a challenging task, and we provide a method to measure how fair a specific testing protocol is. In the future research we suggest to check the fairness of a testing protocol using our tools and we encourage researchers to look for better techniques than the ones that we propose.
Editorial: Advances in multi-source information fusion for epidemic diseases
Zhang Y, Al-Fuqaha A, Humar I and Pace P
Diagnosis of multiple sclerosis using multifocal ERG data feature fusion
López-Dorado A, Pérez J, Rodrigo MJ, Miguel-Jiménez JM, Ortiz M, de Santiago L, López-Guillén E, Blanco R, Cavalliere C, Morla EMS, Boquete L and Garcia-Martin E
The purpose of this paper is to implement a computer-aided diagnosis (CAD) system for multiple sclerosis (MS) based on analysing the outer retina as assessed by multifocal electroretinograms (mfERGs). MfERG recordings taken with the RETI-port/scan 21 (Roland Consult) device from 15 eyes of patients diagnosed with incipient relapsing-remitting MS and without prior optic neuritis, and from 6 eyes of control subjects, are selected. The mfERG recordings are grouped (whole macular visual field, five rings, and four quadrants). For each group, the correlation with a normative database of adaptively filtered signals, based on empirical model decomposition (EMD) and three features from the continuous wavelet transform (CWT) domain, are obtained. Of the initial 40 features, the 4 most relevant are selected in two stages: a) using a filter method and b) using a wrapper-feature selection method. The Support Vector Machine (SVM) is used as a classifier. With the optimal CAD configuration, a Matthews correlation coefficient value of 0.89 (accuracy = 0.95, specificity = 1.0 and sensitivity = 0.93) is obtained. This study identified an outer retina dysfunction in patients with recent MS by analysing the outer retina responses in the mfERG and employing an SVM as a classifier. In conclusion, a promising new electrophysiological-biomarker method based on feature fusion for MS diagnosis was identified.
Pay attention to doctor-patient dialogues: Multi-modal knowledge graph attention image-text embedding for COVID-19 diagnosis
Zheng W, Yan L, Gou C, Zhang ZC, Jason Zhang J, Hu M and Wang FY
The sudden increase in coronavirus disease 2019 (COVID-19) cases puts high pressure on healthcare services worldwide. At this stage, fast, accurate, and early clinical assessment of the disease severity is vital. In general, there are two issues to overcome: (1) Current deep learning-based works suffer from multimodal data adequacy issues; (2) In this scenario, multimodal (e.g., text, image) information should be taken into account together to make accurate inferences. To address these challenges, we propose a multi-modal knowledge graph attention embedding for COVID-19 diagnosis. Our method not only learns the relational embedding from nodes in a constituted knowledge graph but also has access to medical knowledge, aiming at improving the performance of the classifier through the mechanism of medical knowledge attention. The experimental results show that our approach significantly improves classification performance compared to other state-of-the-art techniques and possesses robustness for each modality from multi-modal data. Moreover, we construct a new COVID-19 multi-modal dataset based on text mining, consisting of 1393 doctor-patient dialogues and their 3706 images (347 X-ray 2598 CT 761 ultrasound) about COVID-19 patients and 607 non-COVID-19 patient dialogues and their 10754 images (9658 X-ray 494 CT 761 ultrasound), and the fine-grained labels of all. We hope this work can provide insights to the researchers working in this area to shift the attention from only medical images to the doctor-patient dialogue and its corresponding medical images.
SPICE-IT: Smart COVID-19 pandemic controlled eradication over NDN-IoT
Khan MTR, Saad MM, Tariq MA, Akram J and Kim D
Internet of things (IoT) application in e-health can play a vital role in countering rapidly spreading diseases that can effectively manage health emergency scenarios like pandemics. Efficient disease control also requires monitoring of Standard operating procedure (SOP) follow-up of the population in the disease-prone area with a cost-effective reporting and responding mechanism to register any violation. However, the IoT devices have limited resources and the application requires delay-sensitive data transmission. Named Data Networking (NDN) can significantly reduce content retrieval delays but inherits cache overflow and network congestion challenges. Therefore, we are motivated to present a novel smart COVID-19 pandemic-controlled eradication over NDN-IoT (SPICE-IT) mechanism. SPICE-IT introduces autonomous monitoring in indoor environments with efficient pull-based reporting mechanism that records violations at local servers and cloud server. Intelligent face mask detection and temperature monitoring mechanism examines every person. Cloud server controls the response action from the centre with an adaptive decision-making mechanism. Long short-term memory (LSTM) based caching mechanism reduces the cache overflow and overall network congestion problem.
An analysis model of diagnosis and treatment for COVID-19 pandemic based on medical information fusion
Hu F, Huang M, Sun J, Zhang X and Liu J
Exploring the complicated relationships underlying the clinical information is essential for the diagnosis and treatment of the Coronavirus Disease 2019 (COVID-19). Currently, few approaches are mature enough to show operational impact. Based on electronic medical records (EMRs) of 570 COVID-19 inpatients, we proposed an analysis model of diagnosis and treatment for COVID-19 based on the machine learning algorithms and complex networks. Introducing the medical information fusion, we constructed the heterogeneous information network to discover the complex relationships among the syndromes, symptoms, and medicines. We generated the numerical symptom (medicine) embeddings and divided them into seven communities (syndromes) using the combination of Skip-Gram model and Spectral Clustering (SC) algorithm. After analyzing the symptoms and medicine networks, we identified the key factors using six evaluation metrics of node centrality. The experimental results indicate that the proposed analysis model is capable of discovering the critical symptoms and symptom distribution for diagnosis; the key medicines and medicine combinations for treatment. Based on the latest COVID-19 clinical guidelines, this model could result in the higher accuracy results than the other representative clustering algorithms. Furthermore, the proposed model is able to provide tremendously valuable guidance and help the physicians to combat the COVID-19.
COVID-19 and Non-COVID-19 Classification using Multi-layers Fusion From Lung Ultrasound Images
Muhammad G and Shamim Hossain M
COVID-19 or related viral pandemics should be detected and managed without hesitation, since the virus spreads very rapidly. Often with insufficient human and electronic resources, patients need to be checked from stable patients using vital signs, radiographic photographs, or ultrasound images. Vital signs do not often offer the right outcome, and radiographic photos have a variety of other problems. Lung ultrasound (LUS) images can provide good screening without a lot of complications. This paper suggests a model of a convolutionary neural network (CNN) that has fewer learning parameters but can achieve strong accuracy. The model has five main blocks or layers of convolution connectors. A multi-layer fusion functionality of each block is proposed to improve the efficiency of the COVID-19 screening method utilizing the proposed model. Experiments are conducted using freely accessible LUS photographs and video datasets. The proposed fusion method has 92.5% precision, 91.8% accuracy, and 93.2% retrieval using the data collection. These efficiency metric levels are considerably higher than those used in any of the state-of-the-art CNN versions.
COVID-19 classification by CCSHNet with deep fusion using transfer learning and discriminant correlation analysis
Wang SH, Nayak DR, Guttery DS, Zhang X and Zhang YD
: COVID-19 is a disease caused by a new strain of coronavirus. Up to 18th October 2020, worldwide there have been 39.6 million confirmed cases resulting in more than 1.1 million deaths. To improve diagnosis, we aimed to design and develop a novel advanced AI system for COVID-19 classification based on chest CT (CCT) images.
Covid-19 classification by FGCNet with deep feature fusion from graph convolutional network and convolutional neural network
Wang SH, Govindaraj VV, Górriz JM, Zhang X and Zhang YD
() COVID-19 is an infectious disease spreading to the world this year. In this study, we plan to develop an artificial intelligence based tool to diagnose on chest CT images. () On one hand, we extract features from a self-created convolutional neural network (CNN) to learn individual image-level representations. The proposed CNN employed several new techniques such as rank-based average pooling and multiple-way data augmentation. On the other hand, relation-aware representations were learnt from graph convolutional network (GCN). Deep feature fusion (DFF) was developed in this work to fuse individual image-level features and relation-aware features from both GCN and CNN, respectively. The best model was named as FGCNet. () The experiment first chose the best model from eight proposed network models, and then compared it with 15 state-of-the-art approaches. () The proposed FGCNet model is effective and gives better performance than all 15 state-of-the-art methods. Thus, our proposed FGCNet model can assist radiologists to rapidly detect COVID-19 from chest CT images.
DiCyc: GAN-based deformation invariant cross-domain information fusion for medical image synthesis
Wang C, Yang G, Papanastasiou G, Tsaftaris SA, Newby DE, Gray C, Macnaught G and MacGillivray TJ
Cycle-consistent generative adversarial network (CycleGAN) has been widely used for cross-domain medical image synthesis tasks particularly due to its ability to deal with unpaired data. However, most CycleGAN-based synthesis methods cannot achieve good alignment between the synthesized images and data from the source domain, even with additional image alignment losses. This is because the CycleGAN generator network can encode the relative deformations and noises associated to different domains. This can be detrimental for the downstream applications that rely on the synthesized images, such as generating pseudo-CT for PET-MR attenuation correction. In this paper, we present a deformation invariant cycle-consistency model that can filter out these domain-specific deformation. The deformation is globally parameterized by thin-plate-spline (TPS), and locally learned by modified deformable convolutional layers. Robustness to domain-specific deformations has been evaluated through experiments on multi-sequence brain MR data and multi-modality abdominal CT and MR data. Experiment results demonstrated that our method can achieve better alignment between the source and target data while maintaining superior image quality of signal compared to several state-of-the-art CycleGAN-based methods.
Fusion in stock market prediction: A decade survey on the necessity, recent developments, and potential future directions
Thakkar A and Chaudhari K
Investment in a financial market is aimed at getting higher benefits; this complex market is influenced by a large number of events wherein the prediction of future market dynamics is challenging. The investors' etiquettes towards stock market may demand the need of studying various associated factors and extract the useful information for reliable forecasting. Fusion can be considered as an approach to integrate data or characteristics, in general, and enhance the prediction based on the combinational approach that can aid each other. We conduct a systematic approach to present a survey for the years 2011-2020 by considering articles that have used fusion techniques for various stock market applications and broadly categorize them into information fusion, feature fusion, and model fusion. The major applications of stock market include stock price and trend prediction, risk analysis and return forecasting, index prediction, as well as portfolio management. We also provide an infographic overview of fusion in stock market prediction and extend our survey for other finely addressed financial prediction problems. Based on our surveyed articles, we provide potential future directions and concluding remarks on the significance of applying fusion in stock market.
Advances in multimodal data fusion in neuroimaging: Overview, challenges, and novel orientation
Zhang YD, Dong Z, Wang SH, Yu X, Yao X, Zhou Q, Hu H, Li M, Jiménez-Mesa C, Ramirez J, Martinez FJ and Gorriz JM
Multimodal fusion in neuroimaging combines data from multiple imaging modalities to overcome the fundamental limitations of individual modalities. Neuroimaging fusion can achieve higher temporal and spatial resolution, enhance contrast, correct imaging distortions, and bridge physiological and cognitive information. In this study, we analyzed over 450 references from PubMed, Google Scholar, IEEE, ScienceDirect, Web of Science, and various sources published from 1978 to 2020. We provide a review that encompasses (1) an overview of current challenges in multimodal fusion (2) the current medical applications of fusion for specific neurological diseases, (3) strengths and limitations of available imaging modalities, (4) fundamental fusion rules, (5) fusion quality assessment methods, and (6) the applications of fusion for atlas-based segmentation and quantification. Overall, multimodal fusion shows significant benefits in clinical diagnosis and neuroscience research. Widespread education and further research amongst engineers, researchers and clinicians will benefit the field of multimodal neuroimaging.
The introduction of population migration to SEIAR for COVID-19 epidemic modeling with an efficient intervention strategy
Chen M, Li M, Hao Y, Liu Z, Hu L and Wang L
In this paper, we present a mathematical model of an infectious disease according to the characteristics of the COVID-19 pandemic. The proposed enhanced model, which will be referred to as the SEIR (Susceptible-Exposed-Infectious-Recovered) model with population migration, is inspired by the role that asymptomatic infected individuals, as well as population movements can play a crucial role in spreading the virus. In the model, the infected and the basic reproduction numbers are compared under the influence of intervention policies. The experimental simulation results show the impact of social distancing and migration-in rates on reducing the total number of infections and the basic reproductions. And then, the importance of controlling the number of migration-in people and the policy of restricting residents' movements in preventing the spread of COVID-19 pandemic are verified.
Revisiting crowd behaviour analysis through deep learning: Taxonomy, anomaly detection, crowd emotions, datasets, opportunities and prospects
Luque Sánchez F, Hupont I, Tabik S and Herrera F
Crowd behaviour analysis is an emerging research area. Due to its novelty, a proper taxonomy to organise its different sub-tasks is still missing. This paper proposes a taxonomic organisation of existing works following a pipeline, where sub-problems in last stages benefit from the results in previous ones. Models that employ Deep Learning to solve crowd anomaly detection, one of the proposed stages, are reviewed in depth, and the few works that address emotional aspects of crowds are outlined. The importance of bringing emotional aspects into the study of crowd behaviour is remarked, together with the necessity of producing real-world, challenging datasets in order to improve the current solutions. Opportunities for fusing these models into already functioning video analytics systems are proposed.
Autosomal Dominantly Inherited Alzheimer Disease: Analysis of genetic subgroups by Machine Learning
Castillo-Barnes D, Su L, Ramírez J, Salas-Gonzalez D, Martinez-Murcia FJ, Illan IA, Segovia F, Ortiz A, Cruchaga C, Farlow MR, Xiong C, Graff-Radford NR, Schofield PR, Masters CL, Salloway S, Jucker M, Mori H, Levin J, Gorriz JM and
Despite subjects with Dominantly-Inherited Alzheimer's Disease (DIAD) represent less than 1% of all Alzheimer's Disease (AD) cases, the Dominantly Inherited Alzheimer Network (DIAN) initiative constitutes a strong impact in the understanding of AD disease course with special emphasis on the presyptomatic disease phase. Until now, the 3 genes involved in DIAD pathogenesis (PSEN1, PSEN2 and APP) have been commonly merged into one group (Mutation Carriers, MC) and studied using conventional statistical analysis. Comparisons between groups using null-hypothesis testing or longitudinal regression procedures, such as the linear-mixed-effects models, have been assessed in the extant literature. Within this context, the work presented here performs a comparison between different groups of subjects by considering the 3 genes, either jointly or separately, and using tools based on Machine Learning (ML). This involves a feature selection step which makes use of ANOVA followed by Principal Component Analysis (PCA) to determine which features would be realiable for further comparison purposes. Then, the selected predictors are classified using a Support-Vector-Machine (SVM) in a nested k-Fold cross-validation resulting in maximum classification rates of 72-74% using PiB PET features, specially when comparing asymptomatic Non-Carriers (NC) subjects with asymptomatic PSEN1 Mutation-Carriers (PSEN1-MC). Results obtained from these experiments led to the idea that PSEN1-MC might be considered as a mixture of two different subgroups including: a first group whose patterns were very close to NC subjects, and a second group much more different in terms of imaging patterns. Thus, using a k-Means clustering algorithm it was determined both subgroups and a new classification scenario was conducted to validate this process. The comparison between each subgroup . NC subjects resulted in classification rates around 80% underscoring the importance of considering DIAN as an heterogeneous entity.
Machine Learning for Integrating Data in Biology and Medicine: Principles, Practice, and Opportunities
Zitnik M, Nguyen F, Wang B, Leskovec J, Goldenberg A and Hoffman MM
New technologies have enabled the investigation of biology and human health at an unprecedented scale and in multiple dimensions. These dimensions include myriad properties describing genome, epigenome, transcriptome, microbiome, phenotype, and lifestyle. No single data type, however, can capture the complexity of all the factors relevant to understanding a phenomenon such as a disease. Integrative methods that combine data from multiple technologies have thus emerged as critical statistical and computational approaches. The key challenge in developing such approaches is the identification of effective models to provide a comprehensive and relevant systems view. An ideal method can answer a biological or medical question, identifying important features and predicting outcomes, by harnessing heterogeneous data across several dimensions of biological variation. In this Review, we describe the principles of data integration and discuss current methods and available implementations. We provide examples of successful data integration in biology and medicine. Finally, we discuss current challenges in biomedical integrative methods and our perspective on the future development of the field.
Social big data: Recent achievements and new challenges
Bello-Orgaz G, Jung JJ and Camacho D
Big data has become an important issue for a large number of research areas such as data mining, machine learning, computational intelligence, information fusion, the semantic Web, and social networks. The rise of different big data frameworks such as Apache Hadoop and, more recently, Spark, for massive data processing based on the MapReduce paradigm has allowed for the efficient utilisation of data mining methods and machine learning algorithms in different domains. A number of libraries such as Mahout and SparkMLib have been designed to develop new efficient applications based on machine learning algorithms. The combination of big data technologies and traditional machine learning algorithms has generated new and interesting challenges in other areas as social media and social networks. These new challenges are focused mainly on problems such as data processing, data storage, data representation, and how data can be used for pattern mining, analysing user behaviours, and visualizing and tracking data, among others. In this paper, we present a revision of the new methodologies that is designed to allow for efficient data mining and information fusion from social media and of the new applications and frameworks that are currently appearing under the "umbrella" of the social networks, social media and big data paradigms.