ARTIFICIAL INTELLIGENCE IN MEDICINE

Design and use of a Denoising Convolutional Autoencoder for reconstructing electrocardiogram signals at super resolution
Lomoio U, Veltri P, Guzzi PH and Liò P
Electrocardiogram signals play a pivotal role in cardiovascular diagnostics, providing essential information on electrical hearth activity. However, inherent noise and limited resolution can hinder an accurate interpretation of the recordings. In this paper an advanced Denoising Convolutional Autoencoder designed to process electrocardiogram signals, generating super-resolution reconstructions is proposed; this is followed by in-depth analysis of the enhanced signals. The autoencoder receives a signal window (of 5 s) sampled at 50 Hz (low resolution) as input and reconstructs a denoised super-resolution signal at 500 Hz. The proposed autoencoder is applied to publicly available datasets, demonstrating optimal performance in reconstructing high-resolution signals from very low-resolution inputs sampled at 50 Hz. The results were then compared with current state-of-the-art for electrocardiogram super-resolution, demonstrating the effectiveness of the proposed method. The method achieves a signal-to-noise ratio of 12.20 dB, a mean squared error of 0.0044, and a root mean squared error of 4.86%, which significantly outperforms current state-of-the-art alternatives. This framework can effectively enhance hidden information within signals, aiding in the detection of heart-related diseases.
Computer model for gait assessments in Parkinson's patients using a fuzzy inference model and inertial sensors
Sánchez-Fernández LP, Sánchez-Pérez LA and Martínez-Hernández JM
Patients with Parkinson's disease (PD) in the moderate and severe stages can present several walk alterations. They can show slow movements and difficulty initiating, varying, or interrupting their gait; freezing; short steps; speed changes; shuffling; little arm swing; and festinating gait. The Movement Disorder Society-Unified Parkinson's Disease Rating Scale (MDS-UPDRS) has a good reputation for uniformly evaluating motor and non-motor aspects of PD. However, the motor clinical assessment depends on visual observations, the results are qualitative, and subtle differences are not identified. This study presents a fuzzy inference model for gait assessments in PD patients with detailed descriptions of signal processing and eight biomechanical indicators computations; as such, other authors can replicate the presented methods. The computer model uses 334 bilateral measurements of 58 Parkinson's patients and 15 healthy control subjects performed over one year. The computer model validations are based on physician evaluations in real-time and post-analysis using an extensive database of videos and signals. The assessment results are explainable, quantitative, and qualitative, increasing their acceptance and use in clinical environments. The computer system design considers three expert motor evaluations, including the PD patients' evolutions; this facilitates correlation with medication doses and appropriate intervals for follow-up medical consultations. The assessments include three qualitative gait conditions of MDS-UPDRS-normal, slight, and mild-as well as a numerical evaluation of up to two decimal places.
Training and validating a treatment recommender with partial verification evidence
Unnikrishnan V, Puga C, Schleicher M, Niemann U, Langguth B, Schoisswohl S, Mazurek B, Cima R, Lopez-Escamez JA, Kikidis D, Vellidou E, Pryss R, Schlee W and Spiliopoulou M
Current clinical decision support systems (DSS) are trained and validated on observational data from the clinic in which the DSS is going to be applied. This is problematic for treatments that have already been validated in a randomized clinical trial (RCT), but have not yet been introduced in any clinic. In this work, we report on a method for training and validating the DSS core before introduction to a clinic, using the RCT data themselves. The key challenges we address are of missingness, foremost: missing rationale when assigning a treatment to a patient (the assignment is at random), and missing verification evidence, since the effectiveness of a treatment for a patient can only be verified (ground truth) if the treatment was indeed assigned to the patient - but then the assignment was at random.
Fraud detection in healthcare claims using machine learning: A systematic review
du Preez A, Bhattacharya S, Beling P and Bowen E
Identifying fraud in healthcare programs is crucial, as an estimated 3%-10% of the total healthcare expenditures are lost to fraudulent activities. This study presents a systematic literature review of machine learning techniques applied to fraud detection in health insurance claims. We aim to analyze the data and methodologies documented in the literature over the past two decades, providing insights into research challenges and opportunities.
Neural Architecture Search for biomedical image classification: A comparative study across data modalities
Kuş Z, Aydin M, Kiraz B and Kiraz A
Deep neural networks have significantly advanced medical image classification across various modalities and tasks. However, manually designing these networks is often time-consuming and suboptimal. Neural Architecture Search (NAS) automates this process, potentially finding more efficient and effective models. This study provides a comprehensive comparative analysis of our two NAS methods, PBC-NAS and BioNAS, across multiple biomedical image classification tasks using the MedMNIST dataset. Our experiments evaluate these methods based on classification performance (Accuracy (ACC) and Area Under the Curve (AUC)) and computational complexity (Floating Point Operation Counts). Results demonstrate that BioNAS models slightly outperform PBC-NAS models in accuracy, with BioNAS-2 achieving the highest average accuracy of 0.848. However, PBC-NAS models exhibit superior computational efficiency, with PBC-NAS-2 achieving the lowest average FLOPs of 0.82 GB. Both methods outperform state-of-the-art architectures like ResNet-18 and ResNet-50 and AutoML frameworks such as auto-sklearn, AutoKeras, and Google AutoML. Additionally, PBC-NAS and BioNAS outperform other NAS studies in average ACC results (except MSTF-NAS), and show highly competitive results in average AUC. We conduct extensive ablation studies to investigate the impact of architectural parameters, the effectiveness of fine-tuning, search space efficiency, and the discriminative performance of generated architectures. These studies reveal that larger filter sizes and specific numbers of stacks or modules enhance performance. Fine-tuning existing architectures can achieve nearly optimal results without separating NAS for each dataset. Furthermore, we analyze search space efficiency, uncovering patterns in frequently selected operations and architectural choices. This study highlights the strengths and efficiencies of PBC-NAS and BioNAS, providing valuable insights and guidance for future research and practical applications in biomedical image classification.
CircWaveDL: Modeling of optical coherence tomography images based on a new supervised tensor-based dictionary learning for classification of macular abnormalities
Arian R, Vard A, Kafieh R, Plonka G and Rabbani H
Modeling Optical Coherence Tomography (OCT) images is crucial for numerous image processing applications and aids ophthalmologists in the early detection of macular abnormalities. Sparse representation-based models, particularly dictionary learning (DL), play a pivotal role in image modeling. Traditional DL methods often transform higher-order tensors into vectors and then aggregate them into a matrix, which overlooks the inherent multi-dimensional structure of the data. To address this limitation, tensor-based DL approaches have been introduced. In this study, we present a novel tensor-based DL algorithm, CircWaveDL, for OCT classification, where both the training data and the dictionary are modeled as higher-order tensors. We named our approach CircWaveDL to reflect the use of CircWave atoms for dictionary initialization, rather than random initialization. CircWave has previously shown effectiveness in OCT classification, making it a fitting basis function for our DL method. The algorithm employs CANDECOMP/PARAFAC (CP) decomposition to factorize each tensor into lower dimensions. We then learn a sub-dictionary for each class using its respective training tensor. For testing, a test tensor is reconstructed with each sub-dictionary, and each test B-scan is assigned to the class that yields the minimal residual error. To evaluate the model's generalizability, we tested it across three distinct databases. Additionally, we introduce a new heatmap generation technique based on averaging the most significant atoms of the learned sub-dictionaries. This approach highlights that selecting an appropriate sub-dictionary for reconstructing test B-scans improves reconstructions, emphasizing the distinctive features of different classes. CircWaveDL demonstrated strong generalizability across external validation datasets, outperforming previous classification methods. It achieved accuracies of 92.5 %, 86.1 %, and 89.3 % on datasets 1, 2, and 3, respectively, showcasing its efficacy in OCT image classification.
ECGEFNet: A two-branch deep learning model for calculating left ventricular ejection fraction using electrocardiogram
Qi Y, Li G, Yang J, Li H, Yu Q, Qu M, Ning H and Wang Y
Left ventricular systolic dysfunction (LVSD) and its severity are correlated with the prognosis of cardiovascular diseases. Early detection and monitoring of LVSD are of utmost importance. Left ventricular ejection fraction (LVEF) is an essential indicator for evaluating left ventricular function in clinical practice, the current echocardiography-based evaluation method is not avaliable in primary care and difficult to achieve real-time monitoring capabilities for cardiac dysfunction. We propose a two-branch deep learning model (ECGEFNet) for calculating LVEF using electrocardiogram (ECG), which holds the potential to serve as a primary medical screening tool and facilitate long-term dynamic monitoring of cardiac functional impairments. It integrates original numerical signal and waveform plots derived from the signals in an innovative manner, enabling joint calculation of LVEF by incorporating diverse information encompassing temporal, spatial and phase aspects. To address the inadequate information interaction between the two branches and the lack of efficiency in feature fusion, we propose the fusion attention mechanism (FAT) and the two-branch feature fusion module (BFF) to guide the learning, alignment and fusion of features from both branches. We assemble a large internal dataset and perform experimental validation on it. The accuracy of cardiac dysfunction screening is 92.3%, the mean absolute error (MAE) in LVEF calculation is 4.57%. The proposed model performs well and outperforms existing basic models, and is of great significance for real-time monitoring of the degree of cardiac dysfunction.
Concordance-based Predictive Uncertainty (CPU)-Index: Proof-of-concept with application towards improved specificity of lung cancers on low dose screening CT
Wang Y, Gupta A, Tushar FI, Riley B, Wang A, Tailor TD, Tantum S, Liu JG, Bashir MR, Lo JY and Lafata KJ
In this paper, we introduce a novel concordance-based predictive uncertainty (CPU)-Index, which integrates insights from subgroup analysis and personalized AI time-to-event models. Through its application in refining lung cancer screening (LCS) predictions generated by an individualized AI time-to-event model trained with fused data of low dose CT (LDCT) radiomics with patient demographics, we demonstrate its effectiveness, resulting in improved risk assessment compared to the Lung CT Screening Reporting & Data System (Lung-RADS). Subgroup-based Lung-RADS faces challenges in representing individual variations and relies on a limited set of predefined characteristics, resulting in variable predictions. Conversely, personalized AI time-to-event models are hindered by transparency issues and biases from censored data. By measuring the prediction consistency between subgroup analysis and AI time-to-event models, the CPU-Index framework offers a nuanced evaluation of the bias-variance trade-off and improves the transparency and reliability of predictions. Consistency was estimated by the concordance index of subgroup analysis-based similarity rank and model prediction similarity rank. Subgroup analysis-based similarity loss was defined as the sum-of-the-difference between Lung-RADS and feature-level 0-1 loss. Model prediction similarity loss was defined as squared loss. To test our approach, we identified 3,326 patients who underwent LDCT for LCS from 1/1/2015 to 6/30/2020 with confirmation of lung cancer on pathology within one year. For each LDCT image, the lesion associated with a Lung-RADS score was detected using a pretrained deep learning model from Medical Open Network for AI (MONAI), from which radiomic features were extracted. Radiomics were optimally fused with patient demographics via a positional encoding scheme and used to train a neural multi-task logistic regression time-to-event model that predicts malignancy. Performance was maximized when radiomics features were fused with positionally encoded demographic features. In this configuration, our algorithm raised the AUC from 0.81 ± 0.04 to 0.89 ± 0.02. Compared to standard Lung-RADS, our approach reduced the False-Positive-Rate from 0.41 ± 0.02 to 0.30 ± 0.12 while maintaining the same False-Negative-Rate. Our methodology enhances lung cancer risk assessment by estimating prediction uncertainty and adjusting accordingly. Furthermore, the optimal integration of radiomics and patient demographics improved overall diagnostic performance, indicating their complementary nature.
A systematic review on the roles of remote diagnosis in telemedicine system: Coherent taxonomy, insights, recommendations, and open research directions for intelligent healthcare solutions
Mohsin SS, Salman OH, Jasim AA, Al-Nouman MA and Kairaldeen AR
The term 'remote diagnosis' in telemedicine describes the procedure wherein medical practitioners diagnose patients remotely by using telecommunications technology. With this method, patients can obtain medical care without having to physically visit a hospital, which can be helpful for people who live in distant places or have restricted mobility. When people in the past had health issues, they were usually sent to the hospital, where they received clinical examinations, diagnoses, and treatment at the facility. Thus, hospitals were overcrowded because of the increase in the number of patients or in the death of some very ill patients given that the completion of medical operations required a significant amount of time.
AI-enabled clinical decision support tools for mental healthcare: A product review
Kleine AK, Kokje E, Hummelsberger P, Lermer E, Schaffernak I and Gaube S
The review seeks to promote transparency in the availability of regulated AI-enabled Clinical Decision Support Systems (AI-CDSS) for mental healthcare. From 84 potential products, seven fulfilled the inclusion criteria. The products can be categorized into three major areas: diagnosis of autism spectrum disorder (ASD) based on clinical history, behavioral, and eye-tracking data; diagnosis of multiple disorders based on conversational data; and medication selection based on clinical history and genetic data. We found five scientific articles evaluating the devices' performance and external validity. The average completeness of reporting, indicated by 52 % adherence to the Consolidated Standards of Reporting Trials Artificial Intelligence (CONSORT-AI) checklist, was modest, signaling room for improvement in reporting quality. Our findings stress the importance of obtaining regulatory approval, adhering to scientific standards, and staying up-to-date with the latest changes in the regulatory landscape. Refining regulatory guidelines and implementing effective tracking systems for AI-CDSS could enhance transparency and oversight in the field.
LCDL: Classification of ICD codes based on disease label co-occurrence dependency and LongFormer with medical knowledge
Yang Y, Lin H, Yang Z, Zhang Y, Zhao D and Luo L
Medical coding involves assigning codes to clinical free-text documents, specifically medical records that average over 3,000 markers, in order to track patient diagnoses and treatments. This is typically accomplished through manual assignments by healthcare professionals. To improve efficiency and accuracy while reducing the workload on these professionals, researchers have employed a multi-label classification approach. Since the long-tail phenomenon impacts tens of thousands of ICD codes, whereby only a few codes (representative of common diseases) are frequently assigned, while the majority of codes (representative of rare diseases) are infrequently assigned, this paper presents an LCDL model that addresses the challenge at hand by examining the LongFormer pre-trained language model and the disease label co-occurrence map. To enhance the performance of automated medical coding in the biomedical domain, hierarchies with medical knowledge, synonyms and abbreviations are introduced, improving the medical knowledge representation. Test evaluations are extensively conducted on the benchmark dataset MIMIC-III, and obtained the competitive performance compared to the previous state-of-the-art methods.
Rough hypervolume-driven feature selection with groupwise intelligent sampling for detecting clinical characterization of lupus nephritis
Zhou X, Chen Y, Heidari AA, Chen H and Chen X
Systemic lupus erythematosus (SLE) is an autoimmune inflammatory disease. Lupus nephritis (LN) is a major risk factor for morbidity and mortality in SLE. Proliferative and pure membranous LN have different prognoses and may require different treatments. This study proposes a binary rough hypervolume-driven spherical evolution algorithm with groupwise intelligent sampling (bRGSE). The efficient dimensionality reduction capability of the bRGSE is verified across twelve datasets. These datasets are from the public datasets, with feature dimensions ranging from seven hundred to fifty thousand. The experimental results indicate that bRGSE performs better than seven high-performing alternatives. Then, the bRGSE was combined with adaptive boosting (AdaBoost) to form a new model (bRGSE_AdaBoost), which analyzed clinical records collected from 110 patients with LN. Experimental results show that the proposed bRGSE_AdaBoost can identify the most critical indicators, including urine latent blood, white blood cells, endogenous creatinine clearing rate, and age. These indicators may help differentiate between proliferative LN and membranous LN. The proposed bRGSE algorithm is an efficient dimensionality reduction method. The developed bRGSE_AdaBoost model, a computer-aided model, achieved an accuracy of 96.687 % and is expected to provide early warning for the treatment and diagnosis of LN.
Intrinsic-dimension analysis for guiding dimensionality reduction and data fusion in multi-omics data processing
Gliozzo J, Soto-Gomez M, Guarino V, Bonometti A, Cabri A, Cavalleri E, Reese J, Robinson PN, Mesiti M, Valentini G and Casiraghi E
Multi-omics data have revolutionized biomedical research by providing a comprehensive understanding of biological systems and the molecular mechanisms of disease development. However, analyzing multi-omics data is challenging due to high dimensionality and limited sample sizes, necessitating proper data-reduction pipelines to ensure reliable analyses. Additionally, its multimodal nature requires effective data-integration pipelines. While several dimensionality reduction and data fusion algorithms have been proposed, crucial aspects are often overlooked. Specifically, the choice of projection space dimension is typically heuristic and uniformly applied across all omics, neglecting the unique high dimension small sample size challenges faced by individual omics. This paper introduces a novel multi-modal dimensionality reduction pipeline tailored to individual views. By leveraging intrinsic dimensionality estimators, we assess the curse-of-dimensionality impact on each view and propose a two-step reduction strategy for significantly affected views, combining feature selection with feature extraction. Compared to traditional uniform reduction pipelines in a crucial and supervised multi-omics analysis setting, our approach shows significant improvement. Additionally, we explore three effective unsupervised multi-omics data fusion methods rooted in the main data fusion strategies to gain insights into their performance under crucial, yet overlooked, settings.
Disentangled global and local features of multi-source data variational autoencoder: An interpretable model for diagnosing IgAN via multi-source Raman spectral fusion techniques
Shuai W, Tian X, Zuo E, Zhang X, Lu C, Gu J, Chen C, Lv X and Chen C
A single Raman spectrum reflects limited molecular information. Effective fusion of the Raman spectra of serum and urine source domains helps to obtain richer feature information. However, most of the current studies on immunoglobulin A nephropathy (IgAN) based on Raman spectroscopy are based on small sample data and low signal-to-noise ratio. If a multi-source data fusion strategy is directly adopted, it may even reduce the accuracy of disease diagnosis. To this end, this paper proposes a data enhancement and spectral optimization method based on variational autoencoders to obtain reconstructed Raman spectra with doubled sample size and improved signal-to-noise ratio. In the diagnosis of IgAN in multi-source domain Raman spectra, this paper builds a global and local feature decoupled variational autoencoder (DMSGL-VAE) model based on multi-source data. First, the statistical features after spectral segmentation are extracted, and the latent variables obtained by the variational encoder are decoupled through the decoupling module. The global representation and local representation obtained represent the global shared information and local unique information of the serum and urine source domains, respectively. Then, the cross-source reconstruction loss and decoupling loss are used to constrain the decoupling, and the effectiveness of the decoupling is proved quantitatively and qualitatively. Finally, the features of different source domains were integrated to diagnose IgAN, and the results were analyzed for important features using the SHapley Additive exPlanations algorithm. The experimental results showed that the AUC value of the DMSGL-VAE model for diagnosing IgAN on the test set was as high as 0.9958. The SHAP algorithm was used to further prove that proteins, hydroxybutyrate, and guanine are likely to be common biological fingerprint substances for the diagnosis of IgAN by serum and urine Raman spectroscopy. In summary, the DMSGL-VAE model designed based on Raman spectroscopy in this paper can achieve rapid, non-invasive, and accurate screening of IgAN in terms of classification performance. And interpretable analysis may help doctors further understand IgAN and make more efficient diagnostic measures in the future.
ItpCtrl-AI: End-to-end interpretable and controllable artificial intelligence by modeling radiologists' intentions
Pham TT, Brecheisen J, Wu CC, Nguyen H, Deng Z, Adjeroh D, Doretto G, Choudhary A and Le N
Using Deep Learning in computer-aided diagnosis systems has been of great interest due to its impressive performance in the general domain and medical domain. However, a notable challenge is the lack of explainability of many advanced models, which poses risks in critical applications such as diagnosing findings in CXR. To address this problem, we propose ItpCtrl-AI, a novel end-to-end interpretable and controllable framework that mirrors the decision-making process of the radiologist. By emulating the eye gaze patterns of radiologists, our framework initially determines the focal areas and assesses the significance of each pixel within those regions. As a result, the model generates an attention heatmap representing radiologists' attention, which is then used to extract attended visual information to diagnose the findings. By allowing the directional input, our framework is controllable by the user. Furthermore, by displaying the eye gaze heatmap which guides the diagnostic conclusion, the underlying rationale behind the model's decision is revealed, thereby making it interpretable. In addition to developing an interpretable and controllable framework, our work includes the creation of a dataset, named Diagnosed-Gaze++, which aligns medical findings with eye gaze data. Our extensive experimentation validates the effectiveness of our approach in generating accurate attention heatmaps and diagnoses. The experimental results show that our model not only accurately identifies medical findings but also precisely produces the eye gaze attention of radiologists. The dataset, models, and source code will be made publicly available upon acceptance.
Glaucoma detection: Binocular approach and clinical data in machine learning
Kovalyk-Borodyak O, Morales-Sánchez J, Verdú-Monedero R and Sancho-Gómez JL
In this work, we present a multi-modal machine learning method to automate early glaucoma diagnosis. The proposed methodology introduces two novel aspects for automated diagnosis not previously explored in the literature: simultaneous use of ocular fundus images from both eyes and integration with the patient's additional clinical data. We begin by establishing a baseline, termed monocular mode, which adheres to the traditional approach of considering the data from each eye as a separate instance. We then explore the binocular mode, investigating how combining information from both eyes of the same patient can enhance glaucoma diagnosis accuracy. This exploration employs the PAPILA dataset, comprising information from both eyes, clinical data, ocular fundus images, and expert segmentation of these images. Additionally, we compare two image-derived data modalities: direct ocular fundus images and morphological data from manual expert segmentation. Our method integrates Gradient-Boosted Decision Trees (GBDT) and Convolutional Neural Networks (CNN), specifically focusing on the MobileNet, VGG16, ResNet-50, and Inception models. SHAP values are used to interpret GBDT models, while the Deep Explainer method is applied in conjunction with SHAP to analyze the outputs of convolutional-based models. Our findings show the viability of considering both eyes, which improves the model performance. The binocular approach, incorporating information from morphological and clinical data yielded an AUC of 0.796 (±0.003 at a 95% confidence interval), while the CNN, using the same approach (both eyes), achieved an AUC of 0.764 (±0.005 at a 95% confidence interval).
Advances in diagnosis and prognosis of bacteraemia, bloodstream infection, and sepsis using machine learning: A comprehensive living literature review
B H, D K M, T M R, W B, R W, V V, J D, J RM, F J D, P G and A H H
Blood-related infections are a significant concern in healthcare. They can lead to serious medical complications and even death if not promptly diagnosed and treated. Throughout time, medical research has sought to identify clinical factors and strategies to improve the management of these conditions. The increasing adoption of electronic health records has led to a wealth of electronically available medical information and predictive models have emerged as invaluable tools. This manuscript offers a detailed survey of machine-learning techniques used for the diagnosis and prognosis of bacteraemia, bloodstream infections, and sepsis shedding light on their efficacy, potential limitations, and the intricacies of their integration into clinical practice.
TransformerLSR: Attentive joint model of longitudinal data, survival, and recurrent events with concurrent latent structure
Zhang Z, Zhao Y and Xu Y
In applications such as biomedical studies, epidemiology, and social sciences, recurrent events often co-occur with longitudinal measurements and a terminal event, such as death. Therefore, jointly modeling longitudinal measurements, recurrent events, and survival data while accounting for their dependencies is critical. While joint models for the three components exist in statistical literature, many of these approaches are limited by heavy parametric assumptions and scalability issues. Recently, incorporating deep learning techniques into joint modeling has shown promising results. However, current methods only address joint modeling of longitudinal measurements at regularly-spaced observation times and survival events, neglecting recurrent events. In this paper, we develop TransformerLSR, a flexible transformer-based deep modeling and inference framework to jointly model all three components simultaneously. TransformerLSR integrates deep temporal point processes into the joint modeling framework, treating recurrent and terminal events as two competing processes dependent on past longitudinal measurements and recurrent event times. Additionally, TransformerLSR introduces a novel trajectory representation and model architecture to potentially incorporate a priori knowledge of known latent structures among concurrent longitudinal variables. We demonstrate the effectiveness and necessity of TransformerLSR through simulation studies and analyzing a real-world medical dataset on patients after kidney transplantation.
Prediction of radiological decision errors from longitudinal analysis of gaze and image features
Anikina A, Ibragimova D, Mustafaev T, Mello-Thoms C and Ibragimov B
Medical imaging, particularly radiography, is an indispensable part of diagnosing many chest diseases. Final diagnoses are made by radiologists based on images, but the decision-making process is always associated with a risk of incorrect interpretation. Incorrectly interpreted data can lead to delays in treatment, a prescription of inappropriate therapy, or even a completely missed diagnosis. In this context, our study aims to determine whether it is possible to predict diagnostic errors made by radiologists using eye-tracking technology. For this purpose, we asked 4 radiologists with different levels of experience to analyze 1000 images covering a wide range of chest diseases. Using eye-tracking data, we calculated the radiologists' gaze fixation points and generated feature vectors based on this data to describe the radiologists' gaze behavior during image analysis. Additionally, we emulated the process of revealing the read images following radiologists' gaze data to create a more comprehensive picture of their analysis. Then we applied a recurrent neural network to predict diagnostic errors. Our results showed a 0.7755 ROC AUC score, demonstrating a significant potential for this approach in enhancing the accuracy of diagnostic error recognition.
Implementation of artificial intelligence approaches in oncology clinical trials: A systematic review
Saady M, Eissa M, Yacoub AS, Hamed AB and Azzazy HME
There is a growing interest in leveraging artificial intelligence (AI) technologies to enhance various aspects of clinical trials. The goal of this systematic review is to assess the impact of implementing AI approaches on different aspects of oncology clinical trials.
A generalizable normative deep autoencoder for brain morphological anomaly detection: application to the multi-site StratiBip dataset on bipolar disorder in an external validation framework
Sampaio IW, Tassi E, Bellani M, Benedetti F, Nenadić I, Phillips ML, Piras F, Yatham L, Bianchi AM, Brambilla P and Maggioni E
The heterogeneity of psychiatric disorders makes researching disorder-specific neurobiological markers an ill-posed problem. Here, we face the need for disease stratification models by presenting a generalizable multivariate normative modelling framework for characterizing brain morphology, applied to bipolar disorder (BD). We used deep autoencoders in an anomaly detection framework, combined for the first time with a confounder removal step that integrates training and external validation. The model was trained with healthy control (HC) data from the human connectome project and applied to multi-site external data of HC and BD individuals. We found that brain deviating scores were greater, more heterogeneous, and with increased extreme values in the BD group, with volumes prominently from the basal ganglia, hippocampus, and adjacent regions emerging as significantly deviating. Similarly, individual brain deviating maps based on modified z scores expressed higher abnormalities occurrences, but their overall spatial overlap was lower compared to HCs. Our generalizable framework enabled the identification of brain deviating patterns differing between the subject and the group levels, a step forward towards the development of more effective and personalized clinical decision support systems and patient stratification in psychiatry.