MEDICAL IMAGE ANALYSIS

Multi-task learning with cross-task consistency for improved depth estimation in colonoscopy
Chavarrias Solano PE, Bulpitt A, Subramanian V and Ali S
Colonoscopy screening is the gold standard procedure for assessing abnormalities in the colon and rectum, such as ulcers and cancerous polyps. Measuring the abnormal mucosal area and its 3D reconstruction can help quantify the surveyed area and objectively evaluate disease burden. However, due to the complex topology of these organs and variable physical conditions, for example, lighting, large homogeneous texture, and image modality estimating distance from the camera (aka depth) is highly challenging. Moreover, most colonoscopic video acquisition is monocular, making the depth estimation a non-trivial problem. While methods in computer vision for depth estimation have been proposed and advanced on natural scene datasets, the efficacy of these techniques has not been widely quantified on colonoscopy datasets. As the colonic mucosa has several low-texture regions that are not well pronounced, learning representations from an auxiliary task can improve salient feature extraction, allowing estimation of accurate camera depths. In this work, we propose to develop a novel multi-task learning (MTL) approach with a shared encoder and two decoders, namely a surface normal decoder and a depth estimator decoder. Our depth estimator incorporates attention mechanisms to enhance global context awareness. We leverage the surface normal prediction to improve geometric feature extraction. Also, we apply a cross-task consistency loss among the two geometrically related tasks, surface normal and camera depth. We demonstrate an improvement of 15.75% on relative error and 10.7% improvement on δ accuracy over the most accurate baseline state-of-the-art Big-to-Small (BTS) approach. All experiments are conducted on a recently released C3VD dataset, and thus, we provide a first benchmark of state-of-the-art methods on this dataset.
Semantics and instance interactive learning for labeling and segmentation of vertebrae in CT images
Mao Y, Feng Q, Zhang Y and Ning Z
Automatically labeling and segmenting vertebrae in 3D CT images compose a complex multi-task problem. Current methods progressively conduct vertebra labeling and semantic segmentation, which typically include two separate models and may ignore feature interaction among different tasks. Although instance segmentation approaches with multi-channel prediction have been proposed to alleviate such issues, their utilization of semantic information remains insufficient. Additionally, another challenge for an accurate model is how to effectively distinguish similar adjacent vertebrae and model their sequential attribute. In this paper, we propose a Semantics and Instance Interactive Learning (SIIL) paradigm for synchronous labeling and segmentation of vertebrae in CT images. SIIL models semantic feature learning and instance feature learning, in which the former extracts spinal semantics and the latter distinguishes vertebral instances. Interactive learning involves semantic features to improve the separability of vertebral instances and instance features to help learn position and contour information, during which a Morphological Instance Localization Learning (MILL) module is introduced to align semantic and instance features and facilitate their interaction. Furthermore, an Ordinal Contrastive Prototype Learning (OCPL) module is devised to differentiate adjacent vertebrae with high similarity (via cross-image contrastive learning), and simultaneously model their sequential attribute (via a temporal unit). Extensive experiments on several datasets demonstrate that our method significantly outperforms other approaches in labeling and segmenting vertebrae. Our code is available at https://github.com/YuZhang-SMU/Vertebrae-Labeling-Segmentation.
Large-scale multi-center CT and MRI segmentation of pancreas with deep learning
Zhang Z, Keles E, Durak G, Taktak Y, Susladkar O, Gorade V, Jha D, Ormeci AC, Medetalibeyoglu A, Yao L, Wang B, Isler IS, Peng L, Pan H, Vendrami CL, Bourhani A, Velichko Y, Gong B, Spampinato C, Pyrros A, Tiwari P, Klatte DCF, Engels M, Hoogenboom S, Bolan CW, Agarunov E, Harfouch N, Huang C, Bruno MJ, Schoots I, Keswani RN, Miller FH, Gonda T, Yazici C, Tirkes T, Turkbey B, Wallace MB and Bagci U
Automated volumetric segmentation of the pancreas on cross-sectional imaging is needed for diagnosis and follow-up of pancreatic diseases. While CT-based pancreatic segmentation is more established, MRI-based segmentation methods are understudied, largely due to a lack of publicly available datasets, benchmarking research efforts, and domain-specific deep learning methods. In this retrospective study, we collected a large dataset (767 scans from 499 participants) of T1-weighted (T1 W) and T2-weighted (T2 W) abdominal MRI series from five centers between March 2004 and November 2022. We also collected CT scans of 1,350 patients from publicly available sources for benchmarking purposes. We introduced a new pancreas segmentation method, called PanSegNet, combining the strengths of nnUNet and a Transformer network with a new linear attention module enabling volumetric computation. We tested PanSegNet's accuracy in cross-modality (a total of 2,117 scans) and cross-center settings with Dice and Hausdorff distance (HD95) evaluation metrics. We used Cohen's kappa statistics for intra and inter-rater agreement evaluation and paired t-tests for volume and Dice comparisons, respectively. For segmentation accuracy, we achieved Dice coefficients of 88.3% (±7.2%, at case level) with CT, 85.0% (±7.9%) with T1 W MRI, and 86.3% (±6.4%) with T2 W MRI. There was a high correlation for pancreas volume prediction with R of 0.91, 0.84, and 0.85 for CT, T1 W, and T2 W, respectively. We found moderate inter-observer (0.624 and 0.638 for T1 W and T2 W MRI, respectively) and high intra-observer agreement scores. All MRI data is made available at https://osf.io/kysnj/. Our source code is available at https://github.com/NUBagciLab/PaNSegNet.
IGUANe: A 3D generalizable CycleGAN for multicenter harmonization of brain MR images
Roca V, Kuchcinski G, Pruvo JP, Manouvriez D, Lopes R, and
In MRI studies, the aggregation of imaging data from multiple acquisition sites enhances sample size but may introduce site-related variabilities that hinder consistency in subsequent analyses. Deep learning methods for image translation have emerged as a solution for harmonizing MR images across sites. In this study, we introduce IGUANe (Image Generation with Unified Adversarial Networks), an original 3D model that leverages the strengths of domain translation and straightforward application of style transfer methods for multicenter brain MR image harmonization. IGUANe extends CycleGAN by integrating an arbitrary number of domains for training through a many-to-one architecture. The framework based on domain pairs enables the implementation of sampling strategies that prevent confusion between site-related and biological variabilities. During inference, the model can be applied to any image, even from an unknown acquisition site, making it a universal generator for harmonization. Trained on a dataset comprising T1-weighted images from 11 different scanners, IGUANe was evaluated on data from unseen sites. The assessments included the transformation of MR images with traveling subjects, the preservation of pairwise distances between MR images within domains, the evolution of volumetric patterns related to age and Alzheimer's disease (AD), and the performance in age regression and patient classification tasks. Comparisons with other harmonization and normalization methods suggest that IGUANe better preserves individual information in MR images and is more suitable for maintaining and reinforcing variabilities related to age and AD. Future studies may further assess IGUANe in other multicenter contexts, either using the same model or retraining it for applications to different image modalities. Codes and the trained IGUANe model are available at https://github.com/RocaVincent/iguane_harmonization.git.
Enhancing chest X-ray datasets with privacy-preserving large language models and multi-type annotations: A data-driven approach for improved classification
Bigolin Lanfredi R, Mukherjee P and Summers RM
In chest X-ray (CXR) image analysis, rule-based systems are usually employed to extract labels from reports for dataset releases. However, there is still room for improvement in label quality. These labelers typically output only presence labels, sometimes with binary uncertainty indicators, which limits their usefulness. Supervised deep learning models have also been developed for report labeling but lack adaptability, similar to rule-based systems. In this work, we present MAPLEZ (Medical report Annotations with Privacy-preserving Large language model using Expeditious Zero shot answers), a novel approach leveraging a locally executable Large Language Model (LLM) to extract and enhance findings labels on CXR reports. MAPLEZ extracts not only binary labels indicating the presence or absence of a finding but also the location, severity, and radiologists' uncertainty about the finding. Over eight abnormalities from five test sets, we show that our method can extract these annotations with an increase of 3.6 percentage points (pp) in macro F1 score for categorical presence annotations and more than 20 pp increase in F1 score for the location annotations over competing labelers. Additionally, using the combination of improved annotations and multi-type annotations in classification supervision in a dataset of limited-resolution CXRs, we demonstrate substantial advancements in proof-of-concept classification quality, with an increase of 1.1 pp in AUROC over models trained with annotations from the best alternative approach. We share code and annotations.
LACOSTE: Exploiting stereo and temporal contexts for surgical instrument segmentation
Wang Q, Zhao S, Xu Z and Zhou SK
Surgical instrument segmentation is instrumental to minimally invasive surgeries and related applications. Most previous methods formulate this task as single-frame-based instance segmentation while ignoring the natural temporal and stereo attributes of a surgical video. As a result, these methods are less robust against the appearance variation through temporal motion and view change. In this work, we propose a novel LACOSTE model that exploits Location-Agnostic COntexts in Stereo and TEmporal images for improved surgical instrument segmentation. Leveraging a query-based segmentation model as core, we design three performance-enhancing modules. Firstly, we design a disparity-guided feature propagation module to enhance depth-aware features explicitly. To generalize well for even only a monocular video, we apply a pseudo stereo scheme to generate complementary right images. Secondly, we propose a stereo-temporal set classifier, which aggregates stereo-temporal contexts in a universal way for making a consolidated prediction and mitigates transient failures. Finally, we propose a location-agnostic classifier to decouple the location bias from mask prediction and enhance the feature semantics. We extensively validate our approach on three public surgical video datasets, including two benchmarks from EndoVis Challenges and one real radical prostatectomy surgery dataset GraSP. Experimental results demonstrate the promising performances of our method, which consistently achieves comparable or favorable results with previous state-of-the-art approaches.
A cross-attention-based deep learning approach for predicting functional stroke outcomes using 4D CTP imaging and clinical metadata
Amador K, Pinel N, Winder AJ, Fiehler J, Wilms M and Forkert ND
Acute ischemic stroke (AIS) remains a global health challenge, leading to long-term functional disabilities without timely intervention. Spatio-temporal (4D) Computed Tomography Perfusion (CTP) imaging is crucial for diagnosing and treating AIS due to its ability to rapidly assess the ischemic core and penumbra. Although traditionally used to assess acute tissue status in clinical settings, 4D CTP has also been explored in research for predicting stroke tissue outcomes. However, its potential for predicting functional outcomes, especially in combination with clinical metadata, remains unexplored. Thus, this work aims to develop and evaluate a novel multimodal deep learning model for predicting functional outcomes (specifically, 90-day modified Rankin Scale) in AIS patients by combining 4D CTP and clinical metadata. To achieve this, an intermediate fusion strategy with a cross-attention mechanism is introduced to enable a selective focus on the most relevant features and patterns from both modalities. Evaluated on a dataset comprising 70 AIS patients who underwent endovascular mechanical thrombectomy, the proposed model achieves an accuracy (ACC) of 0.77, outperforming conventional late fusion strategies (ACC = 0.73) and unimodal models based on either 4D CTP (ACC = 0.61) or clinical metadata (ACC = 0.71). The results demonstrate the superior capability of the proposed model to leverage complex inter-modal relationships, emphasizing the value of advanced multimodal fusion techniques for predicting functional stroke outcomes.
Beyond strong labels: Weakly-supervised learning based on Gaussian pseudo labels for the segmentation of ellipse-like vascular structures in non-contrast CTs
Ma Q, Kaladji A, Shu H, Yang G, Lucas A and Haigron P
Deep learning-based automated segmentation of vascular structures in preoperative CT angiography (CTA) images contributes to computer-assisted diagnosis and interventions. While CTA is the common standard, non-contrast CT imaging has the advantage of avoiding complications associated with contrast agents. However, the challenges of labor-intensive labeling and high labeling variability due to the ambiguity of vascular boundaries hinder conventional strong-label-based, fully-supervised learning in non-contrast CTs. This paper introduces a novel weakly-supervised framework using the elliptical topology nature of vascular structures in CT slices. It includes an efficient annotation process based on our proposed standards, an approach of generating 2D Gaussian heatmaps serving as pseudo labels, and a training process through a combination of voxel reconstruction loss and distribution loss with the pseudo labels. We assess the effectiveness of the proposed method on one local and two public datasets comprising non-contrast CT scans, particularly focusing on the abdominal aorta. On the local dataset, our weakly-supervised learning approach based on pseudo labels outperforms strong-label-based fully-supervised learning (1.54% of Dice score on average), reducing labeling time by around 82.0%. The efficiency in generating pseudo labels allows the inclusion of label-agnostic external data in the training set, leading to an additional improvement in performance (2.74% of Dice score on average) with a reduction of 66.3% labeling time, where the labeling time remains considerably less than that of strong labels. On the public dataset, the pseudo labels achieve an overall improvement of 1.95% in Dice score for 2D models with a reduction of 68% of the Hausdorff distance for 3D model.
Clinical knowledge-guided hybrid classification network for automatic periodontal disease diagnosis in X-ray image
Mei L, Deng K, Cui Z, Fang Y, Li Y, Lai H, Tonetti MS and Shen D
Accurate classification of periodontal disease through panoramic X-ray images carries immense clinical importance for effective diagnosis and treatment. Recent methodologies attempt to classify periodontal diseases from X-ray images by estimating bone loss within these images, supervised by manual radiographic annotations for segmentation or keypoint detection. However, these annotations often lack consistency with the clinical gold standard of probing measurements, potentially causing measurement inaccuracy and leading to unstable classifications. Additionally, the diagnosis of periodontal disease necessitates exceptional sensitivity. To address these challenges, we introduce HC-Net, an innovative hybrid classification framework devised for accurately classifying periodontal disease from X-ray images. This framework comprises three main components: tooth-level classification, patient-level classification, and a learnable adaptive noisy-OR gate. In the tooth-level classification, we initially employ instance segmentation to individually identify each tooth, followed by tooth-level periodontal disease classification. For patient-level classification, we utilize a multi-task strategy to concurrently learn patient-level classification and a Classification Activation Map (CAM) that signifies the confidence of local lesion areas within the panoramic X-ray image. Eventually, our adaptive noisy-OR gate acquires a hybrid classification by amalgamating predictions from both levels. In particular, we incorporate clinical knowledge into the workflows used by professional dentists, targeting the enhanced handling of sensitivity of periodontal disease diagnosis. Extensive empirical testing on a dataset amassed from real-world clinics demonstrates that our proposed HC-Net achieves unparalleled performance in periodontal disease classification, exhibiting substantial potential for practical application.
Dual structure-aware image filterings for semi-supervised medical image segmentation
Gu Y, Sun Z, Chen T, Xiao X, Liu Y, Xu Y and Najman L
Semi-supervised image segmentation has attracted great attention recently. The key is how to leverage unlabeled images in the training process. Most methods maintain consistent predictions of the unlabeled images under variations (e.g., adding noise/perturbations, or creating alternative versions) in the image and/or model level. In most image-level variation, medical images often have prior structure information, which has not been well explored. In this paper, we propose novel dual structure-aware image filterings (DSAIF) as the image-level variations for semi-supervised medical image segmentation. Motivated by connected filtering that simplifies image via filtering in structure-aware tree-based image representation, we resort to the dual contrast invariant Max-tree and Min-tree representation. Specifically, we propose a novel connected filtering that removes topologically equivalent nodes (i.e. connected components) having no siblings in the Max/Min-tree. This results in two filtered images preserving topologically critical structure. Applying the proposed DSAIF to mutually supervised networks decreases the consensus of their erroneous predictions on unlabeled images. This helps to alleviate the confirmation bias issue of overfitting to noisy pseudo labels of unlabeled images, and thus effectively improves the segmentation performance. Extensive experimental results on three benchmark datasets demonstrate that the proposed method significantly/consistently outperforms some state-of-the-art methods. The source codes will be publicly available.
LoViT: Long Video Transformer for surgical phase recognition
Liu Y, Boels M, Garcia-Peraza-Herrera LC, Vercauteren T, Dasgupta P, Granados A and Ourselin S
Online surgical phase recognition plays a significant role towards building contextual tools that could quantify performance and oversee the execution of surgical workflows. Current approaches are limited since they train spatial feature extractors using frame-level supervision that could lead to incorrect predictions due to similar frames appearing at different phases, and poorly fuse local and global features due to computational constraints which can affect the analysis of long videos commonly encountered in surgical interventions. In this paper, we present a two-stage method, called Long Video Transformer (LoViT), emphasizing the development of a temporally-rich spatial feature extractor and a phase transition map. The temporally-rich spatial feature extractor is designed to capture critical temporal information within the surgical video frames. The phase transition map provides essential insights into the dynamic transitions between different surgical phases. LoViT combines these innovations with a multiscale temporal aggregator consisting of two cascaded L-Trans modules based on self-attention, followed by a G-Informer module based on ProbSparse self-attention for processing global temporal information. The multi-scale temporal head then leverages the temporally-rich spatial features and phase transition map to classify surgical phases using phase transition-aware supervision. Our approach outperforms state-of-the-art methods on the Cholec80 and AutoLaparo datasets consistently. Compared to Trans-SVNet, LoViT achieves a 2.4 pp (percentage point) improvement in video-level accuracy on Cholec80 and a 3.1 pp improvement on AutoLaparo. Our results demonstrate the effectiveness of our approach in achieving state-of-the-art performance of surgical phase recognition on two datasets of different surgical procedures and temporal sequencing characteristics. The project page is available at https://github.com/MRUIL/LoViT.
An objective comparison of methods for augmented reality in laparoscopic liver resection by preoperative-to-intraoperative image fusion from the MICCAI2022 challenge
Ali S, Espinel Y, Jin Y, Liu P, Güttner B, Zhang X, Zhang L, Dowrick T, Clarkson MJ, Xiao S, Wu Y, Yang Y, Zhu L, Sun D, Li L, Pfeiffer M, Farid S, Maier-Hein L, Buc E and Bartoli A
Augmented reality for laparoscopic liver resection is a visualisation mode that allows a surgeon to localise tumours and vessels embedded within the liver by projecting them on top of a laparoscopic image. Preoperative 3D models extracted from Computed Tomography (CT) or Magnetic Resonance (MR) imaging data are registered to the intraoperative laparoscopic images during this process. Regarding 3D-2D fusion, most algorithms use anatomical landmarks to guide registration, such as the liver's inferior ridge, the falciform ligament, and the occluding contours. These are usually marked by hand in both the laparoscopic image and the 3D model, which is time-consuming and prone to error. Therefore, there is a need to automate this process so that augmented reality can be used effectively in the operating room. We present the Preoperative-to-Intraoperative Laparoscopic Fusion challenge (P2ILF), held during the Medical Image Computing and Computer Assisted Intervention (MICCAI 2022) conference, which investigates the possibilities of detecting these landmarks automatically and using them in registration. The challenge was divided into two tasks: (1) A 2D and 3D landmark segmentation task and (2) a 3D-2D registration task. The teams were provided with training data consisting of 167 laparoscopic images and 9 preoperative 3D models from 9 patients, with the corresponding 2D and 3D landmark annotations. A total of 6 teams from 4 countries participated in the challenge, whose results were assessed for each task independently. All the teams proposed deep learning-based methods for the 2D and 3D landmark segmentation tasks and differentiable rendering-based methods for the registration task. The proposed methods were evaluated on 16 test images and 2 preoperative 3D models from 2 patients. In Task 1, the teams were able to segment most of the 2D landmarks, while the 3D landmarks showed to be more challenging to segment. In Task 2, only one team obtained acceptable qualitative and quantitative registration results. Based on the experimental outcomes, we propose three key hypotheses that determine current limitations and future directions for research in this domain.
Efficient anatomical labeling of pulmonary tree structures via deep point-graph representation-based implicit fields
Xie K, Yang J, Wei D, Weng Z and Fua P
Pulmonary diseases rank prominently among the principal causes of death worldwide. Curing them will require, among other things, a better understanding of the complex 3D tree-shaped structures within the pulmonary system, such as airways, arteries, and veins. Traditional approaches using high-resolution image stacks and standard CNNs on dense voxel grids face challenges in computational efficiency, limited resolution, local context, and inadequate preservation of shape topology. Our method addresses these issues by shifting from dense voxel to sparse point representation, offering better memory efficiency and global context utilization. However, the inherent sparsity in point representation can lead to a loss of crucial connectivity in tree-shaped structures. To mitigate this, we introduce graph learning on skeletonized structures, incorporating differentiable feature fusion for improved topology and long-distance context capture. Furthermore, we employ an implicit function for efficient conversion of sparse representations into dense reconstructions end-to-end. The proposed method not only delivers state-of-the-art performance in labeling accuracy, both overall and at key locations, but also enables efficient inference and the generation of closed surface shapes. Addressing data scarcity in this field, we have also curated a comprehensive dataset to validate our approach. Data and code are available at https://github.com/M3DV/pulmonary-tree-labeling.
SpinDoctor-IVIM: A virtual imaging framework for intravoxel incoherent motion MRI
Lashgari M, Yang Z, Bernabeu MO, Li JR and Frangi AF
Intravoxel incoherent motion (IVIM) imaging is increasingly recognised as an important tool in clinical MRI, where tissue perfusion and diffusion information can aid disease diagnosis, monitoring of patient recovery, and treatment outcome assessment. Currently, the discovery of biomarkers based on IVIM imaging, similar to other medical imaging modalities, is dependent on long preclinical and clinical validation pathways to link observable markers derived from images with the underlying pathophysiological mechanisms. To speed up this process, virtual IVIM imaging is proposed. This approach provides an efficient virtual imaging tool to design, evaluate, and optimise novel approaches for IVIM imaging. In this work, virtual IVIM imaging is developed through a new finite element solver, SpinDoctor-IVIM, which extends SpinDoctor, a diffusion MRI simulation toolbox. SpinDoctor-IVIM simulates IVIM imaging signals by solving the generalised Bloch-Torrey partial differential equation. The input velocity to SpinDoctor-IVIM is computed using HemeLB, an established Lattice Boltzmann blood flow simulator. Contrary to previous approaches, SpinDoctor-IVIM accounts for volumetric microvasculature during blood flow simulations, incorporates diffusion phenomena in the intravascular space, and accounts for the permeability between the intravascular and extravascular spaces. The above-mentioned features of the proposed framework are illustrated with simulations on a realistic microvasculature model.
MedLSAM: Localize and segment anything model for 3D CT images
Lei W, Xu W, Li K, Zhang X and Zhang S
Recent advancements in foundation models have shown significant potential in medical image analysis. However, there is still a gap in models specifically designed for medical image localization. To address this, we introduce MedLAM, a 3D medical foundation localization model that accurately identifies any anatomical part within the body using only a few template scans. MedLAM employs two self-supervision tasks: unified anatomical mapping (UAM) and multi-scale similarity (MSS) across a comprehensive dataset of 14,012 CT scans. Furthermore, we developed MedLSAM by integrating MedLAM with the Segment Anything Model (SAM). This innovative framework requires extreme point annotations across three directions on several templates to enable MedLAM to locate the target anatomical structure in the image, with SAM performing the segmentation. It significantly reduces the amount of manual annotation required by SAM in 3D medical imaging scenarios. We conducted extensive experiments on two 3D datasets covering 38 distinct organs. Our findings are twofold: (1) MedLAM can directly localize anatomical structures using just a few template scans, achieving performance comparable to fully supervised models; (2) MedLSAM closely matches the performance of SAM and its specialized medical adaptations with manual prompts, while minimizing the need for extensive point annotations across the entire dataset. Moreover, MedLAM has the potential to be seamlessly integrated with future 3D SAM models, paving the way for enhanced segmentation performance. Our code is public at https://github.com/openmedlab/MedLSAM.
TopoTxR: A topology-guided deep convolutional network for breast parenchyma learning on DCE-MRIs
Wang F, Zou Z, Sakla N, Partyka L, Rawal N, Singh G, Zhao W, Ling H, Huang C, Prasanna P and Chen C
Characterization of breast parenchyma in dynamic contrast-enhanced magnetic resonance imaging (DCE-MRI) is a challenging task owing to the complexity of underlying tissue structures. Existing quantitative approaches, like radiomics and deep learning models, lack explicit quantification of intricate and subtle parenchymal structures, including fibroglandular tissue. To address this, we propose a novel topological approach that explicitly extracts multi-scale topological structures to better approximate breast parenchymal structures, and then incorporates these structures into a deep-learning-based prediction model via an attention mechanism. Our topology-informed deep learning model, TopoTxR, leverages topology to provide enhanced insights into tissues critical for disease pathophysiology and treatment response. We empirically validate TopoTxR using the VICTRE phantom breast dataset, showing that the topological structures extracted by our model effectively approximate the breast parenchymal structures. We further demonstrate TopoTxR's efficacy in predicting response to neoadjuvant chemotherapy. Our qualitative and quantitative analyses suggest differential topological behavior of breast tissue in treatment-naïve imaging, in patients who respond favorably to therapy as achieving pathological complete response (pCR) versus those who do not. In a comparative analysis with several baselines on the publicly available I-SPY 1 dataset (N = 161, including 47 patients with pCR and 114 without) and the Rutgers proprietary dataset (N = 120, with 69 patients achieving pCR and 51 not), TopoTxR demonstrates a notable improvement, achieving a 2.6% increase in accuracy and a 4.6% enhancement in AUC compared to the state-of-the-art method.
Ensemble transformer-based multiple instance learning to predict pathological subtypes and tumor mutational burden from histopathological whole slide images of endometrial and colorectal cancer
Wang CW, Liu TC, Lai PJ, Muzakky H, Wang YC, Yu MH, Wu CH and Chao TK
In endometrial cancer (EC) and colorectal cancer (CRC), in addition to microsatellite instability, tumor mutational burden (TMB) has gradually gained attention as a genomic biomarker that can be used clinically to determine which patients may benefit from immune checkpoint inhibitors. High TMB is characterized by a large number of mutated genes, which encode aberrant tumor neoantigens, and implies a better response to immunotherapy. Hence, a part of EC and CRC patients associated with high TMB may have higher chances to receive immunotherapy. TMB measurement was mainly evaluated by whole-exome sequencing or next-generation sequencing, which was costly and difficult to be widely applied in all clinical cases. Therefore, an effective, efficient, low-cost and easily accessible tool is urgently needed to distinguish the TMB status of EC and CRC patients. In this study, we present a deep learning framework, namely Ensemble Transformer-based Multiple Instance Learning with Self-Supervised Learning Vision Transformer feature encoder (ETMIL-SSLViT), to predict pathological subtype and TMB status directly from the H&E stained whole slide images (WSIs) in EC and CRC patients, which is helpful for both pathological classification and cancer treatment planning. Our framework was evaluated on two different cancer cohorts, including an EC cohort with 918 histopathology WSIs from 529 patients and a CRC cohort with 1495 WSIs from 594 patients from The Cancer Genome Atlas. The experimental results show that the proposed methods achieved excellent performance and outperforming seven state-of-the-art (SOTA) methods in cancer subtype classification and TMB prediction on both cancer datasets. Fisher's exact test further validated that the associations between the predictions of the proposed models and the actual cancer subtype or TMB status are both extremely strong (p<0.001). These promising findings show the potential of our proposed methods to guide personalized treatment decisions by accurately predicting the EC and CRC subtype and the TMB status for effective immunotherapy planning for EC and CRC patients.
Simulation-free prediction of atrial fibrillation inducibility with the fibrotic kernel signature
Banduc T, Azzolin L, Manninger M, Scherr D, Plank G, Pezzuto S and Sahli Costabal F
Computational models of atrial fibrillation (AF) can help improve success rates of interventions, such as ablation. However, evaluating the efficacy of different treatments requires performing multiple costly simulations by pacing at different points and checking whether AF has been induced or not, hindering the clinical application of these models. In this work, we propose a classification method that can predict AF inducibility in patient-specific cardiac models without running additional simulations. Our methodology does not require re-training when changing atrial anatomy or fibrotic patterns. To achieve this, we develop a set of features given by a variant of the heat kernel signature that incorporates fibrotic pattern information and fiber orientations: the fibrotic kernel signature (FKS). The FKS is faster to compute than a single AF simulation, and when paired with machine learning classifiers, it can predict AF inducibility in the entire domain. To learn the relationship between the FKS and AF inducibility, we performed 2371 AF simulations comprising 6 different anatomies and various fibrotic patterns, which we split into training and a testing set. We obtain a median F1 score of 85.2% in test set and we can predict the overall inducibility with a mean absolute error of 2.76 percent points, which is lower than alternative methods. We think our method can significantly speed-up the calculations of AF inducibility, which is crucial to optimize therapies for AF within clinical timelines. An example of the FKS for an open source model is provided in https://github.com/tbanduc/FKS_AtrialModel_Ferrer.git.
DACG: Dual Attention and Context Guidance model for radiology report generation
Lang W, Liu Z and Zhang Y
Medical images are an essential basis for radiologists to write radiology reports and greatly help subsequent clinical treatment. The task of generating automatic radiology reports aims to alleviate the burden of clinical doctors writing reports and has received increasing attention this year, becoming an important research hotspot. However, there are severe issues of visual and textual data bias and long text generation in the medical field. Firstly, Abnormal areas in radiological images only account for a small portion, and most radiological reports only involve descriptions of normal findings. Secondly, there are still significant challenges in generating longer and more accurate descriptive texts for radiology report generation tasks. In this paper, we propose a new Dual Attention and Context Guidance (DACG) model to alleviate visual and textual data bias and promote the generation of long texts. We use a Dual Attention Module, including a Position Attention Block and a Channel Attention Block, to extract finer position and channel features from medical images, enhancing the image feature extraction ability of the encoder. We use the Context Guidance Module to integrate contextual information into the decoder and supervise the generation of long texts. The experimental results show that our proposed model achieves state-of-the-art performance on the most commonly used IU X-ray and MIMIC-CXR datasets. Further analysis also proves that our model can improve reporting through more accurate anomaly detection and more detailed descriptions. The source code is available at https://github.com/LangWY/DACG.
Semi-supervised ViT knowledge distillation network with style transfer normalization for colorectal liver metastases survival prediction
Elforaici MEA, Montagnon E, Romero FP, Le WT, Azzi F, Trudel D, Nguyen B, Turcotte S, Tang A and Kadoury S
Colorectal liver metastases (CLM) affect almost half of all colon cancer patients and the response to systemic chemotherapy plays a crucial role in patient survival. While oncologists typically use tumor grading scores, such as tumor regression grade (TRG), to establish an accurate prognosis on patient outcomes, including overall survival (OS) and time-to-recurrence (TTR), these traditional methods have several limitations. They are subjective, time-consuming, and require extensive expertise, which limits their scalability and reliability. Additionally, existing approaches for prognosis prediction using machine learning mostly rely on radiological imaging data, but recently histological images have been shown to be relevant for survival predictions by allowing to fully capture the complex microenvironmental and cellular characteristics of the tumor. To address these limitations, we propose an end-to-end approach for automated prognosis prediction using histology slides stained with Hematoxylin and Eosin (H&E) and Hematoxylin Phloxine Saffron (HPS). We first employ a Generative Adversarial Network (GAN) for slide normalization to reduce staining variations and improve the overall quality of the images that are used as input to our prediction pipeline. We propose a semi-supervised model to perform tissue classification from sparse annotations, producing segmentation and feature maps. Specifically, we use an attention-based approach that weighs the importance of different slide regions in producing the final classification results. Finally, we exploit the extracted features for the metastatic nodules and surrounding tissue to train a prognosis model. In parallel, we train a vision Transformer model in a knowledge distillation framework to replicate and enhance the performance of the prognosis prediction. We evaluate our approach on an in-house clinical dataset of 258 CLM patients, achieving superior performance compared to other comparative models with a c-index of 0.804 (0.014) for OS and 0.735 (0.016) for TTR, as well as on two public datasets. The proposed approach achieves an accuracy of 86.9% to 90.3% in predicting TRG dichotomization. For the 3-class TRG classification task, the proposed approach yields an accuracy of 78.5% to 82.1%, outperforming the comparative methods. Our proposed pipeline can provide automated prognosis for pathologists and oncologists, and can greatly promote precision medicine progress in managing CLM patients.
FetusMapV2: Enhanced fetal pose estimation in 3D ultrasound
Chen C, Yang X, Huang Y, Shi W, Cao Y, Luo M, Hu X, Zhu L, Yu L, Yue K, Zhang Y, Xiong Y, Ni D and Huang W
Fetal pose estimation in 3D ultrasound (US) involves identifying a set of associated fetal anatomical landmarks. Its primary objective is to provide comprehensive information about the fetus through landmark connections, thus benefiting various critical applications, such as biometric measurements, plane localization, and fetal movement monitoring. However, accurately estimating the 3D fetal pose in US volume has several challenges, including poor image quality, limited GPU memory for tackling high dimensional data, symmetrical or ambiguous anatomical structures, and considerable variations in fetal poses. In this study, we propose a novel 3D fetal pose estimation framework (called FetusMapV2) to overcome the above challenges. Our contribution is three-fold. First, we propose a heuristic scheme that explores the complementary network structure-unconstrained and activation-unreserved GPU memory management approaches, which can enlarge the input image resolution for better results under limited GPU memory. Second, we design a novel Pair Loss to mitigate confusion caused by symmetrical and similar anatomical structures. It separates the hidden classification task from the landmark localization task and thus progressively eases model learning. Last, we propose a shape priors-based self-supervised learning by selecting the relatively stable landmarks to refine the pose online. Extensive experiments and diverse applications on a large-scale fetal US dataset including 1000 volumes with 22 landmarks per volume demonstrate that our method outperforms other strong competitors.