SIGNAL PROCESSING-IMAGE COMMUNICATION

A novel attention-based enhancement framework for face mask detection in complicated scenarios
Zhang H, Tang J, Wu P, Li H and Zeng N
In the context of COVID-19 pandemic prevention and control, it is of vital significance to realize accurate face mask detection via computer vision technique. In this paper, a novel attention improved Yolo (AI-Yolo) model is proposed, which can handle existing challenges in the complicated real-world scenarios with dense distribution, small-size object detection and interference of similar occlusions. In particular, a selective kernel (SK) module is set to achieve convolution domain soft attention mechanism with split, fusion and selection operations; a spatial pyramid pooling (SPP) module is applied to enhance the expression of local and global features, which enriches the receptive field information; and a feature fusion (FF) module is utilized to promote sufficient fusions of multi-scale features from each resolution branch, which adopts basic convolution operators without excessive computational complexity. In addition, the complete intersection over union (CIoU) loss function is adopted in the training stage for accurate positioning. Experiments are carried out on two challenging public face mask detection datasets, and the results demonstrate the superiority of the proposed AI-Yolo against other seven state-of-the-art object detection algorithms, which achieves the best results in terms of mean average precision and F1 score on both datasets. Furthermore, effectiveness of the meticulously designed modules in AI-Yolo is validated through extensive ablation studies. In a word, the proposed AI-Yolo is competent to accomplish face mask detection tasks under extremely complex situations with precise localization and accurate classification.
MID-UNet: Multi-input directional UNet for COVID-19 lung infection segmentation from CT images
Chi J, Zhang S, Han X, Wang H, Wu C and Yu X
Coronavirus Disease 2019 (COVID-19) has spread globally since the first case was reported in December 2019, becoming a world-wide existential health crisis with over 90 million total confirmed cases. Segmentation of lung infection from computed tomography (CT) scans via deep learning method has a great potential in assisting the diagnosis and healthcare for COVID-19. However, current deep learning methods for segmenting infection regions from lung CT images suffer from three problems: (1) Low differentiation of semantic features between the COVID-19 infection regions, other pneumonia regions and normal lung tissues; (2) High variation of visual characteristics between different COVID-19 cases or stages; (3) High difficulty in constraining the irregular boundaries of the COVID-19 infection regions. To solve these problems, a multi-input directional UNet (MID-UNet) is proposed to segment COVID-19 infections in lung CT images. For the input part of the network, we firstly propose an image blurry descriptor to reflect the texture characteristic of the infections. Then the original CT image, the image enhanced by the adaptive histogram equalization, the image filtered by the non-local means filter and the blurry feature map are adopted together as the input of the proposed network. For the structure of the network, we propose the directional convolution block (DCB) which consist of 4 directional convolution kernels. DCBs are applied on the short-cut connections to refine the extracted features before they are transferred to the de-convolution parts. Furthermore, we propose a contour loss based on local curvature histogram then combine it with the binary cross entropy (BCE) loss and the intersection over union (IOU) loss for better segmentation boundary constraint. Experimental results on the COVID-19-CT-Seg dataset demonstrate that our proposed MID-UNet provides superior performance over the state-of-the-art methods on segmenting COVID-19 infections from CT images.
COVID-19 discrimination framework for X-ray images by considering radiomics, selective information, feature ranking, and a novel hybrid classifier
Koyuncu H and Barstuğan M
In medical imaging procedures for the detection of coronavirus, apart from medical tests, approval of diagnosis has special significance. Imaging procedures are also useful for detecting the damage caused by COVID-19. Chest X-ray imaging is frequently used to diagnose COVID-19 and different pneumonias. This paper presents a task-specific framework to detect coronavirus in X-ray images. Binary classification of three different labels (healthy, bacterial pneumonia, and COVID-19) was performed on two differentiated data sets in which corona is stated as positive. First-order statistics, gray level co-occurrence matrix, gray level run length matrix, and gray level size zone matrix were analyzed to form fifteen sub-data sets and to ascertain the necessary radiomics. Two normalization methods are compared to make the data meaningful. Furthermore, five feature ranking approaches (, , , , and ) are mentioned to provide necessary information to a state-of-the-art classifier based on Gauss-map-based chaotic particle swarm optimization and neural networks. The proposed framework was designed according to the analyses about radiomics, normalization approaches, and filter-based feature ranking methods. In experiments, seven metrics were evaluated to objectively determine the results: accuracy, area under the receiver operating characteristic (ROC) curve, sensitivity, specificity, g-mean, precision, and f-measure. The proposed framework showed promising scores on two X-ray-based data sets, especially with the accuracy and area under the ROC curve rates exceeding 99% for the classification of coronavirus . others.
Predicting ASD Diagnosis in Children with Synthetic and Image-based Eye Gaze Data
Liaqat S, Wu C, Duggirala PR, Cheung SS, Chuah CN, Ozonoff S and Young G
As early intervention is highly effective for young children with autism spectrum disorder (ASD), it is imperative to make accurate diagnosis as early as possible. ASD has often been associated with atypical visual attention and eye gaze data can be collected at a very early age. An automatic screening tool based on eye gaze data that could identify ASD risk offers the opportunity for intervention before the full set of symptoms is present. In this paper, we propose two machine learning methods, synthetic saccade approach and image based approach, to automatically classify ASD given children's eye gaze data collected from free-viewing tasks of natural images. The first approach uses a generative model of synthetic saccade patterns to represent the baseline scan-path from a typical non-ASD individual and combines it with the real scan-path as well as other auxiliary data as inputs to a deep learning classifier. The second approach adopts a more holistic image-based approach by feeding the input image and a sequence of fixation maps into a convolutional or recurrent neural network. Using a publicly-accessible collection of children's gaze data, our experiments indicate that the ASD prediction accuracy reaches 67.23% accuracy on the validation dataset and 62.13% accuracy on the test dataset.
Visibility of Quantization Errors in Reversible JPEG2000
Liu F, Ahanonu EL, Marcellin MW, Lin Y, Ashok A and Bilgin A
Image compression systems that exploit the properties of the human visual system have been studied extensively over the past few decades. For the JPEG2000 image compression standard, all previous methods that aim to optimize perceptual quality have considered the pipeline of the standard. In this work, we propose an approach for the pipeline of the JPEG2000 standard. We introduce a new methodology to measure visibility of quantization errors when reversible color and wavelet transforms are employed. Incorporation of the visibility thresholds using this methodology into a JPEG2000 encoder enables creation of scalable codestreams that can provide both near-threshold and numerically lossless representations, which is desirable in applications where restoration of original image samples is required. Most importantly, this is the first work that quantifies the bitrate penalty incurred by the reversible transforms in near-threshold image compression compared to the irreversible transforms.
Secure transport and adaptation of MC-EZBC video utilizing H.264-based transport protocols
Hellwagner H, Hofbauer H, Kuschnig R, Stütz T and Uhl A
Universal Multimedia Access (UMA) calls for solutions where content is created once and subsequently adapted to given requirements. With regard to UMA and scalability, which is required often due to a wide variety of end clients, the best suited codecs are wavelet based (like the MC-EZBC) due to their inherent high number of scaling options. However, most transport technologies for delivering videos to end clients are targeted toward the H.264/AVC standard or, if scalability is required, the H.264/SVC. In this paper we will introduce a mapping of the MC-EZBC bitstream to existing H.264/SVC based streaming and scaling protocols. This enables the use of highly scalable wavelet based codecs on the one hand and the utilization of already existing network technologies without accruing high implementation costs on the other hand. Furthermore, we will evaluate different scaling options in order to choose the best option for given requirements. Additionally, we will evaluate different encryption options based on transport and bitstream encryption for use cases where digital rights management is required.
A probabilistic approach to incorporating domain knowledge for closed-room people monitoring
Tao J and Tan YP
We propose a novel probabilistic approach to recognizing people entering and leaving a closed room in human work place or living environment. Specifically, people in the view of a monitoring camera are first tracked and represented using low-level color features. Based on a new color similarity measure, optimal recognition of people leaving and entering the room is carried out by probabilistic reasoning under the constraints imposed by the domain knowledge, e.g., a person currently inside a room cannot enter again without first leaving it, and vice versa. The novelty of our work mainly lies in the development of a systematic way to incorporate the correlation and constraint among a sequence of people observations, and the optimality of recognition is achieved by maximizing a joint posterior probability of the observations. Experimental results of real and synthetic data are presented to show the efficacy of the proposed approach.