Local attention and long-distance interaction of rPPG for deepfake detection
With the development of generative models, abused Deepfakes have aroused public concerns. As a defense mechanism, face forgery detection methods have been intensively studied. Remote photoplethysmography (rPPG) technology extract heartbeat signal from recorded videos by examining the subtle changes in skin color caused by cardiac activity. Since the face forgery process inevitably disrupts the periodic changes in facial color, rPPG signal proves to be a powerful biological indicator for Deepfake detection. Motivated by the key observation that rPPG signals produce unique rhythmic patterns in terms of different manipulation methods, we regard Deepfake detection also as a source detection task. The Multi-scale Spatial-Temporal PPG map is adopted to further exploit heartbeat signal from multiple facial regions. Moreover, to capture both spatial and temporal inconsistencies, we propose a two-stage network consisting of a Mask-Guided Local Attention module (MLA) to capture unique local patterns of PPG maps, and a Temporal Transformer to interact features of adjacent PPG maps in long distance. Abundant experiments on FaceForensics + + and Celeb-DF datasets prove the superiority of our method over all other rPPG-based approaches. Visualization also demonstrates the effectiveness of the proposed method.
A bi-directional deep learning architecture for lung nodule semantic segmentation
Lung nodules are abnormal growths and lesions may exist. Both lungs may have nodules. Most lung nodules are harmless (not cancerous/malignant). Pulmonary nodules are rare in lung cancer. X-rays and CT scans identify the lung nodules. Doctors may term the growth a lung spot, coin lesion, or shadow. It is necessary to obtain properly computed tomography (CT) scans of the lungs to get an accurate diagnosis and a good estimate of the severity of lung cancer. This study aims to design and evaluate a deep learning (DL) algorithm for identifying pulmonary nodules (PNs) using the LUNA-16 dataset and examine the prevalence of PNs using DB-Net. The paper states that a new, resource-efficient deep learning architecture is called for, and it has been given the name of DB-NET. When a physician orders a CT scan, they need to employ an accurate and efficient lung nodule segmentation method because they need to detect lung cancer at an early stage. However, segmentation of lung nodules is a difficult task because of the nodules' characteristics on the CT image as well as the nodules' concealed shape, visual quality, and context. The DB-NET model architecture is presented as a resource-efficient deep learning solution for handling the challenge at hand in this paper. Furthermore, it incorporates the Mish nonlinearity function and the mask class weights to improve segmentation effectiveness. In addition to the LUNA-16 dataset, which contained 1200 lung nodules collected during the LUNA-16 test, the LUNA-16 dataset was extensively used to train and assess the proposed model. The DB-NET architecture surpasses the existing U-NET model by a dice coefficient index of 88.89%, and it also achieves a similar level of accuracy to that of human experts.
A robust defect detection method for syringe scale without positive samples
With the worldwide spread of the COVID-19 pandemic, the demand for medical syringes has increased dramatically. Scale defect, one of the most common defects on syringes, has become a major barrier to boosting syringe production. Existing methods for scale defect detection suffer from large volumes of data requirements and the inability to handle diverse and uncertain defects. In this paper, we propose a robust scale defects detection method with only negative samples and favorable detection performance to solve this problem. Different from conventional methods that work in a batch-mode defects detection manner, we propose to locate the defects on syringes with a two-stage framework, which consists of two components, that is, the scale extraction network and the scale defect discriminator. Concretely, the SeNet is first built to utilize the convolutional neural network to extract the main structure of the scale. After that, the scale defect discriminator is designed to detect and label the scale defects. To evaluate the performance of our method, we conduct experiments on one real-world syringe dataset. The competitive results, that is, 99.7% on F1, prove the effectiveness of our method.
A sophisticated and provably grayscale image watermarking system using DWT-SVD domain
Digital watermarking has attracted increasing attentions as it has been the current solution to copyright protection and content authentication in today's digital transformation, which has become an issue to be addressed in multimedia technology. In this paper, we propose an advanced image watermarking system based on the discrete wavelet transform (DWT) in combination with the singular value decomposition (SVD). Firstly, at the sender side, DWT is applied on a grayscale cover image and then eigendecomposition is performed on original HH (high-high) components. Similar operation is done on a grayscale watermark image. Then, two unitary and one diagonal matrices are combined to form a digital watermarked image applying inverse discrete wavelet transform (iDWT). The diagonal component of original image is transmitted through secured channel. At the receiver end, the watermark image is recovered using the watermarked image and diagonal component of the original image. Finally, we compare the original and recovered watermark image and obtained perfect normalized correlation. Simulation consequences indicate that the presented scheme can satisfy the needs of visual imperceptibility and also has high security and strong robustness against many common attacks and signal processing operations. The proposed digital image watermarking system is also compared to state-of-the-art methods to confirm the reliability and supremacy.
A multimodal transformer to fuse images and metadata for skin disease classification
Skin disease cases are rising in prevalence, and the diagnosis of skin diseases is always a challenging task in the clinic. Utilizing deep learning to diagnose skin diseases could help to meet these challenges. In this study, a novel neural network is proposed for the classification of skin diseases. Since the datasets for the research consist of skin disease images and clinical metadata, we propose a novel multimodal Transformer, which consists of two encoders for both images and metadata and one decoder to fuse the multimodal information. In the proposed network, a suitable Vision Transformer (ViT) model is utilized as the backbone to extract image deep features. As for metadata, they are regarded as labels and a new Soft Label Encoder (SLE) is designed to embed them. Furthermore, in the decoder part, a novel Mutual Attention (MA) block is proposed to better fuse image features and metadata features. To evaluate the model's effectiveness, extensive experiments have been conducted on the private skin disease dataset and the benchmark dataset ISIC 2018. Compared with state-of-the-art methods, the proposed model shows better performance and represents an advancement in skin disease diagnosis.
Retinopathy grading with deep learning and wavelet hyper-analytic activations
Recent developments reveal the prominence of Diabetic Retinopathy (DR) grading. In the past few decades, Wavelet-based DR classification has shown successful impacts and the Deep Learning models, like Convolutional Neural Networks (CNN's), have evolved in offering the highest prediction accuracy. In this work, the features of the input image are enhanced with the integration of Multi-Resolution Analysis (MRA) and a CNN framework without costing more convolution filters. The bottleneck with conventional activation functions, used in CNN's, is the nullification of the feature maps that are negative in value. In this work, a novel Hyper-analytic Wavelet ( is formulated with unique characteristics for the wavelet sub-bands. Instead of dismissal, the function transforms these negative coefficients that correspond to significant edge feature maps The hyper-analytic wavelet phase forms the imaginary part of the complex activation. And the hyper-parameter of the activation function is selected such that the corresponding magnitude spectrum produces monotonic and effective activations. The performance of 3 CNN models (1 custom, shallow CNN, ResNet with Soft attention, Alex Net for DR) with spatial-Wavelet quilts is better. With the spatial-Wavelet quilts, the Alex Net for DR has an improvement with an 11% of accuracy level (from 87 to 98%). The highest accuracy level of 98% and the highest Sensitivity of 99% are attained through Modified Alex Net for DR. The proposal also illustrates the visualization of the negative edge preservation with assumed image patches. From this study, the researcher infers that models with spatial-Wavelet quilts, with the hyper-analytic activations, have better generalization ability. And the visualization of heat maps provides evidence of better learning of the feature maps from the wavelet sub-bands.
Motion-compensated online object tracking for activity detection and crowd behavior analysis
It is a nontrivial task to manage crowds in public places and recognize unacceptable behavior (such as violating social distancing norms during the COVID-19 pandemic). In such situations, people should avoid loitering (unnecessary moving out in public places without apparent purpose) and maintain a sufficient physical distance. In this study, a multi-object tracking algorithm has been introduced to improve short-term object occlusion, detection errors, and identity switches. The objects are tracked through bounding box detection and with linear velocity estimation of the object using the Kalman filter frame by frame. The predicted tracks are kept alive for some time, handling the missing detections and short-term object occlusion. ID switches (mainly due to crossing trajectories) are managed by explicitly considering the motion direction of the objects in real time. Furthermore, a novel approach to detect unusual behavior of loitering with a severity level is proposed based on the tracking information. An adaptive algorithm is also proposed to detect physical distance violation based on the object dimensions for the entire length of the track. At last, a mathematical approach to calculate actual physical distance is proposed by using the height of a human as a reference object which adheres more specific distancing norms. The proposed approach is evaluated in traffic and pedestrian movement scenarios. The experimental results demonstrate a significant improvement in the results.
Illumination-aware group portrait compositor
We present a novel compositing framework for full-length human figures that maintains their surface details and appends the localized nature of light and shadow, thereby synthesizing composite results with high visual coherence. The framework is extended from the compositing pipeline proposed in our previous study so that it deploys five stages for photometric information estimation, as well as for 3D reconstruction, global illumination simulation, lighting transfer, and compositing. Based on the interpretation that a sense of coexistence can be achieved through visual coherence, we demonstrate that the proposed framework functions properly as a group portrait compositor. The composite results that the proposed framework composed the images separately rendered 3D human models compared favorably with the results which rendered multiple avatars together. Based on this empirical evaluation, the proposed framework is expected as a new means of fostering a sense of coexistence in remote societies and of efficiently generating highly photorealistic cyberworlds.
Histogram equalization using a selective filter
Many popular modern image processing software packages implement a naïve form of histogram equalization. This implementation is known to produce histograms that are not truly uniform. While exact histogram equalization techniques exist, these may produce undesirable artifacts in some scenarios. In this paper we consider the link between the established continuous theory for global histogram equalization and its discrete implementation, and we formulate a novel histogram equalization technique that builds upon and considerably improves the naïve approach. We show that we can linearly interpolate the cumulative distribution of a low-bit image by approximately dequantizing its intensities using a selective box filter. This helps to distribute the intensities more evenly. The proposed algorithm is subsequently evaluated and compared with existing works in the literature. We find that the method is capable of producing an equalized histogram that has a high entropy, while distances between similar intensities are preserved. The described approach has implications on several related image processing problems, e.g., edge detection.
Fingerprint-based robust medical image watermarking in hybrid transform
To protect the medical images integrity, digital watermark is embedded into the medical images. A non-blind medical image watermarking scheme based on hybrid transform is propounded. In this paper, fingerprint of the patient is used as watermark for better authentication, identifying the original medical image and privacy of the patients. In this scheme, lifting wavelet transform (LWT) and discrete wavelet transform (DWT) are utilized for amplifying the watermarking algorithm. The scaling and embedding factors are calculated adaptively with the help of Local Binary Pattern values of the host medical image to achieve better imperceptibility and robustness for medical images and fingerprint watermark, respectively. Two-level decomposition is done where for the first level LWT is utilized and for the second level decomposition DWT is utilized. At the extraction side, non-blind recovery of fingerprint watermark is performed which is similar to the embedding process. The propounded design is implemented on various medical images like Chest X-ray, CT scan and so on. The propounded design provides better imperceptibility and robustness with the combination of LWT-DWT. The result analysis proves that the proposed fingerprint watermarking scheme has attained best results in terms of robustness and authentication with different medical image attacks. Peak Signal to Noise Ratio and Normalized Correlation Coefficient metrics are used for evaluating the proposed scheme. Furthermore, superior results are obtained when compared to related medical image watermarking schemes.
Observation-driven generation of texture maps depicting dust accumulation over time
The perception of realism in computer-generated images can be significantly enhanced by subtle visual cues. Among those, one can highlight the presence of dust on synthetic objects, which is often subject to temporal variations in real settings. In this paper, we present a framework for the generation of textures representing the accumulation of this ubiquitous material over time in indoor settings. It employs a physically inspired approach to portray the effects of different levels of accumulated dust roughness on the appearance of substrate surfaces and to modulate these effects according to the different illumination and viewing geometries. The development of its core algorithms was guided by empirical insights and data obtained from observational experiments which are also described. To illustrate its applicability to the rendering of visually plausible depictions of time-dependent changes in dusty scenes, we provide sequences of images obtained considering distinct dust accumulation scenarios.
3D objects reconstruction from frontal images: an example with guitars
This work deals with the automatic 3D reconstruction of objects from frontal RGB images. This aims at a better understanding of the reconstruction of 3D objects from RGB images and their use in immersive virtual environments. We propose a complete workflow that can be easily adapted to almost any other family of rigid objects. To explain and validate our method, we focus on guitars. First, we detect and segment the guitars present in the image using semantic segmentation methods based on convolutional neural networks. In a second step, we perform the final 3D reconstruction of the guitar by warping the rendered depth maps of a fitted 3D template in 2D image space to match the input silhouette. We validated our method by obtaining guitar reconstructions from real input images and renders of all guitar models available in the ShapeNet database. Numerical results for different object families were obtained by computing standard mesh evaluation metrics such as Intersection over Union, Chamfer Distance, and the F-score. The results of this study show that our method can automatically generate high-quality 3D object reconstructions from frontal images using various segmentation and 3D reconstruction techniques.
A paced multi-stage block-wise approach for object detection in thermal images
The growing advocacy of thermal imagery in applications, such as autonomous vehicles, surveillance, and COVID-19 detection, necessitates accurate object detection frameworks for the thermal domain. Conventional methods could fall short, especially in situations with poor lighting, for instance, detection during night-time. In this paper, we propose a paced multi-stage block-wise framework for effectively detecting objects from thermal images. Our approach utilizes the pre-existing knowledge of deep neural network-based object detectors trained on large-scale natural image data to enhance performance in the thermal domain constructively. The employed, multi-stage approach drives our model to achieve higher accuracies. And the introduction of the pace parameter during domain adaption enables efficient training. Our experimental results demonstrate that the framework outperforms previous benchmarks on the FLIR ADAS dataset on the person, bicycle, and car categories. We have also illustrated further analysis of the framework, such as the effect of its components on accuracy and training efficiency, its generalizability to other thermal datasets, and its superior performance on night-time images in contrast to state-of-the-art RGB object detectors.
Edge-enhanced instance segmentation by grid regions of interest
This paper focuses on the instance segmentation task. The purpose of instance segmentation is to jointly detect, classify and segment individual instances in images, so it is used to solve a large number of industrial tasks such as novel coronavirus diagnosis and autonomous driving. However, it is not easy for instance models to achieve good results in terms of both efficiency of prediction classes and segmentation results of instance edges. We propose a single-stage instance segmentation model EEMask (edge-enhanced mask), which generates grid ROIs (regions of interest) instead of proposal boxes. EEMask divides the image uniformly according to the grid and then calculates the relevance between the grids based on the distance and grayscale values. Finally, EEMask uses the grid relevance to generate grid ROIs and grid classes. In addition, we design an edge-enhanced layer, which enhances the model's ability to perceive instance edges by increasing the number of channels with higher contrast at the instance edges. There is not any additional convolutional layer overhead, so the whole process is efficient. We evaluate EEMask on a public benchmark. On average, EEMask is 17.8% faster than BlendMask with the same training schedule. EEMask achieves a mask AP score of 39.9 on the MS COCO dataset, which outperforms Mask RCNN by 7.5% and BlendMask by 3.9%.
Virtual object sizes for efficient and convenient mid-air manipulation
It has been taken for granted that the sizes of virtual objects affect the efficiency and convenience of mid-air manipulation in immersive virtual environments. If a virtual object is too small or too large, for example, manipulating it becomes a difficult task. Nevertheless, the virtual object sizes that are optimal and convenient have rarely been studied. In this paper, we select a virtual object with many distinct geometric features and conduct user studies via docking tasks. Through the user studies, the optimal and convenient sizes for mid-air manipulation are estimated. In order to verify the results, a proxy-based manipulation method is designed and implemented, where the proxy is created with the estimated optimal size. The test based on the method shows that the optimal-size proxy enables users to manipulate efficiently virtual objects and the estimated range of convenient sizes is also preferred by the users.
LungSeek: 3D Selective Kernel residual network for pulmonary nodule diagnosis
Early detection and diagnosis of pulmonary nodules is the most promising way to improve the survival chances of lung cancer patients. This paper proposes an automatic pulmonary cancer diagnosis system, LungSeek. LungSeek is mainly divided into two modules: (1) Nodule detection, which detects all suspicious nodules from computed tomography (CT) scan; (2) Nodule Classification, classifies nodules as benign or malignant. Specifically, a 3D Selective Kernel residual network (SK-ResNet) based on the Selective Kernel Network and 3D residual network is located. A deep 3D region proposal network with SK-ResNet is designed for detection of pulmonary nodules while a multi-scale feature fusion network is designed for the nodule classification. Both networks use the SK-Net module to obtain different receptive field information, thereby effectively learning nodule features and improving diagnostic performance. Our method has been verified on the luna16 data set, reaching 89.06, 94.53% and 97.72% when the average number of false positives is 1, 2 and 4, respectively. Meanwhile, its performance is better than the state-of-the-art method and other similar networks and experienced doctors. This method has the ability to adaptively adjust the receptive field according to multiple scales of the input information, so as to better detect nodules of various sizes. The framework of LungSeek based on 3D SK-ResNet is proposed for nodule detection and nodule classification from chest CT. Our experimental results demonstrate the effectiveness of the proposed method in the diagnosis of pulmonary nodules.
PotteryVR: virtual reality pottery
Handcrafting ceramic pottery in the traditional method or virtual reality (VR) with intricate surface details is still challenging for the ceramic and graphic artist. Free-form pottery modeling can be efficiently geometrically modeled with the right tools with detailed 3D print outputs, yet challenging to be manufactured using traditional art. The new advanced pottery VR simulation is a promising method to recreate the traditional pottery simulation for a better experience with some barriers. The challenges that arise from surface detail in pottery are a tedious task accomplished by mesh blending and retopology. This paper focuses on refining the VP application's performance by adding unique sound resonance as a more likely infinite geometric phenomenon textures, blending it into the basic shapes. This paper combines creativity and visual computing technologies such as VR, mesh blending, fixing errors, and 3D printing to bring the ceramic artist's imagination to life. We have used sound resonance with virtual pottery (VP) systems refinements to demonstrate several standard pottery methods from free form deformed pottery, retopology, mesh blended for surface details, and 3D printed pottery with materials including polymer and ceramic resins.
Segmentation and classification on chest radiography: a systematic survey
Chest radiography (X-ray) is the most common diagnostic method for pulmonary disorders. A trained radiologist is required for interpreting the radiographs. But sometimes, even experienced radiologists can misinterpret the findings. This leads to the need for computer-aided detection diagnosis. For decades, researchers were automatically detecting pulmonary disorders using the traditional computer vision (CV) methods. Now the availability of large annotated datasets and computing hardware has made it possible for deep learning to dominate the area. It is now the modus operandi for feature extraction, segmentation, detection, and classification tasks in medical imaging analysis. This paper focuses on the research conducted using chest X-rays for the lung segmentation and detection/classification of pulmonary disorders on publicly available datasets. The studies performed using the Generative Adversarial Network (GAN) models for segmentation and classification on chest X-rays are also included in this study. GAN has gained the interest of the CV community as it can help with medical data scarcity. In this study, we have also included the research conducted before the popularity of deep learning models to have a clear picture of the field. Many surveys have been published, but none of them is dedicated to chest X-rays. This study will help the readers to know about the existing techniques, approaches, and their significance.
SAL3D: a model for saliency prediction in 3D meshes
Advances in virtual and augmented reality have increased the demand for immersive and engaging 3D experiences. To create such experiences, it is crucial to understand visual attention in 3D environments, which is typically modeled by means of saliency maps. While attention in 2D images and traditional media has been widely studied, there is still much to explore in 3D settings. In this work, we propose a deep learning-based model for predicting saliency when viewing 3D objects, which is a first step toward understanding and predicting attention in 3D environments. Previous approaches rely solely on low-level geometric cues or unnatural conditions, however, our model is trained on a dataset of real viewing data that we have manually captured, which indeed reflects actual human viewing behavior. Our approach outperforms existing state-of-the-art methods and closely approximates the ground-truth data. Our results demonstrate the effectiveness of our approach in predicting attention in 3D objects, which can pave the way for creating more immersive and engaging 3D experiences.