Automated detection of elephants in wildlife video
Biologists often have to investigate large amounts of video in behavioral studies of animals. These videos are usually not sufficiently indexed which makes the finding of objects of interest a time-consuming task. We propose a fully automated method for the detection and tracking of elephants in wildlife video which has been collected by biologists in the field. The method dynamically learns a color model of elephants from a few training images. Based on the color model, we localize elephants in video sequences with different backgrounds and lighting conditions. We exploit temporal clues from the video to improve the robustness of the approach and to obtain spatial and temporal consistent detections. The proposed method detects elephants (and groups of elephants) of different sizes and poses performing different activities. The method is robust to occlusions (e.g., by vegetation) and correctly handles camera motion and different lighting conditions. Experiments show that both near- and far-distant elephants can be detected and tracked reliably. The proposed method enables biologists efficient and direct access to their video collections which facilitates further behavioral and ecological studies. The method does not make hard constraints on the species of elephants themselves and is thus easily adaptable to other animal species.
Fast Graph Partitioning Active Contours for Image Segmentation Using Histograms
We present a method to improve the accuracy and speed, as well as significantly reduce the memory requirements, for the recently proposed Graph Partitioning Active Contours (GPACs) algorithm for image segmentation in the work of Sumengen and Manjunath (2006). Instead of computing an approximate but still expensive dissimilarity matrix of quadratic size, , for a 2D image of size × and regular image tiles of size × , we use and an intensity-based matrix to compute terms associated with the complete × dissimilarity matrix. This computationally efficient reformulation of GPAC using a very small memory footprint offers two distinct advantages over the original implementation. It speeds up convergence of the evolving active contour and seamlessly extends performance of GPAC to multidimensional images.
Making texture descriptors invariant to blur
Besides a high distinctiveness, robustness (or invariance) to image degradations is very desirable for texture feature extraction methods in real-world applications. In this paper, focus is on making arbitrary texture descriptors invariant to blur which is often prevalent in real image data. From previous work, we know that most state-of-the-art texture feature extraction methods are unable to cope even with minor blur degradations if the classifier's training stage is based on idealistic data. However, if the training set suffers similarly from the degradations, the obtained accuracies are significantly higher. Exploiting that knowledge, in this approach the level of blur of each image is increased to a certain threshold, based on the estimation of a blur measure. Experiments with synthetically degraded data show that the method is able to generate a high degree of blur invariance without loosing too much distinctiveness. Finally, we show that our method is not limited to ideal Gaussian blur.
Patch-based models and algorithms for image denoising: a comparative review between patch-based images denoising methods for additive noise reduction
Digital images are captured using sensors during the data acquisition phase, where they are often contaminated by noise (an undesired random signal). Such noise can also be produced during transmission or by poor-quality lossy image compression. Reducing the noise and enhancing the images are considered the central process to all other digital image processing tasks. The improvement in the performance of image denoising methods would contribute greatly on the results of other image processing techniques. Patch-based denoising methods recently have merged as the state-of-the-art denoising approaches for various additive noise levels. In this work, the use of the state-of-the-art patch-based denoising methods for additive noise reduction is investigated. Various types of image datasets are addressed to conduct this study.
Multiview video plus depth transmission via virtual-view-assisted complementary down/upsampling
Multiview video plus depth is a popular 3D video format which can provide viewers a vivid 3D feeling. However, its requirements in terms of computational complexity and transmission bandwidth are more than that of conventional 2D video. To mitigate these limitations, some works have proposed to reduce the amount of transmitted data by adopting different resolutions for different views, and consequently, the transmitted video is called mixed resolution video. In order to further reduce the transmitted data and maintain good quality at the decoder side; in this paper, we propose a down/upsampling algorithm for 3D multiview video which systematically takes into account the video encoder and decoder. At the encoder side, the rows of the two adjacent views are downsampled following an interlacing and complementary fashion, whereas, at the decoder side, the discarded pixels are recovered by fusing the virtual view pixels with the directional interpolated pixels from the complementary downsampled views. Moreover, the patterns of the texture surrounding the discarded pixels are used to aid the data fusion, so as to enhance edges recovery. Meanwhile, with the assistance of virtual views, at the decoder side, the proposed approach can effectively recover the discarded high-frequency details. The experimental results demonstrate the superior performance of the proposed framework.
Image denoising with morphology- and size-adaptive block-matching transform domain filtering
BM3D is a state-of-the-art image denoising method. Its denoised results in the regions with strong edges can often be better than in the regions with smooth or weak edges, due to more accurate block-matching for the strong-edge regions. So using adaptive block sizes on different image regions may result in better image denoising. Based on these observations, in this paper, we first partition each image into regions belonging to one of the three morphological components, i.e., contour, texture, and smooth components, according to the regional energy of alternating current (AC) coefficients of discrete cosine transform (DCT). Then, we can adaptively determine the block size for each morphological component. Specifically, we use the smallest block size for the contour components, the medium block size for the texture components, and the largest block size for the smooth components. To better preserve image details, we also use a multi-stage strategy to implement image denoising, where every stage is similar to the BM3D method, except using adaptive sizes and different transform dimensions. Experimental results show that our proposed algorithm can achieve higher PSNR and MSSIM values than the BM3D method, and also better visual quality of denoised images than by the BM3D method and some other existing state-of-the-art methods.
A robust iterative algorithm for image restoration
We present a new image restoration method by combining iterative VanCittert algorithm with noise reduction modeling. Our approach enables decoupling between deblurring and denoising during the restoration process, so allows any well-established noise reduction operator to be implemented in our model, independent of the VanCittert deblurring operation. Such an approach has led to an analytic expression for error estimation of the restored images in our method as well as simple parameter setting for real applications, both of which are hard to attain in many regularization-based methods. Numerical experiments show that our method can achieve good balance between structure recovery and noise reduction, and perform close to the level of the state of the art method and favorably compared to many other methods.
Improved BM3D image denoising using SSIM-optimized Wiener filter
Image denoising is considered a salient pre-processing step in sophisticated imaging applications. Over the decades, numerous studies have been conducted in denoising. Recently proposed added a new dimension to the study of denoising. BM3D is the current state-of-the-art of denoising and is capable of achieving better denoising as compared to any other existing method. However, there is room to improve BM3D to achieve high-quality denoising. In this study, to improve BM3D, we first attempted to improve the Wiener filter (the core of BM3D) by maximizing the structural similarity (SSIM) between the true and the estimated image, instead of minimizing the mean square error (MSE) between them. Moreover, for the profile, we introduced a 3D zigzag thresholding. Experimental results demonstrate that regardless of the type of the image, our proposed method achieves better denoising performance than that of BM3D.
Reliable 3D video streaming considering region of interest
3D video applications are growing more common as communication technology becomes more predominant nowadays. With such increasing demand for the 3D multimedia services in either the wired or wireless networks, robust methods of video streaming will be introduced to show more favorable efficiency outcomes since packet failure is an integral characteristic of communication networks. This paper aims to introduce a new reliable method of stereoscopic video streaming based on multiple description coding (MDC) strategy. The proposed multiple description coding generates four 3D video descriptions considering the interesting objects contained in the scene. To be able to find the interesting objects in the scene, we use two metrics from the second-order statistics of the depth map image in a block-wise manner. Having detected the objects, the proposed multiple description coding algorithm generates the descriptions for the color video using a nonidentical decimation method with respect to the identified objects. To show how much reliable the proposed MDC method is, this article assumes that due to the unreliable communication channel, only one description, among four encoded descriptions, is delivered to the receiver successfully. Therefore, the receiver needs to estimate the missed descriptions' data from the available description. Since the human eye is more sensitive to objects than it is to pixels, the proposed method provides a better visual performance in view of its subjective assessment. Although, the objective test results verify the fact that the proposed method provides an improved performance than the Polyphase SubSampling (PSS) multiple description coding and our previous work using pixel variation. Regarding the depth map image, the proposed method generates the multiple descriptions according to the pixel prediction difficulty level. The considerable improvement achieved by the proposed method is shown with the peak signal-to-noise ratio (PSNR) and Structural SIMilarity (SSIM) simulation result.
Adaptive vehicle extraction in real-time traffic video monitoring based on the fusion of multi-objective particle swarm optimization algorithm
In view of the problems in the real-time traffic video monitoring that the adaptive vehicle extraction is greatly affected by the environmental factors such as the illumination, noise, and so on; the missed detection and error detection rate is high; and it is difficult to meet the robustness and the real-time performance at the same time, a kind of method for the adaptive vehicle extraction in real-time traffic video monitoring based on the fusion of multi-objective particle swarm optimization algorithm is put forward. In this method, based on the multi-objective particle swarm optimization algorithm, adaptive binarization processing is carried out on the image first, and the interference points are removed by filtration through the erosion and expansion method. Simple and effective method is used to carry out the merger of the shadow line and the extraction of the real-time traffic video. In the algorithm, the information entropy in the target area and the symmetry characteristics of the vehicle tail are used to screen and identify the region of interest, which has reduced the missed detection and error detection rate of the algorithm. The multi-objective particle swarm optimization algorithm is used to extract the vehicle boundaries and has achieved relatively good effect. The results show that the detection accuracy is 89% and the average operating speed is 17.6 frames/s during the processing of the real-time traffic video with the resolution of 640 × 480.
Dynamic turbulence mitigation for long-range imaging in the presence of large moving objects
Long-range imaging with visible or infrared observation systems is typically hampered by atmospheric turbulence. Software-based turbulence mitigation methods aim to stabilize and sharpen such recorded image sequences based on the image data only. Although successful restoration has been achieved on static scenes in the past, a significant challenge remains in accounting for moving objects such that they remain visible as moving objects in the output. Here, we investigate a new approach for turbulence mitigation on background as well as large moving objects under moderate turbulence conditions. In our method, we apply and compare different optical flow algorithms to locally estimate both the apparent and true object motion in image sequences and subsequently apply dynamic super-resolution, image sharpening, and newly developed local stabilization methods to the aligned images. We assess the use of these stabilization methods as well as a new method for occlusion compensation for these conditions. The proposed methods are qualitatively evaluated on several visible light recordings of real-world scenes. We demonstrate that our methods achieve a similar image quality on background elements as our prior methods for static scenes, but at the same time obtain a substantial improvement in image quality and reduction in image artifacts on moving objects. In addition, we show that our stabilization and occlusion compensation methods can be robustly used for turbulence mitigation in imagery featuring complex backgrounds and occlusion effects, without compromising the performance in less challenging conditions.
Automatic kidney segmentation using 2.5D ResUNet and 2.5D DenseUNet for malignant potential analysis in complex renal cyst based on CT images
Bosniak renal cyst classification has been widely used in determining the complexity of a renal cyst. However, it turns out that about half of patients undergoing surgery for Bosniak category III, take surgical risks that reward them with no clinical benefit at all. This is because their pathological results reveal that the cysts are actually benign not malignant. This problem inspires us to use recently popular deep learning techniques and study alternative analytics methods for precise binary classification (benign or malignant tumor) on Computerized Tomography (CT) images. To achieve our goal, two consecutive steps are required-segmenting kidney organs or lesions from CT images then classifying the segmented kidneys. In this paper, we propose a study of kidney segmentation using 2.5D ResUNet and 2.5D DenseUNet for efficiently extracting intra-slice and inter-slice features. Our models are trained and validated on the public data set from Kidney Tumor Segmentation (KiTS19) challenge in two different training environments. As a result, all experimental models achieve high mean kidney Dice scores of at least 95% on the KiTS19 validation set consisting of 60 patients. Apart from the KiTS19 data set, we also conduct separate experiments on abdomen CT images of four Thai patients. Based on the four Thai patients, our experimental models show a drop in performance, where the best mean kidney Dice score is 87.60%.
Subjective performance evaluation of bitrate allocation strategies for MPEG and JPEG Pleno point cloud compression
The recent rise in interest in point clouds as an imaging modality has motivated standardization groups such as JPEG and MPEG to launch activities aiming at developing compression standards for point clouds. Lossy compression usually introduces visual artifacts that negatively impact the perceived quality of media, which can only be reliably measured through subjective visual quality assessment experiments. While MPEG standards have been subjectively evaluated in previous studies on multiple occasions, no work has yet assessed the performance of the recent JPEG Pleno standard in comparison to them. In this study, a comprehensive performance evaluation of JPEG and MPEG standards for point cloud compression is conducted. The impact of different configuration parameters on the performance of the codecs is first analyzed with the help of objective quality metrics. The results from this analysis are used to define three rate allocation strategies for each codec, which are employed to compress a set of point clouds at four target rates. The set of distorted point clouds is then subjectively evaluated following two subjective quality assessment protocols. Finally, the obtained results are used to compare the performance of these compression standards and draw insights about best coding practices.
Learned scalable video coding for humans and machines
Video coding has traditionally been developed to support services such as video streaming, videoconferencing, digital TV, and so on. The main intent was to enable human viewing of the encoded content. However, with the advances in deep neural networks (DNNs), encoded video is increasingly being used for automatic video analytics performed by machines. In applications such as automatic traffic monitoring, analytics such as vehicle detection, tracking and counting, would run continuously, while human viewing could be required occasionally to review potential incidents. To support such applications, a new paradigm for video coding is needed that will facilitate efficient representation and compression of video for both machine and human use in a scalable manner. In this manuscript, we introduce an end-to-end learnable video codec that supports a machine vision task in its base layer, while its enhancement layer, together with the base layer, supports input reconstruction for human viewing. The proposed system is constructed based on the concept of conditional coding to achieve better compression gains. Comprehensive experimental evaluations conducted on four standard video datasets demonstrate that our framework outperforms both state-of-the-art learned and conventional video codecs in its base layer, while maintaining comparable performance on the human vision task in its enhancement layer.