Journal of Signal Processing Systems for Signal Image and Video Technology

Fine-tuning-based Transfer Learning for Characterization of Adeno-Associated Virus
Khan AI, Kim MJ and Dutta P
Accurate and precise identification of adeno-associated virus (AAV) vectors play an important role in dose-dependent gene therapy. Although solid-state nanopore techniques can potentially be used to characterize AAV vectors by capturing ionic current, the existing data analysis techniques fall short of identifying them from their ionic current profiles. Recently introduced machine learning methods such as deep convolutional neural network (CNN), developed for image identification tasks, can be applied for such classification. However, with smaller data set for the problem in hand, it is not possible to train a deep neural network from scratch for accurate classification of AAV vectors. To circumvent this, we applied a pre-trained deep CNN (GoogleNet) model to capture the basic features from ionic current signals and subsequently used fine-tuning-based transfer learning to classify AAV vectors. The proposed method is very generic as it requires minimal preprocessing and does not require any handcrafted features. Our results indicate that fine-tuning-based transfer learning can achieve an average classification accuracy between 90 and 99% in three realizations with a very small standard deviation. Results also indicate that the classification accuracy depends on the applied electric field (across nanopore) and the time frame used for data segmentation. We also found that the fine-tuning of the deep network outperforms feature extraction-based classification for the resistive pulse dataset. To expand the usefulness of the fine-tuning-based transfer learning, we have tested two other pre-trained deep networks (ResNet50 and InceptionV3) for the classification of AAVs. Overall, the fine-tuning-based transfer learning from pre-trained deep networks is very effective for classification, though deep networks such as ResNet50 and InceptionV3 take significantly longer training time than GoogleNet.
OpenVVC Decoder Parameterized and Interfaced Synchronous Dataflow (PiSDF) Model: Tile Based Parallelism
Haggui N, Hamidouche W, Belghith F, Masmoudi N and Nezan JF
The emergence of the new video coding standard, Versatile Video Coding (VVC), has resulted in a 40-50% coding gain over its predecessor HEVC for the same visual quality. However, this is accompanied by a sharp increase in computational complexity. The emergence of the VVC standard and the increase in video resolution have exceeded the capacity of single-core architectures. This fact has led researchers to use multicore architectures for the implementation of video standards and to use the parallelism of these architectures for real-time applications. With the strong growth in both areas, video coding and multicore architecture, there is a great need for a design methodology that facilitates the exploration of heterogeneous multicore architectures, which automatically generates optimized code for these architectures in order to reduce time to market. In this context, this paper aims to use the methodology based on data flow modeling associated with the PREESM software. This paper shows how the software has been used to model a complete standard VVC video decoder using Parameterized and Interfaced Synchronous Dataflow (PiSDF) model. The proposed model takes advantage of the parallelism strategies of the OpenVVC decoder and in particular the tile-based parallelism. Experimental results show that the speed of the VVC decoder in PiSDF is slightly higher than the OpenVVC decoder handwritten in C/C++ languages, by up to 11% speedup on a 24-core processor. Thus, the proposed decoder outperforms the state-of-the-art dataflow decoders based on the RVC-CAL model.
LEARNING COMPACT DNN MODELS FOR BEHAVIOR PREDICTION FROM NEURAL ACTIVITY OF CALCIUM IMAGING
Wu X, Lin DT, Chen R and Bhattacharyya SS
In this paper, we develop methods for efficient and accurate information extraction from calcium-imaging-based neural signals. The particular form of information extraction we investigate involves predicting behavior variables linked to animals from which the calcium imaging signals are acquired. More specifically, we develop algorithms to systematically generate compact deep neural network (DNN) models for accurate and efficient calcium-imaging-based predictive modeling. We also develop a software tool, called NeuroGRS, to apply the proposed methods for compact DNN derivation with a high degree of automation. GRS stands for Greedy inter-layer order with Random Selection of intra-layer units, which describes the central algorithm developed in this work for deriving compact DNN structures. Through extensive experiments using NeuroGRS and calcium imaging data, we demonstrate that our methods enable highly streamlined information extraction from calcium images of the brain with minimal loss in accuracy compared to much more computationally expensive approaches.
Towards real-time 3D visualization with multiview RGB camera array
Ke J, Watras AJ, Kim JJ, Liu H, Jiang H and Hu YH
A real-time 3D visualization (RT3DV) system using a multiview RGB camera array is presented. RT3DV can process multiple synchronized video streams to produce a stereo video of a dynamic scene from a chosen view angle. Its design objective is to facilitate 3D visualization at the video frame rate with good viewing quality. To facilitate 3D vision, RT3DV estimates and updates a surface mesh model formed directly from a set of sparse key points. The 3D coordinates of these key points are estimated from matching 2D key points across multiview video streams with the aid of epipolar geometry and trifocal tensor. To capture the scene dynamics, 2D key points in individual video streams are tracked between successive frames. We implemented a proof of concept RT3DV system tasked to process five synchronous video streams acquired by an RGB camera array. It achieves a processing speed of 44 milliseconds per frame and a peak signal to noise ratio (PSNR) of 15.9 dB from a viewpoint coinciding with a reference view. As a comparison, an image-based MVS algorithm utilizing a dense point cloud model and frame by frame feature detection and matching will require 7 seconds to render a frame and yield a reference view PSNR of 16.3 dB.
Non-Uniform Microphone Arrays for Robust Speech Source Localization for Smartphone-Assisted Hearing Aid Devices
Ganguly A and Panahi I
Robust speech source localization (SSL) is an important component of the speech processing pipeline for hearing aid devices (HADs). SSL via time direction of arrival (TDOA) estimation has been known to improve performance of HADs in noisy environments, thereby providing better listening experience for hearing aid users. Smartphones now possess the capability to connect to the HADs through wired or wireless channel. In this paper, we present our findings about the non-uniform non-linear microphone array (NUNLA) geometry for improving SSL for HADs using an L-shaped three-element microphone array available on modern smartphones. The proposed method is implemented on a frame-based TDOA estimation algorithm using a modified Dictionary-based singular value decomposition method (SVD) method for localizing single speech sources under very low signal to noise ratios (SNR). Unlike most methods developed for uniform microphone arrays, the proposed method has low spatial aliasing as well as low spatial ambiguity while providing a robust low-error with 360° DOA scanning capability. We present the comparison among different types of microphone arrays, as well as compare their performance using the proposed method.
A Hybrid Task Graph Scheduler for High Performance Image Processing Workflows
Blattner T, Keyrouz W, Bhattacharyya SS, Halem M and Brady M
Designing applications for scalability is key to improving their performance in hybrid and cluster computing. Scheduling code to utilize parallelism is difficult, particularly when dealing with data dependencies, memory management, data motion, and processor occupancy. The Hybrid Task Graph Scheduler (HTGS) improves programmer productivity when implementing hybrid workflows for multi-core and multi-GPU systems. The Hybrid Task Graph Scheduler (HTGS) is an abstract execution model, framework, and API that increases programmer productivity when implementing hybrid workflows for such systems. HTGS manages dependencies between tasks, represents CPU and GPU memories independently, overlaps computations with disk I/O and memory transfers, keeps multiple GPUs occupied, and uses all available compute resources. Through these abstractions, data motion and memory are explicit; this makes data locality decisions more accessible. To demonstrate the HTGS application program interface (API), we present implementations of two example algorithms: (1) a matrix multiplication that shows how easily task graphs can be used; and (2) a hybrid implementation of microscopy image stitching that reduces code size by ≈ 43% compared to a manually coded hybrid workflow implementation and showcases the minimal overhead of task graphs in HTGS. Both of the HTGS-based implementations show good performance. In image stitching the HTGS implementation achieves similar performance to the hybrid workflow implementation. Matrix multiplication with HTGS achieves 1.3× and 1.8× speedup over the multi-threaded OpenBLAS library for 16k × 16k and 32k × 32k size matrices, respectively.
ALMARVI Execution Platform: Heterogeneous Video Processing SoC Platform on FPGA
Hoozemans J, van Straten J, Viitanen T, Tervo A, Kadlec J and Al-Ars Z
The proliferation of processing hardware alternatives allows developers to use various customized computing platforms to run their applications in an optimal way. However, porting application code on custom hardware requires a lot of development and porting effort. This paper describes a heterogeneous computational platform (the ALMARVI execution platform) comprising of multiple communicating processors that allow easy programmability through an interface to OpenCL. The ALMARVI platform uses processing elements based on both VLIW and Transport Triggered Architectures (-VEX and TCE cores, respectively). It can be implemented on Zynq devices such as the ZedBoard, and supports OpenCL by means of the pocl (Portable OpenCL) project and our ALMAIF interface specification. This allows developers to execute kernels transparently on either processing elements, thereby allowing to optimize execution time with minimal design and development effort.
An E-Textile Respiration Sensing System for NICU Monitoring: Design and Validation
Cay G, Ravichandran V, Saikia MJ, Hoffman L, Laptook A, Padbury J, Salisbury AL, Gitelson-Kahn A, Venkatasubramanian K, Shahriari Y and Mankodiya K
The world is witnessing a rising number of preterm infants who are at significant risk of medical conditions. These infants require continuous care in Neonatal Intensive Care Units (NICU). Medical parameters are continuously monitored in premature infants in the NICU using a set of wired, sticky electrodes attached to the body. Medical adhesives used on the electrodes can be harmful to the baby, causing skin injuries, discomfort, and irritation. In addition, respiration rate (RR) monitoring in the NICU faces challenges of accuracy and clinical quality because RR is extracted from electrocardiogram (ECG). This research paper presents a design and validation of a smart textile pressure sensor system that addresses the existing challenges of medical monitoring in NICU. We designed two e-textile, piezoresistive pressure sensors made of Velostat for noninvasive RR monitoring; one was hand-stitched on a mattress topper material, and the other was embroidered on a denim fabric using an industrial embroidery machine. We developed a data acquisition system for validation experiments conducted on a high-fidelity, programmable NICU baby mannequin. We designed a signal processing pipeline to convert raw time-series signals into parameters including RR, rise and fall time, and comparison metrics. The results of the experiments showed that the relative accuracies of hand-stitched sensors were 98.68 (top sensor) and 98.07 (bottom sensor), while the accuracies of embroidered sensors were 99.37 (left sensor) and 99.39 (right sensor) for the 60 BrPM test case. The presented prototype system shows promising results and demands more research on textile design, human factors, and human experimentation.
iBlock: An Intelligent Decentralised Blockchain-based Pandemic Detection and Assisting System
Egala BS, Pradhan AK, Badarla V and Mohanty SP
The recent COVID-19 outbreak highlighted the requirement for a more sophisticated healthcare system and real-time data analytics in the pandemic mitigation process. Moreover, real-time data plays a crucial role in the detection and alerting process. Combining smart healthcare systems with accurate real-time information about medical service availability, vaccination, and how the pandemic is spreading can directly affect the quality of life and economy. The existing architecture models are become inadequate in handling the pandemic mitigation process using real-time data. The present models are server-centric and controlled by a single party, where the management of confidentiality, integrity, and availability (CIA) of data is doubtful. Therefore, a decentralised user-centric model is necessary, where the CIA of user data is assured. In this paper, we have suggested a decentralized blockchain-based pandemic detection and assistance system (iBlock). The iBlock uses robust technologies like hybrid computing and IPFS to support system functionality. A pseudo-anonymous personal identity is introduced using H-PCS and cryptography for anonymous data sharing. The distributed data management module guarantees data CIA, security, and privacy using cryptography mechanisms. Furthermore, it delivers useful intelligent information in the form of suggestions and alerts to assist the users. Finally, the iBlock reduces stress on healthcare infrastructure and workers by providing accurate predictions and early warnings using AI/ML.
An Analysis of Image Features Extracted by CNNs to Design Classification Models for COVID-19 and Non-COVID-19
Teodoro AAM, Silva DH, Saadi M, Okey OD, Rosa RL, Otaibi SA and Rodríguez DZ
The SARS-CoV-2 virus causes a respiratory disease in humans, known as COVID-19. The confirmatory diagnostic of this disease occurs through the real-time reverse transcription and polymerase chain reaction test (RT-qPCR). However, the period of obtaining the results limits the application of the mass test. Thus, chest X-ray computed tomography (CT) images are analyzed to help diagnose the disease. However, during an outbreak of a disease that causes respiratory problems, radiologists may be overwhelmed with analyzing medical images. In the literature, some studies used feature extraction techniques based on CNNs, with classification models to identify COVID-19 and non-COVID-19. This work compare the performance of applying pre-trained CNNs in conjunction with classification methods based on machine learning algorithms. The main objective is to analyze the impact of the features extracted by CNNs, in the construction of models to classify COVID-19 and non-COVID-19. A SARS-CoV-2 CT data-set is used in experimental tests. The CNNs implemented are visual geometry group (VGG-16 and VGG-19), inception V3 (IV3), and EfficientNet-B0 (EB0). The classification methods were k-nearest neighbor (KNN), support vector machine (SVM), and explainable deep neural networks (xDNN). In the experiments, the best results were obtained by the EfficientNet model used to extract data and the SVM with an RBF kernel. This approach achieved an average performance of 0.9856 in the precision macro, 0.9853 in the sensitivity macro, 0.9853 in the specificity macro, and 0.9853 in the F1 score macro.
Automatic Non-Invasive Cough Detection based on Accelerometer and Audio Signals
Pahar M, Miranda I, Diacon A and Niesler T
We present an automatic non-invasive way of detecting cough events based on both accelerometer and audio signals. The acceleration signals are captured by a smartphone firmly attached to the patient's bed, using its integrated accelerometer. The audio signals are captured simultaneously by the same smartphone using an external microphone. We have compiled a manually-annotated dataset containing such simultaneously-captured acceleration and audio signals for approximately 6000 cough and 68000 non-cough events from 14 adult male patients. Logistic regression (LR), support vector machine (SVM) and multilayer perceptron (MLP) classifiers provide a baseline and are compared with three deep architectures, convolutional neural network (CNN), long short-term memory (LSTM) network, and residual-based architecture (Resnet50) using a leave-one-out cross-validation scheme. We find that it is possible to use either acceleration or audio signals to distinguish between coughing and other activities including sneezing, throat-clearing, and movement on the bed with high accuracy. However, in all cases, the deep neural networks outperform the shallow classifiers by a clear margin and the Resnet50 offers the best performance, achieving an area under the ROC curve (AUC) exceeding 0.98 and 0.99 for acceleration and audio signals respectively. While audio-based classification consistently offers better performance than acceleration-based classification, we observe that the difference is very small for the best systems. Since the acceleration signal requires less processing power, and since the need to record audio is sidestepped and thus privacy is inherently secured, and since the recording device is attached to the bed and not worn, an accelerometer-based highly accurate non-invasive cough detector may represent a more convenient and readily accepted method in long-term cough monitoring.
Prediction of Bus Passenger Traffic using Gaussian Process Regression
G S V and V S H
The paper summarizes the design and implementation of a passenger traffic prediction model, based on Gaussian Process Regression (GPR). Passenger traffic analysis is the present day requirement for proper bus scheduling and traffic management to improve the efficiency and passenger comfort. Bayesian analysis uses statistical modelling to recursively estimate new data from existing data. GPR is a fully Bayesian process model, which is developed using PyMC3 with Theano as backend. The passenger data is modelled as a Poisson process so that the prior for designing the GP regression model is a Gamma distributed function. It is observed that the proposed GP based regression method outperforms the existing methods like Student-t process model and Kernel Ridge Regression (KRR) process.
LSTM Network Integrated with Particle Filter for Predicting the Bus Passenger Traffic
Vidya GS and Hari VS
The paper reports a combination of the deep learning technique and bayesian filtering to effectively predict the passenger traffic. The architecture of the model integrates the particle filter with the LSTM network. The time series sequential prediction is best achieved using LSTM network while Markovian behaviour is well extracted using Bayesian (Particle Filter) filters. The temporal and spatial features of the traffic data are analyzed. Three relevant temporal variations , morning, noon and post noon patterns are identified after the histogram analysis. These patterns are statistically modelled and the integrated model is used to accurately predict the passenger flow for the next thirty days, facilitating, the bus scheduling for that period. The experimental results proved that the proposed integrated model with coefficient of determination ( ) value of 0.88 is functional in predicting the passenger traffic even when the training data set size is small.
Frame-based Programming, Stream-Based Processing for Medical Image Processing Applications
Hoozemans J, de Jong R, van der Vlugt S, Van Straten J, Elango UK and Al-Ars Z
This paper presents and evaluates an approach to deploy image and video processing pipelines that are developed frame-oriented on a hardware platform that is stream-oriented, such as an FPGA. First, this calls for a specialized streaming memory hierarchy and accompanying software framework that transparently moves image segments between stages in the image processing pipeline. Second, we use softcore VLIW processors, that are targetable by a C compiler and have hardware debugging capabilities, to evaluate and debug the software before moving to a High-Level Synthesis flow. The algorithm development phase, including debugging and optimizing on the target platform, is often a very time consuming step in the development of a new product. Our proposed platform allows both software developers and hardware designers to test iterations in a matter of seconds (compilation time) instead of hours (synthesis or circuit simulation time).
Hardware Implementation of a Fixed-Point Decoder for Low-Density Lattice Codes
Srivastava R, Gaudet VC and Mitran P
This paper describes a field-programmable gate array (FPGA) implementation of a fixed-point low-density lattice code (LDLC) decoder where the Gaussian mixture messages that are exchanged during the iterative decoding process are approximated to a single Gaussian. A detailed quantization study is first performed to find the minimum number of bits required for the fixed-point decoder to attain a frame error rate (FER) performance similar to floating-point. Then efficient numerical methods are devised to approximate the required non-linear functions. Finally, the paper presents a comparison of the performance of the different decoder architectures as well as a detailed analysis of the resource requirements and throughput trade-offs of the primary design blocks for the different architectures. A novel pipelined LDLC decoder architecture is proposed where resource re-utilization along with pipelining allows for a parallelism equivalent to 50 variable nodes on the target FPGA device. The pipelined architecture attains a throughput of 10.5 Msymbols/sec at a distance of 5 dB from capacity which is a 1.8 improvement in throughput compared to an implementation with 20 parallel variable nodes without pipelining. This implementation also achieves 24 improvement in throughput over a baseline serial decoder.
Monotonic Optimization of Dataflow Buffer Sizes
Hendriks M, Ara HA, Geilen M, Basten T, Marin RG, de Jong R and van der Vlugt S
Many high data-rate video-processing applications are subject to a trade-off between throughput and the sizes of buffers in the system (the storage distribution). These applications have strict requirements with respect to throughput as this directly relates to the functional correctness. Furthermore, the size of the storage distribution relates to resource usage which should be minimized in many practical cases. The computation kernels of high data-rate video-processing applications can often be specified by cyclo-static dataflow graphs. We therefore study the problem of minimization of the total (weighted) size of the storage distribution under a throughput constraint for cyclo-static dataflow graphs. By combining ideas from the area of monotonic optimization with the causal dependency analysis from a state-of-the-art storage optimization approach, we create an algorithm that scales better than the state-of-the-art approach. Our algorithm can provide a solution and a bound on the suboptimality of this solution at any time, and it iteratively improves this until the optimal solution is found. We evaluate our algorithm using several models from the literature, and on models of a high data-rate video-processing application from the healthcare domain. Our experiments show performance increases up to several orders of magnitude.
Guest Editorial: MLSP 2020 Special Issue
Särkkä S, Roininen L, Kok M, Hostettler R and Hauptmann A
Run-time Reconfigurable Acceleration for Genetic Programming Fitness Evaluation in Trading Strategies
Funie AI, Grigoras P, Burovskiy P, Luk W and Salmon M
Genetic programming can be used to identify complex patterns in financial markets which may lead to more advanced trading strategies. However, the computationally intensive nature of genetic programming makes it difficult to apply to real world problems, particularly in real-time constrained scenarios. In this work we propose the use of Field Programmable Gate Array technology to accelerate the fitness evaluation step, one of the most computationally demanding operations in genetic programming. We propose to develop a fully-pipelined, mixed precision design using run-time reconfiguration to accelerate fitness evaluation. We show that run-time reconfiguration can reduce resource consumption by a factor of 2 compared to previous solutions on certain configurations. The proposed design is up to 22 times faster than an optimised, multithreaded software implementation while achieving comparable financial returns.
Video Compression for Screen Recorded Sequences Following Eye Movements
Serrano-Carrasco DJ, Diaz-Honrubia AJ and Cuenca P
With the advent of smartphones and tablets, video traffic on the Internet has increased enormously. With this in mind, in 2013 the (HEVC) standard was released with the aim of reducing the bit rate (at the same quality) by 50% with respect to its predecessor. However, new contents with greater resolutions and requirements appear every day, making it necessary to further reduce the bit rate. Perceptual video coding has recently been recognized as a promising approach to achieving high-performance video compression and eye tracking data can be used to create and verify these models. In this paper, we present a new algorithm for the bit rate reduction of screen recorded sequences based on the visual perception of videos. An eye tracking system is used during the recording to locate the fixation point of the viewer. Then, the area around that point is encoded with the base (QP) value, which increases when moving away from it. The results show that up to 31.3% of the bit rate may be saved when compared with the original HEVC-encoded sequence, without a significant impact on the perceived quality.
FPGA-Based Soft-Core Processors for Image Processing Applications
Amiri M, Siddiqui FM, Kelly C, Woods R, Rafferty K and Bardak B
With security and surveillance, there is an increasing need to process image data efficiently and effectively either at source or in a large data network. Whilst a Field-Programmable Gate Array has been seen as a key technology for enabling this, the design process has been viewed as problematic in terms of the time and effort needed for implementation and verification. The work here proposes a different approach of using optimized FPGA-based soft-core processors which allows the user to exploit the task and data level parallelism to achieve the quality of dedicated FPGA implementations whilst reducing design time. The paper also reports some preliminary progress on the design flow to program the structure. An implementation for a Histogram of Gradients algorithm is also reported which shows that a performance of 328 fps can be achieved with this design approach, whilst avoiding the long design time, verification and debugging steps associated with conventional FPGA implementations.
Signal Processing Techniques for 6G
Mucchi L, Shahabuddin S, Albreem MAM, Abdallah S, Caputo S, Panayirci E and Juntti M
6G networks have the burden to provide not only higher performance compared to 5G, but also to enable new service domains as well as to open the door over a new paradigm of mobile communication. This paper presents an overview on the role and key challenges of signal processing (SP) in future 6G systems and networks from the conditioning of the signal at transmission to MIMO precoding and detection, from channel coding to channel estimation, from multicarrier and non-orthogonal multiple access (NOMA) to optical wireless communications and physical layer security (PLS). We describe also the core future research challenges on technologies including machine learning based 6G design, integrated communications and sensing (ISAC), and the internet of bio-nano-things.