Clustering of Data with Missing Entries using Non-convex Fusion Penalties
The presence of missing entries in data often creates challenges for pattern recognition algorithms. Traditional algorithms for clustering data assume that all the feature values are known for every data point. We propose a method to cluster data in the presence of missing information. Unlike conventional clustering techniques where every feature is known for each point, our algorithm can handle cases where a few feature values are unknown for every point. For this more challenging problem, we provide theoretical guarantees for clustering using a fusion penalty based optimization problem. Furthermore, we propose an algorithm to solve a relaxation of this problem using saturating non-convex fusion penalties. It is observed that this algorithm produces solutions that degrade gradually with an increase in the fraction of missing feature values. We demonstrate the utility of the proposed method using a simulated dataset, the Wine dataset and also an under-sampled cardiac MRI dataset. It is shown that the proposed method is a promising clustering technique for datasets with large fractions of missing entries.
Nonlinear Structural Vector Autoregressive Models with Application to Directed Brain Networks
Structural equation models (SEMs) and vector autoregressive models (VARMs) are two broad families of approaches that have been shown useful in effective brain connectivity studies. While VARMs postulate that a given region of interest in the brain is directionally connected to another one by virtue of time-lagged influences, SEMs assert that directed dependencies arise due to instantaneous effects, and may even be adopted when nodal measurements are not necessarily multivariate time series. To unify these complementary perspectives, linear structural vector autoregressive models (SVARMs) that leverage both instantaneous and time-lagged nodal data have recently been put forth. Albeit simple and tractable, linear SVARMs are quite limited since they are incapable of modeling nonlinear dependencies between neuronal time series. To this end, the overarching goal of the present paper is to considerably broaden the span of linear SVARMs by capturing nonlinearities through kernels, which have recently emerged as a powerful nonlinear modeling framework in canonical machine learning tasks, e.g., regression, classification, and dimensionality reduction. The merits of kernel-based methods are extended here to the task of learning the effective brain connectivity, and an efficient regularized estimator is put forth to leverage the edge sparsity inherent to real-world complex networks. Judicious kernel choice from a preselected dictionary of kernels is also addressed using a data-driven approach. Numerical tests on ECoG data captured through a study on epileptic seizures demonstrate that it is possible to unveil previously unknown directed links between brain regions of interest.
Spectral Estimation Using Multitaper Whittle Methods with a Lasso Penalty
Spectral estimation provides key insights into the frequency domain characteristics of a time series. Naive non-parametric estimates of the spectral density, such as the periodogram, are inconsistent, and the more advanced lag window or multitaper estimators are often still too noisy. We propose an penalized quasi-likelihood Whittle framework based on multitaper spectral estimates which performs semiparametric spectral estimation for regularly sampled univariate stationary time series. Our new approach circumvents the problematic Gaussianity assumption required by least square approaches and achieves sparsity for a wide variety of basis functions. We present an alternating direction method of multipliers (ADMM) algorithm to efficiently solve the optimization problem, and develop universal threshold and generalized information criterion (GIC) strategies for efficient tuning parameter selection that outperform cross-validation methods. Theoretically, a fast convergence rate for the proposed spectral estimator is established. We demonstrate the utility of our methodology on simulated series and to the spectral analysis of electroencephalogram (EEG) data.
Spike Estimation from Fluorescence Signals Using High-Resolution Property of Group Delay
Spike estimation from calcium (Ca) fluorescence signals is a fundamental and challenging problem in neuroscience. Several models and algorithms have been proposed for this task over the past decade. Nevertheless, it is still hard to achieve accurate spike positions from the Ca fluorescence signals. While existing methods rely on data-driven methods and the physiology of neurons for modelling the spiking process, this work exploits the nature of the fluorescence responses to spikes using signal processing. We first motivate the problem by a novel analysis of the high-resolution property of minimum-phase group delay (GD) functions for multi-pole resonators. The resonators could be connected either in series or in parallel. The Ca indicator responds to a spike with a sudden rise, that is followed by an exponential decay. We interpret the Ca signal as the response of an impulse train to the change in Ca concentration, where the Ca response corresponds to a resonator. We perform minimum-phase group delay-based filtering of the Ca signal for resolving spike locations. The performance of the proposed algorithm is evaluated on nine datasets spanning various indicators, sampling rates and, mouse brain regions. The proposed approach: GDspike, is compared with other spike estimation methods including MLspike, Vogelstein de-convolution algorithm, and data-driven Spike Triggered Mixture (STM) model. The performance of GDSpike is superior to that of the Vogelstein algorithm and is comparable to that of MLSpike. It can also be used to post-process the output of MLSpike, which further enhances the performance.
Symmetric Bilinear Regression for Signal Subgraph Estimation
There is an increasing interest in learning a set of small outcome-relevant subgraphs in network-predictor regression. The extracted signal subgraphs can greatly improve the interpretation of the association between the network predictor and the response. In brain connectomics, the brain network for an individual corresponds to a set of interconnections among brain regions and there is a strong interest in linking the brain connectome to human cognitive traits. Modern neuroimaging technology allows a very fine segmentation of the brain, producing very large structural brain networks. Therefore, accurate and efficient methods for identifying a set of small predictive subgraphs become crucial, leading to discovery of key interconnected brain regions related to the trait and important insights on the mechanism of variation in human cognitive traits. We propose a symmetric bilinear model with penalty to search for small clique subgraphs that contain useful information about the response. A coordinate descent algorithm is developed to estimate the model where we derive analytical solutions for a sequence of conditional convex optimizations. Application of this method on human connectome and language comprehension data shows interesting discovery of relevant interconnections among several small sets of brain regions and better predictive performance than competitors.
Rectified Gaussian Scale Mixtures and the Sparse Non-Negative Least Squares Problem
In this paper, we develop a Bayesian evidence maximization framework to solve the sparse non-negative least squares problem (S-NNLS). We introduce a family of probability densities referred to as the Rectified Gaussian Scale Mixture (R-GSM), to model the sparsity enforcing prior distribution for the signal of interest. The R-GSM prior encompasses a variety of heavy-tailed distributions such as the rectified Laplacian and rectified Student-t distributions with a proper choice of the mixing density. We utilize the hierarchical representation induced by the R-GSM prior and develop an evidence maximization framework based on the Expectation-Maximization (EM) algorithm. Using the EM-based method, we estimate the hyper-parameters and obtain a point estimate for the solution of interest. We refer to this proposed method as rectified Sparse Bayesian Learning (R-SBL). We provide four EM-based R-SBL variants that offer a range of options to trade-off computational complexity to the quality of the E-step computation. These methods include the Markov Chain Monte Carlo EM, linear minimum mean square estimation, approximate message passing and a diagonal approximation. Using numerical experiments, we show that the proposed R-SBL method outperforms existing S-NNLS solvers in terms of both signal and support recovery, and is very robust against the structure of the design matrix.
Convex recovery of continuous domain piecewise constant images from nonuniform Fourier samples
We consider the recovery of a continuous domain piecewise constant image from its non-uniform Fourier samples using a convex matrix completion algorithm. We assume the discontinuities/edges of the image are localized to the zero level-set of a bandlimited function. This assumption induces linear dependencies between the Fourier coefficients of the image, which results in a two-fold block Toeplitz matrix constructed from the Fourier coefficients being low-rank. The proposed algorithm reformulates the recovery of the unknown Fourier coefficients as a structured low-rank matrix completion problem, where the nuclear norm of the matrix is minimized subject to structure and data constraints. We show that exact recovery is possible with high probability when the edge set of the image satisfies an incoherency property. We also show that the incoherency property is dependent on the geometry of the edge set curve, implying higher sampling burden for smaller curves. This paper generalizes recent work on the super-resolution recovery of isolated Diracs or signals with finite rate of innovation to the recovery of piecewise constant images.
Parametric Signal Estimation Using the Cumulative Distribution Transform
We present a new method for estimating signal model parameters using the Cumulative Distribution Transform (CDT). Our approach minimizes the Wasserstein distance between measured and model signals. We derive some useful properties of the CDT and show that the resulting estimation problem, while nonlinear in the original signal domain, becomes a linear least squares problem in the transform domain. Furthermore, we discuss the properties of the estimator in the presence of noise and present a novel approach for mitigating the impact of the noise on the estimates. The proposed estimation approach is evaluated by applying it to a source localization problem and comparing its performance against traditional approaches.
On Ambiguity in Linear Inverse Problems: Entrywise Bounds on Nearly Data-Consistent Solutions and Entrywise Condition Numbers
Ill-posed linear inverse problems appear frequently in various signal processing applications. It can be very useful to have theoretical characterizations that quantify the level of ill-posedness for a given inverse problem and the degree of ambiguity that may exist about its solution. Traditional measures of ill-posedness, such as the condition number of a matrix, provide characterizations that are global in nature. While such characterizations can be powerful, they can also fail to provide full insight into situations where certain entries of the solution vector are more or less ambiguous than others. In this work, we derive novel theoretical lower- and upper-bounds that apply to individual entries of the solution vector, and are valid for all potential solution vectors that are nearly data-consistent. These bounds are agnostic to the noise statistics and the specific method used to solve the inverse problem, and are also shown to be tight. In addition, our results also lead us to introduce an entrywise version of the traditional condition number, which provides a substantially more nuanced characterization of scenarios where certain elements of the solution vector are less sensitive to perturbations than others. Our results are illustrated in an application to magnetic resonance imaging reconstruction, and we include discussions of practical computation methods for large-scale inverse problems, connections between our new theory and the traditional Cramér-Rao bound under statistical modeling assumptions, and potential extensions to cases involving constraints beyond just data-consistency.
Supervising the Decoder of Variational Autoencoders to Improve Scientific Utility
Probabilistic generative models are attractive for scientific modeling because their inferred parameters can be used to generate hypotheses and design experiments. This requires that the learned model provides an accurate representation of the input data and yields a latent space that effectively predicts outcomes relevant to the scientific question. Supervised Variational Autoencoders (SVAEs) have previously been used for this purpose, as a carefully designed decoder can be used as an interpretable generative model of the data, while the supervised objective ensures a predictive latent representation. Unfortunately, the supervised objective forces the encoder to learn a biased approximation to the generative posterior distribution, which renders the generative parameters unreliable when used in scientific models. This issue has remained undetected as reconstruction losses commonly used to evaluate model performance do not detect bias in the encoder. We address this previously-unreported issue by developing a second-order supervision framework (SOS-VAE) that updates the decoder parameters, rather than the encoder, to induce a predictive latent representation. This ensures that the encoder maintains a reliable posterior approximation and the decoder parameters can be effectively interpreted. We extend this technique to allow the user to trade-off the bias in the generative parameters for improved predictive performance, acting as an intermediate option between SVAEs and our new SOS-VAE. We also use this methodology to address missing data issues that often arise when combining recordings from multiple scientific experiments. We demonstrate the effectiveness of these developments using synthetic data and electrophysiological recordings with an emphasis on how our learned representations can be used to design scientific experiments.
Super-resolution with Binary Priors: Theory and Algorithms
The problem of super-resolution is concerned with the reconstruction of temporally/spatially localized events (or spikes) from samples of their convolution with a low-pass filter. Distinct from prior works which exploit sparsity in appropriate domains in order to solve the resulting ill-posed problem, this paper explores the role of binary priors in super-resolution, where the spike (or source) amplitudes are assumed to be binary-valued. Our study is inspired by the problem of neural spike deconvolution, but also applies to other applications such as symbol detection in hybrid millimeter wave communication systems. This paper makes several theoretical and algorithmic contributions to enable binary super-resolution with very few measurements. Our results show that binary constraints offer much stronger identifiability guarantees than sparsity, allowing us to operate in "extreme compression" regimes, where the number of measurements can be significantly smaller than the sparsity level of the spikes. To ensure exact recovery in this "extreme compression" regime, it becomes necessary to design algorithms that exactly enforce binary constraints without relaxation. In order to overcome the ensuing computational challenges, we consider a first order auto-regressive filter (which appears in neural spike deconvolution), and exploit its special structure. This results in a novel formulation of the super-resolution binary spike recovery in terms of binary search in one dimension. We perform numerical experiments that validate our theory and also show the benefits of binary constraints in neural spike deconvolution from real calcium imaging datasets.
Projection Filtering with Observed State Increments with Applications in Continuous-Time Circular Filtering
Angular path integration is the ability of a system to estimate its own heading direction from potentially noisy angular velocity (or increment) observations. Non-probabilistic algorithms for angular path integration, which rely on a summation of these noisy increments, do not appropriately take into account the reliability of such observations, which is essential for appropriately weighing one's current heading direction estimate against incoming information. In a probabilistic setting, angular path integration can be formulated as a continuous-time nonlinear filtering problem (circular filtering) with observed state increments. The circular symmetry of heading direction makes this inference task inherently nonlinear, thereby precluding the use of popular inference algorithms such as Kalman filters, rendering the problem analytically inaccessible. Here, we derive an approximate solution to circular continuous-time filtering, which integrates state increment observations while maintaining a fixed representation through both state propagation and observational updates. Specifically, we extend the established projection-filtering method to account for observed state increments and apply this framework to the circular filtering problem. We further propose a generative model for continuous-time angular-valued direct observations of the hidden state, which we integrate seamlessly into the projection filter. Applying the resulting scheme to a model of probabilistic angular path integration, we derive an algorithm for circular filtering, which we term the circular Kalman filter. Importantly, this algorithm is analytically accessible, interpretable, and outperforms an alternative filter based on a Gaussian approximation.
Multi-target Detection with an Arbitrary Spacing Distribution
Motivated by the structure reconstruction problem in single-particle cryo-electron microscopy, we consider the multi-target detection model, where multiple copies of a target signal occur at unknown locations in a long measurement, further corrupted by additive Gaussian noise. At low noise levels, one can easily detect the signal occurrences and estimate the signal by averaging. However, in the presence of high noise, which is the focus of this paper, detection is impossible. Here, we propose two approaches-autocorrelation analysis and an approximate expectation maximization algorithm-to reconstruct the signal without the need to detect signal occurrences in the measurement. In particular, our methods apply to an arbitrary spacing distribution of signal occurrences. We demonstrate reconstructions with synthetic data and empirically show that the sample complexity of both methods scales as SNR in the low SNR regime.
Graph-based Learning under Perturbations via Total Least-Squares
Graphs are pervasive in different fields unveiling complex relationships between data. Two major graph-based learning tasks are topology identification and inference of signals over graphs. Among the possible models to explain data interdependencies, structural equation models (SEMs) accommodate a gamut of applications involving topology identification. Obtaining conventional SEMs though requires measurements across nodes. On the other hand, typical signal inference approaches 'blindly trust' a given nominal topology. In practice however, signal or topology perturbations may be present in both tasks, due to model mismatch, outliers, outages or adversarial behavior. To cope with such perturbations, this work introduces a regularized total least-squares (TLS) approach and iterative algorithms with convergence guarantees to solve both tasks. Further generalizations are also considered relying on structured and/or weighted TLS when extra prior information on the perturbation is available. Analyses with simulated and real data corroborate the effectiveness of the novel TLS-based approaches.
Exact and Robust Reconstructions of Integer Vectors Based on Multidimensional Chinese Remainder Theorem (MD-CRT)
The robust Chinese remainder theorem (CRT) has been recently proposed for robustly reconstructing a large nonnegative integer from erroneous remainders. It has found many applications in signal processing, including phase unwrapping and frequency estimation under sub-Nyquist sampling. Motivated by the applications in multidimensional (MD) signal processing, in this paper we propose the MD-CRT and robust MD-CRT for Integer vectors with respect to a general set of integer matrix moduli, which provides an algorithm to uniquely reconstruct integer vectors with respect to a general set of integer matrix moduli, which provides an algorithm to uniquely reconstruct an integer vector from its remainders, if it is in the fundamental parallelepiped of the lattice generated by a least common right multiple of all the moduli. For some special forms of moduli, we present explicit reconstruction formulae. Moreover, we derive the robust MD-CRT for integer vectors when the remaining integer matrices of all the moduli left divided by their greatest common left divisor (gcld) are pairwise commutative and coprime. Two different reconstruction algorithms are proposed, and accordingly, two different conditions on the remainder error bound for the reconstruction robustness are obtained, which are related to a quarter of the minimum distance of the lattice generated by the gcld of all the moduli or the Smith normal form of the gcld.
Ellipsoid fitting with the Cayley transform
We introduce Cayley transform ellipsoid fitting (CTEF), an algorithm that uses the Cayley transform to fit ellipsoids to noisy data in any dimension. Unlike many ellipsoid fitting methods, CTEF is ellipsoid specific, meaning it always returns elliptic solutions, and can fit arbitrary ellipsoids. It also significantly outperforms other fitting methods when data are not uniformly distributed over the surface of an ellipsoid. Inspired by growing calls for interpretable and reproducible methods in machine learning, we apply CTEF to dimension reduction, data visualization, and clustering in the context of cell cycle and circadian rhythm data and several classical toy examples. Since CTEF captures global curvature, it extracts nonlinear features in data that other machine learning methods fail to identify. For example, on the clustering examples CTEF outperforms 10 popular algorithms.
A Correlated Noise-assisted Decentralized Differentially Private Estimation Protocol, and its application to fMRI Source Separation
Blind source separation algorithms such as independent component analysis (ICA) are widely used in the analysis of neuroimaging data. To leverage larger sample sizes, different data holders/sites may wish to collaboratively learn feature representations. However, such datasets are often privacy-sensitive, precluding centralized analyses that pool the data at one site. In this work, we propose a differentially private algorithm for performing ICA in a decentralized data setting. Due to the high dimension and small sample size, conventional approaches to decentralized differentially private algorithms suffer in terms of utility. When centralizing the data is not possible, we investigate the benefit of enabling limited collaboration in the form of generating jointly distributed random noise. We show that such (anti) correlated noise improves the privacy-utility trade-off, and can reach the same level of utility as the corresponding non-private algorithm for certain parameter choices. We validate this benefit using synthetic and real neuroimaging datasets. We conclude that it is possible to achieve meaningful utility while preserving privacy, even in complex signal processing systems.
Heterogeneity Aware Two-Stage Group Testing
Group testing refers to the process of testing pooled samples to reduce the total number of tests. Given the current pandemic, and the shortage of test supplies for COVID-19, group testing can play a critical role in time and cost efficient diagnostics. In many scenarios, samples collected from users are also accompanied with auxiliary information (such as demographics, history of exposure, onset of symptoms). Such auxiliary information may differ across patients, and is typically not considered while designing group testing algorithms. In this paper, we abstract such heterogeneity using a model where the population can be categorized into clusters with different prevalence rates. The main result of this work is to show that exploiting knowledge heterogeneity can further improve the efficiency of group testing. Motivated by the practical constraints and diagnostic considerations, we focus on two-stage group testing algorithms, where in the first stage, the goal is to detect as many negative samples by pooling, whereas the second stage involves individual testing to detect any remaining samples. For this class of algorithms, we prove that the gain in efficiency is related to the concavity of the number of tests as a function of the prevalence. We also show how one can choose the optimal pooling parameters for one of the algorithms in this class, namely, doubly constant pooling. We present lower bounds on the average number of tests as a function of the population heterogeneity profile, and also provide numerical results and comparisons.
Extreme Compressed Sensing of Poisson Rates from Multiple Measurements
Compressed sensing (CS) is a signal processing technique that enables the efficient recovery of a sparse high-dimensional signal from low-dimensional measurements. In the multiple measurement vector (MMV) framework, a set of signals with the same support must be recovered from their corresponding measurements. Here, we present the first exploration of the MMV problem where signals are independently drawn from a sparse, multivariate Poisson distribution. We are primarily motivated by a suite of biosensing applications of microfluidics where analytes (such as whole cells or biomarkers) are captured in small volume partitions according to a Poisson distribution. We recover the sparse parameter vector of Poisson rates through maximum likelihood estimation with our novel Sparse Poisson Recovery (SPoRe) algorithm. SPoRe uses batch stochastic gradient ascent enabled by Monte Carlo approximations of otherwise intractable gradients. By uniquely leveraging the Poisson structure, SPoRe substantially outperforms a comprehensive set of existing and custom baseline CS algorithms. Notably, SPoRe can exhibit high performance even with one-dimensional measurements and high noise levels. This resource efficiency is not only unprecedented in the field of CS but is also particularly potent for applications in microfluidics in which the number of resolvable measurements per partition is often severely limited. We prove the identifiability property of the Poisson model under such lax conditions, analytically develop insights into system performance, and confirm these insights in simulated experiments. Our findings encourage a new approach to biosensing and are generalizable to other applications featuring spatial and temporal Poisson signals.
Multitaper Analysis of Semi-Stationary Spectra from Multivariate Neuronal Spiking Observations
Extracting the spectral representations of neural processes that underlie spiking activity is key to understanding how brain rhythms mediate cognitive functions. While spectral estimation of continuous time-series is well studied, inferring the spectral representation of latent non-stationary processes based on spiking observations is challenging due to the underlying nonlinearities that limit the spectrotemporal resolution of existing methods. In this paper, we address this issue by developing a multitaper spectral estimation methodology that can be directly applied to multivariate spiking observations in order to extract the semi-stationary spectral density of the latent non-stationary processes that govern spiking activity. We establish theoretical bounds on the bias-variance trade-off of our proposed estimator. Finally, application of our proposed technique to simulated and real data reveals significant performance gains over existing methods.
A tensor based varying-coefficient model for multi-modal neuroimaging data analysis
All neuroimaging modalities have their own strengths and limitations. A current trend is toward interdisciplinary approaches that use multiple imaging methods to overcome limitations of each method in isolation. At the same time neuroimaging data is increasingly being combined with other non-imaging modalities, such as behavioral and genetic data. The data structure of many of these modalities can be expressed as time-varying multidimensional arrays (tensors), collected at different time-points on multiple subjects. Here, we consider a new approach for the study of neural correlates in the presence of tensor-valued brain images and tensor-valued covariates, where both data types are collected over the same set of time points. We propose a time-varying tensor regression model with an inherent structural composition of responses and covariates. Regression coefficients are expressed using the B-spline technique, and the basis function coefficients are estimated using CP-decomposition by minimizing a penalized loss function. We develop a varying-coefficient model for the tensor-valued regression model, where both covariates and responses are modeled as tensors. This development is a non-trivial extension of function-on-function concurrent linear models for complex and large structural data, where the inherent structures are preserved. In addition to the methodological and theoretical development, the efficacy of the proposed method based on both simulated and real data analysis (e.g., the combination of eye-tracking data and functional magnetic resonance imaging (fMRI) data) is also discussed.