Efficient prediction designs for random fields
For estimation and predictions of random fields, it is increasingly acknowledged that the kriging variance may be a poor representative of true uncertainty. Experimental designs based on more elaborate criteria that are appropriate for empirical kriging (EK) are then often non-space-filling and very costly to determine. In this paper, we investigate the possibility of using a compound criterion inspired by an equivalence theorem type relation to build designs quasi-optimal for the EK variance when space-filling designs become unsuitable. Two algorithms are proposed, one relying on stochastic optimization to explicitly identify the Pareto front, whereas the second uses the surrogate criteria as local heuristic to choose the points at which the (costly) true EK variance is effectively computed. We illustrate the performance of the algorithms presented on both a simple simulated example and a real oceanographic dataset. © 2014 The Authors. published by John Wiley & Sons, Ltd.
Clinical Trial Design as a Decision Problem
The intent of this discussion is to highlight opportunities and limitations of utility-based and decision theoretic arguments in clinical trial design. The discussion is based on a specific case study, but the arguments and principles remain valid in general. The example concerns the design of a randomized clinical trial to compare a gel sealant versus standard care for resolving air leaks after pulmonary resection. The design follows a principled approach to optimal decision making, including a probability model for the unknown distributions of time to resolution of air leaks under the two treatment arms, and an explicit utility function that quantifies clinical preferences for alternative outcomes. As is typical for any real application, the final implementation includes some compromises from the initial principled setup. In particular, we use the formal decision problem only for the final decision, but use reasonable decision boundaries for making interim group sequential decisions that stop the trial early. Beyond the discussion of the particular study, we review more general considerations of using a decision theoretic approach for clinical trial design and summarize some of the reasons why such approaches are not commonly used.
Maximum likelihood estimation for stochastic volatility in mean models with heavy-tailed distributions
In this article, we introduce a likelihood-based estimation method for the stochastic volatility in mean (SVM) model with scale mixtures of normal (SMN) distributions (Abanto-Valle et al., 2012). Our estimation method is based on the fact that the powerful hidden Markov model (HMM) machinery can be applied in order to evaluate an arbitrarily accurate approximation of the likelihood of an SVM model with SMN distributions. The method is based on the proposal of Langrock et al. (2012) and makes explicit the useful link between HMMs and SVM models with SMN distributions. Likelihood-based estimation of the parameters of stochastic volatility models in general, and SVM models with SMN distributions in particular, is usually regarded as challenging as the likelihood is a high-dimensional multiple integral. However, the HMM approximation, which is very easy to implement, makes numerical maximum of the likelihood feasible and leads to simple formulae for forecast distributions, for computing appropriately defined residuals, and for decoding, i.e., estimating the volatility of the process.
Inferring social structure from continuous-time interaction data
Relational event data, which consist of events involving pairs of actors over time, are now commonly available at the finest of temporal resolutions. Existing continuous-time methods for modeling such data are based on point processes and directly model interaction "contagion," whereby one interaction increases the propensity of future interactions among actors, often as dictated by some latent variable structure. In this article, we present an alternative approach to using temporal-relational point process models for continuous-time event data. We characterize interactions between a pair of actors as either spurious or as resulting from an underlying, persistent connection in a latent social network. We argue that consistent deviations from expected behavior, rather than solely high frequency counts, are crucial for identifying well-established underlying social relationships. This study aims to explore these latent network structures in two contexts: one comprising of college students and another involving barn swallows.
Integrative Interaction Analysis using Threshold Gradient Directed Regularization
For many complex business and industry problems, high-dimensional data collection and modeling have been conducted. It has been shown that interactions may have important implications beyond main effects. The number of unknown parameters in an interaction analysis can be larger or much larger than the sample size. As such, results generated from analyzing a single dataset are often unsatisfactory. Integrative analysis, which jointly analyzes the raw data from multiple independent studies, has been conducted in a series of recent studies and shown to outperform single-dataset analysis, meta-analysis, and other multi-datasets analyses. In this study, our goal is to conduct integrative analysis in interaction analysis. For regularized estimation and selection of important interactions (and main effects), we apply a Threshold Gradient Directed Regularization (TGDR) approach. Advancing from the exiting studies, the TGDR approach is modified to respect the "main effects, interactions" hierarchy. The proposed approach has an intuitive formulation and is computationally simple and broadly applicable. Simulations and the analyses of financial early warning system data and news-APP recommendation behavior data demonstrate its satisfactory practical performance.
Mode hunting through active information
We propose a new method to find modes based on active information. We develop an algorithm called active information mode hunting (AIMH) that, when applied to the whole space, will say whether there are any modes present where they are. We show AIMH is consistent and, given that information increases where probability decreases, it helps to overcome issues with the curse of dimensionality. The AIMH also reduces the dimensionality with no resource to principal components. We illustrate the method in three ways: with a theoretical example (showing how it performs better than other mode hunting strategies), a real dataset business application, and a simulation.
Weak signals in high-dimension regression: detection, estimation and prediction
Regularization methods, including Lasso, group Lasso and SCAD, typically focus on selecting variables with strong effects while ignoring weak signals. This may result in biased prediction, especially when weak signals outnumber strong signals. This paper aims to incorporate weak signals in variable selection, estimation and prediction. We propose a two-stage procedure, consisting of variable selection and post-selection estimation. The variable selection stage involves a covariance-insured screening for detecting weak signals, while the post-selection estimation stage involves a shrinkage estimator for jointly estimating strong and weak signals selected from the first stage. We term the proposed method as the covariance-insured screening based post-selection shrinkage estimator. We establish asymptotic properties for the proposed method and show, via simulations, that incorporating weak signals can improve estimation and prediction performance. We apply the proposed method to predict the annual gross domestic product (GDP) rates based on various socioeconomic indicators for 82 countries.
Copula-based robust optimal block designs
Blocking is often used to reduce known variability in designed experiments by collecting together homogeneous experimental units. A common modeling assumption for such experiments is that responses from units within a block are dependent. Accounting for such dependencies in both the design of the experiment and the modeling of the resulting data when the response is not normally distributed can be challenging, particularly in terms of the computation required to find an optimal design. The application of copulas and marginal modeling provides a computationally efficient approach for estimating population-average treatment effects. Motivated by an experiment from materials testing, we develop and demonstrate designs with blocks of size two using copula models. Such designs are also important in applications ranging from microarray experiments to experiments on human eyes or limbs with naturally occurring blocks of size two. We present a methodology for design selection, make comparisons to existing approaches in the literature, and assess the robustness of the designs to modeling assumptions.