A semiparametric multiply robust multiple imputation method for causal inference
Evaluating the impact of non-randomized treatment on various health outcomes is difficult in observational studies because of the presence of covariates that may affect both the treatment or exposure received and the outcome of interest. In the present study, we develop a semiparametric multiply robust multiple imputation method for estimating average treatment effects in such studies. Our method combines information from multiple propensity score models and outcome regression models, and is multiply robust in that it produces consistent estimators for the average causal effects if at least one of the models is correctly specified. Our proposed estimators show promising performances even with incorrect models. Compared with existing fully parametric approaches, our proposed method is more robust against model misspecifications. Compared with fully non-parametric approaches, our proposed method does not have the problem of curse of dimensionality and achieves dimension reduction by combining information from multiple models. In addition, it is less sensitive to the extreme propensity score estimates compared with inverse propensity score weighted estimators and augmented estimators. The asymptotic properties of our method are developed and the simulation study shows the advantages of our proposed method compared with some existing methods in terms of balancing efficiency, bias, and coverage probability. Rubin's variance estimation formula can be used for estimating the variance of our proposed estimators. Finally, we apply our method to 2009-2010 National Health Nutrition and Examination Survey (NHANES) to examine the effect of exposure to perfluoroalkyl acids (PFAs) on kidney function.
Real-time changepoint detection in a nonlinear expectile model
An online changepoint detection procedure based on conditional expectiles is introduced. The key contribution is threefold: nonlinearity of the underlying model improves the overall flexibility while a parametric form of the unknown regression function preserves a simple and straightforward interpretation; The conditional expectiles, well-known in econometrics for being the only coherent and elicitable risk measure, introduce additional robustness-especially with respect to asymmetric error distributions common in various types of data; The proposed statistical test is proved to be consistent and the distribution under the null hypothesis does not depend on the functional form of the underlying model nor the unknown parameters. Empirical properties of the proposed real-time changepoint detection test are investigated in a simulation study and a practical applicability is illustrated using the Covid-19 prevalence data from Prague.
Most Powerful Test Sequences with Early Stopping Options
We extended the application of uniformly most powerful tests to sequential tests with different stage-specific sample sizes and critical regions. In the one parameter exponential family, likelihood ratio sequential tests are shown to be uniformly most powerful for any predetermined -spending function and stage-specific sample sizes. To obtain this result, the probability measure of a group sequential design is constructed with support for all possible outcome events, as is useful for designing an experiment prior to having data. This construction identifies impossible events that are not part of the support. The overall probability distribution is dissected into components determined by the stopping stage. These components are the sub-densities of interim test statistics first described by Armitage, McPherson and Rowe (1969) that are commonly used to create stopping boundaries given an -spending function and a set of interim analysis times. Likelihood expressions conditional on reaching a stage are given to connect pieces of the probability anatomy together. The reduction of support caused by the adoption of an early stopping rule induces sequential truncation (not nesting) in the probability distributions of possible events. Multiple testing induces mixtures on the adapted support. Even asymptotic distributions of inferential statistics that are typically normal, are not. Rather they are derived from mixtures of truncated multivariate normal distributions.
Classes of Multiple Decision Functions Strongly Controlling FWER and FDR
Two general classes of multiple decision functions, where each member of the first class strongly controls the family-wise error rate (FWER), while each member of the second class strongly controls the false discovery rate (FDR), are described. These classes offer the possibility that optimal multiple decision functions with respect to a pre-specified Type II error criterion, such as the missed discovery rate (MDR), could be found which control the FWER or FDR Type I error rates. The gain in MDR of the associated FDR-controlling procedure relative to the well-known Benjamini-Hochberg (BH) procedure is demonstrated via a modest simulation study with gamma-distributed component data. Such multiple decision functions may have the potential of being utilized in multiple testing, specifically in the analysis of high-dimensional data sets.
Optimal Hypothesis Testing: From Semi to Fully Bayes Factors
We propose and examine statistical test-strategies that are somewhat between the maximum likelihood ratio and Bayes factor methods that are well addressed in the literature. The paper shows an optimality of the proposed tests of hypothesis. We demonstrate that our approach can be easily applied to practical studies, because execution of the tests does not require deriving of asymptotical analytical solutions regarding the type I error. However, when the proposed method is utilized, the classical significance level of tests can be controlled.
Quadratic semiparametric Von Mises calculus
We discuss a new method of estimation of parameters in semiparametric and nonparametric models. The method is based on U-statistics constructed from quadratic influence functions. The latter extend ordinary linear influence functions of the parameter of interest as defined in semiparametric theory, and represent second order derivatives of this parameter. For parameters for which the matching cannot be perfect the method leads to a bias-variance trade-off, and results in estimators that converge at a slower than n(-1/2)-rate. In a number of examples the resulting rate can be shown to be optimal. We are particularly interested in estimating parameters in models with a nuisance parameter of high dimension or low regularity, where the parameter of interest cannot be estimated at n(-1/2)-rate.
Detecting multiple generalized change-points by isolating single ones
We introduce a new approach, called Isolate-Detect (ID), for the consistent estimation of the number and location of multiple generalized change-points in noisy data sequences. Examples of signal changes that ID can deal with are changes in the mean of a piecewise-constant signal and changes, continuous or not, in the linear trend. The number of change-points can increase with the sample size. Our method is based on an isolation technique, which prevents the consideration of intervals that contain more than one change-point. This isolation enhances ID's accuracy as it allows for detection in the presence of frequent changes of possibly small magnitudes. In ID, model selection is carried out via thresholding, or an information criterion, or SDLL, or a hybrid involving the former two. The hybrid model selection leads to a general method with very good practical performance and minimal parameter choice. In the scenarios tested, ID is at least as accurate as the state-of-the-art methods; most of the times it outperforms them. ID is implemented in the R packages and , available from CRAN.
Checking for model failure and for prior-data conflict with the constrained multinomial model
Multinomial models can be difficult to use when constraints are placed on the probabilities. An exact model checking procedure for such models is developed based on a uniform prior on the full multinomial model. For inference, a nonuniform prior can be used and a consistency theorem is proved concerning a check for prior-data conflict with the chosen prior. Applications are presented and a new elicitation methodology is developed for multinomial models with ordered probabilities.
On a stochastic order induced by an extension of Panjer's family of discrete distributions
We factorize probability mass functions of discrete distributions belonging to Panjer's family and to its certain extensions to define a stochastic order on the space of distributions supported on . Main properties of this order are presented. Comparison of some well-known distributions with respect to this order allows to generate new families of distributions that satisfy various recurrrence relations. The recursion formula for the probabilities of corresponding compound distributions for one such family is derived. Applications to various domains of reliability theory are provided.