A novel high-order multivariate Markov model for spatiotemporal analysis with application to COVID-19 outbreak
We propose a new strategy for analyzing the evolution of random phenomena over time and space simultaneously based on the high-order multivariate Markov chains. We develop a novel Markov model of order for chains consisting of possible states to gather parsimony with realism. It can capture negative and positive associations among the chains with only a reduced number of parameters, , remarkably lower than required for the full parameterized model. Our model privileges are enhanced by a Monte Carlo simulation experiment, besides application to analyze the spatial-temporal dynamics for the risk level of a recently global pandemic (COVID-19) outbreak in world health organization (WHO) regions for predicting the risk state of epidemiological prevalence and monitoring infection control.
Monitoring parameter change for bivariate time series models of counts
In this study, we consider an online monitoring procedure to detect a parameter change for bivariate time series of counts, following bivariate integer-valued generalized autoregressive heteroscedastic (BIGARCH) and autoregressive (BINAR) models. To handle this problem, we employ the cumulative sum (CUSUM) process constructed from the (standardized) residuals obtained from those models. To attain control limits, we develop limit theorems for the proposed monitoring process. A simulation study and real data analysis are conducted to affirm the validity of the proposed method.
Inflated Density Ratio and Its Variation and Generalization for Computing Marginal Likelihoods
In the Bayesian framework, the marginal likelihood plays an important role in variable selection and model comparison. The marginal likelihood is the marginal density of the data after integrating out the parameters over the parameter space. However, this quantity is often analytically intractable due to the complexity of the model. In this paper, we first examine the properties of the inflated density ratio (IDR) method, which is a Monte Carlo method for computing the marginal likelihood using a single MC or Markov chain Monte Carlo (MCMC) sample. We then develop a variation of the IDR estimator, called the dimension reduced inflated density ratio (Dr.IDR) estimator. We further propose a more general identity and then obtain a general dimension reduced (GDr) estimator. Simulation studies are conducted to examine empirical performance of the IDR estimator as well as the Dr.IDR and GDr estimators. We further demonstrate the usefulness of the GDr estimator for computing the normalizing constants in a case study on the inequality-constrained analysis of variance.
Rejoinder: A comparison of Monte Carlo methods for computing marginal likelihoods of item response theory models
A Comparison of Monte Carlo Methods for Computing Marginal Likelihoods of Item Response Theory Models
Nowadays, Bayesian methods are routinely used for estimating parameters of item response theory (IRT) models. However, the marginal likelihoods are still rarely used for comparing IRT models due to their complexity and a relatively high dimension of the model parameters. In this paper, we review Monte Carlo (MC) methods developed in the literature in recent years and provide a detailed development of how these methods are applied to the IRT models. In particular, we focus on the "best possible" implementation of these MC methods for the IRT models. These MC methods are used to compute the marginal likelihoods under the one-parameter IRT model with the logistic link (1PL model) and the two-parameter logistic IRT model (2PL model) for a real English Examination dataset. We further use the widely applicable information criterion (WAIC) and deviance information criterion (DIC) to compare the 1PL model and the 2PL model. The 2PL model is favored by all of these three Bayesian model comparison criteria for the English Examination data.
Bayesian Empirical Likelihood Methods for Quantile Comparisons
Bayes factors, practical tools of applied statistics, have been dealt with extensively in the literature in the context of hypothesis testing. The Bayes factor based on parametric likelihoods can be considered both as a pure Bayesian approach as well as a standard technique to compute p-values for hypothesis testing. We employ empirical likelihood methodology to modify Bayes factor type procedures for the nonparametric setting. The paper establishes asymptotic approximations to the proposed procedures. These approximations are shown to be similar to those of the classical parametric Bayes factor approach. The proposed approach is applied towards developing testing methods involving quantiles, which are commonly used to characterize distributions. We present and evaluate one and two sample distribution free Bayes factor type methods for testing quantiles based on indicators and smooth kernel functions. An extensive Monte Carlo study and real data examples show that the developed procedures have excellent operating characteristics for one-sample and two-sample data analysis.
Semiparametric Estimation Methods for the Accelerated Failure Time Mixture Cure Model
This paper provides an overview of two semiparametric estimation methods recently proposed in the literature for the accelerated failure time mixture cure model. We prove that the two estimation methods are asymptotically equivalent. A simulation is conducted to investigate the rate of convergence of the two methods. We apply these methods to fit the accelerated failure time mixture cure model to the survival times of leukemia patients receiving bone marrow transplantation.
Asymptotic results for fitting marginal hazards models from stratified case-cohort studies with multiple disease outcomes
In stratified case-cohort designs, samplings of case-cohort samples are conducted via a stratified random sampling based on covariate information available on the entire cohort members. In this paper, we extended the work of Kang & Cai (2009) to a generalized stratified case-cohort study design for failure time data with multiple disease outcomes. Under this study design, we developed weighted estimating procedures for model parameters in marginal multiplicative intensity models and for the cumulative baseline hazard function. The asymptotic properties of the estimators are studied using martingales, modern empirical process theory, and results for finite population sampling.
Discussion of Dette and Trampisch's paper "A general approach to D-optimal designs for weighted univariate polynomial regression models"
This is discussion on "A general approach to D-optimal designs for weighted univariate polynomial regression models" by Holger Dette and Matthias Trampisch.
Rejoinder: Why Do We Test Multiple Traits in Genetic Association Studies?
Why Do We Test Multiple Traits in Genetic Association Studies?
In studies of complex disorders such as nicotine dependence, it is common that researchers assess multiple variables related to a disorder as well as other disorders that are potentially correlated with the primary disorder of interest. In this work, we refer to those variables and disorders broadly as multiple traits. The multiple traits may or may not have a common causal genetic variant. Intuitively, it may be more powerful to accommodate multiple traits in genetic traits, but the analysis of multiple traits is generally more complicated than the analysis of a single trait. Furthermore, it is not well documented as to how much power we may potentially gain by considering multiple traits. Our aim is to enhance our understanding on this important and practical issue. We considered a variety of correlation structures between traits and the disease locus. To focus on the effect of accommodating multiple traits, we examined genetic models that are relatively simple so that we can pinpoint the factors affecting the power. We conducted simulation studies to explore the performance of testing multiple traits simultaneously and the performance of testing a single trait at a time in family-based association studies. Our simulation results demonstrated that the performance of testing multiple traits simultaneously is better than that of testing each trait individually for almost models considered. We also found that the power of association tests varies among the underlying models. The advantage of conducting a multiple traits test is minimized when some traits are influenced by the gene only through other traits; and it is maximized when there are causal relations between the traits and the gene, and among the traits themselves or when there are extraneous traits.
Detecting possibly frequent change-points: Wild Binary Segmentation 2 and steepest-drop model selection-rejoinder
Many existing procedures for detecting multiple change-points in data sequences fail in frequent-change-point scenarios. This article proposes a new change-point detection methodology designed to work well in both infrequent and frequent change-point settings. It is made up of two ingredients: one is "Wild Binary Segmentation 2" (WBS2), a recursive algorithm for producing what we call a 'complete' solution path to the change-point detection problem, i.e. a sequence of estimated nested models containing change-points, where is the data length. The other ingredient is a new model selection procedure, referred to as "Steepest Drop to Low Levels" (SDLL). The SDLL criterion acts on the WBS2 solution path, and, unlike many existing model selection procedures for change-point problems, it is not penalty-based, and only uses thresholding as a certain discrete secondary check. The resulting WBS2.SDLL procedure, combining both ingredients, is shown to be consistent, and to significantly outperform the competition in the frequent change-point scenarios tested. WBS2.SDLL is fast, easy to code and does not require the choice of a window or span parameter.
Seroprevalence of SARS-CoV-2 antibodies in South Korea
In 2020, Korea Disease Control and Prevention Agency reported three rounds of surveys on seroprevalence of severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) antibodies in South Korea. SARS-CoV-2 is the virus which inflicts the coronavirus disease 2019 (COVID-19). We analyze the seroprevalence surveys using a Bayesian method with an informative prior distribution on the seroprevalence parameter, and the sensitivity and specificity of the diagnostic test. We construct the informative prior of the sensitivity and specificity of the diagnostic test using the posterior distribution obtained from the clinical evaluation data. The constraint of the seroprevalence parameter induced from the known confirmed coronavirus 2019 cases can be imposed naturally in the proposed Bayesian model. We also prove that the confidence interval of the seroprevalence parameter based on the Rao's test can be the empty set, while the Bayesian method renders interval estimators with coverage probability close to the nominal level. As of the 30th of October 2020, the credible interval of the estimated SARS-CoV-2 positive population does not exceed 318, 685, approximately of the Korean population.
Empirical likelihood for spatial dynamic panel data models
Spatial dynamic panel data (SDPD) models have received great attention in economics in recent 10 years. Existing approaches for the estimation and test of SDPD models are quasi-maximum likelihood (QML) approach and generalized method of moments (GMM). In this article, we introduce the empirical likelihood (EL) method to the statistical inference for SDPD models. The EL ratio statistics are constructed for the parameters of spatial dynamic panel data models. It is shown that the limiting distributions of the empirical likelihood ratio statistics are chi-squared distributions, which are used to construct confidence regions for the parameters of the models. Simulation results show that the EL based confidence regions outperform the normal approximation based confidence regions.
Goodness of fit test for uniform distribution with censored observation
We develop new goodness of fit test for uniform distribution based on a conditional moment characterization. We study the asymptotic properties of the proposed test statistic. We also present a goodness of fit test for uniform distribution to incorporate the right censored observations and studied its properties. A Monte Carlo simulation study is carried out to evaluate the finite sample performance of the proposed tests. We illustrate the test procedures using real data sets.