Bayesian analysis of multivariate mixed longitudinal ordinal and continuous data
Multivariate longitudinal ordinal and continuous data exist in many scientific fields. However, it is a rigorous task to jointly analyse them due to the complicated correlated structures of those mixed data and the lack of a multivariate distribution. The multivariate probit model, assuming there is a multivariate normal latent variable for each multivariate ordinal data, becomes a natural modeling choice for longitudinal ordinal data especially for jointly analysing with longitudinal continuous data. However, the identifiable multivariate probit model requires the variances of the latent normal variables to be fixed at 1, thus the joint covariance matrix of the latent variables and the continuous multivariate normal variables is restricted at some of the diagonal elements. This constrains to develop both the classical and Bayesian methods to analyse mixed ordinal and continuous data. In this investigation, we proposed three Markov chain Monte Carlo (MCMC) methods: Metropolis--Hastings within Gibbs algorithm based on the identifiable model, and a Gibbs sampling algorithm and parameter-expanded data augmentation based on the constructed non-identifiable model. Through simulation studies and a real data application, we illustrated the performance of these three methods and provided an observation of using non-identifiable model to develop MCMC sampling methods.
Tail sums of Wishart and Gaussian eigenvalues beyond the bulk edge
Consider the classical Gaussian unitary ensemble of size and the real white Wishart ensemble with variables and degrees of freedom. In the limits of large and , with positive ratio in the Wishart case, the expected number of eigenvalues that exit the upper bulk edge is less than one, approaching 0.031 and 0.170 respectively, the latter number being independent of . These statements are consequences of quantitative bounds on tail sums of eigenvalues outside the bulk which are established here for applications in high dimensional covariance matrix estimation.
A survey of high dimension low sample size asymptotics
Peter Hall's work illuminated many aspects of statistical thought, some of which are very well known including the bootstrap and smoothing. However, he also explored many other lesser known aspects of mathematical statistics. This is a survey of one of those areas, initiated by a seminal paper in 2005, on high dimension low sample size asymptotics. An interesting characteristic of that first paper, and of many of the following papers, is that they contain deep and insightful concepts which are frequently surprising and counter-intuitive, yet have mathematical underpinnings which tend to be direct and not difficult to prove.
An Improved Test of Equality of Mean Directions for the Langevin-von Mises-Fisher Distribution
A multi-sample test for equality of mean directions is developed for populations having Langevin-von Mises-Fisher distributions with a common unknown concentration. The proposed test statistic is a monotone transformation of the likelihood ratio. The high-concentration asymptotic null distribution of the test statistic is derived. In contrast to previously suggested high-concentration tests, the high-concentration asymptotic approximation to the null distribution of the proposed test statistic is also valid for large sample sizes with any fixed nonzero concentration parameter. Simulations of size and power show that the proposed test outperforms competing tests. An example with three-dimensional data from an anthropological study illustrates the practical application of the testing procedure.
Evaluating the Contributions of Individual Variables to a Quadratic Form
Quadratic forms capture multivariate information in a single number, making them useful, for example, in hypothesis testing. When a quadratic form is large and hence interesting, it might be informative to partition the quadratic form into contributions of individual variables. In this paper it is argued that meaningful partitions can be formed, though the precise partition that is determined will depend on the criterion used to select it. An intuitively reasonable criterion is proposed and the partition to which it leads is determined. The partition is based on a transformation that maximises the sum of the correlations between individual variables and the variables to which they transform under a constraint. Properties of the partition, including optimality properties, are examined. The contributions of individual variables to a quadratic form are less clear-cut when variables are collinear, and forming new variables through rotation can lead to greater transparency. The transformation is adapted so that it has an invariance property under such rotation, whereby the assessed contributions are unchanged for variables that the rotation does not affect directly. Application of the partition to Hotelling's one- and two-sample test statistics, Mahalanobis distance and discriminant analysis is described and illustrated through examples. It is shown that bootstrap confidence intervals for the contributions of individual variables to a partition are readily obtained.
COVARIATE DECOMPOSITION METHODS FOR LONGITUDINAL MISSING-AT-RANDOM DATA AND PREDICTORS ASSOCIATED WITH SUBJECT-SPECIFIC EFFECTS
Investigators often gather longitudinal data to assess changes in responses over time within subjects and to relate these changes to within-subject changes in predictors. Missing data are common in such studies and predictors can be correlated with subject-specific effects. Maximum likelihood methods for generalized linear mixed models provide consistent estimates when the data are `missing at random' (MAR) but can produce inconsistent estimates in settings where the random effects are correlated with one of the predictors. On the other hand, conditional maximum likelihood methods (and closely related maximum likelihood methods that partition covariates into between- and within-cluster components) provide consistent estimation when random effects are correlated with predictors but can produce inconsistent covariate effect estimates when data are MAR. Using theory, simulation studies, and fits to example data this paper shows that decomposition methods using complete covariate information produce consistent estimates. In some practical cases these methods, that ostensibly require complete covariate information, actually only involve the observed covariates. These results offer an easy-to-use approach to simultaneously protect against bias from both cluster-level confounding and MAR missingness in assessments of change.
A Semiparametric Bayesian Approach to Multivariate Longitudinal Data
We extend the standard multivariate mixed model by incorporating a smooth time effect and relaxing distributional assumptions. We propose a semiparametric Bayesian approach to multivariate longitudinal data using a mixture of Polya trees prior distribution. Usually, the distribution of random effects in a longitudinal data model is assumed to be Gaussian. However, the normality assumption may be suspect, particularly if the estimated longitudinal trajectory parameters exhibit multimodality and skewness. In this paper we propose a mixture of Polya trees prior density to address the limitations of the parametric random effects distribution. We illustrate the methodology by analyzing data from a recent HIV-AIDS study.
Empirical Likelihood Based Inferences for Partially Linear Models with Missing Covariates
This paper considers statistical inference for partially linear models Y = X(T)mu + nu(Z) + epsilon when the linear covariate X is missing with missing probability pi depending upon (Y, Z). We propose empirical likelihood based statistics to construct confidence regions for beta and nu(z). The resulting statistics are shown to be asymptotically chi-squared distributed. Finite sample performance of the proposed statistics is assessed by simulation experiments. The proposed methods are applied to a data set from an AIDS clinical trial.
NONPARAMETRIC ESTIMATION OF CONDITIONAL CUMULATIVE HAZARDS FOR MISSING POPULATION MARKS
A new function for the competing risks model, the conditional cumulative hazard function, is introduced, from which the conditional distribution of failure times of individuals failing due to cause j can be studied. The standard Nelson-Aalen estimator is not appropriate in this setting, as population membership (mark) information may be missing for some individuals owing to random right-censoring. We propose the use of imputed population marks for the censored individuals through fractional risk sets. Some asymptotic properties, including uniform strong consistency, are established. We study the practical performance of this estimator through simulation studies and apply it to a real data set for illustration.