Double/debiased machine learning for logistic partially linear model
We propose double/debiased machine learning approaches to infer a parametric component of a logistic partially linear model. Our framework is based on a Neyman orthogonal score equation consisting of two nuisance models for the nonparametric component of the logistic model and conditional mean of the exposure with the control group. To estimate the nuisance models, we separately consider the use of high dimensional (HD) sparse regression and (nonparametric) machine learning (ML) methods. In the HD case, we derive certain moment equations to calibrate the first order bias of the nuisance models, which preserves the model double robustness property. In the ML case, we handle the nonlinearity of the logit link through a novel and easy-to-implement 'full model refitting' procedure. We evaluate our methods through simulation and apply them in assessing the effect of the emergency contraceptive pill on early gestation and new births based on a 2008 policy reform in Chile.
Partial effects in non-linear panel data models with correlated random effects
Nonlinearity and heterogeneity are known to cause difficulties in estimating and interpreting partial effects. This paper provides a systematic characterization of the various partial effects in nonlinear panel data models that might be of interest to empirical researchers. The interpretation of the partial effects depends upon (i) whether the distribution of unobserved heterogeneity is treated as fixed or allowed to vary with covariates, and (ii) whether one is interested in particular covariate values or an average over such values. The characterization covers partial-effects concepts already in the literature but also includes new concepts for partial effects. A simple panel probit design highlights that the different partial effects can be quantitatively very different.
Using a Satisficing Model of Experimenter Decision-Making to Guide Finite-Sample Inference for Compromised Experiments
This paper presents a simple decision-theoretic economic approach for analyzing social experiments with compromised random assignment protocols that are only partially documented. We model administratively constrained experimenters who satisfice in seeking covariate balance. We develop design-based small-sample hypothesis tests that use worst-case (least favorable) randomization null distributions. Our approach accommodates a variety of compromised experiments, including imperfectly documented re-randomization designs. To make our analysis concrete, we focus much of our discussion on the influential Perry Preschool Project. We reexamine previous estimates of program effectiveness using our methods. The choice of how to model reassignment vitally affects inference.
Model averaging estimation for high-dimensional covariance matrices with a network structure
In this paper, we develop a model averaging method to estimate a high-dimensional covariance matrix, where the candidate models are constructed by different orders of polynomial functions. We propose a Mallows-type model averaging criterion and select the weights by minimizing this criterion, which is an unbiased estimator of the expected in-sample squared error plus a constant. Then, we prove the asymptotic optimality of the resulting model average covariance estimators. Finally, we conduct numerical simulations and a case study on Chinese airport network structure data to demonstrate the usefulness of the proposed approaches.
My friend far, far away: a random field approach to exponential random graph models
We explore the asymptotic properties of strategic models of network formation in very large populations. Specifically, we focus on (undirected) exponential random graph models. We want to recover a set of parameters from the individuals' utility functions using the observation of a single, but large, social network. We show that, under some conditions, a simple logit-based estimator is coherent, consistent and asymptotically normally distributed under a weak version of homophily. The approach is compelling as the computing time is minimal and the estimator can be easily implemented using pre-programmed estimators available in most statistical packages. We provide an application of our method using the Add Health database.
Peer effects in bedtime decisions among adolescents: a social network model with sampled data
Using unique information on a representative sample of US teenagers, we investigate peer effects in adolescent bedtime decisions. We extend the nonlinear least-squares estimator for spatial autoregressive models to estimate network models with network fixed effects and sampled observations on the dependent variable. We show the extent to which neglecting the sampling issue yields misleading inferential results. When accounting for sampling, we find that, besides the individual, family and peer characteristics, the bedtime decisions of peers help to shape one's own bedtime decision.
More reliable inference for the dissimilarity index of segregation
The most widely used measure of segregation is the so-called dissimilarity index. It is now well understood that this measure also reflects randomness in the allocation of individuals to units (i.e. it measures deviations from evenness, not deviations from randomness). This leads to potentially large values of the segregation index when unit sizes and/or minority proportions are small, even if there is no underlying systematic segregation. Our response to this is to produce adjustments to the index, based on an underlying statistical model. We specify the assignment problem in a very general way, with differences in conditional assignment probabilities underlying the resulting segregation. From this, we derive a likelihood ratio test for the presence of any systematic segregation, and bias adjustments to the dissimilarity index. We further develop the asymptotic distribution theory for testing hypotheses concerning the magnitude of the segregation index and show that the use of bootstrap methods can improve the size and power properties of test procedures considerably. We illustrate these methods by comparing dissimilarity indices across school districts in England to measure social segregation.
An instrumental variable random-coefficients model for binary outcomes
In this paper, we study a random-coefficients model for a binary outcome. We allow for the possibility that some or even all of the explanatory variables are arbitrarily correlated with the random coefficients, thus permitting endogeneity. We assume the existence of observed instrumental variables that are jointly independent with the random coefficients, although we place no structure on the joint determination of the endogenous variable and instruments , as would be required for a control function approach. The model fits within the spectrum of generalized instrumental variable models, and we thus apply identification results from our previous studies of such models to the present context, demonstrating their use. Specifically, we characterize the identified set for the distribution of random coefficients in the binary response model with endogeneity via a collection of conditional moment inequalities, and we investigate the structure of these sets by way of numerical illustration.
A Note on Adapting Propensity Score Matching and Selection Models to Choice Based Samples