Journal of Official Statistics

Reliable event rates for disease mapping
Quick H and Song G
When analyzing spatially referenced event data, the criteria for declaring rates as "reliable" is still a matter of dispute. What these varying criteria have in common, however, is that they are rarely satisfied for crude estimates in small area analysis settings, prompting the use of spatial models to improve reliability. While reasonable, recent work has quantified the extent to which popular models from the spatial statistics literature can overwhelm the information contained in the data, leading to oversmoothing. Here, we begin by providing a definition for a "reliable" estimate for event rates that can be used for crude and model-based estimates and allows for discrete and continuous statements of reliability. We then construct a spatial Bayesian framework that allows users to infuse prior information into their models to improve reliability while also guarding against oversmoothing. We apply our approach to county-level birth data from Pennsylvania, highlighting the effect of oversmoothing in spatial models and how our approach can allow users to better focus their attention to areas where sufficient data exists to drive inferential decisions. We then conclude with a brief discussion of how this definition of reliability can be used in the design of small area studies.
Modeling the Relationship between Proxy Measures of Respondent Burden and Survey Response Rates in a Household Panel Survey
Earp M, Kaplan R and Toth D
Respondent burden has important implications for survey outcomes, including response rates and attrition in panel surveys. Despite this, respondent burden remains an understudied topic in the field of survey methodology, with few researchers systematically measuring objective and subjective burden factors in surveys used to produce official statistics. This research was designed to assess the impact of proxy measures of respondent burden, drawing on both objective (survey length and frequency), and subjective (effort, saliency, and sensitivity) burden measures on response rates over time in the Current Population Survey (CPS). Exploratory Factor Analysis confirmed the burden proxy measures were interrelated and formed five distinct factors. Regression tree models further indicated that both objective and subjective proxy burden factors were predictive of future CPS response rates. Additionally, respondent characteristics, including employment and marital status, interacted with these burden factors to further help predict response rates over time. We discuss the implications of these findings, including the importance of measuring both objective and subjective burden factors in production surveys. Our findings support a growing body of research suggesting that subjective burden and individual respondent characteristics should be incorporated into conceptual definitions of respondent burden and have implications for adaptive design.
A User-Driven Method for Using Research Products to Empirically Assess Item Importance in National Surveys
Ong AR, Schultz R, Sinozich S, West BT, Wagner J, Sinibaldi J and Finamore J
Large-scale, nationally representative surveys serve many vital functions, but these surveys are often long and burdensome for respondents. Cutting survey length can help to reduce respondent burden and may improve data quality but removing items from these surveys is not a trivial matter. We propose a method to empirically assess item importance and associated burden in national surveys and guide this decision-making process using different research products produced from such surveys. This method is demonstrated using the Survey of Doctorate Recipients (SDR), a biennial survey administered to individuals with a Science, Engineering, and Health doctorate. We used three main sources of information on the SDR variables: 1) a bibliography of documents using the SDR data, 2) the SDR website that allows users to download summary data, and 3) web timing paradata and break-off rates. The bibliography was coded for SDR variable usage and citation counts. Putting this information together, we identified 35 unused items (17% of the survey) by any of these sources and found that the most burdensome items are highly important. We conclude with general recommendations for those hoping to employ similar methodologies in the future.
Variable inclusion strategies through directed acyclic graphs to adjust health surveys subject to selection bias for producing national estimates
Li Y, Irimata KE, He Y and Parker J
Along with the rapid emergence of web surveys to address time-sensitive priority topics, various propensity score (PS)-based adjustment methods have been developed to improve population representativeness for nonprobability- or probability-sampled web surveys subject to selection bias. Conventional PS-based methods construct pseudo-weights for web samples using a higher-quality reference probability sample. The bias reduction, however, depends on the outcome and variables collected in both web and reference samples. A central issue is identifying variables for inclusion in PS-adjustment. In this paper, directed acyclic graph (DAG), a common graphical tool for causal studies but largely under-utilized in survey research, is used to examine and elucidate how different types of variables in the causal pathways impact the performance of PS-adjustment. While past literature generally recommends including all variables, our research demonstrates that only certain types of variables are needed in PS-adjustment. Our research is illustrated by NCHS' Research and Development Survey, a probability-sampled web survey with potential selection bias, PS-adjusted to the National Health Interview Survey, to estimate U.S. asthma prevalence. Findings in this paper can be used by National Statistics Offices to design questionnaires with variables that improve web-samples' population representativeness and to release more timely and accurate estimates for priority topics.
Probabilistic Projection of Subnational Life Expectancy
Sevcikova H and Raftery AE
Projecting mortality for subnational units, or regions, is of great interest to practicing demographers. We seek a probabilistic method for projecting subnational life expectancy that is based on the national Bayesian hierarchical model used by the United Nations, and at the same time is easy to use. We propose three methods of this kind. Two of them are variants of simple scaling methods. The third method models life expectancy for a region as equal to national life expectancy plus a region-specific stochastic process which is a heteroskedastic first-order autoregressive process (AR(1)), with a variance that declines to a constant as life expectancy increases. We apply our models to data from 29 countries. In an out-of-sample comparison, the proposed methods outperformed other comparative methods and were well calibrated for individual regions. The AR(1) method performed best in terms of crossover patterns between regions. Although the methods work well for individual regions, there are some limitations when evaluating within-country variation. We identified four countries for which the AR(1) method either underestimated or overestimated the predictive between-region within-country standard deviation. However, none of the competing methods works better in this regard than the AR(1) method. In addition to providing the full distribution of subnational life expectancy, the methods can be used to obtain probabilistic forecasts of age-specific mortality rates.
A simulation study of diagnostics for selection bias
Boonstra PS, Little RJA, West BT, Andridge RR and Alvarado-Leiton F
A non-probability sampling mechanism arising from non-response or non-selection is likely to bias estimates of parameters with respect to a target population of interest. This bias poses a unique challenge when selection is 'non-ignorable', i.e. dependent upon the unobserved outcome of interest, since it is then undetectable and thus cannot be ameliorated. We extend a simulation study by Nishimura et al. [, 84, 43-62 (2016)], adding two recently published statistics: the so-called 'standardized measure of unadjusted bias (SMUB)' and 'standardized measure of adjusted bias (SMAB)', which explicitly quantify the extent of bias (in the case of SMUB) or non-ignorable bias (in the case of SMAB) under the assumption that a specified amount of non-ignorable selection exists. Our findings suggest that this new sensitivity diagnostic is more correlated with, and more predictive of, the true, unknown extent of selection bias than other diagnostics, even when the underlying assumed level of non-ignorability is incorrect.
Weighted Dirichlet Process Mixture Models to Accommodate Complex Sample Designs for Linear and Quantile Regression
Elliott MR and Xia X
Standard randomization-based inference conditions on the data in the population and makes inference with respect to the repeating sampling properties of the sampling indicators. In some settings these estimators can be quite unstable; Bayesian model-based approaches focus on the posterior predictive distribution of population quantities, potentially providing a better balance between bias correction and efficiency. Previous work in this area has focused on estimation of means and linear and generalized linear regression parameters; these methods do not allow for a general estimation of distributional functions such as quantile or quantile regression parameters. Here we adapt an extended Dirichlet Process Mixture model that allows the DP prior to be a mixture of DP random basis measures that are a function of covariates. These models allow many mixture components when necessary to accommodate the sample design, but can shrink to few components for more efficient estimation when the data allow. We provide an application to the estimation of relationships between serum dioxin levels and age in the US population, either at the mean level (via linear regression) or across the dioxin distribution (via quantile regression) using the National Health and Nutrition Examination Survey.
Comparing the Ability of Regression Modeling and Bayesian Additive Regression Trees to Predict Costs in a Responsive Survey Design Context
Wagner J, West BT, Elliott MR and Coffey S
Responsive survey designs rely upon incoming data from the field data collection to optimize cost and quality tradeoffs. In order to make these decisions in real-time, survey managers rely upon monitoring tools that generate proxy indicators for cost and quality. There is a developing literature on proxy indicators for the risk of nonresponse bias. However, there is very little research on proxy indicators for costs and almost none aimed at predicting costs under alternative design strategies. Predictions of survey costs and proxy error indicators can be used to optimize survey designs in real time. Using data from the National Survey of Family Growth, we evaluate alternative modeling strategies aimed at predicting survey costs (specifically, interviewer hours). The models include multilevel regression (with random interviewer effects) and Bayesian Additive Regression Trees (BART).
Exploring Mechanisms of Recruitment and Recruitment Cooperation in Respondent Driven Sampling
Lee S, Ong AR and Elliott M
Respondent driven sampling (RDS) is a sampling method designed for hard-to-sample groups with strong social ties. RDS starts with a small number of arbitrarily selected participants (""). Seeds are issued recruitment coupons, which are used to recruit from their social networks. Waves of recruitment and data collection continue until reaching a sufficient sample size. Under the assumptions of random recruitment, with-replacement sampling, and a sufficient number of waves, the probability of selection for each participant converges to be proportional to their network size. With recruitment noncooperation, however, recruitment can end abruptly, causing operational difficulties with unstable sample sizes. Noncooperation may void the recruitment Markovian assumptions, leading to selection bias. Here, we consider two RDS studies: one targeting Korean immigrants in Los Angeles and in Michigan; and another study targeting persons who inject drugs in Southeast Michigan. We explore predictors of coupon redemption, associations between recruiter and recruits, and details within recruitment dynamics. While no consistent predictors of noncooperation were found, there was evidence that coupon redemption of targeted recruits was more common among those who shared social bonds with their recruiters, suggesting that noncooperation is more likely to be a feature of recruits not cooperating, rather than recruiters failing to distribute coupons.
The Joinpoint-Jump and Joinpoint-Comparability Ratio Model for Trend Analysis with Applications to Coding Changes in Health Statistics
Chen HS, Zeichner S, Anderson RN, Espey DK, Kim HJ and Feuer EJ
Analysis of trends in health data collected over time can be affected by instantaneous changes in coding that cause sudden increases/decreases, or "jumps," in data. Despite these sudden changes, the underlying continuous trends can present valuable information related to the changing risk profile of the population, the introduction of screening, new diagnostic technologies, or other causes. The joinpoint model is a well-established methodology for modeling trends over time using connected linear segments, usually on a logarithmic scale. Joinpoint models that ignore data jumps due to coding changes may produce biased estimates of trends. In this article, we introduce methods to incorporate a sudden discontinuous jump in an otherwise continuous joinpoint model. The size of the jump is either estimated directly (the Joinpoint-Jump model) or estimated using supplementary data (the Joinpoint-Comparability Ratio model). Examples using ICD-9/ICD-10 cause of death coding changes, and coding changes in the staging of cancer illustrate the use of these models.
Measuring Trust in Medical Researchers: Adding Insights from Cognitive Interviews to Examine Agree-Disagree and Construct-Specific Survey Questions
Dykema J, Garbarski D, Wal IF and Edward DF
While scales measuring subjective constructs historically rely on agree-disagree (AD) questions, recent research demonstrates that construct-specific (CS) questions clarify underlying response dimensions that AD questions leave implicit and CS questions often yield higher measures of data quality. Given acknowledged issues with AD questions and certain established advantages of CS items, the evidence for the superiority of CS questions is more mixed than one might expect. We build on previous investigations by using cognitive interviewing to deepen understanding of AD and CS response processing and potential sources of measurement error. We randomized 64 participants to receive an AD or CS version of a scale measuring trust in medical researchers. We examine several indicators of data quality and cognitive response processing including: reliability, concurrent validity, recency, response latencies, and indicators of response processing difficulties (e.g., uncodable answers). Overall, results indicate reliability is higher for the AD scale, neither scale is more valid, and the CS scale is more susceptible to recency effects for certain questions. Results for response latencies and behavioral indicators provide evidence that the CS questions promote deeper processing. Qualitative analysis reveals five sources of difficulties with response processing that shed light on under-examined reasons why AD and CS questions can produce different results, with CS not always yielding higher measures of data quality than AD.
Weight Smoothing for Generalized Linear Models Using a Laplace Prior
Xia X and Elliott MR
When analyzing data sampled with unequal inclusion probabilities, correlations between the probability of selection and the sampled data can induce bias if the inclusion probabilities are ignored in the analysis. Weights equal to the inverse of the probability of inclusion are commonly used to correct possible bias. When weights are uncorrelated with the descriptive or model estimators of interest, highly disproportional sample designs resulting in large weights can introduce unnecessary variability, leading to an overall larger mean square error compared to unweighted methods. We describe an approach we term 'weight smoothing' that models the interactions between the weights and the estimators as random effects, reducing the root mean square error (RMSE) by shrinking interactions toward zero when such shrinkage is allowed by the data. This article adapts a flexible Laplace prior distribution for the hierarchical Bayesian model to gain a more robust bias-variance tradeoff than previous approaches using normal priors. Simulation and application suggest that under a linear model setting, weight-smoothing models with Laplace priors yield robust results when weighting is necessary, and provide considerable reduction in RMSE otherwise. In logistic regression models, estimates using weight-smoothing models with Laplace priors are robust, but with less gain in efficiency than in linear regression settings.
Synthetic Multiple-Imputation Procedure for Multistage Complex Samples
Zhou H, Elliott MR and Raghunathan TE
Multiple imputation (MI) is commonly used when item-level missing data are present. However, MI requires that survey design information be built into the imputation models. For multistage stratified clustered designs, this requires dummy variables to represent strata as well as primary sampling units (PSUs) nested within each stratum in the imputation model. Such a modeling strategy is not only operationally burdensome but also inferentially inefficient when there are many strata in the sample design. Complexity only increases when sampling weights need to be modeled. This article develops a general-purpose analytic strategy for population inference from complex sample designs with item-level missingness. In a simulation study, the proposed procedures demonstrate efficient estimation and good coverage properties. We also consider an application to accommodate missing body mass index (BMI) data in the analysis of BMI percentiles using National Health and Nutrition Examination Survey (NHANES) III data. We argue that the proposed methods offer an easy-to-implement solution to problems that are not well-handled by current MI techniques. Note that, while the proposed method borrows from the MI framework to develop its inferential methods, it is designed as an alternative strategy to release multiply imputed datasets for complex sample design data, but rather as an analytic strategy in and of itself.
Letter to the Editor: Probabilistic population forecasts for informed decision making
Bijak J, Alberts I, Alho J, Bryant J, Buettner T, Falkingham J, Forster JJ, Gerland P, King T, Onorante L, Keilman N, O'Hagan A, Owens D, Raftery A, Ševčíková H and Smith PW
Demographic forecasts are inherently uncertain. Nevertheless, an appropriate description of this uncertainty is a key underpinning of informed decision making. In recent decades various methods have been developed to describe the uncertainty of future populations and their structures, but the uptake of such tools amongst the practitioners of official population statistics has been lagging behind. In this letter we revisit the arguments for the practical uses of uncertainty assessments in official population forecasts, and address their implications for decision making. We discuss essential challenges, both for the forecasters and forecast users, and make recommendations for the official statistics community.
Collecting Survey Data during Armed Conflict
Axinn WG, Ghimire D and Williams NE
Surveys provide crucial information about the social consequences of armed conflict, but armed conflict can shape surveys in ways that limit their value. We use longitudinal survey data from throughout the recent armed conflict in Nepal to investigate the relationship between armed conflict events and survey response. The Chitwan Valley Family Study (CVFS) provides a rare window into survey data collection through intense armed conflict. The CVFS data reveal that with operational strategies tailored to the specific conflict, duration of the panel study is the main determinant of attrition from the study, just as in most longitudinal studies outside of conflict settings. Though minor relative to duration, different dimensions of armed conflict can affect survey response in opposing directions, with bombings in the local area reducing response rates but nationwide political events increasing response rates. This important finding demonstrates that survey data quality may be affected differently by various dimensions of armed conflict. Overall, CVFS response rates remained exceptionally high throughout the conflict. We use the CVFS experience to identify principles likely to produce higher quality surveys during periods of generalized violence and instability.
Experimental Studies of Disclosure Risk, Disclosure Harm, Topic Sensitivity, and Survey Participation
Couper MP, Singer E, Conrad FG and Groves RM
This article extends earlier work (Couper et al. 2008) that explores how survey topic and risk of identity and attribute disclosure, along with mention of possible harms resulting from such disclosure, affect survey participation. The first study uses web-based vignettes to examine respondents' expressed willingness to participate in the hypothetical surveys described, whereas the second study uses a mail survey to examine actual participation. Results are consistent with the earlier experiments. In general, we find that under normal survey conditions, specific information about the risk of identity or attribute disclosure influences neither respondents' expressed willingness to participate in a hypothetical survey nor their actual participation in a real survey. However, when the possible harm resulting from disclosure is made explicit, the effect on response becomes significant. In addition, sensitivity of the survey topic is a consistent and strong predictor of both expressed willingness to participate and actual participation.
Designing Input Fields for Non-Narrative Open-Ended Responses in Web Surveys
Couper MP, Kennedy C, Conrad FG and Tourangeau R
Web surveys often collect information such as frequencies, currency amounts, dates, or other items requiring short structured answers in an open-ended format, typically using text boxes for input. We report on several experiments exploring design features of such input fields. We find little effect of the size of the input field on whether frequency or dollar amount answers are well-formed or not. By contrast, the use of templates to guide formatting significantly improves the well-formedness of responses to questions eliciting currency amounts. For date questions (whether month/year or month/day/year), we find that separate input fields improve the quality of responses over single input fields, while drop boxes further reduce the proportion of ill-formed answers. Drop boxes also reduce completion time when the list of responses is short (e.g., months), but marginally increases completion time when the list is long (e.g., birth dates). These results suggest that non-narrative open questions can be designed to help guide respondents to provide answers in the desired format.
Keeping Track of Panel Members: An Experimental Test of a Between-Wave Contact Strategy
McGonagle K, Couper M and Schoeni RF
The Panel Study of Income Dynamics (PSID) is a nationally representative longitudinal survey of approximately 9,000 families and their descendants that has been ongoing since 1968. Since 1969, families have been sent a mailing asking them to update or verify their contact information to keep track of their whereabouts between waves. Having updated contact information prior to data collection is associated with fewer call attempts and refusal conversion efforts, less tracking, and lower attrition. Given these apparent advantages, a study was designed in advance of the 2009 PSID field effort to improve the response rate of the contact update mailing. Families were randomly assigned to the following conditions: mailing design (traditional versus new), $10 as a prepaid versus postpaid incentive, timing and frequency of the mailing (July 2008 versus October 2008 versus both times) and whether or not they were sent a study newsletter. This paper reports on findings with regards to response rates to the mailing and the effect on production outcomes including tracking rates and number of calls during 2009 by these different conditions, examines whether the treatment effects differ by key characteristics of panel members including likelihood of moving and anticipated difficulty in completing an interview, and provides some recommendations for the use of contact update strategies in panel studies.
The Effects of a Between-Wave Incentive Experiment on Contact Update and Production Outcomes in a Panel Study
McGonagle KA, Schoeni RF and Couper MP
Since 1969, families participating in the U.S. Panel Study of Income Dynamics (PSID) have been sent a mailing asking them to update or verify their contact information in order to keep track of their whereabouts between waves. Having updated contact information prior to data collection is associated with fewer call attempts, less tracking, and lower attrition. Based on these advantages, two experiments were designed to increase response rates to the between-wave contact mailing. The first experiment implemented a new protocol that increased the overall response rate by 7 - 10 percentage points compared to the protocol in place for decades on the PSID. This article provides results from the second experiment which examines the basic utility of the between-wave mailing, investigates how incentives affect article cooperation to the update request and field effort, and attempts to identify an optimal incentive amount. Recommendations for the use of contact update strategies in panel studies are made.
A Note on the Effect of Data Clustering on the Multiple-Imputation Variance Estimator: A Theoretical Addendum to ,
He Y, Shimizu I, Schappert S, Xu J, Beresovsky V, Khan D, Valverde R and Schenker N
Multiple imputation is a popular approach to handling missing data. Although it was originally motivated by survey nonresponse problems, it has been readily applied to other data settings. However, its general behavior still remains unclear when applied to survey data with complex sample designs, including clustering. Recently, Lewis et al. (2014) compared single- and multiple-imputation analyses for certain incomplete variables in the 2008 National Ambulatory Medicare Care Survey, which has a nationally representative, multistage, and clustered sampling design. Their study results suggested that the increase of the variance estimate due to multiple imputation compared with single imputation largely disappears for estimates with large design effects. We complement their empirical research by providing some theoretical reasoning. We consider data sampled from an equally weighted, single-stage cluster design and characterize the process using a balanced, one-way normal random-effects model. Assuming that the missingness is completely at random, we derive analytic expressions for the within- and between-multiple-imputation variance estimators for the mean estimator, and thus conveniently reveal the impact of design effects on these variance estimators. We propose approximations for the fraction of missing information in clustered samples, extending previous results for simple random samples. We discuss some generalizations of this research and its practical implications for data release by statistical agencies.
Asking about Sexual Identity on the National Health Interview Survey: Does Mode Matter?
Dahlhamer JM, Galinsky AM and Joestl SS
Privacy, achieved through self-administered modes of interviewing, has long been assumed to be a necessary prerequisite for obtaining unbiased responses to sexual identity questions due to their potentially sensitive nature. This study uses data collected as part of a split-ballot field test embedded in the National Health Interview Survey (NHIS) to examine the association between survey mode (computer-assisted personal interviewing (CAPI) versus audio computer-assisted self-interviewing (ACASI)) and sexual minority identity reporting. Bivariate and multivariate quantitative analyses tested for differences in sexual minority identity reporting and non-response by survey mode, as well as for moderation of such differences by sociodemographic characteristics and interviewing environment. No significant main effects of interview mode on sexual minority identity reporting or nonresponse were found. Two significant mode effects emerged in subgroup analyses of sexual minority status out of 35 comparisons, and one significant mode effect emerged in subgroup analyses of item nonresponse. We conclude that asking the NHIS sexual identity question using CAPI does not result in estimates that differ systematically and meaningfully from those produced using ACASI.