Using Auxiliary Information in Probability Survey Data to Improve Pseudo-Weighting in Nonprobability Samples: A Copula Model Approach
While probability sampling has been considered the gold standard of survey methods, nonprobability sampling is increasingly popular due to its convenience and low cost. However, nonprobability samples can lead to biased estimates due to the unknown nature of the underlying selection mechanism. In this article, we propose parametric and semiparametric approaches to integrate probability and nonprobability samples using common ancillary variables observed in both samples. In the parametric approach, the joint distribution of ancillary variables is assumed to follow the latent Gaussian copula model, which is flexible to accommodate both categorical and continuous variables. In contrast, the semiparametric approach requires no assumptions about the distribution of ancillary variables. In addition, logistic regression is used to model the mechanism by which population units enter the nonprobability sample. The unknown parameters in the copula model are estimated through the pseudo maximum likelihood approach. The logistic regression model is estimated by maximizing the sample likelihood constructed from the nonprobability sample. The proposed method is evaluated in the context of estimating the population mean. Our simulation results show that the proposed method is able to correct the selection bias in the nonprobability sample by consistently estimating the underlying inclusion mechanism. By incorporating additional information in the nonprobability sample, the combined method can estimate the population mean more efficiently than using the probability sample alone. A real-data application is provided to illustrate the practical use of the proposed method.
Proxy Survey Cost Indicators in Interviewer-Administered Surveys: Are they Actually Correlated with Costs?
Survey design decisions are-by their very nature-tradeoffs between costs and errors. However, measuring costs is often difficult. Furthermore, surveys are growing more complex. Many surveys require that cost information be available to make decisions during data collection. These complexities create new challenges for monitoring and understanding survey costs. Often, survey cost information lags behind reporting of paradata. Furthermore, in some situations, the measurement of costs at the case level is difficult. Given the time lag in reporting cost information and the difficulty of assigning costs directly to cases, survey designers and managers have frequently turned to proxy indicators for cost. These proxy measures are often based upon level-of-effort paradata. An example of such a proxy cost indicator is the number of attempts per interview. Unfortunately, little is known about how accurately these proxy indicators actually mirror the true costs of the survey. In this article, we examine a set of these proxy indicators across several surveys with different designs, including different modes of interview. We examine the strength of correlation between these indicators and two different measures of costs-the total project cost and total interviewer hours. This article provides some initial evidence about the quality of these proxies as surrogates for the true costs using data from several different surveys with interviewer-administered modes (telephone, face to face) across three organizations (University of Michigan's Survey Research Center, Westat, US Census Bureau). We find that some indicators (total attempts, total contacts, total completes, sample size) are correlated (average correlation ∼0.60) with total costs across several surveys. These same indicators are strongly correlated (average correlation ∼0.82) with total interviewer hours. For survey components, three indicators (total attempts, sample size, and total miles) are strongly correlated with both total costs (average correlation ∼0.77) and with total interviewer hours (average correlation ∼0.86).
Interviewer Effects on the Measurement of Physical Performance in a Cross-National Biosocial Survey
Biosocial surveys increasingly use interviewers to collect objective physical health measures (or "biomeasures") in respondents' homes. While interviewers play an important role, their high involvement can lead to unintended interviewer effects on the collected measurements. Such interviewer effects add uncertainty to population estimates and have the potential to lead to erroneous inferences. This study examines interviewer effects on the measurement of physical performance in a cross-national and longitudinal setting using data from the Survey of Health, Ageing and Retirement in Europe. The analyzed biomeasures exhibited moderate-to-large interviewer effects on the measurements, which varied across biomeasure types and across countries. Our findings demonstrate the necessity to better understand the origin of interviewer-related measurement errors in biomeasure collection and account for these errors in statistical analyses of biomeasure data.
Incorporating Adaptive Survey Design in a Two-Stage National Web or Mail Mixed-Mode Survey: An Experiment in the American Family Health Study
This article presents the results of an adaptive design experiment in the recruitment of households and individuals for a two-stage national probability web or mail mixed-mode survey, the American Family Health Study (AFHS). In the screening stage, we based the adaptive design's subgroup differentiation on segmentation. We used tailored invitation materials for a subsample where a high proportion of the population was Hispanic and added a paper questionnaire to the initial mailing for a subsample with rural and older families. In the main-survey stage, the adaptive design targeted the households where a member other than the screening respondent was selected for the survey. The adaptations included emailing and/or texting, an additional prepaid incentive, and seeking screening respondents' help to remind the selected individuals. The main research questions are (i) whether the adaptive design improved survey production outcomes and (ii) whether combining adaptive design and postsurvey weighting adjustments improved survey estimates compared to performing postsurvey adjustments alone. Unfortunately, the adaptive designs did not improve the survey production outcomes. We found that the weighted AFHS estimates closely resemble those of a benchmark national face-to-face survey, the National Survey of Family Growth, although the adaptive design did not additionally change survey estimates beyond the weighting adjustments. Nonetheless, our experiment yields useful insights about the implementation of adaptive design in a self-administered mail-recruit web or mail survey. We were able to identify subgroups with potentially lower response rates and distinctive characteristics, but it was challenging to develop effective protocol adaptations for these subgroups under the constraints of the two primary survey modes and the operational budget of the AFHS. In addition, for self-administered within-household selection, it was difficult to obtain contact information from, reach, and recruit selected household members that did not respond to the screening interview.
Joint Imputation of General Data
High-dimensional complex survey data of general structures (e.g., containing continuous, binary, categorical, and ordinal variables), such as the US Department of Defense's Health-Related Behaviors Survey (HRBS), often confound procedures designed to impute any missing survey data. Imputation by fully conditional specification (FCS) is often considered the state of the art for such datasets due to its generality and flexibility. However, FCS procedures contain a theoretical flaw that is exposed by HRBS data-HRBS imputations created with FCS are shown to diverge across iterations of Markov Chain Monte Carlo. Imputation by joint modeling lacks this flaw; however, current joint modeling procedures are neither general nor flexible enough to handle HRBS data. As such, we introduce an algorithm that efficiently and flexibly applies multiple imputation by joint modeling in data of general structures. This procedure draws imputations from a latent joint multivariate normal model that underpins the generally structured data and models the latent data via a sequence of conditional linear models, the predictors of which can be specified by the user. We perform rigorous evaluations of HRBS imputations created with the new algorithm and show that they are convergent and of high quality. Lastly, simulations verify that the proposed method performs well compared to existing algorithms including FCS.
Visible Cash, a Second Incentive, and Priority Mail? An Experimental Evaluation of Mailing Strategies for a Screening Questionnaire in a National Push-to-Web/Mail Survey
In push-to-web surveys that use postal mail to contact sampled cases, participation is contingent on the mail being opened and the survey invitations being delivered. The design of the mailings is crucial to the success of the survey. We address the question of how to design invitation mailings that can grab potential respondents' attention and sway them to be interested in the survey in a short window of time. In the household screening stage of a national survey, the American Family Health Study, we experimentally tested three mailing design techniques for recruiting respondents: (1) a visible cash incentive in the initial mailing, (2) a second incentive for initial nonrespondents, and (3) use of Priority Mail in the nonresponse follow-up mailing. We evaluated the three techniques' overall effects on response rates as well as how they differentially attracted respondents with different characteristics. We found that all three techniques were useful in increasing the screening response rates, but there was little evidence that they had differential effects on sample subgroups that could help to reduce nonresponse biases.
Dependence-Robust Confidence Intervals for Capture-Recapture Surveys
Capture-recapture (CRC) surveys are used to estimate the size of a population whose members cannot be enumerated directly. CRC surveys have been used to estimate the number of Coronavirus Disease 2019 (COVID-19) infections, people who use drugs, sex workers, conflict casualties, and trafficking victims. When -capture samples are obtained, counts of unit captures in subsets of samples are represented naturally by a contingency table in which one element-the number of individuals appearing in none of the samples-remains unobserved. In the absence of additional assumptions, the population size is not identifiable (i.e., point identified). Stringent assumptions about the dependence between samples are often used to achieve point identification. However, real-world CRC surveys often use convenience samples in which the assumed dependence cannot be guaranteed, and population size estimates under these assumptions may lack empirical credibility. In this work, we apply the theory of partial identification to show that weak assumptions or qualitative knowledge about the nature of dependence between samples can be used to characterize a nontrivial confidence set for the true population size. We construct confidence sets under bounds on pairwise capture probabilities using two methods: test inversion bootstrap confidence intervals and profile likelihood confidence intervals. Simulation results demonstrate well-calibrated confidence sets for each method. In an extensive real-world study, we apply the new methodology to the problem of using heterogeneous survey data to estimate the number of people who inject drugs in Brussels, Belgium.
Estimating Web Survey Mode and Panel Effects in a Nationwide Survey of Alcohol Use
Random-digit dialing (RDD) telephone surveys are challenged by declining response rates and increasing costs. Many surveys that were traditionally conducted via telephone are seeking cost-effective alternatives, such as address-based sampling (ABS) with self-administered web or mail questionnaires. At a fraction of the cost of both telephone and ABS surveys, opt-in web panels are an attractive alternative. The 2019-2020 National Alcohol Survey (NAS) employed three methods: (1) an RDD telephone survey (traditional NAS method); (2) an ABS push-to-web survey; and (3) an opt-in web panel. The study reported here evaluated differences in the three data-collection methods, which we will refer to as "mode effects," on alcohol consumption and health topics. To evaluate mode effects, multivariate regression models were developed predicting these characteristics, and the presence of a mode effect on each outcome was determined by the significance of the three-level effect (RDD-telephone, ABS-web, opt-in web panel) in each model. Those results were then used to adjust for mode effects and produce a "telephone-equivalent" estimate for the ABS and panel data sources. The study found that ABS-web and RDD were similar for most estimates but exhibited differences for sensitive questions including getting drunk and experiencing depression. The opt-in web panel exhibited more differences between it and the other two survey modes. One notable example is the reporting of drinking alcohol at least 3-4 times per week, which was 21 percent for RDD-phone, 24 percent for ABS-web, and 34 percent for opt-in web panel. The regression model adjusts for mode effects, improving comparability with past surveys conducted by telephone; however, the models result in higher variance of the estimates. This method of adjusting for mode effects has broad applications to mode and sample transitions throughout the survey research industry.
THE EFFECTS OF A TARGETED "EARLY BIRD" INCENTIVE STRATEGY ON RESPONSE RATES, FIELDWORK EFFORT, AND COSTS IN A NATIONAL PANEL STUDY
Adaptive survey designs are increasingly used by survey practitioners to counteract ongoing declines in household survey response rates and manage rising fieldwork costs. This paper reports findings from an evaluation of an early-bird incentive (EBI) experiment targeting high-effort respondents who participate in the 2019 wave of the US Panel Study of Income Dynamics. We identified a subgroup of high-effort respondents at risk of nonresponse based on their prior wave fieldwork effort and randomized them to a treatment offering an extra time-delimited monetary incentive for completing their interview within the first month of data collection (treatment group; = 800) or the standard study incentive (control group; = 400). In recent waves, we have found that the costs of the protracted fieldwork needed to complete interviews with high-effort cases in the form of interviewer contact attempts plus an increased incentive near the close of data collection are extremely high. By incentivizing early participation and reducing the number of interviewer contact attempts and fieldwork days to complete the interview, our goal was to manage both nonresponse and survey costs. We found that the EBI treatment increased response rates and reduced fieldwork effort and costs compared to a control group. We review several key findings and limitations, discuss their implications, and identify the next steps for future research.
Deriving Priors for Bayesian Prediction of Daily Response Propensity in Responsive Survey Design: Historical Data Analysis Versus Literature Review
Responsive survey design (RSD) aims to increase the efficiency of survey data collection via live monitoring of paradata and the introduction of protocol changes when survey errors and increased costs seem imminent. Daily predictions of response propensity for all active sampled cases are among the most important quantities for live monitoring of data collection outcomes, making sound predictions of these propensities essential for the success of RSD. Because it relies on real-time updates of prior beliefs about key design quantities, such as predicted response propensities, RSD stands to benefit from Bayesian approaches. However, empirical evidence of the merits of these approaches is lacking in the literature, and the derivation of informative prior distributions is required for these approaches to be effective. In this paper, we evaluate the ability of two approaches to deriving prior distributions for the coefficients defining daily response propensity models to improve predictions of daily response propensity in a real data collection employing RSD. The first approach involves analyses of historical data from the same survey, and the second approach involves literature review. We find that Bayesian methods based on these two approaches result in higher-quality predictions of response propensity than more standard approaches ignoring prior information. This is especially true during the early-to-middle periods of data collection, when survey managers using RSD often consider interventions.
Inference from Nonrandom Samples Using Bayesian Machine Learning
We consider inference from nonrandom samples in data-rich settings where high-dimensional auxiliary information is available both in the sample and the target population, with survey inference being a special case. We propose a regularized prediction approach that predicts the outcomes in the population using a large number of auxiliary variables such that the ignorability assumption is reasonable and the Bayesian framework is straightforward for quantification of uncertainty. Besides the auxiliary variables, we also extend the approach by estimating the propensity score for a unit to be included in the sample and also including it as a predictor in the machine learning models. We find in simulation studies that the regularized predictions using soft Bayesian additive regression trees yield valid inference for the population means and coverage rates close to the nominal levels. We demonstrate the application of the proposed methods using two different real data applications, one in a survey and one in an epidemiologic study.
An Experimental Evaluation of Two Approaches for Improving Response to Household Screening Efforts in National Mail/Web Surveys
Survey researchers have carefully modified their data collection operations for various reasons, including the rising costs of data collection and the ongoing Coronavirus disease (COVID-19) pandemic, both of which have made in-person interviewing difficult. For large national surveys that require household (HH) screening to determine survey eligibility, cost-efficient screening methods that do not include in-person visits need additional evaluation and testing. A new study, known as the American Family Health Study (AFHS), recently initiated data collection with a national probability sample, using a sequential mixed-mode mail/web protocol for push-to-web US HH screening (targeting persons aged 18-49 years). To better understand optimal approaches for this type of national screening effort, we embedded two randomized experiments in the AFHS data collection. The first tested the use of bilingual respondent materials where mailed invitations to the screener were sent in both English and Spanish to 50 percent of addresses with a high predicted likelihood of having a Spanish speaker and 10 percent of all other addresses. We found that the bilingual approach did not increase the response rate of high-likelihood Spanish-speaking addresses, but consistent with prior work, it increased the proportion of eligible Hispanic respondents identified among completed screeners, especially among addresses predicted to have a high likelihood of having Spanish speakers. The second tested a form of nonresponse follow-up, where a subsample of active sampled HHs that had not yet responded to the screening invitations was sent a priority mailing with a $5 incentive, adding to the $2 incentive provided for all sampled HHs in the initial screening invitation. We found this approach to be quite valuable for increasing the screening survey response rate.
Multiple Imputation with Massive Data: An Application to the Panel Study of Income Dynamics
Multiple imputation (MI) is a popular and well-established method for handling missing data in multivariate data sets, but its practicality for use in massive and complex data sets has been questioned. One such data set is the Panel Study of Income Dynamics (PSID), a longstanding and extensive survey of household income and wealth in the United States. Missing data for this survey are currently handled using traditional hot deck methods because of the simple implementation; however, the univariate hot deck results in large random wealth fluctuations. MI is effective but faced with operational challenges. We use a sequential regression/chained-equation approach, using the software IVEware, to multiply impute cross-sectional wealth data in the 2013 PSID, and compare analyses of the resulting imputed data with those from the current hot deck approach. Practical difficulties, such as non-normally distributed variables, skip patterns, categorical variables with many levels, and multicollinearity, are described together with our approaches to overcoming them. We evaluate the imputation quality and validity with internal diagnostics and external benchmarking data. MI produces improvements over the existing hot deck approach by helping preserve correlation structures, such as the associations between PSID wealth components and the relationships between the household net worth and sociodemographic factors, and facilitates completed data analyses with general purposes. MI incorporates highly predictive covariates into imputation models and increases efficiency. We recommend the practical implementation of MI and expect greater gains when the fraction of missing information is large.
Using Capture-Recapture Methodology to Enhance Precision of Representative Sampling-Based Case Count Estimates
The application of serial principled sampling designs for diagnostic testing is often viewed as an ideal approach to monitoring prevalence and case counts of infectious or chronic diseases. Considering logistics and the need for timeliness and conservation of resources, surveillance efforts can generally benefit from creative designs and accompanying statistical methods to improve the precision of sampling-based estimates and reduce the size of the necessary sample. One option is to augment the analysis with available data from other surveillance streams that identify cases from the population of interest over the same timeframe, but may do so in a highly nonrepresentative manner. We consider monitoring a closed population (e.g., a long-term care facility, patient registry, or community), and encourage the use of capture-recapture methodology to produce an alternative case total estimate to the one obtained by principled sampling. With care in its implementation, even a relatively small simple or stratified random sample not only provides its own valid estimate, but provides the only fully defensible means of justifying a second estimate based on classical capture-recapture methods. We initially propose weighted averaging of the two estimators to achieve greater precision than can be obtained using either alone, and then show how a novel single capture-recapture estimator provides a unified and preferable alternative. We develop a variant on a Dirichlet-multinomial-based credible interval to accompany our hybrid design-based case count estimates, with a view toward improved coverage properties. Finally, we demonstrate the benefits of the approach through simulations designed to mimic an acute infectious disease daily monitoring program or an annual surveillance program to quantify new cases within a fixed patient registry.
A Simple Question Goes a Long Way: A Wording Experiment on Bank Account Ownership
Ownership of a bank account is an objective measure and should be relatively easy to elicit via survey questions. Yet, depending on the interview mode, the wording of the question and its placement within the survey may influence respondents' answers. The Health and Retirement Study (HRS) asset module, as administered online to members of the Understanding America Study (UAS), yielded substantially lower rates of reported bank account ownership than either a single question on ownership in the Current Population Survey (CPS) or the full asset module administered to HRS panelists (both interviewer-administered surveys). We designed and implemented an experiment in the UAS comparing the original HRS question eliciting bank account ownership with two alternative versions that were progressively simplified. We document strong evidence that the original question leads to systematic underestimation of bank account ownership. In contrast, the proportion of bank account owners obtained from the simplest alternative version of the question is very similar to the population benchmark estimate. We investigate treatment effect heterogeneity by cognitive ability and financial literacy. We find that questionnaire simplification affects responses of individuals with higher cognitive ability substantially less than those with lower cognitive ability. Our results suggest that high-quality data from surveys start from asking the right questions, which should be as simple and precise as possible and carefully adapted to the mode of interview.
A SEMIPARAMETRIC MULTIPLE IMPUTATION APPROACH TO FULLY SYNTHETIC DATA FOR COMPLEX SURVEYS
Data synthesis is an effective statistical approach for reducing data disclosure risk. Generating fully synthetic data might minimize such risk, but its modeling and application can be difficult for data from large, complex surveys. This article extended the two-stage imputation to simultaneously impute item missing values and generate fully synthetic data. A new combining rule for making inferences using data generated in this manner was developed. Two semiparametric missing data imputation models were adapted to generate fully synthetic data for skewed continuous variable and sparse binary variable, respectively. The proposed approach was evaluated using simulated data and real longitudinal data from the Health and Retirement Study. The proposed approach was also compared with two existing synthesis approaches: (1) parametric regressions models as implemented in ; and (2) nonparametric Classification and Regression Trees as implemented in package for R using real data. The results show that high data utility is maintained for a wide variety of descriptive and model-based statistics using the proposed strategy. The proposed strategy also performs better than existing methods for sophisticated analyses such as factor analysis.
INTERVIEWER EFFECTS IN LIVE VIDEO AND PRERECORDED VIDEO INTERVIEWING
Live video (LV) communication tools (e.g., Zoom) have the potential to provide survey researchers with many of the benefits of in-person interviewing, while also greatly reducing data collection costs, given that interviewers do not need to travel and make in-person visits to sampled households. The COVID-19 pandemic has exposed the vulnerability of in-person data collection to public health crises, forcing survey researchers to explore remote data collection modes-such as LV interviewing-that seem likely to yield high-quality data without in-person interaction. Given the potential benefits of these technologies, the operational and methodological aspects of video interviewing have started to receive research attention from survey methodologists. Although it is remote, video interviewing still involves respondent-interviewer interaction that introduces the possibility of interviewer effects. No research to date has evaluated this potential threat to the quality of the data collected in video interviews. This research note presents an evaluation of interviewer effects in a recent experimental study of alternative approaches to video interviewing including both LV interviewing and the use of prerecorded videos of the same interviewers asking questions embedded in a web survey ("prerecorded video" interviewing). We find little evidence of significant interviewer effects when using these two approaches, which is a promising result. We also find that when interviewer effects were present, they tended to be slightly larger in the LV approach as would be expected in light of its being an interactive approach. We conclude with a discussion of the implications of these findings for future research using video interviewing.
Challenges of virtual RDS for recruitment of sexual minority women for a behavioral health study
Respondent driven sampling (RDS) is an approach commonly used to recruit nonprobability samples of rare and hard-to-find populations. The purpose of this study was to explore the utility of phone and web-based RDS methodology to sample sexual minority women (SMW) for participation in a telephone survey. Key features included 1) utilizing a national probability survey sample to select seeds; 2) web-based recruitment with emailed coupons; and 3) virtual processes for orienting, screening and scheduling potential participants for computer-assisted telephone interviews. Rather than resulting in a large diverse sample of SMW, only a small group of randomly selected women completed the survey and agreed to recruit their peers, and very few women recruited even one participant. Only seeds from the most recent of two waves of the probability study generated new SMW recruits. Three RDS attempts to recruit SMW over several years and findings from brief qualitative interviews revealed four key challenges to successful phone and web-based RDS with this population. First, population-based sampling precludes sampling based on participant characteristics that are often used in RDS. Second, methods that distance prospective participants from the research team may impede development of relationships, investment in the study, and motivation to participate. Third, recruitment for telephone surveys may be impeded by multiple burdens on seeds and recruits (e.g., survey length, understanding the study and RDS process). Finally, many seeds from a population-based sample may be needed, which is not generally feasible when working with a limited pool of potential seeds. This method may yield short recruitment chains, which would not meet key RDS assumptions for approximation of a probability sample. In conclusion, potential challenges to using RDS in studies with SMW, particularly those using virtual approaches, should be considered.
On The Robustness Of Respondent-Driven Sampling Estimators To Measurement Error
Respondent-driven sampling (RDS) is a popular method of conducting surveys in hard to reach populations where strong assumptions are required in order to make valid statistical inferences. In this paper we investigate the assumption that network degrees are measured accurately by the RDS survey and find that there is likely significant measurement error present in typical studies. We prove that most RDS estimators remain consistent under an imperfect measurement model with little to no added bias, though the variance of the estimators does increase.
DEALING WITH INACCURATE MEASURES OF SIZE IN TWO-STAGE PROBABILITY PROPORTIONAL TO SIZE SAMPLE DESIGNS: APPLICATIONS IN AFRICAN HOUSEHOLD SURVEYS
The units at the early stages of multi-stage area samples are generally sampled with probabilities proportional to their estimated sizes (PPES). With such a design, an overall equal probability (EP) sample design would yield a constant number of final stage units from each final stage cluster if the measures of size used in the PPES selection at each sampling stage were directly proportional to the number of final stage units. However, there are often sizable relative differences between the measures of size used in the PPES selections and the number of final stage units. Two common approaches for dealing with these differences are: (1) to retain a self-weighting sample design, allowing the sample sizes to vary across the sampled primary sampling units (PSUs) and (2) to retain the fixed sample size in each PSU and to compensate for the unequal selection probabilities by weighting adjustments in the analyses. This article examines these alternative designs in the context of two-stage sampling in which PSUs are sampled with PPES at the first stage, and an equal probability sample of final stage units is selected from each sampled PSU at the second stage. Two-stage sample designs of this type are used for household surveys in many countries. The discussion is illustrated with data from the Population-based HIV Impact Assessment surveys that were conducted using this design in several African countries.
Comparing Methods for Assessing Reliability
The usual method for assessing the reliability of survey data has been to conduct reinterviews a short interval (such as one to two weeks) after an initial interview and to use these data to estimate relatively simple statistics, such as gross difference rates (GDRs). More sophisticated approaches have also been used to estimate reliability. These include estimates from multi-trait, multi-method experiments, models applied to longitudinal data, and latent class analyses. To our knowledge, no prior study has systematically compared these different methods for assessing reliability. The Population Assessment of Tobacco and Health Reliability and Validity (PATH-RV) Study, done on a national probability sample, assessed the reliability of answers to the Wave 4 questionnaire from the PATH Study. Respondents in the PATH-RV were interviewed twice about two weeks apart. We examined whether the classic survey approach yielded different conclusions from the more sophisticated methods. We also examined two methods for assessing problems with survey questions and item nonresponse rates and response times to see how strongly these related to the different reliability estimates. We found that kappa was highly correlated with both GDRs and over-time correlations, but the latter two statistics were less highly correlated, particularly for adult respondents; estimates from longitudinal analyses of the same items in the main PATH study were also highly correlated with the traditional reliability estimates. The latent class analysis results, based on fewer items, also showed a high level of agreement with the traditional measures. The other methods and indicators had at best weak relationships with the reliability estimates derived from the reinterview data. Although the Question Understanding Aid seems to tap a different factor from the other measures, for adult respondents, it did predict item nonresponse and response latencies and thus may be a useful adjunct to the traditional measures.