Crud (Re)Defined
The idea that in behavioral research everything correlates with everything else was a niche area of the scientific literature for more than half a century. With the increasing availability of large data sets in psychology, the "crud" factor has, however, become more relevant than ever before. When referenced in empirical work, it is often used by researchers to discount minute-but statistically significant-effects that are deemed too small to be considered meaningful. This review tracks the history of the crud factor and examines how its use in the psychological- and behavioral-science literature has developed to this day. We highlight a common and deep-seated lack of understanding about what the crud factor is and discuss whether it can be proven to exist or estimated and how it should be interpreted. This lack of understanding makes the crud factor a convenient tool for psychologists to use to disregard unwanted results, even though the presence of a crud factor should be a large inconvenience for the discipline. To inspire a concerted effort to take the crud factor more seriously, we clarify the definitions of important concepts, highlight current pitfalls, and pose questions that need to be addressed to ultimately improve understanding of the crud factor. Such work will be necessary to develop the crud factor into a useful concept encouraging improved psychological research.
Improving Practices for Selecting a Subset of Important Predictors in Psychology: An Application to Predicting Pain
Frequently, researchers in psychology are faced with the challenge of narrowing down a large set of predictors to a smaller subset. There are a variety of ways to do this, but commonly it is done by choosing predictors with the strongest bivariate correlations with the outcome. However, when predictors are correlated, bivariate relationships may not translate into multivariate relationships. Further, any attempts to control for multiple testing are likely to result in extremely low power. Here we introduce a Bayesian variable-selection procedure frequently used in other disciplines, stochastic search variable selection (SSVS). We apply this technique to choosing the best set of predictors of the perceived unpleasantness of an experimental pain stimulus from among a large group of sociocultural, psychological, and neurobiological (functional MRI) individual-difference measures. Using SSVS provides information about which variables predict the outcome, controlling for uncertainty in the other variables of the model. This approach yields new, useful information to guide the choice of relevant predictors. We have provided Web-based open-source software for performing SSVS and visualizing the results.
Recommendations for Increasing the Transparency of Analysis of Preexisting Data Sets
Secondary data analysis, or the analysis of preexisting data, provides a powerful tool for the resourceful psychological scientist. Never has this been more true than now, when technological advances enable both sharing data across labs and continents and mining large sources of preexisting data. However, secondary data analysis is easily overlooked as a key domain for developing new open-science practices or improving analytic methods for robust data analysis. In this article, we provide researchers with the knowledge necessary to incorporate secondary data analysis into their methodological toolbox. We explain that secondary data analysis can be used for either exploratory or confirmatory work, and can be either correlational or experimental, and we highlight the advantages and disadvantages of this type of research. We describe how transparency-enhancing practices can improve and alter interpretations of results from secondary data analysis and discuss approaches that can be used to improve the robustness of reported results. We close by suggesting ways in which scientific subfields and institutions could address and improve the use of secondary data analysis.
On the Interpretation of Parameters in Multivariate Multilevel Models Across Different Combinations of Model Specification and Estimation
The increasing availability of software with which to estimate multivariate multilevel models (also called multilevel structural equation models) makes it easier than ever before to leverage these powerful techniques to answer research questions at multiple levels of analysis simultaneously. However, interpretation can be tricky given that different choices for centering model predictors can lead to different versions of what appear to be the same parameters; this is especially the case when the predictors are latent variables created through model-estimated variance components. A further complication is a recent change to Mplus (Version 8.1), a popular software program for estimating multivariate multilevel models, in which the selection of Bayesian estimation instead of maximum likelihood results in different lower-level predictors when random slopes are requested. This article provides a detailed explication of how the parameters of multilevel models differ as a function of the analyst's decisions regarding centering and the form of lower-level predictors (i.e., observed or latent), the method of estimation, and the variant of program syntax used. After explaining how different methods of centering lower-level observed predictor variables result in different higher-level effects within univariate multilevel models, this article uses simulated data to demonstrate how these same concepts apply in specifying multivariate multilevel models with latent lower-level predictor variables. Complete data, input, and output files for all of the example models have been made available online to further aid readers in accurately translating these central tenets of multivariate multilevel modeling into practice.
A Practical Guide to Variable Selection in Structural Equation Models with Regularized MIMIC Models
Methodological innovations have allowed researchers to consider increasingly sophisticated statistical models that are better in line with the complexities of real world behavioral data. However, despite these powerful new analytic approaches, sample sizes may not always be sufficiently large to deal with the increase in model complexity. This poses a difficult modeling scenario that entails large models with a comparably limited number of observations given the number of parameters. We here describe a particular strategy to overcoming this challenge, called . Regularization, a method to penalize model complexity during estimation, has proven a viable option for estimating parameters in this small n, large p setting, but has so far mostly been used in linear regression models. Here we show how to integrate regularization within structural equation models, a popular analytic approach in psychology. We first describe the rationale behind regularization in regression contexts, and how it can be extended to regularized structural equation modeling (Jacobucci, Grimm, & McArdle, 2016). Our approach is evaluated through the use of a simulation study, showing that regularized SEM outperforms traditional SEM estimation methods in situations with a large number of predictors and small sample size. We illustrate the power of this approach in two empirical examples: modeling the neural determinants of visual short term memory, as well as identifying demographic correlates of stress, anxiety and depression. We illustrate the performance of the method and discuss practical aspects of modeling empirical data, and provide a step-by-step online tutorial.
The Psychological Science Accelerator: Advancing Psychology through a Distributed Collaborative Network
Concerns have been growing about the veracity of psychological research. Many findings in psychological science are based on studies with insufficient statistical power and nonrepresentative samples, or may otherwise be limited to specific, ungeneralizable settings or populations. Crowdsourced research, a type of large-scale collaboration in which one or more research projects are conducted across multiple lab sites, offers a pragmatic solution to these and other current methodological challenges. The Psychological Science Accelerator (PSA) is a distributed network of laboratories designed to enable and support crowdsourced research projects. These projects can focus on novel research questions, or attempt to replicate prior research, in large, diverse samples. The PSA's mission is to accelerate the accumulation of reliable and generalizable evidence in psychological science. Here, we describe the background, structure, principles, procedures, benefits, and challenges of the PSA. In contrast to other crowdsourced research networks, the PSA is ongoing (as opposed to time-limited), efficient (in terms of re-using structures and principles for different projects), decentralized, diverse (in terms of participants and researchers), and inclusive (of proposals, contributions, and other relevant input from anyone inside or outside of the network). The PSA and other approaches to crowdsourced psychological science will advance our understanding of mental processes and behaviors by enabling rigorous research and systematically examining its generalizability.
Improving Present Practices in the Visual Display of Interactions
Interaction plots are used frequently in psychology research to make inferences about moderation hypotheses. A common method of analyzing and displaying interactions is to create simple-slopes or marginal-effects plots using standard software programs. However, these plots omit features that are essential to both graphic integrity and statistical inference. For example, they often do not display all quantities of interest, omit information about uncertainty, or do not show the observed data underlying an interaction, and failure to include these features undermines the strength of the inferences that may be drawn from such displays. Here, we review the strengths and limitations of present practices in analyzing and visualizing interaction effects in psychology. We provide simulated examples of the conditions under which visual displays may lead to inappropriate inferences and introduce open-source software that provides optimized utilities for analyzing and visualizing interactions.
Practical Solutions for Sharing Data and Materials From Psychological Research
Widespread sharing of data and materials (including displays and text- and video-based descriptions of experimental procedures) will improve the reproducibility of psychological science and accelerate the pace of discovery. In this article, we discuss some of the challenges to open sharing and offer practical solutions for researchers who wish to share more of the products-and process-of their research. Many of these solutions were devised by the Databrary.org data library for storing and sharing video, audio, and other forms of sensitive or personally identifiable data. We also discuss ways in which researchers can make shared data and materials easier for others to find and reuse. Widely adopted, these solutions and practices will increase transparency and speed progress in psychological science.
Writing Empirical Articles: Transparency, Reproducibility, Clarity, and Memorability
This article provides recommendations for writing empirical journal articles that enable transparency, reproducibility, clarity, and memorability. Recommendations for transparency include preregistering methods, hypotheses, and analyses; submitting registered reports; distinguishing confirmation from exploration; and showing your warts. Recommendations for reproducibility include documenting methods and results fully and cohesively, by taking advantage of open--science tools, and citing sources responsibly. Recommendations for clarity include writing short paragraphs, composed of short sentences; writing comprehensive abstracts; and seeking feedback from a naive audience. Recommendations for memorability include writing narratively; embracing the hourglass shape of empirical articles; beginning articles with a hook; and synthesizing, rather than Mad Libbing, previous literature.
A multi-lab study of bilingual infants: Exploring the preference for infant-directed speech
From the earliest months of life, infants prefer listening to and learn better from infant-directed speech (IDS) than adult-directed speech (ADS). Yet, IDS differs within communities, across languages, and across cultures, both in form and in prevalence. This large-scale, multi-site study used the diversity of bilingual infant experiences to explore the impact of different types of linguistic experience on infants' IDS preference. As part of the multi-lab ManyBabies 1 project, we compared lab-matched samples of 333 bilingual and 385 monolingual infants' preference for North-American English IDS (cf. ManyBabies Consortium, 2020: ManyBabies 1), tested in 17 labs in 7 countries. Those infants were tested in two age groups: 6-9 months (the younger sample) and 12-15 months (the older sample). We found that bilingual and monolingual infants both preferred IDS to ADS, and did not differ in terms of the overall magnitude of this preference. However, amongst bilingual infants who were acquiring North-American English (NAE) as a native language, greater exposure to NAE was associated with a stronger IDS preference, extending the previous finding from ManyBabies 1 that monolinguals learning NAE as a native language showed a stronger preference than infants unexposed to NAE. Together, our findings indicate that IDS preference likely makes a similar contribution to monolingual and bilingual development, and that infants are exquisitely sensitive to the nature and frequency of different types of language input in their early environments.
Putting Psychology to the Test: Rethinking Model Evaluation Through Benchmarking and Prediction
Consensus on standards for evaluating models and theories is an integral part of every science. Nonetheless, in psychology, relatively little focus has been placed on defining reliable communal metrics to assess model performance. Evaluation practices are often idiosyncratic and are affected by a number of shortcomings (e.g., failure to assess models' ability to generalize to unseen data) that make it difficult to discriminate between good and bad models. Drawing inspiration from fields such as machine learning and statistical genetics, we argue in favor of introducing common benchmarks as a means of overcoming the lack of reliable model evaluation criteria currently observed in psychology. We discuss a number of principles benchmarks should satisfy to achieve maximal utility, identify concrete steps the community could take to promote the development of such benchmarks, and address a number of potential pitfalls and concerns that may arise in the course of implementation. We argue that reaching consensus on common evaluation benchmarks will foster cumulative progress in psychology and encourage researchers to place heavier emphasis on the practical utility of scientific models.
Evaluating Response Shift in Statistical Mediation Analysis
Researchers and prevention scientists often develop interventions to target intermediate variables (known as ) that are thought to be related to an outcome. When researchers target a mediating construct measured by self-report, the meaning of self-report measure could change from pretest to posttest for the individuals who received the intervention - which is a phenomenon referred to as . As a result, any observed changes on the mediator measure across groups or across time might reflect a combination of true change on the construct and response shift. Although previous studies have focused on identifying the source and type of response shift in measures after an intervention, there has been limited research on how using sum scores in the presence of response shift affects the estimation of mediated effects via statistical mediation analysis, which is critical for explaining how the intervention worked. In this paper, we focus on recalibration response shift, which is a change in internal standards of measurement, which affects how respondents interpret the response scale. We provide background on the theory of response shift and the methodology used to detect response shift (i.e., tests of measurement invariance). Additionally, we use simulated datasets to provide an illustration of how recalibration in the mediator can bias estimates of the mediated effect and also impact type I error and power.
Hybrid Experimental Designs for Intervention Development: What, Why, and How
Advances in mobile and wireless technologies offer tremendous opportunities for extending the reach and impact of psychological interventions and for adapting interventions to the unique and changing needs of individuals. However, insufficient engagement remains a critical barrier to the effectiveness of digital interventions. Human delivery of interventions (e.g., by clinical staff) can be more engaging but potentially more expensive and burdensome. Hence, the integration of digital and human-delivered components is critical to building effective and scalable psychological interventions. Existing experimental designs can be used to answer questions either about human-delivered components that are typically sequenced and adapted at relatively slow timescales (e.g., monthly) or about digital components that are typically sequenced and adapted at much faster timescales (e.g., daily). However, these methodologies do not accommodate sequencing and adaptation of components at multiple timescales and hence cannot be used to empirically inform the joint sequencing and adaptation of human-delivered and digital components. Here, we introduce the hybrid experimental design (HED)-a new experimental approach that can be used to answer scientific questions about building psychological interventions in which human-delivered and digital components are integrated and adapted at multiple timescales. We describe the key characteristics of HEDs (i.e., what they are), explain their scientific rationale (i.e., why they are needed), and provide guidelines for their design and corresponding data analysis (i.e., how can data arising from HEDs be used to inform effective and scalable psychological interventions).
iCatcher+: Robust and Automated Annotation of Infants' and Young Children's Gaze Behavior From Videos Collected in Laboratory, Field, and Online Studies
Technological advances in psychological research have enabled large-scale studies of human behavior and streamlined pipelines for automatic processing of data. However, studies of infants and children have not fully reaped these benefits because the behaviors of interest, such as gaze duration and direction, still have to be extracted from video through a laborious process of manual annotation, even when these data are collected online. Recent advances in computer vision raise the possibility of automated annotation of these video data. In this article, we built on a system for automatic gaze annotation in young children, iCatcher, by engineering improvements and then training and testing the system (referred to hereafter as iCatcher+) on three data sets with substantial video and participant variability (214 videos collected in U.S. lab and field sites, 143 videos collected in Senegal field sites, and 265 videos collected via webcams in homes; participant age range = 4 months-3.5 years). When trained on each of these data sets, iCatcher+ performed with near human-level accuracy on held-out videos on distinguishing "LEFT" versus "RIGHT" and "ON" versus "OFF" looking behavior across all data sets. This high performance was achieved at the level of individual frames, experimental trials, and study videos; held across participant demographics (e.g., age, race/ethnicity), participant behavior (e.g., movement, head position), and video characteristics (e.g., luminance); and generalized to a fourth, entirely held-out online data set. We close by discussing next steps required to fully automate the life cycle of online infant and child behavioral studies, representing a key step toward enabling robust and high-throughput developmental research.