Simulation studies for methodological research in psychology: A standardized template for planning, preregistration, and reporting
Simulation studies are widely used for evaluating the performance of statistical methods in psychology. However, the quality of simulation studies can vary widely in terms of their design, execution, and reporting. In order to assess the quality of typical simulation studies in psychology, we reviewed 321 articles published in in 2021 and 2022, among which 100/321 = 31.2% report a simulation study. We find that many articles do not provide complete and transparent information about key aspects of the study, such as justifications for the number of simulation repetitions, Monte Carlo uncertainty estimates, or code and data to reproduce the simulation studies. To address this problem, we provide a summary of the ADEMP (aims, data-generating mechanism, estimands and other targets, methods, performance measures) design and reporting framework from Morris et al. (2019) adapted to simulation studies in psychology. Based on this framework, we provide ADEMP-PreReg, a step-by-step template for researchers to use when designing, potentially preregistering, and reporting their simulation studies. We give formulae for estimating common performance measures, their Monte Carlo standard errors, and for calculating the number of simulation repetitions to achieve a desired Monte Carlo standard error. Finally, we give a detailed tutorial on how to apply the ADEMP framework in practice using an example simulation study on the evaluation of methods for the analysis of pre-post measurement experiments. (PsycInfo Database Record (c) 2024 APA, all rights reserved).
Why multiple hypothesis test corrections provide poor control of false positives in the real world
Most scientific disciplines use significance testing to draw conclusions about experimental or observational data. This classical approach provides a theoretical guarantee for controlling the number of false positives across a set of hypothesis tests, making it an appealing framework for scientists seeking to limit the number of false effects or associations that they claim to observe. Unfortunately, this theoretical guarantee applies to few experiments, and the true false positive rate (FPR) is much higher. Scientists have plenty of freedom to choose the error rate to control, the tests to include in the adjustment, and the method of correction, making strong error control difficult to attain. In addition, hypotheses are often tested after finding unexpected relationships or patterns, the data are analyzed in several ways, and analyses may be run repeatedly as data accumulate. As a result, adjusted values are too small, incorrect conclusions are often reached, and results are harder to reproduce. In the following, I argue why the FPR is rarely controlled meaningfully and why shrinking parameter estimates is preferable to value adjustments. (PsycInfo Database Record (c) 2024 APA, all rights reserved).
How to conduct an integrative mixed methods meta-analysis: A tutorial for the systematic review of quantitative and qualitative evidence
This article is a guide on how to conduct mixed methods meta-analyses (sometimes called mixed methods systematic reviews, integrative meta-analyses, or integrative meta-syntheses), using an integrative approach. These aggregative methods allow researchers to synthesize qualitative and quantitative findings from a research literature in order to benefit from the strengths of both forms of analysis. The article articulates distinctions in how qualitative and quantitative methodologies work with variation to develop a coherent theoretical basis for their integration. In advancing this methodological approach to integrative mixed methods meta-analysis (IMMMA), I provide rationales for procedural decisions that support methodological integrity and address prior misconceptions that may explain why these methods have not been as commonly used as might be expected. Features of questions and subject matters that lead them to be amenable to this research approach are considered. The steps to conducting an IMMMA then are described, with illustrative examples, and in a manner open to the use of a range of qualitative and quantitative meta-analytic approaches. These steps include the development of research aims, the selection of primary research articles, the generation of units for analysis, and the development of themes and findings. The tutorial provides guidance on how to develop IMMMA findings that have methodological integrity and are based upon the appreciation of the distinctive approaches to modeling variation in quantitative and qualitative methodologies. The article concludes with guidance for report writing and developing principles for practice. (PsycInfo Database Record (c) 2024 APA, all rights reserved).
Harvesting heterogeneity: Selective expertise versus machine learning
The heterogeneity of outcomes in behavioral research has long been perceived as a challenge for the validity of various theoretical models. More recently, however, researchers have started perceiving heterogeneity as something that needs to be not only acknowledged but also actively addressed, particularly in applied research. A serious challenge, however, is that classical psychological methods are not well suited for making practical recommendations when heterogeneous outcomes are expected. In this article, we argue that heterogeneity requires a separation between basic and applied behavioral methods, and between different types of behavioral expertise. We propose a novel framework for evaluating behavioral expertise and suggest that selective expertise can easily be automated via various machine learning methods. We illustrate the value of our framework via an empirical study of the preferences towards battery electric vehicles. Our results suggest that a basic multiarm bandit algorithm vastly outperforms human expertise in selecting the best interventions. (PsycInfo Database Record (c) 2024 APA, all rights reserved).
Lagged multidimensional recurrence quantification analysis for determining leader-follower relationships within multidimensional time series
The current article introduces lagged multidimensional recurrence quantification analysis. The method is an extension of multidimensional recurrence quantification analysis and allows to quantify the joint dynamics of multivariate time series and to investigate leader-follower relationships in behavioral and physiological data. Moreover, the method enables the quantification of the joint dynamics of a group, when such leader-follower relationships are taken into account. We first provide a formal presentation of the method, and then apply it to synthetic data, as well as data sets from joint action research, investigating the shared dynamics of facial expression and beats-per-minute recordings within different groups. A wrapper function is included, for applying the method together with the "crqa" package in R. (PsycInfo Database Record (c) 2024 APA, all rights reserved).
The potential of preregistration in psychology: Assessing preregistration producibility and preregistration-study consistency
Study preregistration has become increasingly popular in psychology, but its potential to restrict researcher degrees of freedom has not yet been empirically verified. We used an extensive protocol to assess the producibility (i.e., the degree to which a study can be properly conducted based on the available information) of preregistrations and the consistency between preregistrations and their corresponding papers for 300 psychology studies. We found that preregistrations often lack methodological details and that undisclosed deviations from preregistered plans are frequent. These results highlight that biases due to researcher degrees of freedom remain possible in many preregistered studies. More comprehensive registration templates typically yielded more producible preregistrations. We did not find that the producibility and consistency of preregistrations differed over time or between original and replication studies. Furthermore, we found that operationalizations of variables were generally preregistered more producible and consistently than other study parts. Inconsistencies between preregistrations and published studies were mainly encountered for data collection procedures, statistical models, and exclusion criteria. Our results indicate that, to unlock the full potential of preregistration, researchers in psychology should aim to write more producible preregistrations, adhere to these preregistrations more faithfully, and more transparently report any deviations from their preregistrations. This could be facilitated by training and education to improve preregistration skills, as well as the development of more comprehensive templates. (PsycInfo Database Record (c) 2024 APA, all rights reserved).
Comments on the measurement of effect sizes for indirect effects in Bayesian analysis of variance
Bayesian analysis of variance (BANOVA), implemented through R packages, offers a Bayesian approach to analyzing experimental data. A tutorial in extensively documents BANOVA. This note critically examines a method for evaluating mediation using partial eta-squared as an effect size measure within the BANOVA framework. We first identify an error in the formula for partial eta-squared and propose a corrected version. Subsequently, we discuss limitations in the interpretability of this effect size measure, drawing on previous research, and argue for its potential unsuitability in assessing indirect effects in mediation analysis. (PsycInfo Database Record (c) 2024 APA, all rights reserved).
Item response theory-based continuous test norming
In norm-referenced psychological testing, an individual's performance is expressed in relation to a reference population using a standardized score, like an intelligence quotient score. The reference population can depend on a continuous variable, like age. Current continuous norming methods transform the raw score into an age-dependent standardized score. Such methods have the shortcoming to solely rely on the raw test scores, ignoring valuable information from individual item responses. Instead of modeling the raw test scores, we propose modeling the item scores with a Bayesian two-parameter logistic (2PL) item response theory model with age-dependent mean and variance of the latent trait distribution, 2PL-norm for short. Norms are then derived using the estimated latent trait score and the age-dependent distribution parameters. Simulations show that 2PL-norms are overall more accurate than those from the most popular raw score-based norming methods cNORM and generalized additive models for location, scale, and shape (GAMLSS). Furthermore, the credible intervals of 2PL-norm exhibit clearly superior coverage over the confidence intervals of the raw score-based methods. The only issue of 2PL-norm is its slightly lower performance at the tails of the norms. Among the raw score-based norming methods, GAMLSS outperforms cNORM. For empirical practice this suggests the use of 2PL-norm, if the model assumptions hold. If not, or the interest is solely in the point estimates of the extreme trait positions, GAMLSS-based norming is a better alternative. The use of the 2PL-norm is illustrated and compared with GAMLSS and cNORM using empirical data, and code is provided, so that users can readily apply 2PL-norm to their normative data. (PsycInfo Database Record (c) 2024 APA, all rights reserved).
Percentage of variance accounted for in structural equation models: The rediscovery of the goodness of fit index
This article delves into the often-overlooked metric of percentage of variance accounted for in structural equation models (SEM). The goodness of fit index (GFI) provides the percentage of variance of the sum of squared covariances explained by the model. Despite being introduced over four decades ago, the GFI has been overshadowed in favor of fit indices that prioritize distinctions between close and nonclose fitting models. Similar to ² in regression, the GFI should not be used to this aim but rather to quantify the model's utility. The central aim of this study is to reintroduce the GFI, introducing a novel approach to computing the GFI using mean and mean-and-variance corrected test statistics, specifically designed for nonnormal data. We use an extensive simulation study to evaluate the precision of inferences on the GFI, including point estimates and confidence intervals. The findings demonstrate that the GFI can be very accurately estimated, even with nonnormal data, and that confidence intervals exhibit reasonable accuracy across diverse conditions, including large models and nonnormal data scenarios. The article provides methods and code for estimating the GFI in any SEM, urging researchers to reconsider the reporting of the percentage of variance accounted for as an essential tool for model assessment and selection. (PsycInfo Database Record (c) 2024 APA, all rights reserved).
Bayesian estimation and comparison of idiographic network models
Idiographic network models are estimated on time series data of a single individual and allow researchers to investigate person-specific associations between multiple variables over time. The most common approach for fitting graphical vector autoregressive (GVAR) models uses least absolute shrinkage and selection operator (LASSO) regularization to estimate a contemporaneous and a temporal network. However, estimation of idiographic networks can be unstable in relatively small data sets typical for psychological research. This bears the risk of misinterpreting differences in estimated networks as spurious heterogeneity between individuals. As a remedy, we evaluate the performance of a Bayesian alternative for fitting GVAR models that allows for regularization of parameters while accounting for estimation uncertainty. We also develop a novel test, implemented in the tsnet package in R, which assesses whether differences between estimated networks are reliable based on matrix norms. We first compare Bayesian and LASSO approaches across a range of conditions in a simulation study. Overall, LASSO estimation performs well, while a Bayesian GVAR without edge selection may perform better when the true network is dense. In an additional simulation study, the novel test is conservative and shows good false-positive rates. Finally, we apply Bayesian estimation and testing in an empirical example using daily data on clinical symptoms for 40 individuals. We additionally provide functionality to estimate Bayesian GVAR models in Stan within tsnet. Overall, Bayesian GVAR modeling facilitates the assessment of estimation uncertainty which is important for studying interindividual differences of intraindividual dynamics. In doing so, the novel test serves as a safeguard against premature conclusions of heterogeneity. (PsycInfo Database Record (c) 2024 APA, all rights reserved).
Multiple imputation of missing data in large studies with many variables: A fully conditional specification approach using partial least squares
Multiple imputation (MI) is one of the most popular methods for handling missing data in psychological research. However, many imputation approaches are poorly equipped to handle a large number of variables, which are a common sight in studies that employ questionnaires to assess psychological constructs. In such a case, conventional imputation approaches often become unstable and require that the imputation model be simplified, for example, by removing variables or combining them into composite scores. In this article, we propose an alternative method that extends the fully conditional specification approach to MI with dimension reduction techniques such as partial least squares. To evaluate this approach, we conducted a series of simulation studies, in which we compared it with other approaches that were based on variable selection, composite scores, or dimension reduction through principal components analysis. Our findings indicate that this novel approach can provide accurate results even in challenging scenarios, where other approaches fail to do so. Finally, we also illustrate the use of this method in real data and discuss the implications of our findings for practice. (PsycInfo Database Record (c) 2024 APA, all rights reserved).
Data integrity in an online world: Demonstration of multimodal bot screening tools and considerations for preserving data integrity in two online social and behavioral research studies with marginalized populations
Internet-based studies are widely used in social and behavioral health research, yet bots and fraud from "survey farming" bring significant threats to data integrity. For research centering marginalized communities, data integrity is an ethical imperative, as fraudulent data at a minimum poses a threat to scientific integrity, and worse could even promulgate false, negative stereotypes about the population of interest. Using data from two online surveys of sexual and gender minority populations (young men who have sex with men and transgender women of color), we (a) demonstrate the use of online survey techniques to identify and mitigate internet-based fraud, (b) differentiate techniques for and identify two different types of "survey farming" (i.e., bots and false responders), and (c) demonstrate the consequences of those distinct types of fraud on sample characteristics and statistical inferences, if fraud goes unaddressed. We provide practical recommendations for internet-based studies in psychological, social, and behavioral health research to ensure data integrity and discuss implications for future research testing data integrity techniques. (PsycInfo Database Record (c) 2024 APA, all rights reserved).
Thinking clearly about time-invariant confounders in cross-lagged panel models: A guide for choosing a statistical model from a causal inference perspective
Many statistical models have been proposed to examine reciprocal cross-lagged causal effects from panel data. The present article aims to clarify how these various statistical models control for unmeasured time-invariant confounders, helping researchers understand the differences in the statistical models from a causal inference perspective. Assuming that the true data generation model (i.e., causal model) has time-invariant confounders that were not measured, we compared different statistical models (e.g., dynamic panel model and random-intercept cross-lagged panel model) in terms of the conditions under which they can provide a relatively accurate estimate of the target causal estimand. Based on the comparisons and realistic plausibility of these conditions, we made some practical suggestions for researchers to select a statistical model when they are interested in causal inference. (PsycInfo Database Record (c) 2024 APA, all rights reserved).
A computationally efficient and robust method to estimate exploratory factor analysis models with correlated residuals
A critical assumption in exploratory factor analysis (EFA) is that manifest variables are no longer correlated after the influences of the common factors are controlled. The assumption may not be valid in some EFA applications; for example, questionnaire items share other characteristics in addition to their relations to common factors. We present a computationally efficient and robust method to estimate EFA with correlated residuals. We provide details on the implementation of the method with both ordinary least squares estimation and maximum likelihood estimation. We demonstrate the method using empirical data and conduct a simulation study to explore its statistical properties. The results are (a) that the new method encountered much fewer convergence problems than the existing method; (b) that the EFA model with correlated residuals produced a more satisfactory model fit than the conventional EFA model; and (c) that the EFA model with correlated residuals and the conventional EFA model produced very similar estimates for factor loadings. (PsycInfo Database Record (c) 2024 APA, all rights reserved).
Mixture multigroup structural equation modeling: A novel method for comparing structural relations across many groups
Behavioral scientists often examine the relations between two or more latent variables (e.g., how emotions relate to life satisfaction), and structural equation modeling (SEM) is the state-of-the-art for doing so. When comparing these "structural relations" among many groups, they likely differ across the groups. However, it is equally likely that some groups share the same relations so that clusters of groups emerge. Latent variables are measured indirectly by questionnaires and, for validly comparing their relations among groups, the measurement of the latent variables should be invariant across the groups (i.e., measurement invariance). However, across many groups, often at least some measurement parameters differ. Restricting these measurement parameters to be invariant, when they are not, causes the structural relations to be estimated incorrectly and invalidates their comparison. We propose mixture multigroup SEM (MMG-SEM) to gather groups with equivalent structural relations in clusters while accounting for the reality of measurement noninvariance. Specifically, MMG-SEM obtains a clustering of groups focused on the structural relations by making them cluster-specific, while capturing measurement noninvariances with group-specific measurement parameters. In this way, MMG-SEM ensures that the clustering is valid and unaffected by differences in measurement. This article proposes an estimation procedure built around the R package "lavaan" and evaluates MMG-SEM's performance through two simulation studies. The results demonstrate that MMG-SEM successfully recovers the group-clustering as well as the cluster-specific relations and the partially group-specific measurement parameters. To illustrate its empirical value, we apply MMG-SEM to cross-cultural data on the relations between experienced emotions and life satisfaction. (PsycInfo Database Record (c) 2024 APA, all rights reserved).
The Bayesian reservoir model of psychological regulation
Social and behavioral scientists are increasingly interested the dynamics of the processes they study. Despite the wide array of processes studied, a fairly narrow set of models are applied to characterize dynamics within these processes. For social and behavioral research to take the next step in modeling dynamics, a wider variety of models need to be considered. The reservoir model is one model of psychological regulation that helps expand the models available (Deboeck & Bergeman, 2013). The present article implements the Bayesian reservoir model for both single time series and multilevel data. Simulation 1 compares the performance of the original version of the reservoir model fit using structural equation modeling (Deboeck & Bergeman, 2013) to the proposed Bayesian estimation approach. Simulation 2 expands this to a multilevel data scenario and compares this to the single-level version. The Bayesian estimation approach performs substantially better than the original estimation approach and produces low-bias estimates even with time series as short as 25 observations. Combining Bayesian estimation with a multilevel modeling approach allows for relatively unbiased estimation with sample sizes as small as 15 individuals and/or with time series as short as 15 observations. Finally, a substantive example is presented that applies the Bayesian reservoir model to perceived stress, examining how the model parameters relate to psychological variables commonly expected to relate to resilience. The current expansion of the reservoir model demonstrates the benefits of leveraging the combined strengths of Bayesian estimation and multilevel modeling, with new dynamic models that have been tailored to match the process of psychological regulation. (PsycInfo Database Record (c) 2024 APA, all rights reserved).
Statistical power and optimal design for randomized controlled trials investigating mediation effects
Mediation analyses in randomized controlled trials (RCTs) can unpack potential causal pathways between interventions and outcomes and help the iterative improvement of interventions. When designing RCTs investigating these mechanisms, two key considerations are (a) the sample size needed to achieve adequate statistical power and (b) the efficient use of resources. The current study has developed closed-form statistical power formulas for RCTs investigating mediation effects with and without covariates under the Sobel and joint significance tests. The power formulas are functions of sample size, sample allocation between treatment conditions, effect sizes in the treatment-mediator and mediator-outcome paths, and other common parameters (e.g., significance level, one- or two-tailed test). The power formulas allow us to assess how covariates impact the magnitude of mediation effects and statistical power. Accounting for the potential unequal sampling costs between treatment conditions, we have further developed an optimal design framework to identify optimal sample allocations that provide the maximum statistical power under a fixed budget or use the minimum resources to achieve a target power. Illustrations show that the proposed method can identify more efficient and powerful sample allocations than conventional designs with an equal number of individuals in each treatment condition. We have implemented the methods in the R package odr to improve the accessibility of the work. (PsycInfo Database Record (c) 2024 APA, all rights reserved).
Latent growth mixture models as latent variable multigroup factor models: Comment on McNeish et al. (2023)
McNeish et al. argue for the general use of covariance pattern growth mixture models because these models do not involve the assumption of random effects, demonstrate high rates of convergence, and are most likely to identify the correct number of latent subgroups. We argue that the covariance pattern growth mixture model is a single random intercept model. It and other models considered in their article are special cases of a general model involving slope and intercept factors. We argue growth mixture models are multigroup invariance hypotheses based on unknown subgroups. Psychometric models in which trajectories are modeled using slope factor loadings which vary by latent subgroup are often conceptually preferable. Convergence rates for mixture models can be substantially improved by using a variance component start value taken from analyses with one fewer class and by specifying multifactor models in orthogonal form. No single latent growth model is appropriate across all research contexts and, instead, the most appropriate latent mixture model must be "right-sized" to the data under consideration. Reanalysis of a real-world longitudinal data set of posttraumatic stress disorder symptomatology reveals a three-group model involving exponential decline, further suggesting that the four-group "cat's cradle" pattern frequently reported is artefactual. (PsycInfo Database Record (c) 2024 APA, all rights reserved).
Clustering methods: To optimize or to not optimize?
Many clustering problems are associated with a particular objective criterion that is sought to be optimized. There are often several methods that can be used to tackle the optimization problem, and one or more of them might guarantee a globally optimal solution. However, it is quite possible that, relative to one or more suboptimal solutions, a globally optimal solution might be less interpretable from the standpoint of psychological theory or be less in accordance with some known (i.e., true) cluster structure. For example, in simulation experiments, it has sometimes been observed that there is not a perfect correspondence between the optimized clustering criterion and recovery of the underlying known cluster structure. This can lead to the misconception that clustering methods with a tendency to produce suboptimal solutions might, in some instances, be preferable to superior methods that provide globally optimal (or at least better locally optimal) solutions. In this article, we present results from simulation studies in the context of -median clustering where departure from global optimality was carefully controlled. Although the results showed that suboptimal solutions sometimes produced marginally better recovery for experimental cells where the known cluster structure was less well-defined, capriciously accepting inferior solutions is an unwise practice. However, there are instances in which some sacrifice in the optimization criterion value to meet certain desirable constraints or to improve the value of one or more other relevant criteria is principled. (PsycInfo Database Record (c) 2024 APA, all rights reserved).
Investigating the effects of congruence between within-person associations: A comparison of two extensions of response surface analysis
Response surface analysis (RSA) allows researchers to study whether the degree of congruence between two predictor variables is related to a potential psychological outcome. Here, we adapt RSA to the case in which the two predictor variables whose congruence is of interest refer to individual differences in within-person associations (WPAs) between variables that fluctuate over time. For example, a WPA-congruence hypothesis in research on romantic relationships could posit that partners are happier when they have similar social reactivities-that is, when they have similarly strong WPAs between the quantity of their social interactions and their momentary well-being. One method for testing a WPA-congruence hypothesis is a two-step approach in which the individuals' WPAs are first estimated as random slopes in respective multilevel models, and then these estimates are used as predictors in a regular RSA. As an alternative, we suggest combining RSA with multilevel structural equation modeling (MSEM) by specifying the WPAs as random slopes in the structural equation and using their latent second-order terms to predict the outcome on Level 2. We introduce both approaches and provide and explain their corresponding computer code templates. We also compared the two approaches with a simulation study and found that the MSEM model-despite its complexities (e.g., nonlinear functions of latent slopes)-has advantages over the two-step approach. We conclude that the MSEM approach should be used in practice. We demonstrate its application using data from a daily diary study and offer guidance for important decisions (e.g., about standardization). (PsycInfo Database Record (c) 2024 APA, all rights reserved).
A computational method to reveal psychological constructs from text data
When starting to formalize psychological constructs, researchers traditionally rely on two distinct approaches: the quantitative approach, which defines constructs as part of a testable theory based on prior research and domain knowledge often deploying self-report questionnaires, or the qualitative approach, which gathers data mostly in the form of text and bases construct definitions on exploratory analyses. Quantitative research might lead to an incomplete understanding of the construct, while qualitative research is limited due to challenges in the systematic data processing, especially at large scale. We present a new computational method that combines the comprehensiveness of qualitative research and the scalability of quantitative analyses to define psychological constructs from semistructured text data. Based on structured questions, participants are prompted to generate sentences reflecting instances of the construct of interest. We apply computational methods to calculate embeddings as numerical representations of the sentences, which we then run through a clustering algorithm to arrive at groupings of sentences as psychologically relevant classes. The method includes steps for the measurement and correction of bias introduced by the data generation, and the assessment of cluster validity according to human judgment. We demonstrate the applicability of our method on an example from emotion regulation. Based on short descriptions of emotion regulation attempts collected through an open-ended situational judgment test, we use our method to derive classes of emotion regulation strategies. Our approach shows how machine learning and psychology can be combined to provide new perspectives on the conceptualization of psychological processes. (PsycInfo Database Record (c) 2024 APA, all rights reserved).
Cross-lagged panel modeling with binary and ordinal outcomes
To date, cross-lagged panel modeling has been studied only for continuous outcomes. This article presents methods that are suitable also when there are binary and ordinal outcomes. Modeling, testing, identification, and estimation are discussed. A two-part ordinal model is proposed for ordinal variables with strong floor effects often seen in applications. An example considers the interaction between stress and alcohol use in an alcohol treatment study. Extensions to multiple-group analysis and modeling in the presence of trends are discussed. (PsycInfo Database Record (c) 2024 APA, all rights reserved).
Scaling and estimation of latent growth models with categorical indicator variables
Although the interest in latent growth models (LGMs) with categorical indicator variables has recently increased, there are still difficulties regarding the selection of estimation methods and the interpretation of model estimates. However, difficulties in estimating and interpreting categorical LGMs can be avoided by understanding the scaling process. Depending on which parameter constraint methods are selected at each step of the scaling process, the scale applied to the model changes, which can produce significant differences in the estimation results and interpretation. In other words, if a different method is chosen for any of the steps in the scaling process, the estimation results will not be comparable. This study organizes the scaling process and its relationship with estimation methods for categorical LGMs. Specifically, this study organizes the parameter constraint methods included in the scaling process of categorical LGMs and extensively considers the effect of parameter constraints at each step on the meaning of estimates. This study also provides evidence for the scale suitability and interpretability of model estimates through a simple illustration. (PsycInfo Database Record (c) 2024 APA, all rights reserved).
How should we model the effect of "change"-Or should we?
There have been long and bitter debates between those who advocate for the use of residualized change as the foundation of longitudinal models versus those who utilize difference scores. However, these debates have focused primarily on modeling change in the outcome variable. Here, we extend these same ideas to the covariate side of the change equation, finding similar issues arise when using lagged versus difference scores as covariates of interest in models of change. We derive a system of relationships that emerge across models differing in how time-varying covariates are represented, and then demonstrate how the set of logical transformations emerges in applied longitudinal settings. We conclude by considering the practical implications of a synthesized understanding of the effects of difference scores as both outcomes and predictors, with specific consequences for mediation analysis within multivariate longitudinal models. Our results suggest that there is reason for caution when using difference scores as time-varying covariates, given their propensity for inducing apparent inferential inversions within different analyses. (PsycInfo Database Record (c) 2024 APA, all rights reserved).
Solving variables with Monte Carlo simulation experiments: A stochastic root-solving approach
Despite their popularity and flexibility, questions remain regarding how to optimally solve particular unknown variables of interest using Monte Carlo simulation experiments. This article reviews two common approaches based on either performing deterministic iterative searches with noisy objective functions or by constructing interpolation estimates given fitted surrogate functions, highlighting the inefficiencies and inferential concerns of both methods. To address these limitations, and to fill a gap in existing Monte Carlo experimental methodology, a novel algorithm termed the probabilistic bisection algorithm with bolstering and interpolations (ProBABLI) is presented with the goal providing efficient, consistent, and unbiased estimates (with associated confidence intervals) for the stochastic root equations found in Monte Carlo simulation research. Properties of the ProBABLI approach are demonstrated using practical sample size planning applications for independent samples tests and structural equation models given target power rates, precision criteria, and expected power functions that incorporate prior beliefs. (PsycInfo Database Record (c) 2024 APA, all rights reserved).
Uses of uncertain statistical power: Designing future studies, not evaluating completed studies
tatistical power is a topic of intense interest as part of proposed methodological reforms to improve the defensibility of psychological findings. Power has been used in disparate ways-some that follow and some that do not follow from definitional features of statistical power. We introduce a taxonomy on three uses of power (comparing the performance of different procedures, designing or planning studies, and evaluating completed studies) in the context of new developments that consider uncertainty due to sampling variability. This review first describes fundamental concepts underlying power, new quantitative developments in power analysis, and the application of power analysis in designing studies. To facilitate the pedagogy of using power for design, we provide web applications to illustrate these concepts and examples of power analysis using newly developed methods. We also describe why using power for evaluating completed studies can be counterproductive. We conclude with a discussion of future directions in quantitative research on power analysis and provide recommendations for applying power in substantive research. (PsycInfo Database Record (c) 2024 APA, all rights reserved).
An overview of alternative formats to the Likert format: A comment on Wilson et al. (2022)
Wilson et al. (2022) compared the Likert response format to an alternative format, which they called the Guttman response format. Using a Rasch modeling approach, they found that the Guttman response format had better properties relative to the Likert response format. We agree with their analyses and conclusions. However, they have failed to mention many existing articles that have sought to overcome the disadvantages of the Likert format through the use of an alternative format. For example, the so-called "Guttman response format" is essentially the same as the Expanded format, which was proposed by Zhang and Savalei (2016) as a way to overcome the disadvantages of the Likert format. Similar alternative formats have been investigated since the 1960s. In this short response article, we provide a review of several alternative formats, explaining in detail the key characteristics of all the alternative formats that are designed to overcome the problems with the Likert format. (PsycInfo Database Record (c) 2024 APA, all rights reserved).
Practical implications of equating equivalence tests: Reply to Campbell and Gustafson (2022)
Linde et al. (2021) compared the "two one-sided tests" the "highest density interval-region of practical equivalence", and the "interval Bayes factor" approaches to establishing equivalence in terms of power and Type I error rate using typical decision thresholds. They found that the interval Bayes factor approach exhibited a higher power but also a higher Type I error rate than the other approaches. In response, Campbell and Gustafson (2022) showed that the performances of the three approaches can approximate one another when they are calibrated to have the same Type I error rate. In this article, we argue that these results have little bearing on how these approaches are used in practice; a concrete example is used to highlight this important point. (PsycInfo Database Record (c) 2024 APA, all rights reserved).
Correction to "Comparing theories with the Ising model of explanatory coherence" by Maier et al. (2023)
Reports an error in "Comparing theories with the Ising model of explanatory coherence" by Maximilian Maier, Noah van Dongen and Denny Borsboom (, Advanced Online Publication, Mar 02, 2023, np). In the article, the copyright attribution was incorrectly listed, and the Creative Commons CC BY 4.0 license disclaimer was incorrectly omitted from the author note. The correct copyright is "© 2023 The Author(s)," and the omitted disclaimer is below: Open Access funding provided by University College London: This work is licensed under a Creative Commons Attribution 4.0 International License (CC BY 4.0; https://creativecommons.org/licenses/by/ 4.0). This license permits copying and redistributing the work in any medium or format, as well as adapting the material for any purpose, even commercially. (The following abstract of the original article appeared in record 2023-50323-001.) Theories are among the most important tools of science. Lewin (1943) already noted "There is nothing as practical as a good theory." Although psychologists discussed problems of theory in their discipline for a long time, weak theories are still widespread in most subfields. One possible reason for this is that psychologists lack the tools to systematically assess the quality of their theories. Thagard (1989) developed a computational model for formal theory evaluation based on the concept of explanatory coherence. However, there are possible improvements to Thagard's (1989) model and it is not available in software that psychologists typically use. Therefore, we developed a new implementation of explanatory coherence based on the Ising model. We demonstrate the capabilities of this new Ising model of Explanatory Coherence (IMEC) on several examples from psychology and other sciences. In addition, we implemented it in the R-package IMEC to assist scientists in evaluating the quality of their theories in practice. (PsycInfo Database Record (c) 2024 APA, all rights reserved).
Causal definitions versus casual estimation: Reply to Valente et al. (2022)
In this response to Valente et al. (2022), I am discussing the plausibility and applicability of the proposed mediation model and its causal effects estimation for single case experimental designs (SCEDs). I will focus on the underlying assumptions that the authors use to identify the causal effects. These assumptions include the particularly problematic assumption of sequential ignorability or no-unmeasured confounders. First, I will discuss the plausibility of the assumption in general and then particularly for SCEDs by providing an analytic argument and a reanalysis of the empirical example in Valente et al. (2022). Second, I will provide a simulation that reproduces the design by Valente et al. (2022) with the exception that, for a more realistic depiction of empirical data, an unmeasured confounder affects the mediator and outcome variables. The results of this simulation study indicate that even minor violations will lead to Type I error rates up to 100% and coverage rates as low as 0% for the defined causal direct and indirect effects. Third, using historical data on the effect of birth control on stork population and birth rates, I will show that mediation models like the proposed method can lead to surprising artifacts. These artifacts can hardly be identified with statistically means including methods such as sensitivity analyses. (PsycInfo Database Record (c) 2024 APA, all rights reserved).