Unveiling Fall Triggers in Older Adults: A Machine Learning Graphical Model Analysis
While existing research has identified diverse fall risk factors in adults aged 60 and older across various areas, comprehensively examining the interrelationships between all factors can enhance our knowledge of complex mechanisms and ultimately prevent falls. This study employs a novel approach-a (MUGM)-to unravel the interplay between sociodemographics, mental well-being, body composition, self-assessed and performance-based fall risk assessments, and physical activity patterns. Using a parameterized joint probability density, MUGMs specify the higher-order dependence structure and reveals the underlying graphical structure of heterogeneous variables. The MUGM consisting of mixed types of variables (continuous and categorical) has versatile applications that provide innovative and practical insights, as it is equipped to transcend the limitations of traditional correlation analysis and uncover sophisticated interactions within a high-dimensional data set. Our study included 120 elders from central Florida whose 37 fall risk factors were analyzed using an MUGM. Among the identified features, 34 exhibited pairwise relationships, while COVID-19-related factors and housing composition remained conditionally independent from all others. The results from our study serve as a foundational exploration, and future research investigating the longitudinal aspects of these features plays a pivotal role in enhancing our knowledge of the dynamics contributing to fall prevention in this population.
Generalized Linear Models with Covariate Measurement Error and Zero-Inflated Surrogates
Epidemiological studies often encounter a challenge due to exposure measurement error when estimating an exposure-disease association. A surrogate variable may be available for the true unobserved exposure variable. However, zero-inflated data are encountered frequently in the surrogate variables. For example, many nutrient or physical activity measures may have a zero value (or a low detectable value) among a group of individuals. In this paper, we investigate regression analysis when the observed surrogates may have zero values among some individuals of the whole study cohort. A naive regression calibration without taking into account a probability mass of the surrogate variable at 0 (or a low detectable value) will be biased. We developed a regression calibration estimator which typically can have smaller biases than the naive regression calibration estimator. We propose an expected estimating equation estimator which is consistent under the zero-inflated surrogate regression model. Extensive simulations show that the proposed estimator performs well in terms of bias correction. These methods are applied to a physical activity intervention study.
Asymptotic Properties for Cumulative Probability Models for Continuous Outcomes
Regression models for continuous outcomes frequently require a transformation of the outcome, which is often specified a priori or estimated from a parametric family. Cumulative probability models (CPMs) nonparametrically estimate the transformation by treating the continuous outcome as if it is ordered categorically. They thus represent a flexible analysis approach for continuous outcomes. However, it is difficult to establish asymptotic properties for CPMs due to the potentially unbounded range of the transformation. Here we show asymptotic properties for CPMs when applied to slightly modified data where bounds, one lower and one upper, are chosen and the outcomes outside the bounds are set as two ordinal categories. We prove the uniform consistency of the estimated regression coefficients and of the estimated transformation function between the bounds. We also describe their joint asymptotic distribution, and show that the estimated regression coefficients attain the semiparametric efficiency bound. We show with simulations that results from this approach and those from using the CPM on the original data are very similar when a small fraction of the data are modified. We reanalyze a dataset of HIV-positive patients with CPMs to illustrate and compare the approaches.
A Mechanistic Model for Long COVID Dynamics
Long COVID, a long-lasting disorder following an acute infection of COVID-19, represents a significant public health burden at present. In this paper, we propose a new mechanistic model based on differential equations to investigate the population dynamics of long COVID. By connecting long COVID with acute infection at the population level, our modeling framework emphasizes the interplay between COVID-19 transmission, vaccination, and long COVID dynamics. We conducted a detailed mathematical analysis of the model. We also validated the model using numerical simulation with real data from the US state of Tennessee and the UK.
Next-Generation Sequencing Data-Based Association Testing of a Group of Genetic Markers for Complex Responses Using a Generalized Linear Model Framework
Association testing has been widely used to study the relationship between genetic variants and phenotypes. Most association testing methods are genotype-based, i.e. first estimate genotype and then regress phenotype on estimated genotype and other variables. Directly testing methods based on next generation sequencing (NGS) data without genotype calling have been proposed and shown advantage over genotype-based methods in the scenarios when genotype calling is not accurate. NGS data-based single-variant testing have been proposed including our previously proposed single-variant testing method, i.e. UNC combo method [1]. NGS data-based group testing methods for continuous phenotype have also been proposed by us using a linear model framework which can handle continuous responses [2]. In this paper, we extend our linear model-based framework to a generalized linear model-based framework so that the methods can handle other types of responses especially binary responses which is commonly-faced in association studies. We have conducted extensive simulation studies to evaluate the performance of different estimators and compare our estimators with their corresponding genotype-based methods. We found that all methods have Type I errors controlled, and our NGS data-based testing methods have better performance than their corresponding genotype-based methods in the literature for other types of responses including binary responses (logistic regression) and count responses (Poisson regression especially when sequencing depth is low. In conclusion, we have extended our previous linear model (LM) framework to a generalized linear model (GLM) framework and derived NGS data-based testing methods for a group of genetic variants. Compared with our previously proposed LM-based methods [2], the new GLM-based methods can handle more complex responses (for example, binary responses and count responses) in addition to continuous responses. Our methods have filled the literature gap and shown advantage over their corresponding genotype-based methods in the literature.
A Flexible Method for Diagnostic Accuracy with Biomarker Measurement Error
Diagnostic biomarkers are often measured with errors due to imperfect lab conditions or analytic variability of the assay. The ability of a diagnostic biomarker to discriminate between cases and controls is often measured by the area under the receiver operating characteristic curve (AUC), sensitivity, specificity, among others. Ignoring measurement error can cause biased estimation of a diagnostic accuracy measure, which results in misleading interpretation of the efficacy of a diagnostic biomarker. Existing assays available are either research grade or clinical grade. Research assays are cost effective, often multiplex, but they may be associated with moderate measurement errors leading to poorer diagnostic performance. In comparison, clinical assays may provide better diagnostic ability, but with higher cost since they are usually developed by industry. Correction for attenuation methods are often valid when biomarkers are from a normal distribution, but may be biased with skewed biomarkers. In this paper, we develop a flexible method based on skew-normal biomarker distributions to correct for bias in estimating diagnostic performance measures including AUC, sensitivity, and specificity. Finite sample performance of the proposed method is examined via extensive simulation studies. The methods are applied to a pancreatic cancer biomarker study.
Improving the Robustness of Variable Selection and Predictive Performance of Regularized Generalized Linear Models and Cox Proportional Hazard Models
High-dimensional data applications often entail the use of various statistical and machine-learning algorithms to identify an optimal signature based on biomarkers and other patient characteristics that predicts the desired clinical outcome in biomedical research. Both the composition and predictive performance of such biomarker signatures are critical in various biomedical research applications. In the presence of a large number of features, however, a conventional regression analysis approach fails to yield a good prediction model. A widely used remedy is to introduce regularization in fitting the relevant regression model. In particular, a penalty on the regression coefficients is extremely useful, and very efficient numerical algorithms have been developed for fitting such models with different types of responses. This -based regularization tends to generate a parsimonious prediction model with promising prediction performance, i.e., feature selection is achieved along with construction of the prediction model. The variable selection, and hence the composition of the signature, as well as the prediction performance of the model depend on the choice of the penalty parameter used in the regularization. The penalty parameter is often chosen by K-fold cross-validation. However, such an algorithm tends to be unstable and may yield very different choices of the penalty parameter across multiple runs on the same dataset. In addition, the predictive performance estimates from the internal cross-validation procedure in this algorithm tend to be inflated. In this paper, we propose a Monte Carlo approach to improve the robustness of regularization parameter selection, along with an additional cross-validation wrapper for objectively evaluating the predictive performance of the final model. We demonstrate the improvements via simulations and illustrate the application via a real dataset.
Modeling the Interplay between HDV and HBV in Chronic HDV/HBV Patients
Hepatitis D virus is an infectious subviral agent that can only propagate in people infected with hepatitis B virus. In this study, we modified and further developed a recent model for early hepatitis D virus and hepatitis B virus kinetics to better reproduce hepatitis D virus and hepatitis B virus kinetics measured in infected patients during anti-hepatitis D virus treatment. The analytical solutions were provided to highlight the new features of the modified model. The improved model offered significantly better prospects for modeling hepatitis D virus and hepatitis B virus interactions.
Advances in Parameter Estimation and Learning from Data for Mathematical Models of Hepatitis C Viral Kinetics
Mathematical models, some of which incorporate both intracellular and extracellular hepatitis C viral kinetics, have been advanced in recent years for studying HCV-host dynamics, antivirals mode of action, and their efficacy. The standard ordinary differential equation (ODE) hepatitis C virus (HCV) kinetic model keeps track of uninfected cells, infected cells, and free virus. In multiscale models, a fourth partial differential equation (PDE) accounts for the intracellular viral RNA (vRNA) kinetics in an infected cell. The PDE multiscale model is substantially more difficult to solve compared to the standard ODE model, with governing differential equations that are stiff. In previous contributions, we developed and implemented stable and efficient numerical methods for the multiscale model for both the solution of the model equations and parameter estimation. In this contribution, we perform sensitivity analysis on model parameters to gain insight into important properties and to ensure our numerical methods can be safely used for HCV viral dynamic simulations. Furthermore, we generate in-silico patients using the multiscale models to perform machine learning from the data, which enables us to remove HCV measurements on certain days and still be able to estimate meaningful observations with a sufficiently small error.
An Integrated Workflow for Building Digital Twins of Cardiac Electromechanics-A Multi-Fidelity Approach for Personalising Active Mechanics
Personalised computer models of cardiac function, referred to as cardiac digital twins, are envisioned to play an important role in clinical precision therapies of cardiovascular diseases. A major obstacle hampering clinical translation involves the significant computational costs involved in the personalisation of biophysically detailed mechanistic models that require the identification of high-dimensional parameter vectors. An important aspect to identify in electromechanics (EM) models are active mechanics parameters that govern cardiac contraction and relaxation. In this study, we present a novel, fully automated, and efficient approach for personalising biophysically detailed active mechanics models using a two-step multi-fidelity solution. In the first step, active mechanical behaviour in a given 3D EM model is represented by a purely phenomenological, low-fidelity model, which is personalised at the organ scale by calibration to clinical cavity pressure data. Then, in the second step, median traces of nodal cellular active stress, intracellular calcium concentration, and fibre stretch are generated and utilised to personalise the desired high-fidelity model at the cellular scale using a 0D model of cardiac EM. Our novel approach was tested on a cohort of seven human left ventricular (LV) EM models, created from patients treated for aortic coarctation (CoA). Goodness of fit, computational cost, and robustness of the algorithm against uncertainty in the clinical data and variations of initial guesses were evaluated. We demonstrate that our multi-fidelity approach facilitates the personalisation of a biophysically detailed active stress model within only a few (2 to 4) expensive 3D organ-scale simulations-a computational effort compatible with clinical model applications.
Application of Machine Learning to Study the Association between Environmental Factors and COVID-19 Cases in Mississippi, USA
Because of the large-scale impact of COVID-19 on human health, several investigations are being conducted to understand the underlying mechanisms affecting the spread and transmission of the disease. The present study aimed to assess the effects of selected environmental factors such as temperature, humidity, dew point, wind speed, pressure, and precipitation on the daily increase in COVID-19 cases in Mississippi, USA, during the period from January 2020 to August 2021. A machine learning model was used to predict COVID-19 cases and implement preventive measures if necessary. A statistical analysis using Python programming showed that the humidity ranged from 56% to 78%, and COVID-19 cases increased from 634 to 3546. Negative correlations were found between temperature and COVID-19 incidence rate (-0.22) and between humidity and COVID-19 incidence rate (-0.15). The linear regression model showed the model linear coefficients to be 0.92 and -1.29, respectively, with the intercept being 55.64. For the test dataset, the R score was 0.053. The statistical analysis and machine learning show that there is no linear dependence of temperature and humidity with the COVID-19 incidence rate.
Evaluation of Surrogate Endpoints Using Information-Theoretic Measure of Association Based on Havrda and Charvat Entropy
Surrogate endpoints have been used to assess the efficacy of a treatment and can potentially reduce the duration and/or number of required patients for clinical trials. Using information theory, Alonso et al. (2007) proposed a unified framework based on Shannon entropy, a new definition of surrogacy that departed from the hypothesis testing framework. In this paper, a new family of surrogacy measures under Havrda and Charvat (H-C) entropy is derived which contains Alonso's definition as a particular case. Furthermore, we extend our approach to a new model based on the information-theoretic measure of association for a longitudinally collected continuous surrogate endpoint for a binary clinical endpoint of a clinical trial using H-C entropy. The new model is illustrated through the analysis of data from a completed clinical trial. It demonstrates advantages of H-C entropy-based surrogacy measures in the evaluation of scheduling longitudinal biomarker visits for a phase 2 randomized controlled clinical trial for treatment of multiple sclerosis.
A Surrogate Measure for Time-Varying Biomarkers in Randomized Clinical Trials
Clinical trials with rare or distant outcomes are usually designed to be large in size and long term. The resource-demand and time-consuming characteristics limit the feasibility and efficiency of the studies. There are motivations to replace rare or distal clinical endpoints by reliable surrogate markers, which could be earlier and easier to collect. However, statistical challenges still exist to evaluate and rank potential surrogate markers. In this paper, we define a generalized proportion of treatment effect for survival settings. The measure's definition and estimation do not rely on any model assumption. It is equipped with a consistent and asymptotically normal non-parametric estimator. Under proper conditions, the measure reflects the proportion of average treatment effect mediated by the surrogate marker among the group that would survive to mark the measurement time under both intervention and control arms.
Assessing Methods for Evaluating the Number of Components in Non-Negative Matrix Factorization
Non-negative matrix factorization is a relatively new method of matrix decomposition which factors an m×n data matrix X into an m×k matrix W and a k×n matrix H, so that X≈W×H. Importantly, all values in X, W, and H are constrained to be non-negative. NMF can be used for dimensionality reduction, since the k columns of W can be considered components into which X has been decomposed. The question arises: how does one choose k? In this paper, we first assess methods for estimating k in the context of NMF in synthetic data. Second, we examine the effect of normalization on this estimate's accuracy in empirical data. In synthetic data with orthogonal underlying components, methods based on PCA and Brunet's Cophenetic Correlation Coefficient achieved the highest accuracy. When evaluated on a well-known real dataset, normalization had an unpredictable effect on the estimate. For any given normalization method, the methods for estimating k gave widely varying results. We conclude that when estimating k, it is best not to apply normalization. If underlying components are known to be orthogonal, then Velicer's MAP or Minka's Laplace-PCA method might be best. However, when orthogonality of the underlying components is unknown, none of the methods seemed preferable.
A Mathematical Analysis of HDV Genotypes: From Molecules to Cells
Hepatitis D virus HDV) is classified according to eight genotypes. The various genotypes are included in the HDVdb database, where each HDV sequence is specified by its genotype. In this contribution, a mathematical analysis is performed on RNA sequences in HDVdb. The RNA folding predicted structures of the Genbank HDV genome sequences in HDVdb are classified according to their coarse-grain tree-graph representation. The analysis allows discarding in a simple and efficient way the vast majority of the sequences that exhibit a rod-like structure, which is important for the virus replication, to attempt to discover other biological functions by structure consideration. After the filtering, there remain only a small number of sequences that can be checked for their additional stem-loops besides the main one that is known to be responsible for virus replication. It is found that a few sequences contain an additional stem-loop that is responsible for RNA editing or other possible functions. These few sequences are grouped into two main classes, one that is well-known experimentally belonging to genotype 3 for patients from South America associated with RNA editing, and the other that is not known at present belonging to genotype 7 for patients from Cameroon. The possibility that another function besides virus replication reminiscent of the editing mechanism in HDV genotype 3 exists in HDV genotype 7 has not been explored before and is predicted by eigenvalue analysis. Finally, when comparing native and shuffled sequences, it is shown that HDV sequences belonging to all genotypes are accentuated in their mutational robustness and thermodynamic stability as compared to other viruses that were subjected to such an analysis.
Nonlinear Dynamics of the Introduction of a New SARS-CoV-2 Variant with Different Infectiousness
Several variants of the SARS-CoV-2 virus have been detected during the COVID-19 pandemic. Some of these new variants have been of health public concern due to their higher infectiousness. We propose a theoretical mathematical model based on differential equations to study the effect of introducing a new, more transmissible SARS-CoV-2 variant in a population. The mathematical model is formulated in such a way that it takes into account the higher transmission rate of the new SARS-CoV-2 strain and the subpopulation of asymptomatic carriers. We find the basic reproduction number using the method of the next generation matrix. This threshold parameter is crucial since it indicates what parameters play an important role in the outcome of the COVID-19 pandemic. We study the local stability of the infection-free and endemic equilibrium states, which are potential outcomes of a pandemic. Moreover, by using a suitable Lyapunov functional and the LaSalle invariant principle, it is proved that if the basic reproduction number is less than unity, the infection-free equilibrium is globally asymptotically stable. Our study shows that the new more transmissible SARS-CoV-2 variant will prevail and the prevalence of the preexistent variant would decrease and eventually disappear. We perform numerical simulations to support the analytic results and to show some effects of a new more transmissible SARS-CoV-2 variant in a population.
Diffusion in Sephadex Gel Structures: Time Dependency Revealed by Multi-Sequence Acquisition over a Broad Diffusion Time Range
It has been increasingly reported that in biological tissues diffusion-weighted MRI signal attenuation deviates from mono-exponential decay, especially at high -values. A number of diffusion models have been proposed to characterize this non-Gaussian diffusion behavior. One of these models is the continuous-time random-walk (CTRW) model, which introduces two new parameters: a fractional order time derivative and a fractional order spatial derivative . These new parameters have been linked to intravoxel diffusion heterogeneities in time and space, respectively, and are believed to depend on diffusion times. Studies on this time dependency are limited, largely because the diffusion time cannot vary over a board range in a conventional spin-echo echo-planar imaging sequence due to the accompanying T2 decays. In this study, we investigated the time-dependency of the CTRW model in Sephadex gel phantoms across a broad diffusion time range by employing oscillating-gradient spin-echo, pulsed-gradient spin-echo, and pulsed-gradient stimulated echo sequences. We also performed Monte Carlo simulations to help understand our experimental results. It was observed that the diffusion process fell into the Gaussian regime at extremely short diffusion times whereas it exhibited a strong time dependency in the CTRW parameters at longer diffusion times.
Bijective Mapping Analysis to Extend the Theory of Functional Connections to Non-Rectangular 2-Dimensional Domains
This work presents an initial analysis of using bijective mappings to extend the Theory of Functional Connections to non-rectangular two-dimensional domains. Specifically, this manuscript proposes three different mappings techniques: (a) complex mapping, (b) the projection mapping, and (c) polynomial mapping. In that respect, an accurate least-squares approximated inverse mapping is also developed for those mappings with no closed-form inverse. Advantages and disadvantages of using these mappings are highlighted and a few examples are provided. Additionally, the paper shows how to replace boundary constraints expressed in terms of a piece-wise sequence of functions with a single function, which is compatible and required by the Theory of Functional Connections already developed for rectangular domains.
Efficient Methods for Parameter Estimation of Ordinary and Partial Differential Equation Models of Viral Hepatitis Kinetics
Parameter estimation in mathematical models that are based on differential equations is known to be of fundamental importance. For sophisticated models such as age-structured models that simulate biological agents, parameter estimation that addresses all cases of data points available presents a formidable challenge and efficiency considerations need to be employed in order for the method to become practical. In the case of age-structured models of viral hepatitis dynamics under antiviral treatment that deal with partial differential equations, a fully numerical parameter estimation method was developed that does not require an analytical approximation of the solution to the multiscale model equations, avoiding the necessity to derive the long-term approximation for each model. However, the method is considerably slow because of precision problems in estimating derivatives with respect to the parameters near their boundary values, making it almost impractical for general use. In order to overcome this limitation, two steps have been taken that significantly reduce the running time by orders of magnitude and thereby lead to a practical method. First, constrained optimization is used, letting the user add constraints relating to the boundary values of each parameter before the method is executed. Second, optimization is performed by derivative-free methods, eliminating the need to evaluate expensive numerical derivative approximations. The newly efficient methods that were developed as a result of the above approach are described for hepatitis C virus kinetic models during antiviral therapy. Illustrations are provided using a user-friendly simulator that incorporates the efficient methods for both the ordinary and partial differential equation models.
Within-Host Phenotypic Evolution and the Population-Level Control of Chronic Viral Infections by Treatment and Prophylaxis
Chronic viral infections can persist for decades spanning thousands of viral generations, leading to a highly diverse population of viruses with its own complex evolutionary history. We propose an expandable mathematical framework for understanding how the emergence of genetic and phenotypic diversity affects the population-level control of those infections by both non-curative treatment and chemo-prophylactic measures. Our frameworks allows both neutral and phenotypic evolution, and we consider the specific evolution of contagiousness, resistance to therapy, and efficacy of prophylaxis. We compute both the controlled and uncontrolled, population-level basic reproduction number accounting for the within-host evolutionary process where new phenotypes emerge and are lost in infected persons, which we also extend to include both treatment and prophylactic control efforts. We used these results to discuss the conditions under which the relative efficacy of prophylactic versus therapeutic methods of control are superior. Finally, we give expressions for the endemic equilibrium of these models for certain constrained versions of the within-host evolutionary model providing a potential method for estimating within-host evolutionary parameters from population-level genetic sequence data.
A Mathematical Model for early HBV and -HDV Kinetics during Anti-HDV Treatment
Hepatitis delta virus (HDV) is an infectious subviral agent that can only propagate in people infected with hepatitis B virus (HBV). HDV/HBV infection is considered to be the most severe form of chronic viral hepatitis. In this contribution, a mathematical model for the interplay between HDV and HBV under anti-HDV treatment is presented. Previous models were not designed to account for the observation that HBV rises when HDV declines with HDV-specific therapy. In the simple model presented here, HDV and HBV kinetics are coupled, giving rise to an improved viral kinetic model that simulates the early interplay of HDV and HBV during anti-HDV therapy.