HUMAN GENETICS

A genome-wide scan of non-coding RNAs and enhancers for refractive error and myopia
Tedja MS, Swierkowska-Janc J, Enthoven CA, Meester-Smoor MA, Hysi PG, Felix JF, Cowan CS, , Cherry TJ, van der Spek PJ, Ghanbari M, Erkeland SJ, Barakat TS, Klaver CCW and Verhoeven VJM
Refractive error (RE) and myopia are complex polygenic conditions with the majority of genome-wide associated genetic variants in non-exonic regions. Given this, and the onset during childhood, gene-regulation is expected to play an important role in its pathogenesis. This prompted us to explore beyond traditional gene finding approaches. We performed a genetic association study between variants in non-coding RNAs and enhancers, and RE and myopia. We obtained single-nucleotide polymorphisms (SNPs) in microRNA (miRNA) genes, miRNA-binding sites, long non-coding RNAs genes (lncRNAs) and enhancers from publicly available databases: miRNASNPv2, PolymiRTS, VISTA Enhancer Browser, FANTOM5 and lncRNASNP2. We investigated whether SNPs overlapping these elements were associated with RE and myopia leveraged from a large GWAS meta-analysis (N = 160,420). With genetic risk scores (GRSs) per element, we investigated the joint effect of associated variants on RE, axial length (AL)/corneal radius (CR), and AL progression in an independent child cohort, the Generation R Study (N = 3638 children). We constructed a score for biological plausibility per SNP in highly confident miRNA-binding sites and enhancers in chromatin accessible regions. We found that SNPs in two miRNA genes, 14 enhancers and 81 lncRNA genes in chromatin accessible regions and 54 highly confident miRNA-binding sites, were in RE and myopia-associated loci. GRSs from SNPs in enhancers were significantly associated with RE, AL/CR and AL progression. GRSs from lncRNAs were significantly associated with all AL/CR and AL progression. GRSs from miRNAs were not associated with any ocular biometric measurement. GRSs from miRNA-binding sites showed suggestive but inconsistent significance. We prioritized candidate miRNA binding sites and candidate enhancers for future functional validation. Pathways of target and host genes of highly ranked variants included eye development (BMP4, MPPED2), neurogenesis (DDIT4, NTM), extracellular matrix (ANTXR2, BMP3), photoreceptor metabolism (DNAJB12), photoreceptor morphogenesis (CHDR1), neural signaling (VIPR2) and TGF-beta signaling (ANAPC16). This is the first large-scale study of non-coding RNAs and enhancers for RE and myopia. Enhancers and lncRNAs could be of large importance as they are associated with childhood myopia. We provide a confident blueprint for future functional validation by prioritizing candidate miRNA binding sites and candidate enhancers.
Human organoids for rapid validation of gene variants linked to cochlear malformations
Zafeer MF, Ramzan M, Duman D, Mutlu A, Seyhan S, Kalcioglu MT, Fitoz S, DeRosa BA, Guo S, Dykxhoorn DM and Tekin M
Developmental anomalies of the hearing organ, the cochlea, are diagnosed in approximately one-fourth of individuals with congenital. The majority of patients with cochlear malformations remain etiologically undiagnosed due to insufficient knowledge about underlying genes or the inability to make conclusive interpretations of identified genetic variants. We used exome sequencing for the genetic evaluation of hearing loss associated with cochlear malformations in three probands from unrelated families deafness. We subsequently generated monoclonal induced pluripotent stem cell (iPSC) lines, bearing patient-specific knockins and knockouts using CRISPR/Cas9 to assess pathogenicity of candidate variants. We detected FGF3 (p.Arg165Gly) and GREB1L (p.Cys186Arg), variants of uncertain significance in two recognized genes for deafness, and PBXIP1(p.Trp574*) in a candidate gene. Upon differentiation of iPSCs towards inner ear organoids, we observed developmental aberrations in knockout lines compared to their isogenic controls. Patient-specific single nucleotide variants (SNVs) showed similar abnormalities as the knockout lines, functionally supporting their causality in the observed phenotype. Therefore, we present human inner ear organoids as a potential tool to validate the pathogenicity of DNA variants associated with cochlear malformations.
Genetic landscape in undiagnosed patients with syndromic hearing loss revealed by whole exome sequencing and phenotype similarity search
Mutai H, Miya F, Nara K, Yamamoto N, Inoue S, Murakami H, Namba K, Shitara H, Minami S, Nakano A, Arimoto Y, Morimoto N, Kawasaki T, Wasano K, Fujioka M, Uchida Y, Kaga K, Yamazawa K, Kikkawa Y, Kosaki K, Tsunoda T and Matsunaga T
There are hundreds of rare syndromic diseases involving hearing loss, many of which are not targeted for clinical genetic testing. We systematically explored the genetic causes of undiagnosed syndromic hearing loss using a combination of whole exome sequencing (WES) and a phenotype similarity search system called PubCaseFinder. Fifty-five families with syndromic hearing loss of unknown cause were analyzed using WES after prescreening of several deafness genes depending on patient clinical features. Causative genes were identified in 22 families, including both established genes associated with syndromic hearing loss (PTPN11, CHD7, KARS1, OPA1, DLX5, MITF, SOX10, MYO7A, and USH2A) and those associated with nonsyndromic hearing loss (STRC, EYA4, and KCNQ4). Association of a DLX5 variant with incomplete partition type I (IP-I) anomaly of the inner ear was identified in a patient with cleft lip and palate and acetabular dysplasia. The study identified COL1A1, CFAP52, and NSD1 as causative genes through phenotype similarity search or by analogy. ZBTB10 was proposed as a novel candidate gene for syndromic hearing loss with IP-I. A mouse model with homozygous Zbtb10 frameshift variant resulted in embryonic lethality, suggesting the importance of this gene for early embryonic development. Our data highlight a wide spectrum of rare causative genes in patients with syndromic hearing loss, and demonstrate that WES analysis combined with phenotype similarity search is a valuable approach for clinical genetic testing of undiagnosed disease.
CAGI6 ID panel challenge: assessment of phenotype and variant predictions in 415 children with neurodevelopmental disorders (NDDs)
Aspromonte MC, Del Conte A, Zhu S, Tan W, Shen Y, Zhang Y, Li Q, Wang MH, Babbi G, Bovo S, Martelli PL, Casadio R, Althagafi A, Toonsi S, Kulmanov M, Hoehndorf R, Katsonis P, Williams A, Lichtarge O, Xian S, Surento W, Pejaver V, Mooney SD, Sunderam U, Srinivasan R, Murgia A, Piovesan D, Tosatto SCE and Leonardi E
The Genetics of Neurodevelopmental Disorders Lab in Padua provided a new intellectual disability (ID) Panel challenge for computational methods to predict patient phenotypes and their causal variants in the context of the Critical Assessment of the Genome Interpretation, 6th edition (CAGI6). Eight research teams submitted a total of 30 models to predict phenotypes based on the sequences of 74 genes (VCF format) in 415 pediatric patients affected by Neurodevelopmental Disorders (NDDs). NDDs are clinically and genetically heterogeneous conditions, with onset in infant age. Here, we assess the ability and accuracy of computational methods to predict comorbid phenotypes based on clinical features described in each patient and their causal variants. We also evaluated predictions for possible genetic causes in patients without a clear genetic diagnosis. Like the previous ID Panel challenge in CAGI5, seven clinical features (ID, ASD, ataxia, epilepsy, microcephaly, macrocephaly, hypotonia), and variants (Pathogenic/Likely Pathogenic, Variants of Uncertain Significance and Risk Factors) were provided. The phenotypic traits and variant data of 150 patients from the CAGI5 ID Panel Challenge were provided as training set for predictors. The CAGI6 challenge confirms CAGI5 results that predicting phenotypes from gene panel data is highly challenging, with AUC values close to random, and no method able to predict relevant variants with both high accuracy and precision. However, a significant improvement is noted for the best method, with recall increasing from 66% to 82%. Several groups also successfully predicted difficult-to-detect variants, emphasizing the importance of variants initially excluded by the Padua NDD Lab.
Integrating transcriptomic and polygenic risk scores to enhance predictive accuracy for ischemic stroke subtypes
Cai X, Li H, Cao X, Ma X, Zhu W, Xu L, Yang S, Yu R and Huang P
Ischemic stroke (IS), characterized by complex etiological diversity, is a significant global health challenge. Recent advancements in genome-wide association studies (GWAS) and transcriptomic profiling offer promising avenues for enhanced risk prediction and understanding of disease mechanisms. GWAS summary statistics from the GIGASTROKE Consortium and genetic and phenotypic data from the UK Biobank (UKB) were used. Transcriptome-Wide Association Studies (TWAS) were conducted using FUSION to identify genes associated with IS and its subtypes across eight tissues. Colocalization analysis identified shared genetic variants influencing both gene expression and disease risk. Sum Transcriptome-Polygenic Risk Scores (STPRS) models were constructed by combining polygenic risk scores (PRS) and polygenic transcriptome risk scores (PTRS) using logistic regression. The predictive performance of STPRS was evaluated using the area under the curve (AUC). A Phenome-wide association study (PheWAS) explored associations between STPRS and various phenotypes. TWAS identified 34 susceptibility genes associated with IS and its subtypes. Colocalization analysis revealed 18 genes with a posterior probability (PP) H4 > 75% for joint expression quantitative trait loci (eQTL) and GWAS associations, highlighting their genetic relevance. The STPRS models demonstrated superior predictive accuracy compared to conventional PRS, showing significant associations with numerous UKB phenotypes, including atrial fibrillation and blood pressure. Integrating transcriptomic data with polygenic risk scores through STPRS enhances predictive accuracy for IS and its subtypes. This approach refines our understanding of the genetic and molecular landscape of stroke and paves the way for tailored preventive and therapeutic strategies.
An augmented transformer model trained on protein family specific variant data leads to improved prediction of variants of uncertain significance
Joshi D, Pradhan S, Sajeed R, Srinivasan R and Rana S
Variants of uncertain significance (VUS) represent variants that lack sufficient evidence to be confidently associated with a disease, thus posing a challenge in the interpretation of genetic testing results. Here we report an improved method for predicting the VUS of Arylsulfatase A (ARSA) gene as part of the Critical Assessment of Genome Interpretation challenge (CAGI6). Our method uses a transfer learning approach that leverages a pre-trained protein language model to predict the impact of mutations on the activity of the ARSA enzyme, whose deficiency is known to cause a rare genetic disorder, metachromatic leukodystrophy. Our innovative framework combines zero-shot log odds scores and embeddings from the ESM, an evolutionary scale model as features for training a supervised model on gene variants functionally related to the ARSA gene. The zero-shot log odds score feature captures the generic properties of the proteins learned due to its pre-training on millions of sequences in the UniProt data, while the ESM embeddings for the proteins in the ARSA family capture features specific to the family. We also tested our approach on another enzyme, N-acetyl-glucosaminidase (NAGLU), that belongs to the same superfamily as ARSA. Our results demonstrate that the performance of our family models (augmented ESM models) is either comparable or better than the ESM models. The ARSA model compares favorably with the majority of state-of-the-art predictors on area under precision and recall curve (AUPRC) performance metric. However, the NAGLU model outperforms all pathogenicity predictors evaluated in this study on AUPRC metric. The improved AUPRC has relevance in a diagnostic setting where variant prioritization generally entails identifying a small number of pathogenic variants from a larger number of benign variants. Our results also indicate that genes that have sparse or no experimental variant impact data, the family variant data can serve as a proxy training data for making accurate predictions. Attention analysis of active sites and binding sites in ARSA and NAGLU proteins shed light on probable mechanisms of pathogenicity for positions that are highly attended.
Integrative analysis of transcriptome and proteome wide association studies prioritized functional genes for obesity
Zhao QG, Ma XL, Xu Q, Song ZT, Bu F, Li K, Han BX, Yan SS, Zhang L, Luo Y and Pei YF
Genome-wide association studies have identified dozens of genomic loci for obesity. However, functional genes and their detailed genetic mechanisms underlying these loci are mainly unknown. In this study, we conducted an integrative study to prioritize plausibly functional genes by combining information from genome-, transcriptome- and proteome-wide association analyses.
Conventional and genetic association between migraine and stroke with druggable genome-wide Mendelian randomization
Wang X, Pang W, Hu X, Shu T, Luo Y, Li J, Feng L, Qiu K, Rao Y, Song Y, Mao M, Zhang Y, Ren J and Zhao Y
The genetic relationship between migraine and stroke remains underexplored, particularly in the context of druggable targets. Previous studies have been limited by small sample sizes and a lack of focus on genetic-targeted therapies for these conditions. We analyzed the association and causality between migraine and stroke using multivariable logistic regression in the UK Biobank cohort and Mendelian randomization (MR) analyses based on genome-wide association study (GWAS) data. Integrating expression quantitative trait loci (eQTLs) data from blood and brain regions, we explored the phenotypic and genetic links between migraine medications, drug target, and stroke. Additionally, we explored novel druggable genes for migraine and evaluated their effects on migraine signaling molecules and stroke risk. Migraine was significantly associated with stroke, particularly ischemic stroke (IS) and intracerebral hemorrhage (ICH), with MR analysis confirming a causal link to ICH. HTR1A emerged as a potential link between antidepressants (preventive medications for migraine) and stroke. We identified 17 migraine-related druggable genes, with 5 genes (HMGCR, TGFB1, TGFB3, KCNK5, IMPDH2) associated with nine existing drugs. Further MR analysis identified correlation of CELSR3 and IMPDH2 with cGMP pathway marker PRKG1, and identified KCNK5, PLXNB1, and MDK as novel migraine-associated druggable genes significantly linked to the stroke risks. These findings established the phenotypic and genetic link between migraine, its medication and stroke, identifying potential targets for single and dual-purpose therapies for migraine and stoke, and emphasized the need for further research to validate these associations.
Further evidence of biallelic NAV3 variants associated with recessive neurodevelopmental disorder with dysmorphism, developmental delay, intellectual disability, and behavioral abnormalities
Kakar N, Mascarenhas S, Ali A, Azmatullah , Ijlal Haider SM, Badiger VA, Ghofrani MS, Kruse N, Hashmi SN, Pozojevic J, Balachandran S, Toft M, Malik S, Händler K, Fatima A, Iqbal Z, Shukla A, Spielmann M and Radhakrishnan P
Neuron navigators (NAVs) are cytoskeleton-associated proteins well known for their role in axonal guidance, neuronal migration, and neurite growth necessary for neurodevelopment. Neuron navigator 3 (NAV3) is one of the three NAV proteins highly expressed in the embryonic and adult brain. However, the role of the NAV3 gene in human disease is not well-studied. Recently, five bi-allelic and three mono-allelic variants in NAV3 were reported in 12 individuals from eight unrelated families with neurodevelopmental disorder (NDD). Here, we report five patients from three unrelated consanguineous families segregating autosomal recessive NDD. Patients have symptoms of dysmorphism, intellectual disability, developmental delay, and behavioral abnormalities. Exome sequencing (ES) was performed on two affected individuals from one large family, and one affected individual from each of the other two families. ES revealed two homozygous nonsense c.6325C > T; p.(Gln2109Ter) and c.6577C > T; p.(Arg2193Ter) and a homozygous splice site (c.243 + 1G > T) variants in the NAV3 (NM_001024383.2). Analysis of single-cell sequencing datasets from embryonic and young adult human brains revealed that NAV3 is highly expressed in the excitatory neurons, inhibitory neurons, and microglia, consistent with its role in neurodevelopment. In conclusion, in this study, we further validate biallelic protein truncating variants in NAV3 as a cause of NDD, expanding the spectrum of pathogenic variants in this newly discovered NDD gene.
Biallelic germline DDX41 variants in a patient with bone dysplasia, ichthyosis, and dysmorphic features
Sharma P, McFadden JR, Frost FG, Markello TC, Grange DK, Introne WJ, Gahl WA and Malicdan MCV
DDX41 (DEAD‑box helicase 41) is a member of the largest family of RNA helicases. The DEAD-box RNA helicases share a highly conserved core structure and regulate all aspects of RNA metabolism. The functional role of DDX41 in innate immunity is also highly conserved. DDX41 acts as a sensor of viral DNA and activates the STING-TBK1-IRF3-type I IFN signaling pathway. Germline heterozygous variants in DDX41 have been reported in familial myelodysplasia syndrome (MDS)/acute myeloid leukemia (AML) patients; most patients also acquired a somatic variant in the second DDX41 allele. Here, we report a patient who inherited compound heterozygous DDX41 variants and presented with bone dysplasia, ichthyosis, and dysmorphic features. Functional analyses of the patient-derived dermal fibroblasts revealed a reduced abundance of DDX41 and abrogated activation of the IFN genes through the STING-type I interferon pathway. Genome-wide transcriptome analyses in the patient's fibroblasts revealed significant gene dysregulation and changes in the RNA splicing events. The patient's fibroblasts also displayed upregulation of periostin mRNA expression. Using an RNA binding protein assay, we identified DDX41 as a novel regulator of periostin expression. Our results suggest that functional impairment of DDX41, along with dysregulated periostin expression, likely contributes to this patient's multisystem disorder.
The MorbidGenes panel: a monthly updated list of diagnostically relevant rare disease genes derived from diverse sources
Jauss RT, Popp B, Bachmann J, Abou Jamra R and Platzer K
With exome sequencing now standard, diagnostic labs are in need of a, in principle, to-the-day-accurate list of genes associated with rare diseases. Manual curation efforts are slow and often disease specific, while efforts relying on single sources are too inaccurate and may result in false-positive or false-negative genes.
Polymorphic pseudogenes in the human genome - a comprehensive assessment
Lopes-Marques M, Peixoto MJ, Cooper DN, Prata MJ, Azevedo L and Castro LFC
Over the past decade, variations of the coding portion of the human genome have become increasingly evident. In this study, we focus on polymorphic pseudogenes, a unique and relatively unexplored type of pseudogene whose inactivating mutations have not yet been fixed in the human genome at the global population level. Thus, polymorphic pseudogenes are characterized by the presence in the population of both coding alleles and non-coding alleles originating from Loss-of-Function (LoF) mutations. These alleles can be found both in heterozygosity and in homozygosity in different human populations and thus represent pseudogenes that have not yet been fixed in the population.
Germline copy number variants and endometrial cancer risk
Stylianou CE, Wiggins GAR, Lau VL, Dennis J, Shelling AN, Wilson M, Sykes P, Amant F, Annibali D, De Wispelaere W, Easton DF, Fasching PA, Glubb DM, Goode EL, Lambrechts D, Pharoah PDP, Scott RJ, Tham E, Tomlinson I, Bolla MK, Couch FJ, Czene K, Dörk T, Dunning AM, Fletcher O, García-Closas M, Hoppe R, , Jernström H, Kaaks R, Michailidou K, Obi N, Southey MC, Stone J, Wang Q, Spurdle AB, O'Mara TA, Pearson J and Walker LC
Known risk loci for endometrial cancer explain approximately one third of familial endometrial cancer. However, the association of germline copy number variants (CNVs) with endometrial cancer risk remains relatively unknown. We conducted a genome-wide analysis of rare CNVs overlapping gene regions in 4115 endometrial cancer cases and 17,818 controls to identify functionally relevant variants associated with disease. We identified a 1.22-fold greater number of CNVs in DNA samples from cases compared to DNA samples from controls (p = 4.4 × 10). Under three models of putative CNV impact (deletion, duplication, and loss of function), genome-wide association studies identified 141 candidate gene loci associated (p < 0.01) with endometrial cancer risk. Pathway analysis of the candidate loci revealed an enrichment of genes involved in the 16p11.2 proximal deletion syndrome, driven by a large recurrent deletion (chr16:29,595,483-30,159,693) identified in 0.15% of endometrial cancer cases and 0.02% of control participants. Together, these data provide evidence that rare copy number variants have a role in endometrial cancer susceptibility and that the proximal 16p11.2 BP4-BP5 region contains 25 candidate risk gene(s) that warrant further analysis to better understand their role in human disease.
Interpreting the actionable clinical role of rare variants associated with short QT syndrome
Martínez-Barrios E, Greco A, Cruzalegui J, Cesar S, Díez-Escuté N, Cerralbo P, Chipa F, Zschaeck I, Slanovic L, Mangas A, Toro R, Brugada J, Sarquella-Brugada G and Campuzano O
Genetic testing is recommended in the diagnosis of short QT syndrome. This rare inherited lethal entity is characterized by structural normal hearts with short QT intervals in the electrocardiogram. Few families diagnosed with this arrhythmogenic disease have been reported worldwide so far, impeding a comprehensive understanding of this syndrome. Unraveling the origin of the disease helps to the early identification of genetic carriers at risk. However, only rare variants with a definite deleterious role should be actionable in clinical practice. Our aim was to perform a comprehensive update and reinterpretation, according to the American College of Medical Genetics and Genomics recommendations of all rare variants currently associated with short QT syndrome. We identified 34 rare variants. Reanalysis showed that only nine variants played a deleterious role associated with a definite short QT syndrome phenotype. These variants were located in the four main genes: KCNQ1, KCNH2, KCNJ2 or SLC4A3. Additional rare variants located in other genes were associated with other conditions with phenotypic shortened QT intervals, but not definite diagnosis of short QT syndrome. Periodically updating of rare variants, especially those previously classified as unknown, helps to clarify the role of rare variants and translate genetic data into clinical practice.
Advancements and limitations in polygenic risk score methods for genomic prediction: a scoping review
Jayasinghe D, Eshetie S, Beckmann K, Benyamin B and Lee SH
This scoping review aims to identify and evaluate the landscape of Polygenic Risk Score (PRS)-based methods for genomic prediction from 2013 to 2023, highlighting their advancements, key concepts, and existing gaps in knowledge, research, and technology. Over the past decade, various PRS-based methods have emerged, each employing different statistical frameworks aimed at enhancing prediction accuracy, processing speed and memory efficiency. Despite notable advancements, challenges persist, including unrealistic assumptions regarding sample sizes and the polygenicity of traits necessary for accurate predictions, as well as limitations in exploring hyper-parameter spaces and considering environmental interactions. We included studies focusing on PRS-based methods for risk prediction that underwent methodological evaluations using valid approaches and released computational tools/software. Additionally, we restricted our selection to studies involving human participants that were published in English language. This review followed the standard protocol recommended by Joanna Briggs Institute Reviewer's Manual, systematically searching Ovid MEDLINE, Ovid Embase, Scopus and Web of Science databases. Additionally, searches included grey literature sources like pre-print servers such as bioRxiv, and articles recommended by experts to ensure comprehensive and diverse coverage of relevant records. This study identified 34 studies detailing 37 genomic prediction methods, the majority of which rely on linkage disequilibrium (LD) information and necessitate hyper-parameter tuning. Nine methods integrate functional/gene annotation, while 12 are suitable for cross-ancestry genomic prediction, with only one considering gene-environment (GxE) interaction. While some methods require individual-level data, most leverage summary statistics, offering flexibility. Despite progress, challenges remain. These include computational complexity and the need for large sample sizes for high prediction accuracy. Furthermore, recent methods exhibit varying effectiveness across traits, with absolute accuracies often falling short of clinical utility. Transferability across ancestries varies, influenced by trait heritability and diversity of training data, while handling admixed populations remains challenging. Additionally, the absence of standard error measurements for individual PRSs, crucial in clinical settings, underscores a critical gap. Another issue is the lack of customizable graphical visualization tools among current software packages. While genomic prediction methods have advanced significantly, there is still room for improvement. Addressing current challenges and embracing future research directions will lead to the development of more universally applicable, robust, and clinically relevant genomic prediction tools.
Genetic analysis of preaxial polydactyly: identification of novel variants and the role of ZRS duplications in a Chinese cohort of 102 cases
Pu S, Wang Z, Tang X, Wang D, Yang X, Jiang J, Deng Y, Xiang B, Yang J, Wang X, Guo X, Sun M, Wang B and Chen J
Preaxial polydactyly (PPD) is a congenital limb malformation, previously reported to be caused primarily by variants in the ZRS and upstream preZRS regions. This study investigated genetic variations associated with PPD, focusing on point variants and copy number variations (CNVs) in the ZRS and preZRS regions. Comprehensive genetic analyses were conducted on 102 patients with PPD, including detailed clinical examinations and Sanger sequencing of the ZRS and preZRS regions. Additionally, real-time quantitative PCR (qPCR) was used to detect CNVs in the ZRS region. The evolutionary conservation and population frequencies of identified variants were also evaluated. Six point variants were identified, among which four are likely pathogenic novel variants: 93G > T (g.156584477G > T), 106G > A (g.156584464G > A), 278G > A (g.156584292G > A), and 409A > C (g.156585378A > C). Additionally, qPCR analysis revealed that 66.67% of patients exhibited ZRS duplications. Notably, these duplications were also present in cases with newly identified potential pathogenic point variants. These findings suggest the possible interaction of point variants in ZRS and preZRS through a common pathogenic mechanism, leading jointly to PPD. The findings expand the variant spectrum associated with non-syndromic polydactyly and highlight that, despite different classifications, anterior polydactyly caused by variants in ZRS and nearby regions may share common pathogenic mechanisms. The incorporation of various variant types in genetic screening can effectively enhance the rate of pathogenic variant detection and contribute to the cost-effectiveness of genetic testing for limb developmental defects, thereby promoting healthy births.
Assessing the predicted impact of single amino acid substitutions in calmodulin for CAGI6 challenges
Turina P, Dal Cortivo G, Enriquez Sandoval CA, Alexov E, Ascher DB, Babbi G, Bakolitsa C, Casadio R, Fariselli P, Folkman L, Kamandula A, Katsonis P, Li D, Lichtarge O, Martelli PL, Panday SK, Pires DEV, Portelli S, Pucci F, Rodrigues CHM, Rooman M, Savojardo C, Schwersensky M, Shen Y, Strokach AV, Sun Y, Woo J, Radivojac P, Brenner SE, Dell'Orco D and Capriotti E
Recent thermodynamic and functional studies have been conducted to evaluate the impact of amino acid substitutions on Calmodulin (CaM). The Critical Assessment of Genome Interpretation (CAGI) data provider at University of Verona (Italy) measured the melting temperature (T) and the percentage of unfolding (%unfold) of a set of CaM variants (CaM challenge dataset). Thermodynamic measurements for the equilibrium unfolding of CaM were obtained by monitoring far-UV Circular Dichroism as a function of temperature. These measurements were used to determine the T and the percentage of protein remaining unfolded at the highest temperature. The CaM challenge dataset, comprising a total of 15 single amino acid substitutions, was used to evaluate the effectiveness of computational methods in predicting the T and unfolding percentages associated with the variants, and categorizing them as destabilizing or not. For the sixth edition of CAGI, nine independent research groups from four continents (Asia, Australia, Europe, and North America) submitted over 52 sets of predictions, derived from various approaches. In this manuscript, we summarize the results of our assessment to highlight the potential limitations of current algorithms and provide insights into the future development of more accurate prediction tools. By evaluating the thermodynamic stability of CaM variants, this study aims to enhance our understanding of the relationship between amino acid substitutions and protein stability, ultimately contributing to more accurate predictions of the effects of genetic variants.
Homozygosity for a hypomorphic mutation in frizzled class receptor 5 causes syndromic ocular coloboma with microcornea in humans
Cortés-González V, Rodriguez-Morales M, Ataliotis P, Mayer C, Plaisancié J, Chassaing N, Lee H, Rozet JM, Cavodeassi F and Fares Taie L
Ocular coloboma (OC) is a congenital disorder caused by the incomplete closure of the embryonic ocular fissure. OC can present as a simple anomaly or, in more complex forms, be associated with additional ocular abnormalities. It can occur in isolation or as part of a broader syndrome, exhibiting considerable genetic heterogeneity. Diagnostic yield for OC remains below 30%, indicating the need for further genetic exploration. Mutations in the Wnt receptor FZD5, which is expressed throughout eye development, have been linked to both isolated and complex forms of coloboma. These mutations often result in a dominant-negative effect, where the mutated FZD5 protein disrupts WNT signaling by sequestering WNT ligands. Here, we describe a case of syndromic bilateral OC with additional features such as microcornea, bone developmental anomalies, and mild intellectual disability. Whole exome sequencing revealed a homozygous rare missense variant in FZD5. Consistent with a loss-of-function effect, overexpressing of fzd5 mRNA harboring the missense variant in zebrafish embryos does not influence embryonic development, whereas overexpression of wild-type fzd5 mRNA results in body axis duplications. However, in vitro TOPFlash assays revealed that the missense variant only caused partial loss-of-function, behaving as a hypomorphic mutation. We further showed that the mutant protein still localized to the cell membrane and maintained proper conformation when modeled in silico, suggesting that the impairment lies in signal transduction. This hypothesis is further supported by the fact that the variant affects a highly conserved amino acid known to be crucial for protein-protein interactions.
Exome variant prioritization in a large cohort of hearing-impaired individuals indicates IKZF2 to be associated with non-syndromic hearing loss and guides future research of unsolved cases
Velde HM, Vaseghi-Shanjani M, Smits JJ, Ramakrishnan G, Oostrik J, Wesdorp M, Astuti G, Yntema HG, Hoefsloot L, Lanting CP, Huynen MA, Lehman A, Turvey SE, , Pennings RJE and Kremer H
Although more than 140 genes have been associated with non-syndromic hereditary hearing loss (HL), at least half of the cases remain unexplained in medical genetic testing. One reason is that pathogenic variants are located in 'novel' deafness genes. A variant prioritization approach was used to identify novel (candidate) genes for HL. Exome-wide sequencing data were assessed for subjects with presumed hereditary HL that remained unexplained in medical genetic testing by gene-panel analysis. Cases in group AD had presumed autosomal dominantly inherited HL (n = 124), and in group AR, presumed autosomal recessive HL (n = 337). Variants in known and candidate deafness genes were prioritized based on allele frequencies and predicted effects. Selected variants were tested for their co-segregation with HL. Two cases were solved by variants in recently identified deafness genes (ABHD12, TRRAP). Variant prioritization also revealed potentially causative variants in candidate genes associated with recessive and X-linked HL. Importantly, missense variants in IKZF2 were found to co-segregate with dominantly inherited non-syndromic HL in three families. These variants specifically affected Zn-coordinating cysteine or histidine residues of the zinc finger motifs 2 and 3 of the encoded protein Helios. This finding indicates a complex genotype-phenotype correlation for IKZF2 defects, as this gene was previously associated with non-syndromic dysfunction of the immune system and ICHAD syndrome, including HL. The designed strategy for variant prioritization revealed that IKZF2 variants can underlie non-syndromic HL. The large number of candidate genes for HL and variants therein stress the importance of inclusion of family members for variant prioritization.
Age-dependent somatic expansion of the ATXN3 CAG repeat in the blood and buccal swab DNA of individuals with spinocerebellar ataxia type 3/Machado-Joseph disease
Sidky AM, Melo ARV, Kay TT, Raposo M, Lima M and Monckton DG
Spinocerebellar ataxia type 3/Machado-Joseph disease (SCA3/MJD) is caused by the expansion of a genetically unstable polyglutamine-encoding CAG repeat in ATXN3. Longer alleles are generally associated with earlier onset and frequent intergenerational expansions mediate the anticipation observed in this disorder. Somatic expansion of the repeat has also been implicated in disease onset and slowing the rate of somatic expansion has been proposed as a therapeutic strategy. Here, we utilised high-throughput ultra-deep MiSeq amplicon sequencing to precisely define the number and sequence of the ATXN3 repeat, the genotype of an adjacent single nucleotide variant and quantify somatic expansion in blood and buccal swab DNA of a cohort of individuals with SCA3 from the Azores islands (Portugal). We revealed systematic mis-sizing of the ATXN3 repeat and high levels of inaccuracy of the traditional fragment length analysis that have important implications for attempts to identify modifiers of clinical and molecular phenotypes. Quantification of somatic expansion in blood DNA and multivariate regression revealed the expected effects of age at sampling and CAG repeat length, although the effect of repeat length was surprisingly modest with much stronger associations with age. We also observed an association of the downstream rs12895357 single nucleotide variant with the rate of somatic expansion, and a higher level of somatic expansion in buccal swab DNA compared to blood. These data suggest that the ATXN3 locus in SCA3 patients in blood or buccal swab DNA might serve as a good biomarker for clinical trials testing suppressors of somatic expansion with peripheral exposure.
Methodologies underpinning polygenic risk scores estimation: a comprehensive overview
Ndong Sima CAA, Step K, Swart Y, Schurz H, Uren C and Möller M
Polygenic risk scores (PRS) have emerged as a promising tool for predicting disease risk and treatment outcomes using genomic data. Thousands of genome-wide association studies (GWAS), primarily involving populations of European ancestry, have supported the development of PRS models. However, these models have not been adequately evaluated in non-European populations, raising concerns about their clinical validity and predictive power across diverse groups. Addressing this issue requires developing novel risk prediction frameworks that leverage genetic characteristics across diverse populations, considering host-microbiome interactions and a broad range of health measures. One of the key aspects in evaluating PRS is understanding the strengths and limitations of various methods for constructing them. In this review, we analyze strengths and limitations of different methods for constructing PRS, including traditional weighted approaches and new methods such as Bayesian and Frequentist penalized regression approaches. Finally, we summarize recent advances in PRS calculation methods development, and highlight key areas for future research, including development of models robust across diverse populations by underlining the complex interplay between genetic variants across diverse ancestral backgrounds in disease risk as well as treatment response prediction. PRS hold great promise for improving disease risk prediction and personalized medicine; therefore, their implementation must be guided by careful consideration of their limitations, biases, and ethical implications to ensure that they are used in a fair, equitable, and responsible manner.