How a Medical Student Found Himself in a Human Genome Free for All
In this short memoir, I recount the series of improbable interactions and events that led me from medical school to a leadership role in the Human Genome Project.
Deep Learning Sequence Models for Transcriptional Regulation
Deciphering the regulatory code of gene expression and interpreting the transcriptional effects of genome variation are critical challenges in human genetics. Modern experimental technologies have resulted in an abundance of data, enabling the development of sequence-based deep learning models that link patterns embedded in DNA to the biochemical and regulatory properties contributing to transcriptional regulation, including modeling epigenetic marks, 3D genome organization, and gene expression, with tissue and cell-type specificity. Such methods can predict the functional consequences of any noncoding variant in the human genome, even rare or never-before-observed variants, and systematically characterize their consequences beyond what is tractable from experiments or quantitative genetics studies alone. Recently, the development and application of interpretability approaches have led to the identification of key sequence patterns contributing to the predicted tasks, providing insights into the underlying biological mechanisms learned and revealing opportunities for improvement in future models.
Polygenic Risk Scores Driving Clinical Change in Glaucoma
Glaucoma is a clinically heterogeneous disease and the world's leading cause of irreversible blindness. Therapeutic intervention can prevent blindness but relies on early diagnosis, and current clinical risk factors are limited in their ability to predict who will develop sight-threatening glaucoma. The high heritability of glaucoma makes it an ideal substrate for genetic risk prediction, with the bulk of risk being polygenic in nature. Here, we summarize the foundations of glaucoma genetic risk, the development of polygenic risk prediction instruments, and emerging opportunities for genetic risk stratification. Although challenges remain, genetic risk stratification will significantly improve glaucoma screening and management.
Causes and Consequences of Varying Transposable Element Activity: An Evolutionary Perspective
Transposable elements (TEs) are genomic parasites found in nearly all eukaryotes, including humans. This evolutionary success of TEs is due to their replicative activity, involving insertion into new genomic locations. TE activity varies at multiple levels, from between taxa to within individuals. The rapidly accumulating evidence of the influence of TE activity on human health, as well as the rapid growth of new tools to study it, motivated an evaluation of what we know about TE activity thus far. Here, we discuss why TE activity varies, and the consequences of this variation, from an evolutionary perspective. By studying TE activity in nonhuman organisms in the context of evolutionary theories, we can shed light on the factors that affect TE activity. While the consequences of TE activity are usually deleterious, some have lasting evolutionary impacts by conferring benefits on the host or affecting other evolutionary processes.
Benefit-Sharing by Design: A Call to Action for Human Genomics Research
The ethical standards for the responsible conduct of human research have come a long way; however, concerns surrounding equity remain in human genetics and genomics research. Addressing these concerns will help society realize the full potential of human genomics research. One outstanding concern is the fair and equitable sharing of benefits from research on human participants. Several international bodies have recognized that benefit-sharing can be an effective tool for ethical research conduct, but international laws, including the Convention on Biological Diversity and its Nagoya Protocol on Access and Benefit-Sharing, explicitly exclude human genetic and genomic resources. These agreements face significant challenges that must be considered and anticipated if similar principles are applied in human genomics research. We propose that benefit-sharing from human genomics research can be a bottom-up effort and embedded into the existing research process. We propose the development of a "benefit-sharing by design" framework to address concerns of fairness and equity in the use of human genomic resources and samples and to learn from the aspirations and decade of implementation of the Nagoya Protocol.
Clinical and Therapeutic Implications of Clonal Hematopoiesis
Clonal hematopoiesis (CH) is an age-related process whereby hematopoietic stem and progenitor cells (HSPCs) acquire mutations that lead to a proliferative advantage and clonal expansion. The most commonly mutated genes are epigenetic regulators, DNA damage response genes, and splicing factors, which are essential to maintain functional HSPCs and are frequently involved in the development of hematologic malignancies. Established risk factors for CH, including age, prior cytotoxic therapy, and smoking, increase the risk of acquiring CH and/or may increase CH fitness. CH has emerged as a novel risk factor in many age-related diseases, such as hematologic malignancies, cardiovascular disease, diabetes, and autoimmune disorders, among others. Future characterization of the mechanisms driving CH evolution will be critical to develop preventative and therapeutic approaches.
-Related Disorders: From Disease Mechanism to Evidence-Based Treatments
Recent advances in genetic sequencing are transforming our approach to rare-disease care. Initially identified in cancer, gain-of-function mutations of the gene are also detected in malformation mosaic diseases categorized as -related disorders (PRDs). Over the past decade, new approaches have enabled researchers to elucidate the pathophysiology of PRDs and uncover novel therapeutic options. In just a few years, owing to vigorous global research efforts, PRDs have been transformed from incurable diseases to chronic disorders accessible to targeted therapy. However, new challenges for both medical practitioners and researchers have emerged. Areas of uncertainty remain in our comprehension of PRDs, especially regarding the relationship between genotype and phenotype, the mechanisms underlying mosaicism, and the processes involved in intercellular communication. As the clinical and biological landscape of PRDs is constantly evolving, this review aims to summarize current knowledge regarding and its role in nonmalignant human disease, from molecular mechanisms to evidence-based treatments.
Population Diversity at the Single-Cell Level
Population-scale single-cell genomics is a transformative approach for unraveling the intricate links between genetic and cellular variation. This approach is facilitated by cutting-edge experimental methodologies, including the development of high-throughput single-cell multiomics and advances in multiplexed environmental and genetic perturbations. Examining the effects of natural or synthetic genetic variants across cellular contexts provides insights into the mutual influence of genetics and the environment in shaping cellular heterogeneity. The development of computational methodologies further enables detailed quantitative analysis of molecular variation, offering an opportunity to examine the respective roles of stochastic, intercellular, and interindividual variation. Future opportunities lie in leveraging long-read sequencing, refining disease-relevant cellular models, and embracing predictive and generative machine learning models. These advancements hold the potential for a deeper understanding of the genetic architecture of human molecular traits, which in turn has important implications for understanding the genetic causes of human disease.
The Genetics of Human Sleep and Sleep Disorders
Healthy sleep is vital for humans to achieve optimal health and longevity. Poor sleep and sleep disorders are strongly associated with increased morbidity and mortality. However, the importance of good sleep continues to be underrecognized. Mechanisms regulating sleep and its functions in humans remain mostly unclear even after decades of dedicated research. Advancements in gene sequencing techniques and computational methodologies have paved the way for various genetic analysis approaches, which have provided some insights into human sleep genetics. This review summarizes our current knowledge of the genetic basis underlying human sleep traits and sleep disorders. We also highlight the use of animal models to validate genetic findings from human sleep studies and discuss potential molecular mechanisms and signaling pathways involved in the regulation of human sleep.
Genome-Wide Screening Approaches for Biochemical Reactions Independent of Cell Growth
Genome-wide screening is a potent approach for comprehensively understanding the molecular mechanisms of biological phenomena. However, despite its widespread use in the past decades across various biological targets, its application to biochemical reactions with temporal and reversible biological outputs remains a formidable challenge. To uncover the molecular machinery underlying various biochemical reactions, we have recently developed the revival screening method, which combines flow cytometry-based cell sorting with library reconstruction from collected cells. Our refinements to the traditional genome-wide screening technique have proven successful in revealing the molecular machinery of biochemical reactions of interest. In this article, we elucidate the technical basis of revival screening, focusing on its application to CRISPR-Cas9 single guide RNA (sgRNA) library screening. Finally, we also discuss the future of genome-wide screening while describing recent achievements from in vitro and in vivo screening.
Toward Realizing the Promise of AI in Precision Health Across the Spectrum of Care
Significant progress has been made in augmenting clinical decision-making using artificial intelligence (AI) in the context of secondary and tertiary care at large academic medical centers. For such innovations to have an impact across the spectrum of care, additional challenges must be addressed, including inconsistent use of preventative care and gaps in chronic care management. The integration of additional data, including genomics and data from wearables, could prove critical in addressing these gaps, but technical, legal, and ethical challenges arise. On the technical side, approaches for integrating complex and messy data are needed. Data and design imperfections like selection bias, missing data, and confounding must be addressed. In terms of legal and ethical challenges, while AI has the potential to aid in leveraging patient data to make clinical care decisions, we also risk exacerbating existing disparities. Organizations implementing AI solutions must carefully consider how they can improve care for all and reduce inequities.
The Role of Cilia and the Complex Genetics of Congenital Heart Disease
Congenital heart disease (CHD) can affect up to 1% of live births, and despite abundant evidence of a genetic etiology, the genetic landscape of CHD is still not well understood. A large-scale mouse chemical mutagenesis screen for mutations causing CHD yielded a preponderance of cilia-related genes, pointing to a central role for cilia in CHD pathogenesis. The genes uncovered by the screen included genes that regulate ciliogenesis and cilia-transduced cell signaling as well as many that mediate endocytic trafficking, a cell process critical for both ciliogenesis and cell signaling. The clinical relevance of these findings is supported by whole-exome sequencing analysis of CHD patients that showed enrichment for pathogenic variants in ciliome genes. Surprisingly, among the ciliome CHD genes recovered were many that encoded direct protein-protein interactors. Assembly of the CHD genes into a protein-protein interaction network yielded a tight interactome that suggested this protein-protein interaction may have functional importance and that its disruption could contribute to the pathogenesis of CHD. In light of these and other findings, we propose that an interactome enriched for ciliome genes may provide the genomic context for the complex genetics of CHD and its often-observed incomplete penetrance and variable expressivity.
Somatic Gene Therapy: Ethics and Access
Manipulation of a patient's genome for therapeutic ends is being attempted through numerous methods, some of which have resulted in disease-modifying interventions. The much anticipated promise of somatic gene therapy is starting to pay off; however, there remain many scientific unknowns, including concerns about safety and durability. A significant ethical concern is that of access to these novel interventions, an issue that is normally framed in terms of the high costs of approved products. I describe how access issues permeate gene therapy long before there is any commercial product and how even upstream decisions-such as choices of indication to pursue, viral vector, and where to site a trial-have significant implications for access to resultant products in both the developmental and commercial stages.
The Genetics and Functional Genomics of Osteoarthritis
Osteoarthritis is the most prevalent whole-joint degenerative disorder, and is characterized by the degradation of articular cartilage and the underlying bone structures. Almost 600 million people are affected by osteoarthritis worldwide. No curative treatments are available, and management strategies focus mostly on pain relief. Here, we provide a comprehensive overview of the available human genetic and functional genomics studies for osteoarthritis to date and delineate how these studies have helped shed light on disease etiopathology. We highlight genetic discoveries from genome-wide association studies and provide a detailed overview of molecular-level investigations in osteoarthritis tissues, including methylation-, transcriptomics-, and proteomics-level analyses. We review how functional genomics data from different molecular levels have helped to prioritize effector genes that can be used as drug targets or drug-repurposing opportunities. Finally, we discuss future directions with the potential to drive a step change in osteoarthritis research.
Mapping Human Immunity and the Education of Waldeyer's Ring
The development and deployment of single-cell genomic technologies have driven a resolution revolution in our understanding of the immune system, providing unprecedented insight into the diversity of immune cells present throughout the body and their function in health and disease. Waldeyer's ring is the collective name for the lymphoid tissue aggregations of the upper aerodigestive tract, comprising the palatine, pharyngeal (adenoids), lingual, and tubal tonsils. These tonsils are the first immune sentinels encountered by ingested and inhaled antigens and are responsible for mounting the first wave of adaptive immune response. An effective mucosal immune response is critical to neutralizing infection in the upper airway and preventing systemic spread, and dysfunctional immune responses can result in ear, nose, and throat pathologies. This review uses Waldeyer's ring to demonstrate how single-cell technologies are being applied to advance our understanding of the immune system and highlight directions for future research.
The Decision at 10
A decade ago, the US Supreme Court decided , concluding that isolated genes were not patentable subject matter. Beyond being a mere patent dispute, the case was a political and cultural phenomenon, viewed as a harbinger for the health of the biotechnology industry. With a decade of perspective, though, 's impact seems much narrower. The law surrounding patentable subject matter-while greatly transformed-only centered on in small part. The case had only a modest impact on patenting practices both in and outside the United States. And persistent efforts to legislatively overturn the decision have not borne fruit. The significance of thus remains, even a decade later, hidden by larger developments in science and law that have occurred since the case was decided.
Beyond the Human Genome Project: The Age of Complete Human Genome Sequences and Pangenome References
The Human Genome Project was an enormous accomplishment, providing a foundation for countless explorations into the genetics and genomics of the human species. Yet for many years, the human genome reference sequence remained incomplete and lacked representation of human genetic diversity. Recently, two major advances have emerged to address these shortcomings: complete gap-free human genome sequences, such as the one developed by the Telomere-to-Telomere Consortium, and high-quality pangenomes, such as the one developed by the Human Pangenome Reference Consortium. Facilitated by advances in long-read DNA sequencing and genome assembly algorithms, complete human genome sequences resolve regions that have been historically difficult to sequence, including centromeres, telomeres, and segmental duplications. In parallel, pangenomes capture the extensive genetic diversity across populations worldwide. Together, these advances usher in a new era of genomics research, enhancing the accuracy of genomic analysis, paving the path for precision medicine, and contributing to deeper insights into human biology.
RNA Sequencing in Disease Diagnosis
RNA sequencing (RNA-seq) enables the accurate measurement of multiple transcriptomic phenotypes for modeling the impacts of disease variants. Advances in technologies, experimental protocols, and analysis strategies are rapidly expanding the application of RNA-seq to identify disease biomarkers, tissue- and cell-type-specific impacts, and the spatial localization of disease-associated mechanisms. Ongoing international efforts to construct biobank-scale transcriptomic repositories with matched genomic data across diverse population groups are further increasing the utility of RNA-seq approaches by providing large-scale normative reference resources. The availability of these resources, combined with improved computational analysis pipelines, has enabled the detection of aberrant transcriptomic phenotypes underlying rare diseases. Further expansion of these resources, across both somatic and developmental tissues, is expected to soon provide unprecedented insights to resolve disease origin, mechanism of action, and causal gene contributions, suggesting the continued high utility of RNA-seq in disease diagnosis.
Genomic Interactions Between and Humans
is considered by many to be the deadliest microbe, with the estimated annual cases numbering more than 10 million. The bacteria, including , are classified into nine major lineages and hundreds of sublineages, each with different geographical distributions and levels of virulence. The phylogeographic patterns can be a result of recent and early human migrations as well as coevolution between the bacteria and various human populations, which may explain why many studies on human genetic factors contributing to tuberculosis have not been replicable in different areas. Moreover, several studies have revealed the significance of interactions between human genetic variations and bacterial genotypes in determining the development of tuberculosis, suggesting coadaptation. The increased availability of whole-genome sequence data from both humans and bacteria has enabled a better understanding of these interactions, which can inform the development of vaccines and other control measures.
Integrating Large-Scale Protein Structure Prediction into Human Genetics Research
The last five years have seen impressive progress in deep learning models applied to protein research. Most notably, sequence-based structure predictions have seen transformative gains in the form of AlphaFold2 and related approaches. Millions of missense protein variants in the human population lack annotations, and these computational methods are a valuable means to prioritize variants for further analysis. Here, we review the recent progress in deep learning models applied to the prediction of protein structure and protein variants, with particular emphasis on their implications for human genetics and health. Improved prediction of protein structures facilitates annotations of the impact of variants on protein stability, protein-protein interaction interfaces, and small-molecule binding pockets. Moreover, it contributes to the study of host-pathogen interactions and the characterization of protein function. As genome sequencing in large cohorts becomes increasingly prevalent, we believe that better integration of state-of-the-art protein informatics technologies into human genetics research is of paramount importance.
Sickle Cell Disease: From Genetics to Curative Approaches
Sickle cell disease (SCD) is a monogenic blood disease caused by a point mutation in the gene coding for β-globin. The abnormal hemoglobin [sickle hemoglobin (HbS)] polymerizes under low-oxygen conditions and causes red blood cells to sickle. The clinical presentation varies from very severe (with acute pain, chronic pain, and early mortality) to normal (few complications and a normal life span). The variability of SCD might be due (in part) to various genetic modulators. First, we review the main genetic factors, polymorphisms, and modifier genes that influence the expression of globin or otherwise modulate the severity of SCD. Considering SCD as a complex, multifactorial disorder is important for the development of appropriate pharmacological and genetic treatments. Second, we review the characteristics, advantages, and disadvantages of the latest advances in gene therapy for SCD, from lentiviral-vector-based approaches to gene-editing strategies.