The COVID-19 High-Performance Computing Consortium
In March of 2020, recognizing the potential of High Performance Computing (HPC) to accelerate understanding and the pace of scientific discovery in the fight to stop COVID-19, the HPC community assembled the largest collection of worldwide HPC resources to enable COVID-19 researchers worldwide to advance their critical efforts. Amazingly, the COVID-19 HPC Consortium was formed within one week through the joint effort of the Office of Science and Technology Policy (OSTP), the U.S. Department of Energy (DOE), the National Science Foundation (NSF), and IBM to create a unique public-private partnership between government, industry, and academic leaders. This article is the Consortium's story-how the Consortium was created, its founding members, what it provides, how it works, and its accomplishments. We will reflect on the lessons learned from the creation and operation of the Consortium and describe how the features of the Consortium could be sustained as a National Strategic Computing Reserve to ensure the nation is prepared for future crises.
Discrete-Time Modeling of COVID-19 Propagation in Argentina with Explicit Delays
We present a new deterministic discrete-time compartmental model of COVID-19 that explicitly takes into account relevant delays related to the stages of the disease, its diagnosis and report system, allowing to represent the presence of imported cases. In addition to developing the model equations, we describe an automatic parameter fitting mechanism using official data on the spread of the virus in Argentina. The result consistently reflects the behavior of the disease with respect to characteristic times: latency, infectious period, report of cases (confirmed and dead), and allows for detecting automatically changes in the reproductive number and in the mortality factor. We also analyse the model's prediction capability and present simulation results assuming different future scenarios. We discuss usage of the model in a closed-loop control scheme, where the explicit presence of delays plays a key role in projecting more realistic dynamics than that of classic continuous-time models.
Supercomputing Pipelines Search for Therapeutics Against COVID-19
The urgent search for drugs to combat SARS-CoV-2 has included the use of supercomputers. The use of general-purpose graphical processing units (GPUs), massive parallelism, and new software for high-performance computing (HPC) has allowed researchers to search the vast chemical space of potential drugs faster than ever before. We developed a new drug discovery pipeline using the Summit supercomputer at Oak Ridge National Laboratory to help pioneer this effort, with new platforms that incorporate GPU-accelerated simulation and allow for the virtual screening of billions of potential drug compounds in days compared to weeks or months for their ability to inhibit SARS-COV-2 proteins. This effort will accelerate the process of developing drugs to combat the current COVID-19 pandemic and other diseases.
Computational Decision Support for the COVID-19 Healthcare Coalition
The COVID-19 Healthcare Coalition was established as a private sector-led response to the COVID-19 pandemic. Its purpose was to bring together healthcare organizations, technology firms, nonprofits, academia, and startups to preserve the healthcare delivery system and help protect U.S. populations by providing data-driven, real-time insights that improve outcomes. This required the coalition to obtain, align, and orchestrate many heterogeneous data sources and present this data on dashboards in a format that was understandable and useful to decision makers. To do this, the coalition employed an ensemble approach to analysis, combining machine learning algorithms together with theory-based simulations, allowing prognosis to provide computational decision support rooted in science and engineering.
Trustworthy Computational Evidence Through Transparency and Reproducibility
Many high-performance computing applications are of high consequence to society. Global climate modeling is a historic example of this. In 2020, the societal issue of greatest concern, the still-raging COVID-19 pandemic, saw a legion of computational scientists turning their endeavors to new research projects in this direction. Applications of such high consequence highlight the need for building trustworthy computational models.
Corrections to "Visual Analytics for Decision-Making During Pandemics"
[This corrects the article DOI: 10.1109/MCSE.2020.3023288.].
Visual Analytics for Decision-Making During Pandemics
We introduce a trans-disciplinary collaboration between researchers, healthcare practitioners, and community health partners in the Southwestern U.S. to enable improved management, response, and recovery to our current pandemic and for future health emergencies. Our Center work enables effective and efficient decision-making through interactive, human-guided analytical environments. We discuss our PanViz 2.0 system, a visual analytics application for supporting pandemic preparedness through a tightly coupled epidemiological model and interactive interface. We discuss our framework, current work, and plans to extend the system with exploration of what-if scenarios, interactive machine learning for model parameter inference, and analysis of mitigation strategies to facilitate decision-making during public health crises.
Cloud Computing for COVID-19: Lessons Learned From Massively Parallel Models of Ventilator Splitting
A patient-specific airflow simulation was developed to help address the pressing need for an expansion of the ventilator capacity in response to the COVID-19 pandemic. The computational model provides guidance regarding how to split a ventilator between two or more patients with differing respiratory physiologies. To address the need for fast deployment and identification of optimal patient-specific tuning, there was a need to simulate hundreds of millions of different clinically relevant parameter combinations in a short time. This task, driven by the dire circumstances, presented unique computational and research challenges. We present here the guiding principles and lessons learned as to how a large-scale and robust cloud instance was designed and deployed within 24 hours and 800 000 compute hours were utilized in a 72-hour period. We discuss the design choices to enable a quick turnaround of the model, execute the simulation, and create an intuitive and interactive interface.
Scalable Analysis of Authentic Viral Envelopes on FRONTERA
Enveloped viruses, such as SARS-CoV-2, infect cells via fusion of their envelope with the host membrane. By employing molecular simulations to characterize viral envelopes, researchers can gain insights into key determinants of infection. Here, the Frontera supercomputer is leveraged for large-scale modeling and analysis of authentic viral envelopes, whose lipid compositions are complex and realistic. Visual Molecular Dynamics (VMD) with support for MPI is employed, overcoming previous computational limitations and enabling investigation into virus biology at an unprecedented scale. The techniques applied here to an authentic HIV-1 envelope at two levels of spatial resolution (29 million particles and 280 million atoms) are broadly applicable to the study of other viruses. The authors are actively employing these techniques to develop and characterize an authentic SARS-CoV-2 envelope. A general framework for carrying out scalable analysis of simulation trajectories on Frontera is presented, expanding the utility of the machine in humanity's ongoing fight against infectious diseases.
Biomolecular Simulations in the Time of COVID19, and After
COVID19 has changed life for people worldwide. Despite lockdowns globally, computational research has pressed on, working remotely and collaborating virtually on research questions in COVID19 and the virus it is caused by, SARS-CoV-2. Molecular simulations can help to characterize the function of viral and host proteins and have the potential to contribute to the search for vaccines and treatments. Changes in the of research groups include broader adoption of the use of preprint servers, earlier and more open sharing of methods, models, and data, the use of social media to rapidly disseminate information, online seminars, and cloud-based virtual collaboration. Research funders and computing providers worldwide recognized the need to provide rapid and significant access to computational architectures. In this review, we discuss how the interplay of all of these factors is influencing the impact - both potential and realized - of biomolecular simulations in the fight against SARS-CoV-2.
Discovering Geometry in Data Arrays
Modern technologies produce a deluge of complicated data. In neuroscience, for example, minimally invasive experimental methods can take recordings of large populations of neurons at high resolution under a multitude of conditions. Such data arrays possess non-trivial interdependencies along each of their axes. Insights into these data arrays may lay the foundations of advanced treatments for nervous system disorders. The potential impacts of such data, however, will not be fully realized unless the techniques for analyzing them keep pace. Specifically, there is an urgent, growing need for methods for estimating the low-dimensional structure and geometry in big and noisy data arrays. This article reviews a framework for identifying complicated underlying patterns in such data and also recounts the key role that the Department of Energy Computational Sciences Graduate Fellowship played in setting the stage for this work to be done by the author.
How to Model for a Living: The CSGF as a Catalyst for Supermodels
Models are ubiquitous and uniting tools for computational scientists across disciplines. As a computational biophysical chemist, I apply multiple models to understand and predict how molecules recognize and interact with each other in complex, dynamic biological environments. The Department of Energy Computational Science Graduate Fellowship (DOE CSGF) cultivates interest in engaging in models from an multidisciplinary perspective and enables junior scientists to see how computational modeling is a creative and collaborative process. Below, I describe ways, based in part on my own experiences as a CSGF recipient, in which modeling can be used both to understand the molecular world and to excite others about computational science.
A PyMOL snippet library for Jupyter to boost researcher productivity
Snippets - code templates one line or longer - boost researcher productivity because they are faster to insert than writing the code from scratch and because they reduce debugging time. Several extensions support the use of snippets in Jupyter. We developed a Python version of the pymolsnips library and customized it for use in the jupyterlab-snippets-multimenus extension for JupyterLab. The extension provides access to the snippets by pull-down menus. Each snippet performs one task. Each task often requires many lines of code. This library's availability in Jupyter enables PyMOL users to run PyMOL efficiently inside Jupyter while storing the code and the associated molecular graphics images next to each other in one notebook document. This proximity of code and images supports reproducible research in structural biology, and the use of one computer file facilitates collaborations.
FluoRender Script: A Case Study of Lingua Franca in Translational Computer Science
FluoRender is a software program used for the visualization and analysis of 3-D biological image data, particularly from fluorescence microscopy. We examine FluoRender's script system to demonstrate its translation process. In this article, we borrow the concept of lingua franca from linguistics. We designed a connecting language between the source and target domains for translation, thereby augmenting understanding and acceptance. In FluoRender's script system, the lingua franca consists of the mapping between the control of the media player and the computational and interactive subroutines of an analysis workflow. Workflows supporting automatic, semiautomatic, and manual operations were made available and easily accessible to end users. The formalization of the lingua franca as a technique for translational computer science provides guidance for future development.
Exascale Computing: A New Dawn for Computational Biology
As biologists discover and learn to embrace the complexity of biological systems, computational data analysis and modeling have become critical for furthering our understanding. Exascale computing will enable the development of new predictive multiscale models, transforming how we study the behaviors of organisms and ecosystems, ultimately leading to new innovations and discoveries.
Revealing the mechanism of SARS-CoV-2 spike protein binding with ACE2
A large population in the world has been infected by COVID-19. Understanding the mechanisms of Severe Acute Respiratory Syndrome CoronaVirus 2 (SARS-CoV-2) is important for management and treatment of the COVID-19. When it comes to the infection process, one of the most important proteins in SARS-CoV-2 is the spike (S) protein, which is able to bind to human Angiotensin-Converting Enzyme 2 (ACE2) and initializes the entry of the host cell. In this study, we implemented multi-scale computational approaches to study the electrostatic features of the interfaces of the SARS-CoV-2 S protein Receptor Binding Domain (RBD) and ACE2. The simulations and analyses were performed on high-performance computing resources in Texas Advanced Computing Center (TACC). Our study identified key residues on the SARS-CoV-2, which can be used as targets for future drug design. The results shed lights on future drug design and therapeutic targets for COVID-19.
Discovering Metamorphic Relations for Scientific Software From User Forums
Scientific software can be used for decades and is constantly evolving. Recently, metamorphic testing, a property-based testing technique, has shown to be effective in testing scientific software, and the necessary properties are expressed as metamorphic relations. However, the development of metamorphic relations is difficult: it requires considerable practical expertise for the software tester. In this article, we report our experience of uncovering metamorphic relations from a user forum's questions of the United States Environmental Protection Agency's Storm Water Management Model (SWMM). Our study not only illustrates a wealth of end users' expertise in interpreting software results, but also demonstrates the usefulness of classifying the user-oriented metamorphic relations into a nominal, ordinal, and functional hierarchy mainly from the software output perspective.
Early COVID-19 pandemic modeling: Three compartmental model case studies from Texas, USA
The novel coronavirus (SARS-CoV-2) emerged in late 2019 and spread globally in early 2020. Initial reports suggested the associated disease, COVID-19, produced rapid epidemic growth and caused high mortality. As the virus sparked local epidemics in new communities, health systems and policy makers were forced to make decisions with limited information about the spread of the disease. We developed a compartmental model to project COVID-19 healthcare demands that combined information regarding SARS-CoV-2 transmission dynamics from international reports with local COVID-19 hospital census data to support response efforts in three Metropolitan Statistical Areas (MSAs) in Texas, USA: Austin-Round Rock, Houston-The Woodlands-Sugar Land, and Beaumont-Port Arthur. Our model projects that strict stay-home orders and other social distancing measures could suppress the spread of the pandemic. Our capacity to provide rapid decision-support in response to emerging threats depends on access to data, validated modeling approaches, careful uncertainty quantification, and adequate computational resources.
Random Sampling using -vector
This work introduces two new techniques for random number generation with any prescribed nonlinear distribution based on the -vector methodology. The first approach is based on inverse transform sampling using the optimal -vector to generate the samples by inverting the cumulative distribution. The second approach generates samples by performing random searches in a pre-generated large database previously built by massive inversion of the prescribed nonlinear distribution using the -vector. Both methods are shown suitable for massive generation of random samples. Examples are provided to clarify these methodologies.
ANARI: A 3-D Rendering API Standard
ANARI is a new 3-D rendering API, an emerging Khronos standard that enables visualization applications to leverage the state-of-the-art rendering techniques across diverse hardware platforms and rendering engines. Visualization applications have historically embedded custom-written renderers to enable them to provide the necessary combination of features, performance, and visual fidelity required by their users. As computing power, rendering algorithms, dedicated rendering hardware acceleration operations, and associated low-level APIs have advanced, the effort and costs associated with maintaining renderers within visualization applications have risen dramatically. The rising cost and complexity associated with renderer development creates an undesirable barrier for visualization applications to be able to fully benefit from the latest rendering methods and hardware. ANARI directly addresses these challenges by providing a high-level, visualization-oriented API that abstracts low-level rendering algorithms and hardware acceleration details while providing easy and efficient access to diverse ANARI implementations, thereby enabling visualization applications to support the state-of-the-art rendering capabilities.