COMPUTING IN SCIENCE & ENGINEERING

The COVID-19 High-Performance Computing Consortium
Brase J, Campbell N, Helland B, Hoang T, Parashar M, Rosenfield M, Sexton J and Towns J
In March of 2020, recognizing the potential of High Performance Computing (HPC) to accelerate understanding and the pace of scientific discovery in the fight to stop COVID-19, the HPC community assembled the largest collection of worldwide HPC resources to enable COVID-19 researchers worldwide to advance their critical efforts. Amazingly, the COVID-19 HPC Consortium was formed within one week through the joint effort of the Office of Science and Technology Policy (OSTP), the U.S. Department of Energy (DOE), the National Science Foundation (NSF), and IBM to create a unique public-private partnership between government, industry, and academic leaders. This article is the Consortium's story-how the Consortium was created, its founding members, what it provides, how it works, and its accomplishments. We will reflect on the lessons learned from the creation and operation of the Consortium and describe how the features of the Consortium could be sustained as a National Strategic Computing Reserve to ensure the nation is prepared for future crises.
Discrete-Time Modeling of COVID-19 Propagation in Argentina with Explicit Delays
Bergonzi M, Pecker-Marcosig E, Kofman E and Castro R
We present a new deterministic discrete-time compartmental model of COVID-19 that explicitly takes into account relevant delays related to the stages of the disease, its diagnosis and report system, allowing to represent the presence of imported cases. In addition to developing the model equations, we describe an automatic parameter fitting mechanism using official data on the spread of the virus in Argentina. The result consistently reflects the behavior of the disease with respect to characteristic times: latency, infectious period, report of cases (confirmed and dead), and allows for detecting automatically changes in the reproductive number and in the mortality factor. We also analyse the model's prediction capability and present simulation results assuming different future scenarios. We discuss usage of the model in a closed-loop control scheme, where the explicit presence of delays plays a key role in projecting more realistic dynamics than that of classic continuous-time models.
Supercomputing Pipelines Search for Therapeutics Against COVID-19
Vermaas JV, Sedova A, Baker MB, Boehm S, Rogers DM, Larkin J, Glaser J, Smith MD, Hernandez O and Smith JC
The urgent search for drugs to combat SARS-CoV-2 has included the use of supercomputers. The use of general-purpose graphical processing units (GPUs), massive parallelism, and new software for high-performance computing (HPC) has allowed researchers to search the vast chemical space of potential drugs faster than ever before. We developed a new drug discovery pipeline using the Summit supercomputer at Oak Ridge National Laboratory to help pioneer this effort, with new platforms that incorporate GPU-accelerated simulation and allow for the virtual screening of billions of potential drug compounds in days compared to weeks or months for their ability to inhibit SARS-COV-2 proteins. This effort will accelerate the process of developing drugs to combat the current COVID-19 pandemic and other diseases.
Computational Decision Support for the COVID-19 Healthcare Coalition
Tolk A, Glazner C and Ungerleider J
The COVID-19 Healthcare Coalition was established as a private sector-led response to the COVID-19 pandemic. Its purpose was to bring together healthcare organizations, technology firms, nonprofits, academia, and startups to preserve the healthcare delivery system and help protect U.S. populations by providing data-driven, real-time insights that improve outcomes. This required the coalition to obtain, align, and orchestrate many heterogeneous data sources and present this data on dashboards in a format that was understandable and useful to decision makers. To do this, the coalition employed an ensemble approach to analysis, combining machine learning algorithms together with theory-based simulations, allowing prognosis to provide computational decision support rooted in science and engineering.
Trustworthy Computational Evidence Through Transparency and Reproducibility
Barba LA
Many high-performance computing applications are of high consequence to society. Global climate modeling is a historic example of this. In 2020, the societal issue of greatest concern, the still-raging COVID-19 pandemic, saw a legion of computational scientists turning their endeavors to new research projects in this direction. Applications of such high consequence highlight the need for building trustworthy computational models.
Corrections to "Visual Analytics for Decision-Making During Pandemics"
Reinert A, Snyder LS, Zhao J, Fox AS, Hougen DF, Nicholson C and Ebert DS
[This corrects the article DOI: 10.1109/MCSE.2020.3023288.].
Cloud Computing for COVID-19: Lessons Learned From Massively Parallel Models of Ventilator Splitting
Kaplan M, Kneifel C, Orlikowski V, Dorff J, Newton M, Howard A, Shinn D, Bishawi M, Chidyagwai S, Balogh P and Randles A
A patient-specific airflow simulation was developed to help address the pressing need for an expansion of the ventilator capacity in response to the COVID-19 pandemic. The computational model provides guidance regarding how to split a ventilator between two or more patients with differing respiratory physiologies. To address the need for fast deployment and identification of optimal patient-specific tuning, there was a need to simulate hundreds of millions of different clinically relevant parameter combinations in a short time. This task, driven by the dire circumstances, presented unique computational and research challenges. We present here the guiding principles and lessons learned as to how a large-scale and robust cloud instance was designed and deployed within 24 hours and 800 000 compute hours were utilized in a 72-hour period. We discuss the design choices to enable a quick turnaround of the model, execute the simulation, and create an intuitive and interactive interface.
Visual Analytics for Decision-Making During Pandemics
Reinert A, Snyder LS, Zhao J, Fox AS, Hougen DF, Nicholson C and Ebert DS
We introduce a trans-disciplinary collaboration between researchers, healthcare practitioners, and community health partners in the Southwestern U.S. to enable improved management, response, and recovery to our current pandemic and for future health emergencies. Our Center work enables effective and efficient decision-making through interactive, human-guided analytical environments. We discuss our PanViz 2.0 system, a visual analytics application for supporting pandemic preparedness through a tightly coupled epidemiological model and interactive interface. We discuss our framework, current work, and plans to extend the system with exploration of what-if scenarios, interactive machine learning for model parameter inference, and analysis of mitigation strategies to facilitate decision-making during public health crises.
Hands-on with IBM Visual Insights
Luo S and Kindratenko V
Scalable Analysis of Authentic Viral Envelopes on FRONTERA
González-Arias F, Reddy T, Stone JE, Hadden-Perilla JA and Perilla JR
Enveloped viruses, such as SARS-CoV-2, infect cells via fusion of their envelope with the host membrane. By employing molecular simulations to characterize viral envelopes, researchers can gain insights into key determinants of infection. Here, the Frontera supercomputer is leveraged for large-scale modeling and analysis of authentic viral envelopes, whose lipid compositions are complex and realistic. Visual Molecular Dynamics (VMD) with support for MPI is employed, overcoming previous computational limitations and enabling investigation into virus biology at an unprecedented scale. The techniques applied here to an authentic HIV-1 envelope at two levels of spatial resolution (29 million particles and 280 million atoms) are broadly applicable to the study of other viruses. The authors are actively employing these techniques to develop and characterize an authentic SARS-CoV-2 envelope. A general framework for carrying out scalable analysis of simulation trajectories on Frontera is presented, expanding the utility of the machine in humanity's ongoing fight against infectious diseases.
Discovering Geometry in Data Arrays
Chi EC
Modern technologies produce a deluge of complicated data. In neuroscience, for example, minimally invasive experimental methods can take recordings of large populations of neurons at high resolution under a multitude of conditions. Such data arrays possess non-trivial interdependencies along each of their axes. Insights into these data arrays may lay the foundations of advanced treatments for nervous system disorders. The potential impacts of such data, however, will not be fully realized unless the techniques for analyzing them keep pace. Specifically, there is an urgent, growing need for methods for estimating the low-dimensional structure and geometry in big and noisy data arrays. This article reviews a framework for identifying complicated underlying patterns in such data and also recounts the key role that the Department of Energy Computational Sciences Graduate Fellowship played in setting the stage for this work to be done by the author.
FluoRender Script: A Case Study of Lingua Franca in Translational Computer Science
Wan Y, Holman HA and Hansen C
FluoRender is a software program used for the visualization and analysis of 3-D biological image data, particularly from fluorescence microscopy. We examine FluoRender's script system to demonstrate its translation process. In this article, we borrow the concept of lingua franca from linguistics. We designed a connecting language between the source and target domains for translation, thereby augmenting understanding and acceptance. In FluoRender's script system, the lingua franca consists of the mapping between the control of the media player and the computational and interactive subroutines of an analysis workflow. Workflows supporting automatic, semiautomatic, and manual operations were made available and easily accessible to end users. The formalization of the lingua franca as a technique for translational computer science provides guidance for future development.
How to Model for a Living: The CSGF as a Catalyst for Supermodels
Radhakrishnan ML
Models are ubiquitous and uniting tools for computational scientists across disciplines. As a computational biophysical chemist, I apply multiple models to understand and predict how molecules recognize and interact with each other in complex, dynamic biological environments. The Department of Energy Computational Science Graduate Fellowship (DOE CSGF) cultivates interest in engaging in models from an multidisciplinary perspective and enables junior scientists to see how computational modeling is a creative and collaborative process. Below, I describe ways, based in part on my own experiences as a CSGF recipient, in which modeling can be used both to understand the molecular world and to excite others about computational science.
Biomolecular Simulations in the Time of COVID19, and After
Amaro RE and Mulholland AJ
COVID19 has changed life for people worldwide. Despite lockdowns globally, computational research has pressed on, working remotely and collaborating virtually on research questions in COVID19 and the virus it is caused by, SARS-CoV-2. Molecular simulations can help to characterize the function of viral and host proteins and have the potential to contribute to the search for vaccines and treatments. Changes in the of research groups include broader adoption of the use of preprint servers, earlier and more open sharing of methods, models, and data, the use of social media to rapidly disseminate information, online seminars, and cloud-based virtual collaboration. Research funders and computing providers worldwide recognized the need to provide rapid and significant access to computational architectures. In this review, we discuss how the interplay of all of these factors is influencing the impact - both potential and realized - of biomolecular simulations in the fight against SARS-CoV-2.
A PyMOL snippet library for Jupyter to boost researcher productivity
Mooers BHM
Snippets - code templates one line or longer - boost researcher productivity because they are faster to insert than writing the code from scratch and because they reduce debugging time. Several extensions support the use of snippets in Jupyter. We developed a Python version of the pymolsnips library and customized it for use in the jupyterlab-snippets-multimenus extension for JupyterLab. The extension provides access to the snippets by pull-down menus. Each snippet performs one task. Each task often requires many lines of code. This library's availability in Jupyter enables PyMOL users to run PyMOL efficiently inside Jupyter while storing the code and the associated molecular graphics images next to each other in one notebook document. This proximity of code and images supports reproducible research in structural biology, and the use of one computer file facilitates collaborations.
Revealing the mechanism of SARS-CoV-2 spike protein binding with ACE2
Xie Y, Du D, Karki CB, Guo W, Lopez-Hernandez AE, Sun S, Juarez BY, Li H, Wang J and Li L
A large population in the world has been infected by COVID-19. Understanding the mechanisms of Severe Acute Respiratory Syndrome CoronaVirus 2 (SARS-CoV-2) is important for management and treatment of the COVID-19. When it comes to the infection process, one of the most important proteins in SARS-CoV-2 is the spike (S) protein, which is able to bind to human Angiotensin-Converting Enzyme 2 (ACE2) and initializes the entry of the host cell. In this study, we implemented multi-scale computational approaches to study the electrostatic features of the interfaces of the SARS-CoV-2 S protein Receptor Binding Domain (RBD) and ACE2. The simulations and analyses were performed on high-performance computing resources in Texas Advanced Computing Center (TACC). Our study identified key residues on the SARS-CoV-2, which can be used as targets for future drug design. The results shed lights on future drug design and therapeutic targets for COVID-19.
Early COVID-19 pandemic modeling: Three compartmental model case studies from Texas, USA
Pierce KA, Ho E, Wang X, Pasco R, Du Z, Zynda G, Song J, Wells G, Fox SJ and Meyers LA
The novel coronavirus (SARS-CoV-2) emerged in late 2019 and spread globally in early 2020. Initial reports suggested the associated disease, COVID-19, produced rapid epidemic growth and caused high mortality. As the virus sparked local epidemics in new communities, health systems and policy makers were forced to make decisions with limited information about the spread of the disease. We developed a compartmental model to project COVID-19 healthcare demands that combined information regarding SARS-CoV-2 transmission dynamics from international reports with local COVID-19 hospital census data to support response efforts in three Metropolitan Statistical Areas (MSAs) in Texas, USA: Austin-Round Rock, Houston-The Woodlands-Sugar Land, and Beaumont-Port Arthur. Our model projects that strict stay-home orders and other social distancing measures could suppress the spread of the pandemic. Our capacity to provide rapid decision-support in response to emerging threats depends on access to data, validated modeling approaches, careful uncertainty quantification, and adequate computational resources.
ANARI: A 3-D Rendering API Standard
Stone JE, Griffin KS, Amstutz J, DeMarle DE, Sherman WR and Günther J
ANARI is a new 3-D rendering API, an emerging Khronos standard that enables visualization applications to leverage the state-of-the-art rendering techniques across diverse hardware platforms and rendering engines. Visualization applications have historically embedded custom-written renderers to enable them to provide the necessary combination of features, performance, and visual fidelity required by their users. As computing power, rendering algorithms, dedicated rendering hardware acceleration operations, and associated low-level APIs have advanced, the effort and costs associated with maintaining renderers within visualization applications have risen dramatically. The rising cost and complexity associated with renderer development creates an undesirable barrier for visualization applications to be able to fully benefit from the latest rendering methods and hardware. ANARI directly addresses these challenges by providing a high-level, visualization-oriented API that abstracts low-level rendering algorithms and hardware acceleration details while providing easy and efficient access to diverse ANARI implementations, thereby enabling visualization applications to support the state-of-the-art rendering capabilities.
Comparing the Use of Research Resource Identifiers and Natural Language Processing for Citation of Databases, Software, and Other Digital Artifacts
Hsu CN, Bandrowski AE, Gillespie TH, Udell J, Lin KW, Ozyurt IB, Grethe JS and Martone ME
The Research Resource Identifier (RRID) was introduced in 2014 to better identify biomedical research resources and track their use across the literature, including key digital resources such as databases and software. Authors include an RRID after the first mention of any resource used. Here, we provide an overview of RRIDs and analyze their use for digital resource identification. We quantitatively compare the output of our RRID curation workflow with the outputs of automated text mining systems used to identify resource mentions in text. The results show that authors follow RRID reporting guidelines well, and that our natural language processing based text mining was able to identify nearly all of the resources identified by RRIDs as well as thousands more. Finally, we demonstrate how RRIDs and text mining can complement each other to provide a scalable solution to digital resource citation.
Discovering Metamorphic Relations for Scientific Software From User Forums
Lin X, Simon M, Peng Z and Niu N
Scientific software can be used for decades and is constantly evolving. Recently, metamorphic testing, a property-based testing technique, has shown to be effective in testing scientific software, and the necessary properties are expressed as metamorphic relations. However, the development of metamorphic relations is difficult: it requires considerable practical expertise for the software tester. In this article, we report our experience of uncovering metamorphic relations from a user forum's questions of the United States Environmental Protection Agency's Storm Water Management Model (SWMM). Our study not only illustrates a wealth of end users' expertise in interpreting software results, but also demonstrates the usefulness of classifying the user-oriented metamorphic relations into a nominal, ordinal, and functional hierarchy mainly from the software output perspective.
The GA4GH Task Execution Application Programming Interface: Enabling Easy Multicloud Task Execution
Kanitz A, McLoughlin MH, Beckman L, , Malladi VS and Ellrott K
The Global Alliance for Genomics and Health (GA4GH) Task Execution Service (TES) application programming interface (API) is a standardized schema and API for describing and executing batch execution tasks. It provides a common way to submit and manage tasks to a variety of compute environments, including on-premises high-performance computing and high-throughput computing systems, cloud computing platforms, and hybrid environments. The TES API is designed to be flexible and extensible, allowing it to be adapted to a wide range of use cases, such as "bringing compute to the data" solutions for federated and distributed data analysis, or load balancing across multicloud infrastructures. This API has been adopted by numerous different service providers and is utilized by several workflow engines, yielding a single abstracted interface for developers and researchers. Using its capabilities, genome research institutes are building extensible hybrid compute systems to study life science.