INTERNATIONAL JOURNAL OF HIGH PERFORMANCE COMPUTING APPLICATIONS

Free energy perturbation-based large-scale virtual screening for effective drug discovery against COVID-19
Li Z, Wu C, Li Y, Liu R, Lu K, Wang R, Liu J, Gong C, Yang C, Wang X, Zhan CG and Luo HB
As a theoretically rigorous and accurate method, FEP-ABFE (Free Energy Perturbation-Absolute Binding Free Energy) calculations showed great potential in drug discovery, but its practical application was difficult due to high computational cost. To rapidly discover antiviral drugs targeting SARS-CoV-2 M and TMPRSS2, we performed FEP-ABFE-based virtual screening for ∼12,000 protein-ligand binding systems on a new generation of Tianhe supercomputer. A task management tool was specifically developed for automating the whole process involving more than 500,000 MD tasks. In further experimental validation, 50 out of 98 tested compounds showed significant inhibitory activity towards M, and one representative inhibitor, dipyridamole, showed remarkable outcomes in subsequent clinical trials. This work not only demonstrates the potential of FEP-ABFE in drug discovery but also provides an excellent starting point for further development of anti-SARS-CoV-2 drugs. Besides, ∼500 TB of data generated in this work will also accelerate the further development of FEP-related methods.
Data-driven scalable pipeline using national agent-based models for real-time pandemic response and decision support
Bhattacharya P, Chen J, Hoops S, Machi D, Lewis B, Venkatramanan S, Wilson ML, Klahn B, Adiga A, Hurt B, Outten J, Adiga A, Warren A, Baek YY, Porebski P, Marathe A, Xie D, Swarup S, Vullikanti A, Mortveit H, Eubank S, Barrett CL and Marathe M
This paper describes an integrated, data-driven operational pipeline based on national agent-based models to support federal and state-level pandemic planning and response. The pipeline consists of () an automatic semantic-aware scheduling method that coordinates jobs across two separate high performance computing systems; () a data pipeline to collect, integrate and organize national and county-level disaggregated data for initialization and post-simulation analysis; () a digital twin of national social contact networks made up of 288 Million individuals and 12.6 Billion time-varying interactions covering the US states and DC; () an extension of a parallel agent-based simulation model to study epidemic dynamics and associated interventions. This pipeline can run 400 replicates of national runs in less than 33 h, and reduces the need for human intervention, resulting in faster turnaround times and higher reliability and accuracy of the results. Scientifically, the work has led to significant advances in real-time epidemic sciences.
#COVIDisAirborne: AI-enabled multiscale computational microscopy of delta SARS-CoV-2 in a respiratory aerosol
Dommer A, Casalino L, Kearns F, Rosenfeld M, Wauer N, Ahn SH, Russo J, Oliveira S, Morris C, Bogetti A, Trifan A, Brace A, Sztain T, Clyde A, Ma H, Chennubhotla C, Lee H, Turilli M, Khalid S, Tamayo-Mendoza T, Welborn M, Christensen A, Smith DG, Qiao Z, Sirumalla SK, O'Connor M, Manby F, Anandkumar A, Hardy D, Phillips J, Stern A, Romero J, Clark D, Dorrell M, Maiden T, Huang L, McCalpin J, Woods C, Gray A, Williams M, Barker B, Rajapaksha H, Pitts R, Gibbs T, Stone J, Zuckerman DM, Mulholland AJ, Miller T, Jha S, Ramanathan A, Chong L and Amaro RE
We seek to completely revise current models of airborne transmission of respiratory viruses by providing never-before-seen atomic-level views of the SARS-CoV-2 virus within a respiratory aerosol. Our work dramatically extends the capabilities of multiscale computational microscopy to address the significant gaps that exist in current experimental methods, which are limited in their ability to interrogate aerosols at the atomic/molecular level and thus obscure our understanding of airborne transmission. We demonstrate how our integrated data-driven platform provides a new way of exploring the composition, structure, and dynamics of aerosols and aerosolized viruses, while driving simulation method development along several important axes. We present a series of initial scientific discoveries for the SARS-CoV-2 Delta variant, noting that the full scientific impact of this work has yet to be realized.
Intelligent resolution: Integrating Cryo-EM with AI-driven multi-resolution simulations to observe the severe acute respiratory syndrome coronavirus-2 replication-transcription machinery in action
Trifan A, Gorgun D, Salim M, Li Z, Brace A, Zvyagin M, Ma H, Clyde A, Clark D, Hardy DJ, Burnley T, Huang L, McCalpin J, Emani M, Yoo H, Yin J, Tsaris A, Subbiah V, Raza T, Liu J, Trebesch N, Wells G, Mysore V, Gibbs T, Phillips J, Chennubhotla SC, Foster I, Stevens R, Anandkumar A, Vishwanath V, Stone JE, Tajkhorshid E, A Harris S and Ramanathan A
The severe acute respiratory syndrome coronavirus-2 (SARS-CoV-2) replication transcription complex (RTC) is a multi-domain protein responsible for replicating and transcribing the viral mRNA inside a human cell. Attacking RTC function with pharmaceutical compounds is a pathway to treating COVID-19. Conventional tools, e.g., cryo-electron microscopy and all-atom molecular dynamics (AAMD), do not provide sufficiently high resolution or timescale to capture important dynamics of this molecular machine. Consequently, we develop an innovative workflow that bridges the gap between these resolutions, using mesoscale fluctuating finite element analysis (FFEA) continuum simulations and a hierarchy of AI-methods that continually learn and infer features for maintaining consistency between AAMD and FFEA simulations. We leverage a multi-site distributed workflow manager to orchestrate AI, FFEA, and AAMD jobs, providing optimal resource utilization across HPC centers. Our study provides unprecedented access to study the SARS-CoV-2 RTC machinery, while providing general capability for AI-enabled multi-resolution simulations at scale.
Digital transformation of droplet/aerosol infection risk assessment realized on "Fugaku" for the fight against COVID-19
Ando K, Bale R, Li C, Matsuoka S, Onishi K and Tsubokura M
The fastest supercomputer in 2020, Fugaku, has not only achieved digital transformation of epidemiology in allowing end-to-end, detailed quantitative modeling of COVID-19 transmissions for the first time but also transformed the behavior of the entire Japanese public through its detailed analysis of transmission risks in multitudes of societal situations entailing heavy risks. A novel aerosol simulation methodology was synthesized out of a combination of a new CFD methods meeting industrial demands in the solver, CUBE (Jansson et al., 2019), which not only allowed the simulations to scale massively with high resolution required for micrometer virus-containing aerosol particles but also enabled extremely rapid time-to-solution due to its ability to generate the digital twins representing multitudes of societal situations in a matter of minutes, attaining true overall application high performance; such simulations have been running for the past 1.5°years on Fugaku, cumulatively consuming top supercomputer-class resources and the communicated by the media as well as becoming the basis for official public policies.
Language models for the prediction of SARS-CoV-2 inhibitors
Blanchard AE, Gounley J, Bhowmik D, Chandra Shekar M, Lyngaas I, Gao S, Yin J, Tsaris A, Wang F and Glaser J
The COVID-19 pandemic highlights the need for computational tools to automate and accelerate drug design for novel protein targets. We leverage deep learning language models to generate and score drug candidates based on predicted protein binding affinity. We pre-trained a deep learning language model (BERT) on ∼9.6 billion molecules and achieved peak performance of 603 petaflops in mixed precision. Our work reduces pre-training time from days to hours, compared to previous efforts with this architecture, while also increasing the dataset size by nearly an order of magnitude. For scoring, we fine-tuned the language model using an assembled set of thousands of protein targets with binding affinity data and searched for inhibitors of specific protein targets, SARS-CoV-2 Mpro and PLpro. We utilized a genetic algorithm approach for finding optimal candidates using the generation and scoring capabilities of the language model. Our generalizable models accelerate the identification of inhibitors for emerging therapeutic targets.
AI-driven multiscale simulations illuminate mechanisms of SARS-CoV-2 spike dynamics
Casalino L, Dommer AC, Gaieb Z, Barros EP, Sztain T, Ahn SH, Trifan A, Brace A, Bogetti AT, Clyde A, Ma H, Lee H, Turilli M, Khalid S, Chong LT, Simmerling C, Hardy DJ, Maia JD, Phillips JC, Kurth T, Stern AC, Huang L, McCalpin JD, Tatineni M, Gibbs T, Stone JE, Jha S, Ramanathan A and Amaro RE
We develop a generalizable AI-driven workflow that leverages heterogeneous HPC resources to explore the time-dependent dynamics of molecular systems. We use this workflow to investigate the mechanisms of infectivity of the SARS-CoV-2 spike protein, the main viral infection machinery. Our workflow enables more efficient investigation of spike dynamics in a variety of complex environments, including within a complete SARS-CoV-2 viral envelope simulation, which contains 305 million atoms and shows strong scaling on ORNL Summit using NAMD. We present several novel scientific discoveries, including the elucidation of the spike's full glycan shield, the role of spike glycans in modulating the infectivity of the virus, and the characterization of the flexible interactions between the spike and the human ACE2 receptor. We also demonstrate how AI can accelerate conformational sampling across different systems and pave the way for the future application of such methods to additional studies in SARS-CoV-2 and other molecular systems.
Comparing perturbation models for evaluating stability of neuroimaging pipelines
Kiar G, de Oliveira Castro P, Rioux P, Petit E, Brown ST, Evans AC and Glatard T
With an increase in awareness regarding a troubling lack of reproducibility in analytical software tools, the degree of validity in scientific derivatives and their downstream results has become unclear. The nature of reproducibility issues may vary across domains, tools, data sets, and computational infrastructures, but numerical instabilities are thought to be a core contributor. In neuroimaging, unexpected deviations have been observed when varying operating systems, software implementations, or adding negligible quantities of noise. In the field of numerical analysis, these issues have recently been explored through Monte Carlo Arithmetic, a method involving the instrumentation of floating-point operations with probabilistic noise injections at a target precision. Exploring multiple simulations in this context allows the characterization of the result space for a given tool or operation. In this article, we compare various perturbation models to introduce instabilities within a typical neuroimaging pipeline, including (i) targeted noise, (ii) Monte Carlo Arithmetic, and (iii) operating system variation, to identify the significance and quality of their impact on the resulting derivatives. We demonstrate that even low-order models in neuroimaging such as the structural connectome estimation pipeline evaluated here are sensitive to numerical instabilities, suggesting that stability is a relevant axis upon which tools are compared, alongside more traditional criteria such as biological feasibility, computational efficiency, or, when possible, accuracy. Heterogeneity was observed across participants which clearly illustrates a strong interaction between the tool and data set being processed, requiring that the stability of a given tool be evaluated with respect to a given cohort. We identify use cases for each perturbation method tested, including quality assurance, pipeline error detection, and local sensitivity analysis, and make recommendations for the evaluation of stability in a practical and analytically focused setting. Identifying how these relationships and recommendations scale to higher order computational tools, distinct data sets, and their implication on biological feasibility remain exciting avenues for future work.
Application Performance Analysis and Efficient Execution on Systems with multi-core CPUs, GPUs and MICs: A Case Study with Microscopy Image Analysis
Teodoro G, Kurc T, Andrade G, Kong J, Ferreira R and Saltz J
We carry out a comparative performance study of multi-core CPUs, GPUs and Intel Xeon Phi (Many Integrated Core-MIC) with a microscopy image analysis application. We experimentally evaluate the performance of computing devices on core operations of the application. We correlate the observed performance with the characteristics of computing devices and data access patterns, computation complexities, and parallelization forms of the operations. The results show a significant variability in the performance of operations with respect to the device used. The performances of operations with regular data access are comparable or sometimes better on a MIC than that on a GPU. GPUs are more efficient than MICs for operations that access data irregularly, because of the lower bandwidth of the MIC for random data accesses. We propose new performance-aware scheduling strategies that consider variabilities in operation speedups. Our scheduling strategies significantly improve application performance compared to classic strategies in hybrid configurations.
High performance virtual drug screening on many-core processors
McIntosh-Smith S, Price J, Sessions RB and Ibarra AA
Drug screening is an important part of the drug development pipeline for the pharmaceutical industry. Traditional, lab-based methods are increasingly being augmented with computational methods, ranging from simple molecular similarity searches through more complex pharmacophore matching to more computationally intensive approaches, such as molecular docking. The latter simulates the binding of drug molecules to their targets, typically protein molecules. In this work, we describe BUDE, the Bristol University Docking Engine, which has been ported to the OpenCL industry standard parallel programming language in order to exploit the performance of modern many-core processors. Our highly optimized OpenCL implementation of BUDE sustains 1.43 TFLOP/s on a single Nvidia GTX 680 GPU, or 46% of peak performance. BUDE also exploits OpenCL to deliver effective performance portability across a broad spectrum of different computer architectures from different vendors, including GPUs from Nvidia and AMD, Intel's Xeon Phi and multi-core CPUs with SIMD instruction sets.
Feature-based Analysis of Large-scale Spatio-Temporal Sensor Data on Hybrid Architectures
Saltz J, Teodoro G, Pan T, Cooper L, Kong J, Klasky S and Kurc T
Analysis of large sensor datasets for structural and functional features has applications in many domains, including weather and climate modeling, characterization of subsurface reservoirs, and biomedicine. The vast amount of data obtained from state-of-the-art sensors and the computational cost of analysis operations create a barrier to such analyses. In this paper, we describe middleware system support to take advantage of large clusters of hybrid CPU-GPU nodes to address the data and compute-intensive requirements of feature-based analyses in large spatio-temporal datasets.
HPC AND GRID COMPUTING FOR INTEGRATIVE BIOMEDICAL RESEARCH
Kurc T, Hastings S, Kumar V, Langella S, Sharma A, Pan T, Oster S, Ervin D, Permar J, Narayanan S, Gil Y, Deelman E, Hall M and Saltz J
Integrative biomedical research projects query, analyze, and integrate many different data types and make use of datasets obtained from measurements or simulations of structure and function at multiple biological scales. With the increasing availability of high-throughput and high-resolution instruments, the integrative biomedical research imposes many challenging requirements on software middleware systems. In this paper, we look at some of these requirements using example research pattern templates. We then discuss how middleware systems, which incorporate Grid and high-performance computing, could be employed to address the requirements.
AUTOMATIC GENERATION OF FFT FOR TRANSLATIONS OF MULTIPOLE EXPANSIONS IN SPHERICAL HARMONICS
Kurzak J, Mirkovic D, Pettitt BM and Johnsson SL
The fast multipole method (FMM) is an efficient algorithm for calculating electrostatic interactions in molecular simulations and a promising alternative to Ewald summation methods. Translation of multipole expansion in spherical harmonics is the most important operation of the fast multipole method and the fast Fourier transform (FFT) acceleration of this operation is among the fastest methods of improving its performance. The technique relies on highly optimized implementation of fast Fourier transform routines for the desired expansion sizes, which need to incorporate the knowledge of symmetries and zero elements in the input arrays. Here a method is presented for automatic generation of such, highly optimized, routines.
THE VIRTUAL INSTRUMENT: SUPPORT FOR GRID-ENABLED MCELL SIMULATIONS
Casanova H, Berman F, Bartol T, Gokcay E, Sejnowski T, Birnbaum A, Dongarra J, Miller M, Ellisman M, Faerman M, Obertelli G, Wolski R, Pomerantz S and Stiles J
Ensembles of widely distributed, heterogeneous resources, or Grids, have emerged as popular platforms for large-scale scientific applications. In this paper we present the Virtual Instrument project, which provides an integrated application execution environment that enables end-users to run and interact with running scientific simulations on Grids. This work is performed in the specific context of MCell, a computational biology application. While MCell provides the basis for running simulations, its capabilities are currently limited in terms of scale, ease-of-use, and interactivity. These limitations preclude usage scenarios that are critical for scientific advances. Our goal is to create a scientific "Virtual Instrument" from MCell by allowing its users to transparently access Grid resources while being able to steer running simulations. In this paper, we motivate the Virtual Instrument project and discuss a number of relevant issues and accomplishments in the area of Grid software development and application scheduling. We then describe our software design and report on the current implementation. We verify and evaluate our design via experiments with MCell on a real-world Grid testbed.