Finding the semantic similarity in single-particle diffraction images using self-supervised contrastive projection learning
Single-shot coherent diffraction imaging of isolated nanosized particles has seen remarkable success in recent years, yielding in-situ measurements with ultra-high spatial and temporal resolution. The progress of high-repetition-rate sources for intense X-ray pulses has further enabled recording datasets containing millions of diffraction images, which are needed for the structure determination of specimens with greater structural variety and dynamic experiments. The size of the datasets, however, represents a monumental problem for their analysis. Here, we present an automatized approach for finding semantic similarities in coherent diffraction images without relying on human expert labeling. By introducing the concept of projection learning, we extend self-supervised contrastive learning to the context of coherent diffraction imaging and achieve a dimensionality reduction producing semantically meaningful embeddings that align with physical intuition. The method yields substantial improvements compared to previous approaches, paving the way toward real-time and large-scale analysis of coherent diffraction experiments at X-ray free-electron lasers.
Infrared-active phonons in one-dimensional materials and their spectroscopic signatures
Dimensionality provides a clear fingerprint on the dispersion of infrared-active, polar-optical phonons. For these phonons, the local dipoles parametrized by the Born effective charges drive the LO-TO splitting of bulk materials; this splitting actually breaks down in two-dimensional materials. Here, we develop the theory for one-dimensional (1D) systems-nanowires, nanotubes, and atomic and polymeric chains. Combining an analytical model with the implementation of density-functional perturbation theory in 1D boundary conditions, we show that the dielectric splitting in the dispersion relations collapses as at the zone center. The dielectric properties and the radius of the 1D materials are linked by the present work to these red shifts, opening infrared and Raman characterization avenues.
A general-purpose material property data extraction pipeline from large polymer corpora using natural language processing
The ever-increasing number of materials science articles makes it hard to infer chemistry-structure-property relations from literature. We used natural language processing methods to automatically extract material property data from the abstracts of polymer literature. As a component of our pipeline, we trained MaterialsBERT, a language model, using 2.4 million materials science abstracts, which outperforms other baseline models in three out of five named entity recognition datasets. Using this pipeline, we obtained ~300,000 material property records from ~130,000 abstracts in 60 hours. The extracted data was analyzed for a diverse range of applications such as fuel cells, supercapacitors, and polymer solar cells to recover non-trivial insights. The data extracted through our pipeline is made available at polymerscholar.org which can be used to locate material property data recorded in abstracts. This work demonstrates the feasibility of an automatic pipeline that starts from published literature and ends with extracted material property information.
Thermal conductivity of glasses: first-principles theory and applications
Predicting the thermal conductivity of glasses from first principles has hitherto been a very complex problem. The established Allen-Feldman and Green-Kubo approaches employ approximations with limited validity-the former neglects anharmonicity, the latter misses the quantum Bose-Einstein statistics of vibrations-and require atomistic models that are very challenging for first-principles methods. Here, we present a protocol to determine from first principles the thermal conductivity () of glasses above the plateau (i.e., above the temperature-independent region appearing almost without exceptions in the () of all glasses at cryogenic temperatures). The protocol combines the Wigner formulation of thermal transport with convergence-acceleration techniques, and accounts comprehensively for the effects of structural disorder, anharmonicity, and Bose-Einstein statistics. We validate this approach in vitreous silica, showing that models containing less than 200 atoms can already reproduce () in the macroscopic limit. We discuss the effects of anharmonicity and the mechanisms determining the trend of () at high temperature, reproducing experiments at temperatures where radiative effects remain negligible.
Systematic Coarse-graining of Epoxy Resins with Machine Learning-Informed Energy Renormalization
A persistent challenge in predictive molecular modeling of thermoset polymers is to capture the effects of chemical composition and degree of crosslinking (DC) on dynamical and mechanical properties with high computational efficiency. We established a new coarse-graining (CG) approach that combines the energy renormalization method with Gaussian process surrogate models of the molecular dynamics simulations. This allows a machine-learning informed functional calibration of DC-dependent CG force field parameters. Taking versatile epoxy resins consisting of Bisphenol A diglycidyl ether combined with curing agent of either 4,4-Diaminodicyclohexylmethane or polyoxypropylene diamines, we demonstrated excellent agreement between all-atom and CG predictions for density, Debye-Waller factor, Young's modulus and yield stress at any DC. We further introduce a surrogate model enabled simplification of the functional forms of 14 non-bonded calibration parameters by quantifying the uncertainty of a candidate set of high-dimensional/flexible calibration functions. The framework established provides an efficient methodology for chemistry-specific, large-scale investigations of the dynamics and mechanics of epoxy resins.
Quantum-accurate machine learning potentials for metal-organic frameworks using temperature driven active learning
Understanding structural flexibility of metal-organic frameworks (MOFs) via molecular dynamics simulations is crucial to design better MOFs. Density functional theory (DFT) and quantum-chemistry methods provide highly accurate molecular dynamics, but the computational overheads limit their use in long time-dependent simulations. In contrast, classical force fields struggle with the description of coordination bonds. Here we develop a DFT-accurate machine-learning spectral neighbor analysis potentials for two representative MOFs. Their structural and vibrational properties are then studied and tightly compared with available experimental data. Most importantly, we demonstrate an active-learning algorithm, based on mapping the relevant internal coordinates, which drastically reduces the number of training data to be computed at the DFT level. Thus, the workflow presented here appears as an efficient strategy for the study of flexible MOFs with DFT accuracy, but at a fraction of the DFT computational cost.
The energy landscape of magnetic materials
Magnetic materials can display many solutions to the electronic-structure problem, corresponding to different local or global minima of the energy functional. In Hartree-Fock or density-functional theory different single-determinant solutions lead to different magnetizations, ionic oxidation states, hybridizations, and inter-site magnetic couplings. The vast majority of these states can be fingerprinted through their projection on the atomic orbitals of the magnetic ions. We have devised an approach that provides an effective control over these occupation matrices, allowing us to systematically explore the landscape of the potential energy surface. We showcase the emergence of a complex zoology of self-consistent states; even more so when semi-local density-functional theory is augmented - and typically made more accurate - by Hubbard corrections. Such extensive explorations allow to robustly identify the ground state of magnetic systems, and to assess the accuracy (or not) of current functionals and approximations.
Projectability disentanglement for accurate and automated electronic-structure Hamiltonians
Maximally-localized Wannier functions (MLWFs) are broadly used to characterize the electronic structure of materials. Generally, one can construct MLWFs describing isolated bands (e.g. valence bands of insulators) or entangled bands (e.g. valence and conduction bands of insulators, or metals). Obtaining accurate and compact MLWFs often requires chemical intuition and trial and error, a challenging step even for experienced researchers and a roadblock for high-throughput calculations. Here, we present an automated approach, projectability-disentangled Wannier functions (PDWFs), that constructs MLWFs spanning the occupied bands and their complement for the empty states, providing a tight-binding picture of optimized atomic orbitals in crystals. Key to the algorithm is a projectability measure for each Bloch state onto atomic orbitals, determining if that state should be kept identically, discarded, or mixed into the disentanglement. We showcase the accuracy on a test set of 200 materials, and the reliability by constructing 21,737 Wannier Hamiltonians.
Localization and segmentation of atomic columns in supported nanoparticles for fast scanning transmission electron microscopy
To accurately capture the dynamic behavior of small nanoparticles in scanning transmission electron microscopy, high-quality data and advanced data processing is needed. The fast scan rate required to observe structural dynamics inherently leads to very noisy data where machine learning tools are essential for unbiased analysis. In this study, we develop a workflow based on two U-Net architectures to automatically localize and classify atomic columns at particle-support interfaces. The model is trained on non-physical image simulations, achieves sub-pixel localization precision, high classification accuracy, and generalizes well to experimental data. We test our model on both in situ and ex situ experimental time series recorded at 5 frames per second of small Pt nanoparticles supported on CeO(111). The processed movies show sub-second dynamics of the nanoparticles and reveal site-specific movement patterns of individual atomic columns.
Ab initio framework for deciphering trade-off relationships in multi-component alloys
While first-principles methods have been successfully applied to characterize individual properties of multi-principal element alloys (MPEA), their use in searching for optimal trade-offs between competing properties is hampered by high computational demands. In this work, we present a framework to explore Pareto-optimal compositions by integrating advanced ab initio-based techniques into a Bayesian multi-objective optimization workflow, complemented by a simple analytical model providing straightforward analysis of trends. We benchmark the framework by applying it to solid solution strengthening and ductility of refractory MPEAs, with the parameters of the strengthening and ductility models being efficiently computed using a combination of the coherent-potential approximation method, accounting for finite-temperature effects, and actively-learned moment-tensor potentials parameterized with ab initio data. Properties obtained from ab initio calculations are subsequently used to extend predictions of all relevant material properties to a large class of refractory alloys with the help of the analytical model validated by the data and relying on a few element-specific parameters and universal functions that describe bonding between elements. Our findings offer crucial insights into the traditional strength-vs-ductility dilemma of refractory MPEAs. The proposed framework is versatile and can be extended to other materials and properties of interest, enabling a predictive and tractable high-throughput screening of Pareto-optimal MPEAs over the entire composition space.
Internal consistency of multi-tier +EDMFT
The multi-tier +EDMFT scheme is an ab-initio method for calculating the electronic structure of correlated materials. While the approach is free from ad-hoc parameters, it requires a selection of appropriate energy windows for describing low-energy and strongly correlated physics. In this study, we test the consistency of the multi-tier description by considering different low-energy windows for a series of cubic SrXO (X = V, Cr, Mn) perovskites. Specifically, we compare the 3-orbital model, the 5-orbital + model, the 12-orbital + model, and (in the case of SrVO) the 14-orbital + + model and compare the results to available photoemission and X-ray absorption measurements. The multi-tier method yields consistent results for the and + low-energy windows, while the models with states produce stronger correlation effects and mostly agree well with experiment, especially in the unoccupied part of the spectrum. We also discuss the consistency between the fermionic and bosonic spectral functions and the physical origin of satellite features, and present momentum-resolved charge susceptibilities.
A unified moment tensor potential for silicon, oxygen, and silica
Si and its oxides have been extensively explored in theoretical research due to their technological importance. Simultaneously describing interatomic interactions within both Si and SiO without the use of ab initio methods is considered challenging, given the charge transfers involved. Herein, this challenge is overcome by developing a unified machine learning interatomic potentials describing the Si/SiO/O system, based on the moment tensor potential (MTP) framework. This MTP is trained using a comprehensive database generated using density functional theory simulations, encompassing diverse crystal structures, point defects, extended defects, and disordered structure. Extensive testing of the MTP is performed, indicating it can describe static and dynamic features of very diverse Si, O, and SiO atomic structures with a degree of fidelity approaching that of DFT.
MEPO-ML: a robust graph attention network model for rapid generation of partial atomic charges in metal-organic frameworks
Accurate computation of the gas adsorption properties of MOFs is usually bottlenecked by the DFT calculations required to generate partial atomic charges. Therefore, large virtual screenings of MOFs often use the QEq method which is rapid, but of limited accuracy. Recently, machine learning (ML) models have been trained to generate charges in much better agreement with DFT-derived charges compared to the QEq models. Previous ML charge models for MOFs have all used training sets with less than 3000 MOFs obtained from the CoRE MOF database, which has recently been shown to have high structural error rates. In this work, we developed a graph attention network model for predicting DFT-derived charges in MOFs where the model was developed with the ARC-MOF database that contains 279,632 MOFs and over 40 million charges. This model, which we call , predicts charges with a mean absolute error of 0.025e on our test set of over 27 K MOFs. Other ML models reported in the literature were also trained using the same dataset and descriptors, and MEPO-ML was shown to give the lowest errors. The gas adsorption properties evaluated using MEPO-ML charges are found to be in significantly better agreement with the reference DFT-derived charges compared to the empirical charges, for both polar and non-polar gases. Using only a single CPU core on our benchmark computer, MEPO-ML charges can be generated in less than two seconds on average (including all computations required to apply the model) for MOFs in the test set of 27 K MOFs.
Hyperactive learning for data-driven interatomic potentials
Data-driven interatomic potentials have emerged as a powerful tool for approximating ab initio potential energy surfaces. The most time-consuming step in creating these interatomic potentials is typically the generation of a suitable training database. To aid this process hyperactive learning (HAL), an accelerated active learning scheme, is presented as a method for rapid automated training database assembly. HAL adds a biasing term to a physically motivated sampler (e.g. molecular dynamics) driving atomic structures towards uncertainty in turn generating unseen or valuable training configurations. The proposed HAL framework is used to develop atomic cluster expansion (ACE) interatomic potentials for the AlSi10 alloy and polyethylene glycol (PEG) polymer starting from roughly a dozen initial configurations. The HAL generated ACE potentials are shown to be able to determine macroscopic properties, such as melting temperature and density, with close to experimental accuracy.
The rule of four: anomalous distributions in the stoichiometries of inorganic compounds
Why are materials with specific characteristics more abundant than others? This is a fundamental question in materials science and one that is traditionally difficult to tackle, given the vastness of compositional and configurational space. We highlight here the anomalous abundance of inorganic compounds whose primitive unit cell contains a number of atoms that is a multiple of four. This occurrence-named here the -has to our knowledge not previously been reported or studied. Here, we first highlight the rule's existence, especially notable when restricting oneself to experimentally known compounds, and explore its possible relationship with established descriptors of crystal structures, from symmetries to energies. We then investigate this relative abundance by looking at structural descriptors, both of global (packing configurations) and local (the smooth overlap of atomic positions) nature. Contrary to intuition, the overabundance does not correlate with low-energy or high-symmetry structures; in fact, structures which obey the are characterized by low symmetries and loosely packed arrangements maximizing the free volume. We are able to correlate this abundance with local structural symmetries, and visualize the results using a hybrid supervised-unsupervised machine learning method.
A rule-free workflow for the automated generation of databases from scientific literature
In recent times, transformer networks have achieved state-of-the-art performance in a wide range of natural language processing tasks. Here we present a workflow based on the fine-tuning of BERT models for different downstream tasks, which results in the automated extraction of structured information from unstructured natural language in scientific literature. Contrary to existing methods for the automated extraction of structured compound-property relations from similar sources, our workflow does not rely on the definition of intricate grammar rules. Hence, it can be adapted to a new task without requiring extensive implementation efforts and knowledge. We test our data-extraction workflow by automatically generating a database for Curie temperatures and one for band gaps. These are then compared with manually curated datasets and with those obtained with a state-of-the-art rule-based method. Furthermore, in order to showcase the practical utility of the automatically extracted data in a material-design workflow, we employ them to construct machine-learning models to predict Curie temperatures and band gaps. In general, we find that, although more noisy, automatically extracted datasets can grow fast in volume and that such volume partially compensates for the inaccuracy in downstream tasks.
Not yet defect-free: the current landscape for women in computational materials research
Solids that are also liquids: elastic tensors of superionic materials
Superionics are fascinating materials displaying both solid- and liquid-like characteristics: as solids, they respond elastically to shear stress; as liquids, they display fast-ion diffusion at normal conditions. In addition to such scientific interest, superionics are technologically relevant for energy, electronics, and sensing applications. Characterizing and understanding their elastic properties is, e.g., urgently needed to address their feasibility as solid-state electrolytes in all-solid-state batteries. However, static approaches to elasticity assume well-defined reference positions around which atoms vibrate, in contrast with the quasi-liquid motion of the mobile ions in fast ionic conductors. Here, we derive the elastic tensors of superionics from ensemble fluctuations in the isobaric-isothermal ensemble, exploiting extensive Car-Parrinello simulations. We apply this approach to paradigmatic Li-ion conductors, and complement with a block analysis to compute statistical errors. Static approaches sampled over the trajectories often overestimate the response, highlighting the importance of a dynamical treatment in determining elastic tensors in superionics.
An interpretable deep learning approach for designing nanoporous silicon nitride membranes with tunable mechanical properties
The high permeability and strong selectivity of nanoporous silicon nitride (NPN) membranes make them attractive in a broad range of applications. Despite their growing use, the strength of NPN membranes needs to be improved for further extending their biomedical applications. In this work, we implement a deep learning framework to design NPN membranes with improved or prescribed strength values. We examine the predictions of our framework using physics-based simulations. Our results confirm that the proposed framework is not only able to predict the strength of NPN membranes with a wide range of microstructures, but also can design NPN membranes with prescribed or improved strength. Our simulations further demonstrate that the microstructural heterogeneity that our framework suggests for the optimized design, lowers the stress concentration around the pores and leads to the strength improvement of NPN membranes as compared to conventional membranes with homogenous microstructures.
Topological representations of crystalline compounds for the machine-learning prediction of materials properties
Accurate theoretical predictions of desired properties of materials play an important role in materials research and development. Machine learning (ML) can accelerate the materials design by building a model from input data. For complex datasets, such as those of crystalline compounds, a vital issue is how to construct low-dimensional representations for input crystal structures with chemical insights. In this work, we introduce an algebraic topology-based method, called atom-specific persistent homology (ASPH), as a unique representation of crystal structures. The ASPH can capture both pairwise and many-body interactions and reveal the topology-property relationship of a group of atoms at various scales. Combined with composition-based attributes, ASPH-based ML model provides a highly accurate prediction of the formation energy calculated by density functional theory (DFT). After training with more than 30,000 different structure types and compositions, our model achieves a mean absolute error of 61 meV/atom in cross-validation, which outperforms previous work such as Voronoi tessellations and Coulomb matrix method using the same ML algorithm and datasets. Our results indicate that the proposed topology-based method provides a powerful computational tool for predicting materials properties compared to previous works.
Enhanced spin Hall ratio in two-dimensional semiconductors
The conversion efficiency from charge current to spin current via the spin Hall effect is evaluated by the spin Hall ratio (SHR). Through state-of-the-art ab initio calculations involving both charge conductivity and spin Hall conductivity, we report the SHRs of the III-V monolayer family, revealing an ultrahigh ratio of 0.58 in the hole-doped GaAs monolayer. In order to find more promising 2D materials, a descriptor for high SHR is proposed and applied to a high-throughput database, which provides the fully relativistic band structures and Wannier Hamiltonians of 216 exfoliable monolayer semiconductors and has been released to the community. Among potential candidates for high SHR, the MXene monolayer ScCCl is identified with the proposed descriptor and confirmed by computation, demonstrating the descriptor validity for high SHR materials discovery.