Space-Time Distribution of Trichloroethylene Groundwater Concentrations: Geostatistical Modeling and Visualization
This paper describes a geostatistical approach to model and visualize the space-time distribution of groundwater contaminants. It is illustrated using data from one of the world's largest plume of trichloroethylene (TCE) contamination, extending over 23 km, which has polluted drinking water wells in northern Michigan. A total of 613 TCE concentrations were recorded at 36 wells between May 2003 and October 2018. To account for the non-stationarity of the spatial covariance, the data were first projected in a new space using multidimensional scaling. During this spatial deformation the domain is stretched in regions of relatively lower spatial correlation (i.e., higher spatial dispersion), while being contracted in regions of higher spatial correlation. The range of temporal autocorrelation is 43 months, while the spatial range is 11 km. The sample semivariogram was fitted using three different types of non-separable space-time models, and their prediction performance was compared using cross-validation. The sum-metric and product-sum semivariogram models performed equally well, with a mean absolute error of prediction corresponding to 23% of the mean TCE concentration. The observations were then interpolated every 6 months to the nodes of a 150 m spacing grid covering the study area and results were visualized using a three-dimensional space-time cube. This display highlights how TCE concentrations increased over time in the northern part of the study area, as the plume is flowing to the so-called Chain of Lakes.
From Natural Resources Evaluation to Spatial Epidemiology: 25 Years in the Making
When in the winter of 1994, under the supervision of my post-doc adviser André Journel, I started writing "" in the bedroom of a tiny Palo Alto apartment, little did I know that 25 years later I would be conducting NIH-funded research on medical geostatistics from a lakefront office nestled in the Irish Hills of Michigan. The professional and personal path that led me to trade the mapping of heavy metal concentrations in the topsoil of the Swiss Jura for the geostatistical analysis of cancer data was anything but planned, yet André's help and guidance were instrumental early on. Looking back, shifting scientific interest from the characterization of contaminated sites to human health made sense as the field of epidemiology is increasingly concerned with the concept of exposome, which comprises all environmental exposures (e.g., air, soil, drinking water) that a person experiences from conception throughout the life course. Although both environmental and epidemiological data exhibit space-time variability, the latter has specific characteristics that required the adaptation of traditional geostatistical tools, such as semivariogram and kriging. Challenges include: (i) the heteroscedasticity of disease rate data (i.e., larger uncertainty of disease rates computed from small populations), (ii) their uneven spatial support (e.g., rates recorded for administrative units of different size and shape), and (iii) the limitations of Euclidean metrics to embody proximity when dealing with data that pertain to human mobility. Most of these challenges were addressed by borrowing concepts developed in adjacent fields, stressing the value of interdisciplinary research and intellectual curiosity, something I learned as a fresh PhD in agronomical sciences joining André's research group at the Stanford Center for Reservoir Forecasting in the early nineties.
Development and Evaluation of Geostatistical Methods for Non-Euclidean-Based Spatial Covariance Matrices
Customary and routine practice of geostatistical modeling assumes that inter-point distances are a Euclidean metric (i.e., as the crow flies) when characterizing spatial variation. There are many real-world settings, however, in which the use of a non-Euclidean distance is more appropriate, for example in complex bodies of water. However, if such a distance is used with current semivariogram functions, the resulting spatial covariance matrices are no longer guaranteed to be positive-definite. Previous attempts to address this issue for geostatistical prediction (i.e., kriging) models transform the non-Euclidean space into a Euclidean metric, such as through multi-dimensional scaling (MDS). However, these attempts estimate spatial covariances only after distances are scaled. An alternative method is proposed to re-estimate a spatial covariance structure originally based on a non-Euclidean distance metric to ensure validity. This method is compared to the standard use of Euclidean distance, as well as a previously utilized MDS method. All methods are evaluated using cross-validation assessments on both simulated and real-world experiments. Results show a high level of bias in prediction variance for the previously developed MDS method that has not been highlighted previously. Conversely, the proposed method offers a preferred tradeoff between prediction accuracy and prediction variance and at times outperforms the existing methods for both sets of metrics. Overall results indicate that this proposed method can provide improved geostatistical predictions while ensuring valid results when the use of non-Euclidean distances is warranted.
A multivariate geostatistical methodology to delineate areas of potential interest for future sedimentary gold exploration
This paper describes a multivariate geostatistical methodology to delineate areas of potential interest for future sedimentary gold exploration, with an application to an abandoned sedimentary gold mining region in Portugal. The main challenge was the existence of only a dozen gold measurements confined to the grounds of the old gold mines, which precluded the application of traditional interpolation techniques, such as cokriging. The analysis could, however, capitalize on 376 stream sediment samples that were analyzed for twenty two elements. Gold (Au) was first predicted at all 376 locations using linear regression (R=0.798) and four metals (Fe, As, Sn and W), which are known to be mostly associated with the local gold's paragenesis. One hundred realizations of the spatial distribution of gold content were generated using sequential indicator simulation and a soft indicator coding of regression estimates, to supplement the hard indicator coding of gold measurements. Each simulated map then underwent a local cluster analysis to identify significant aggregates of low or high values. The one hundred classified maps were processed to derive the most likely classification of each simulated node and the associated probability of occurrence. Examining the distribution of the hot-spots and cold-spots reveals a clear enrichment in Au along the Erges River downstream from the old sedimentary mineralization.
Time-Lapse Analysis of Methane Quantity in the Mary Lee Group of Coal Seams Using Filter-Based Multiple-Point Geostatistical Simulation
Coal seam degasification and its success are important for controlling methane, and thus for the health and safety of coal miners. During the course of degasification, properties of coal seams change. Thus, the changes in coal reservoir conditions and in-place gas content as well as methane emission potential into mines should be evaluated by examining time-dependent changes and the presence of major heterogeneities and geological discontinuities in the field. In this work, time-lapsed reservoir and fluid storage properties of the New Castle coal seam, Mary Lee/Blue Creek seam, and Jagger seam of Black Warrior Basin, Alabama, were determined from gas and water production history matching and production forecasting of vertical degasification wellbores. These properties were combined with isotherm and other important data to compute gas-in-place (GIP) and its change with time at borehole locations. Time-lapsed training images (TIs) of GIP and GIP difference corresponding to each coal and date were generated by using these point-wise data and Voronoi decomposition on the TI grid, which included faults as discontinuities for expansion of Voronoi regions. Filter-based multiple-point geostatistical simulations, which were preferred in this study due to anisotropies and discontinuities in the area, were used to predict time-lapsed GIP distributions within the study area. Performed simulations were used for mapping spatial time-lapsed methane quantities as well as their uncertainties within the study area. The systematic approach presented in this paper is the first time in literature that history matching, TIs of GIPs and filter simulations are used for degasification performance evaluation and for assessing GIP for mining safety. Results from this study showed that using production history matching of coalbed methane wells to determine time-lapsed reservoir data could be used to compute spatial GIP and representative GIP TIs generated through Voronoi decomposition. Furthermore, performing filter simulations using point-wise data and TIs could be used to predict methane quantity in coal seams subjected to degasification. During the course of the study, it was shown that the material balance of gas produced by wellbores and the GIP reductions in coal seams predicted using filter simulations compared very well, showing the success of filter simulations for continuous variables in this case study. Quantitative results from filter simulations of GIP within the studied area briefly showed that GIP was reduced from an initial ~73 Bcf (median) to ~46 Bcf (2011), representing a 37 % decrease and varying spatially through degasification. It is forecasted that there will be an additional ~2 Bcf reduction in methane quantity between 2011 and 2015. This study and presented results showed that the applied methodology and utilized techniques can be used to map GIP and its change within coal seams after degasification, which can further be used for ventilation design for methane control in coal mines.
Combining Areal and Point Data in Geostatistical Interpolation: Applications to Soil Science and Medical Geography
A common issue in spatial interpolation is the combination of data measured over different spatial supports. For example, information available for mapping disease risk typically includes point data (e.g. patients' and controls' residence) and aggregated data (e.g. socio-demographic and economic attributes recorded at the census track level). Similarly, soil measurements at discrete locations in the field are often supplemented with choropleth maps (e.g. soil or geological maps) that model the spatial distribution of soil attributes as the juxtaposition of polygons (areas) with constant values. This paper presents a general formulation of kriging that allows the combination of both point and areal data through the use of area-to-area, area-to-point, and point-to-point covariances in the kriging system. The procedure is illustrated using two data sets: (1) geological map and heavy metal concentrations recorded in the topsoil of the Swiss Jura, and (2) incidence rates of late-stage breast cancer diagnosis per census tract and location of patient residences for three counties in Michigan. In the second case, the kriging system includes an error variance term derived according to the binomial distribution to account for varying degree of reliability of incidence rates depending on the total number of cases recorded in those tracts. Except under the binomial kriging framework, area-and-point (AAP) kriging ensures the coherence of the prediction so that the average of interpolated values within each mapping unit is equal to the original areal datum. The relationships between binomial kriging, Poisson kriging, and indicator kriging are discussed under different scenarios for the population size and spatial support. Sensitivity analysis demonstrates the smaller smoothing and greater prediction accuracy of the new procedure over ordinary and traditional residual kriging based on the assumption that the local mean is constant within each mapping unit.
High-Order Sequential Simulation via Statistical Learning in Reproducing Kernel Hilbert Space
The present work proposes a new high-order simulation framework based on statistical learning. The training data consist of the sample data together with a training image, and the learning target is the underlying random field model of spatial attributes of interest. The learning process attempts to find a model with expected high-order spatial statistics that coincide with those observed in the available data, while the learning problem is approached within the statistical learning framework in a reproducing kernel Hilbert space (RKHS). More specifically, the required RKHS is constructed via a spatial Legendre moment (SLM) reproducing kernel that systematically incorporates the high-order spatial statistics. The target distributions of the random field are mapped into the SLM-RKHS to start the learning process, where solutions of the random field model amount to solving a quadratic programming problem. Case studies with a known data set in different initial settings show that sequential simulation under the new framework reproduces the high-order spatial statistics of the available data and resolves the potential conflicts between the training image and the sample data. This is due to the characteristics of the spatial Legendre moment kernel and the generalization capability of the proposed statistical learning framework. A three-dimensional case study at a gold deposit shows practical aspects of the proposed method in real-life applications.
Error Propagation in Isometric Log-ratio Coordinates for Compositional Data: Theoretical and Practical Considerations
Compositional data, as they typically appear in geochemistry in terms of concentrations of chemical elements in soil samples, need to be expressed in log-ratio coordinates before applying the traditional statistical tools if the relative structure of the data is of primary interest. There are different possibilities for this purpose, like centered log-ratio coefficients, or isometric log-ratio coordinates. In both the approaches, geometric means of the compositional parts are involved, and it is unclear how measurement errors or detection limit problems affect their presentation in coordinates. This problem is investigated theoretically by making use of the theory of error propagation. Due to certain limitations of this approach, the effect of error propagation is also studied by means of simulations. This allows to provide recommendations for practitioners on the amount of error and on the expected distortion of the results, depending on the purpose of the analysis.
Compression-based Facies Modelling
Simple object- or pixel-based facies models use facies proportions as the constraining input parameter to be honored in the output model. The resultant interconnectivity of the facies bodies is an unconstrained output property of the modelling, and if the objects being modelled are geometrically representative in three dimensions, commonly-available methods will produce well-connected facies when the model net:gross ratio exceeds about 30%. Geological processes have more degrees of freedom, and facies in high net:gross natural systems often have much lower connectivity than can be achieved by object-based or common implementations of pixel-based forward modelling. The compression method decouples facies proportion from facies connectivity in the modelling process and allows systems to be generated in which both are defined independently at input. The two-step method first generates a model with the correct connectivity but incorrect facies proportions using a conventional method, and then applies a geometrical transform to scale the model to the correct facies proportions while retaining the connectivity of the original model. The method, and underlying parameters, are described and illustrated using examples representative of low and high connectivity geological systems.
Simultaneous Stochastic Optimization of Mining Complexes and Mineral Value Chains
Recent developments in modelling and optimization approaches for the production of mineral and energy resources have resulted in new simultaneous stochastic optimization frameworks and related digital technologies. A mining complex is a type of value chain whereby raw materials (minerals) extracted from various mineral deposits are transformed into a set of sellable products, using the available processing streams. The supply of materials extracted from a group of mines represents a major source of uncertainty in mining operations and mineral value chains. The simultaneous stochastic optimization of mining complexes, presented herein, aims to address major limitations of past approaches by modelling and optimizing several interrelated aspects of the mineral value chain in a single model. This single optimization model integrates material extraction from a set of sources along with their uncertainty, the related risk management, blending, stockpiling, non-linear transformations that occur in the available processing streams, the utilization of processing streams, and, finally, the transportation of products to customers. Uncertainty in materials extracted from the related mineral deposits of a mining complex is represented by a group of stochastic simulations. This paper presents a two-stage stochastic mixed integer nonlinear programming formulation for modelling and optimizing a mining complex, along with a metaheuristic-based solver that facilitates the practical optimization of exceptionally large mathematical formulations. The distinct advantages of the approach presented herein are demonstrated through two case studies, where the stochastic framework is compared to past approaches that ignore uncertainty. Results demonstrate major improvements in both meeting forecasted production targets and net present value. Concepts and methods presented in this paper for the simultaneous stochastic optimization for mining complexes may be adopted and applied to the optimization of smart oil fields.
Vertical Stacking Statistics of Multi-facies Object-Based Models
Equations describing facies proportions and amalgamation ratios are derived for randomly placed objects belonging to two or three foreground facies embedded in a background facies, as a function of the volume fractions and object thicknesses of independent facies models combined in a stratigraphically meaningful order. The equations are validated using one-dimensional continuum models. Evaluation of the equations reveals a simple relationship between an effective facies proportion and an effective amalgamation ratio, both measured as a function only of the facies in question and the background facies. This relationship provides a firm analytical basis for applying the compression algorithm to multi-facies object-based models. A set of two-dimensional cross-sectional models illustrates the approach, which allows models to be generated with realistic object stacking characteristics defined independently for each facies in a multi-facies object-based model.
A Study of the Influence of Measurement Volume, Blending Ratios and Sensor Precision on Real-Time Reconciliation of Grade Control Models
The mining industry continuously struggles to keep produced tonnages and grades aligned with targets derived from model-based expectations. Deviations often result from the inability to characterise short-term production units accurately based on sparsely distributed exploration data. During operation, the characterisation of short-term production units can be significantly improved when deviations are monitored and integrated back into the underlying grade control model. A previous contribution introduced a novel simulation-based geostatistical approach to repeatedly update the grade control model based on online data from a production monitoring system. The added value of the presented algorithm results from its ability to handle inaccurate observations made on blended material streams originating from two or more extraction points. This contribution further extends previous work studying the relation between system control parameters and algorithm performance. A total of 125 experiments are conducted to quantify the effects of variations in measurement volume, blending ratio and sensor precision. Based on the outcome of the experiments, recommendations are formulated for optimal operation of the monitoring system, guaranteeing the best possible algorithm performance.
High-Order Spatial Simulation Using Legendre-Like Orthogonal Splines
High-order sequential simulation techniques for complex non-Gaussian spatially distributed variables have been developed over the last few years. The high-order simulation approach does not require any transformation of initial data and makes no assumptions about any probability distribution function, while it introduces complex spatial relations to the simulated realizations via high-order spatial statistics. This paper presents a new extension where a conditional probability density function (cpdf) is approximated using Legendre-like orthogonal splines. The coefficients of spline approximation are estimated using high-order spatial statistics inferred from the available sample data, additionally complemented by a training image. The advantages of using orthogonal splines with respect to the previously used Legendre polynomials include their ability to better approximate a multidimensional probability density function, reproduce the high-order spatial statistics, and provide a generalization of high-order simulations using Legendre polynomials. The performance of the new method is first tested with a completely known image and compared to both the high-order simulation approach using Legendre polynomials and the conventional sequential Gaussian simulation method. Then, an application in a gold deposit demonstrates the advantages of the proposed method in terms of the reproduction of histograms, variograms, and high-order spatial statistics, including connectivity measures. The C++ course code of the high-order simulation implementation presented herein, along with an example demonstrating its utilization, are provided online as supplementary material.
Optimizing Infill Drilling Decisions Using Multi-Armed Bandits: Application in a Long-Term, Multi-Element Stockpile
Mining operations face a decision regarding additional drilling several times during their lifetime. The two questions that always arise upon making this decision are whether more drilling is required and, if so, where the additional drill holes should be located. The method presented in this paper addresses both of these questions through an optimization in a multi-armed bandit (MAB) framework. The MAB optimizes the best infill drilling pattern while taking geological uncertainty into account by using multiple conditional simulations for the deposit under consideration. The proposed method is applied to a long-term, multi-element stockpile, which is a part of a gold mining complex. The stockpiles in this mining complex are of particular interest due to difficult-to-meet blending requirements. In several mining periods grade targets of deleterious elements at the processing plant can only be met by using high amounts of stockpiled material. The best pattern is defined in terms of causing the most material type changes for the blocks in the stockpile. Material type changes are the driver for changes in the extraction sequence, which ultimately defines the value of a mining operation. The results of the proposed method demonstrate its practical aspects and its effectiveness towards the optimization of infill drilling schemes.
A New Computational Model of High-Order Stochastic Simulation Based on Spatial Legendre Moments
Multiple-point simulations have been introduced over the past decade to overcome the limitations of second-order stochastic simulations in dealing with geologic complexity, curvilinear patterns, and non-Gaussianity. However, a limitation is that they sometimes fail to generate results that comply with the statistics of the available data while maintaining the consistency of high-order spatial statistics. As an alternative, high-order stochastic simulations based on spatial cumulants or spatial moments have been proposed; however, they are also computationally demanding, which limits their applicability. The present work derives a new computational model to numerically approximate the conditional probability density function (cpdf) as a multivariate Legendre polynomial series based on the concept of spatial Legendre moments. The advantage of this method is that no explicit computations of moments (or cumulants) are needed in the model. The approximation of the cpdf is simplified to the computation of a unified empirical function. Moreover, the new computational model computes the cpdfs within a local neighborhood without storing the high-order spatial statistics through a predefined template. With this computational model, the algorithm for the estimation of the cpdf is developed in such a way that the conditional cumulative distribution function (ccdf) can be computed conveniently through another recursive algorithm. In addition to the significant reduction of computational cost, the new algorithm maintains higher numerical precision compared to the original version of the high-order simulation. A new method is also proposed to deal with the replicates in the simulation algorithm, reducing the impacts of conflicting statistics between the sample data and the training image (TI). A brief description of implementation is provided and, for comparison and verification, a set of case studies is conducted and compared with the results of the well-established multi-point simulation algorithm, filtersim. This comparison demonstrates that the proposed high-order simulation algorithm can generate spatially complex geological patterns while also reproducing the high-order spatial statistics from the sample data.
A New Non-stationary High-order Spatial Sequential Simulation Method
A new non-stationary, high-order sequential simulation method is presented herein, aiming to accommodate complex curvilinear patterns when modelling non-Gaussian, spatially distributed and variant attributes of natural phenomena. The proposed approach employs spatial templates, training images and a set of sample data. At each step of a multi-grid approach, a template consisting of several data points and a simulation node located in the center of the grid is selected. To account for the non-stationarity exhibited in the samples, the data events decided by the conditioning data are utilized to calibrate the importance of the related replicates. Sliding the template over the training image generates a set of training patterns, and for each pattern a weight is calculated. The weight value of each training pattern is determined by a similarity measure defined herein, which is calculated between the data event of the training pattern and that of the simulation pattern. This results in a non-stationary spatial distribution of the weight values for the training patterns. The proposed new similarity measure is constructed from the high-order statistics of data events from the available data set, when compared to their corresponding training patterns. In addition, this new high-order statistics measure allows for the effective detection of similar patterns in different orientations, as these high-order statistics conform to the commutativity property. The proposed method is robust against the addition of more training images due to its non-stationary aspect; it only uses replicates from the pattern database with the most similar local high-order statistics to simulate each node. Examples demonstrate the key aspects of the method.
Solving Geophysical Inversion Problems with Intractable Likelihoods: Linearized Gaussian Approximations Versus the Correlated Pseudo-marginal Method
A geophysical Bayesian inversion problem may target the posterior distribution of geological or hydrogeological parameters given geophysical data. To account for the scatter in the petrophysical relationship linking the target parameters to the geophysical properties, this study treats the intermediate geophysical properties as latent (unobservable) variables. To perform inversion in such a latent variable model, the intractable likelihood function of the (hydro)geological parameters given the geophysical data needs to be estimated. This can be achieved by approximation with a Gaussian probability density function based on local linearization of the geophysical forward operator, thereby, accounting for the noise in the petrophysical relationship by a corresponding addition to the data covariance matrix. The new approximate method is compared against the general correlated pseudo-marginal method, which estimates the likelihood by Monte Carlo averaging over samples of the latent variable. First, the performances of the two methods are tested on a synthetic test example, in which a multivariate Gaussian porosity field is inferred using crosshole ground-penetrating radar first-arrival travel times. For this example with rather small petrophysical uncertainty, the two methods provide near-identical estimates, while an inversion that ignores petrophysical uncertainty leads to biased estimates. The results of a sensitivity analysis are then used to suggest that the linearized Gaussian approach, while attractive due to its relative computational speed, suffers from a decreasing accuracy with increasing scatter in the petrophysical relationship. The computationally more expensive correlated pseudo-marginal method performs very well even for settings with high petrophysical uncertainty.
Training Image Free High-Order Stochastic Simulation Based on Aggregated Kernel Statistics
A training image free, high-order sequential simulation method is proposed herein, which is based on the efficient inference of high-order spatial statistics from the available sample data. A statistical learning framework in kernel space is adopted to develop the proposed simulation method. Specifically, a new concept of aggregated kernel statistics is proposed to enable sparse data learning. The conditioning data in the proposed high-order sequential simulation method appear as data events corresponding to the attribute values associated with the so-called spatial templates of various geometric configurations. The replicates of the data events act as the training data in the learning framework for inference of the conditional probability distribution and generation of simulated values. These replicates are mapped into spatial Legendre moment kernel spaces, and the kernel statistics are computed thereafter, encapsulating the high-order spatial statistics from the available data. To utilize the incomplete information from the replicates, which partially match the spatial template of a given data event, the aggregated kernel statistics combine the ensemble of the elements in different kernel subspaces for statistical inference, embedding the high-order spatial statistics of the replicates associated with various spatial templates into the same kernel subspace. The aggregated kernel statistics are incorporated into a learning algorithm to obtain the target probability distribution in the underlying random field, while preserving in the simulations the high-order spatial statistics from the available data. The proposed method is tested using a synthetic dataset, showing the reproduction of the high-order spatial statistics of the sample data. The comparison with the corresponding high-order simulation method using TIs emphasizes the generalization capacity of the proposed method for sparse data learning.
A Graph Clustering Approach to Localization for Adaptive Covariance Tuning in Data Assimilation Based on State-Observation Mapping
An original graph clustering approach for the efficient localization of error covariances is proposed within an ensemble-variational data assimilation framework. Here, the localization term is very generic and refers to the idea of breaking up a global assimilation into subproblems. This unsupervised localization technique based on a linearized state-observation measure is general and does not rely on any prior information such as relevant spatial scales, empirical cutoff radii or homogeneity assumptions. Localization is performed via graph theory, a branch of mathematics emerging as a powerful approach to capturing complex and highly interconnected Earth and environmental systems in computational geosciences. The novel approach automatically segregates the state and observation variables in an optimal number of clusters, and it is more amenable to scalable data assimilation. The application of this method does not require underlying block-diagonal structures of prior covariance matrices. To address intercluster connectivity, two alternative data adaptations are proposed. Once the localization is completed, a covariance diagnosis and tuning are performed within each cluster, whose contribution is sequentially integrated into the entire covariance matrix. Numerical twin experiment tests show the reduced cost and added flexibility of this approach compared to global covariance tuning, and more accurate results yielded for both observation- and background-error parameter tuning.
Blind Source Separation for Compositional Time Series
Many geological phenomena are regularly measured over time to follow developments and changes. For many of these phenomena, the absolute values are not of interest, but rather the relative information, which means that the data are compositional time series. Thus, the serial nature and the compositional geometry should be considered when analyzing the data. Multivariate time series are already challenging, especially if they are higher dimensional, and latent variable models are a popular way to deal with this kind of data. Blind source separation techniques are well-established latent factor models for time series, with many variants covering quite different time series models. Here, several such methods and their assumptions are reviewed, and it is shown how they can be applied to high-dimensional compositional time series. Also, a novel blind source separation method is suggested which is quite flexible regarding the assumptions of the latent time series. The methodology is illustrated using simulations and in an application to light absorbance data from water samples taken from a small stream in Lower Austria.