This case study is meant to demonstrate how to use gstlearn for non-linear geostatistics with the Gaussian model.
We consider three supports:
The support of the samples, considered as points and noted \(x\),
The support of the selection, the bloc, noted \(v\),
The support of the reporting, the panels, noted \(V\).
The variable of interest \(z\) is defined over the domain \(S \subset \mathbb{R}^d\) with \(d = 2\). The domain is uniformly divided into panels and each panel is regularly subdivided into blocs, hence defining two regular grids, the grid of panels and the grid of blocs.
Hence, we will have a data set and two grids.
The regionalised variable \(z(x)\) for \(x \in D \subset \mathbb{R}^d\) is modeled as a transform of the stationary Gaussian Random Function, \(Z(x) = \phi(Y(x))\) where
\(E\{Y(x)\} = 0\) and \(Cov\{Y(x), Y(x+h)\} = \rho_Y(h) = 1 -\gamma_Y (h)\)
As \(Var(Z) < +\infty\), \(\phi \in L^2(g)\) and it can be expressed as a linear combination of Hermite polynomials \(\phi(y) = \sum_{n = 1}^{+\infty} \phi_n H_n(y)\).
Non linear Geostatistics implements non linear estimators of non linear transforms of the variable of interest \(Z\). Two issues are addressed, the selection and the change of support, with the following tasks common in geosciences (i.e., in a mining context or for environmental studies),
For example, recovered mineral resources will be defined by the selected ore at a given cutoff, \(T(z_c) = 1_{Z \geq z_c}\), and the metal contained in the selected ore, \(Q(z_c) = Z \times 1_{Z \geq z_c}\).
The average value of the volume \(v\) is noted \(Z(v) = \frac{1}{|v|} \int_v Z(u) du\).
A first task is to predict marginal distribution of \(Z(v)\), or the histogram, knowing the histogram of \(Z\) and its spatial structure. A second task is to predict the conditional distribution of \(Z(v)\) given the spatial the prior spatial model and some observations. The first question is referred to as the global recoverable resources, and the second one as the local recoverable resources.
Three estimators will be illustrated:
the conditional expectation (EC)
the disjunctive kriging (DK)
EC and DK can be used to evaluate point recovery functions at a non observed point \(x_0\), \(1_{Z(x_0) \geq z_c}\) and \(Z(x_0) \times 1_{Z(x_0) \geq z_c}\). They can also be used to evaluate bloc recovery functions for a block \(v\), \(1_{Z(v) \geq z_c}\) and \(Z(v) \times 1_{Z(v) \geq z_c}\). DK can also evaluate recovery functions average on a bigger support, e.g. \(\frac{1}{N} \sum_{i=1}^{N} 1_{Z(v_i) \geq z_c}\) is the recovered ore on the panels \(V = \cup_{i=1}^{N}v_i\).
UC computes block recovery functions averaged on a panel.
Two change of support models are available for a Gaussian model, they are noted respectively \(DGM-1\) and \(DGM-2\).
Initially, we generate a realization of the variable on a fine grid, modeling the regionalized variable defined on the point support. This realization, or simulation, is sampled to defined (\(np\) points uniformly sampled on the domain) the input data set.
We will use the Geostatistics library gstlearn.
# Set the Seed for the Random Number generator
law_set_random_seed(32131)
## NULL
# Defining the format for dumping statistics of variables in a Db
dbfmt = DbStringFormat()
dbfmt$setFlags(flag_resume=TRUE,flag_vars=FALSE,flag_locator=TRUE)
## NULL
# Defining global options
flag.debug = FALSE
Setting the trace: this option allows dumping all the elements to check calculations when the rank of the target is equal to 1 (first block, first panel, …). This option is important for debugging but it creates lots of printout.
OptDbg_reset()
## NULL
if (flag.debug) OptDbg_setReference(1)
Defining color scales common to all future representations of estimated quantities:
ZEstMin = -3.
ZEstMax = +3.
TEstMin = 0.
TEstMax = 1.
TStdMin = 0.
TStdMax = 1.
QEstMin = 0.
QEstMax = 3.
QStdMin = 0.
QStdMax = 2.
Three grids are defined:
The grid of the samples. It representes a reference realization of the regionalized variable. This realization is sampled to define the data set of the observations. This data set is then used to evaluate the ability of the non linear technique to reproduce the selectivity curves computed on the reference simulation.
The grid of the panels
The grid of the blocs
# grid of samples
nx_S = c(100,100)
dx_S = c(0.01, 0.01)
# grid of panels
dx_P = 0.25
# grid of blocs
nx_B = 5
dx_B = dx_P / nx_B
# Generate initial grid
grid = DbGrid_create(nx = nx_S, dx = dx_S)
grid$display()
##
## Data Base Grid Characteristics
## ==============================
##
## Data Base Summary
## -----------------
## File is organized as a regular grid
## Space dimension = 2
## Number of Columns = 3
## Total number of samples = 10000
##
## Grid characteristics:
## ---------------------
## Origin : 0.000 0.000
## Mesh : 0.010 0.010
## Number : 100 100
##
## Variables
## ---------
## Column = 0 - Name = rank - Locator = NA
## Column = 1 - Name = x1 - Locator = x1
## Column = 2 - Name = x2 - Locator = x2
## NULL
# Create grid of panels covering the simulated area
panels = DbGrid_createCoveringDb(grid, dx=c(dx_P,dx_P))
panels$display()
##
## Data Base Grid Characteristics
## ==============================
##
## Data Base Summary
## -----------------
## File is organized as a regular grid
## Space dimension = 2
## Number of Columns = 2
## Total number of samples = 25
##
## Grid characteristics:
## ---------------------
## Origin : 0.000 0.000
## Mesh : 0.250 0.250
## Number : 5 5
##
## Variables
## ---------
## Column = 0 - Name = x1 - Locator = x1
## Column = 1 - Name = x2 - Locator = x2
## NULL
# Discretization with a grid of blocks which covers the simulated area
blocs = DbGrid_createCoveringDb(grid, dx=c(dx_B,dx_B))
blocs$display()
##
## Data Base Grid Characteristics
## ==============================
##
## Data Base Summary
## -----------------
## File is organized as a regular grid
## Space dimension = 2
## Number of Columns = 2
## Total number of samples = 441
##
## Grid characteristics:
## ---------------------
## Origin : 0.000 0.000
## Mesh : 0.050 0.050
## Number : 21 21
##
## Variables
## ---------
## Column = 0 - Name = x1 - Locator = x1
## Column = 1 - Name = x2 - Locator = x2
## NULL
A lognormal model is defined and a simulation is performed on the grid of samples.
The regionalized variable \(z(x)\) is modeled using a lognormal model \(Z(x) = m \times e^{\sigma \, Y(x) - 1/2 \sigma^2}\) with
\(Y\) a stationary Gaussian Random Function with an exponential variogram with a range equal to \(0.1\) and a sill equal to \(1.0\). The mean of \(Y\) is null.
The lognormal transform is parametrized by its mean \(m\) and the dispersion coefficient \(\sigma\). The first two moments are:
# Simulation of the Gaussian variable
model_init = Model_createFromParam(ECov_EXPONENTIAL(), range=0.1, sill=1.)
err = simtub(NULL, grid, model_init, namconv=NamingConvention("Y"))
# Nonlinear transform (lognormal)
m_Z = 1.5
s_Z = 0.5
grid["Z"] = m_Z * exp(s_Z * grid["Y"] - 0.5*s_Z^2)
p = ggplot()
p = p + plot.grid(grid, "Z")
ggPrint(p)
p = ggplot()
p = p + plot.hist(grid, name = "Y", bins=100, col='gray', fill='skyblue')
p = p + plot.decoration(xlab = "Y", title = "Simulated samples (Y variable)")
ggPrint(p)
p = ggplot()
p = p + plot.hist(grid, name = "Z", bins = 100, col = "gray", fill = "orange")
p = p + plot.decoration(xlab = "Z", title = "Simulated samples (Z variable)")
ggPrint(p)
opers = EStatOption_fromKeys(c("NUM","MINI","MAXI","MEAN","STDV"))
stats = dbStatisticsMono(grid, c("Y","Z"), opers=opers)
knitr::kable(stats$toTL(), digits=2,
caption = "Statistics of the simulated variables on the point support")
Number | Minimum | Maximum | Mean | St. Dev. | |
---|---|---|---|---|---|
Y | 10000 | -4.30 | 3.65 | 0.07 | 1.01 |
Z | 10000 | 0.15 | 8.20 | 1.56 | 0.85 |
We create a new data set by extracting few samples from the previous grid.
np = 500
data = Db_createSamplingDb(grid, number=np, names=c("x1","x2","Y","Z"))
data$setLocator("Z", ELoc_Z())
## NULL
data$display()
##
## Data Base Characteristics
## =========================
##
## Data Base Summary
## -----------------
## File is organized as a set of isolated points
## Space dimension = 2
## Number of Columns = 5
## Total number of samples = 500
##
## Variables
## ---------
## Column = 0 - Name = rank - Locator = NA
## Column = 1 - Name = x1 - Locator = x1
## Column = 2 - Name = x2 - Locator = x2
## Column = 3 - Name = Y - Locator = NA
## Column = 4 - Name = Z - Locator = z1
## NULL
p = ggplot()
p = p + plot.point(data, nameColor = "Z", nameSize = "Z", flagLegendSize = TRUE)
ggPrint(p)