This case study is meant to demonstrate how to use gstlearn for non-linear geostatistics with the Gaussian model.
We consider three supports:
The support of the samples, considered as points and noted \(x\),
The support of the selection, the bloc, noted \(v\),
The support of the reporting, the panels, noted \(V\).
The variable of interest \(z\) is defined over the domain \(S \subset \mathbb{R}^d\) with \(d = 2\). The domain is uniformly divided into panels and each panel is regularly subdivided into blocs, hence defining two regular grids, the grid of panels and the grid of blocs.
Hence, we will have a data set and two grids.
The regionalised variable \(z(x)\) for \(x \in D \subset \mathbb{R}^d\) is modeled as a transform of the stationary Gaussian Random Function, \(Z(x) = \phi(Y(x))\) where
\(E\{Y(x)\} = 0\) and \(Cov\{Y(x), Y(x+h)\} = \rho_Y(h) = 1 -\gamma_Y (h)\)
As \(Var(Z) < +\infty\), \(\phi \in L^2(g)\) and it can be expressed as a linear combination of Hermite polynomials \(\phi(y) = \sum_{n = 1}^{+\infty} \phi_n H_n(y)\).
Non linear Geostatistics implements non linear estimators of non linear transforms of the variable of interest \(Z\). Two issues are addressed, the selection and the change of support, with the following tasks common in geosciences (i.e., in a mining context or for environmental studies),
For example, recovered mineral resources will be defined by the selected ore at a given cutoff, \(T(z_c) = 1_{Z \geq z_c}\), and the metal contained in the selected ore, \(Q(z_c) = Z \times 1_{Z \geq z_c}\).
The average value of the volume \(v\) is noted \(Z(v) = \frac{1}{|v|} \int_v Z(u) du\).
A first task is to predict marginal distribution of \(Z(v)\), or the histogram, knowing the histogram of \(Z\) and its spatial structure. A second task is to predict the conditional distribution of \(Z(v)\) given the spatial the prior spatial model and some observations. The first question is referred to as the global recoverable resources, and the second one as the local recoverable resources.
Three estimators will be illustrated:
the conditional expectation (EC)
the disjunctive kriging (DK)
EC and DK can be used to evaluate point recovery functions at a non observed point \(x_0\), \(1_{Z(x_0) \geq z_c}\) and \(Z(x_0) \times 1_{Z(x_0) \geq z_c}\). They can also be used to evaluate bloc recovery functions for a block \(v\), \(1_{Z(v) \geq z_c}\) and \(Z(v) \times 1_{Z(v) \geq z_c}\). DK can also evaluate recovery functions average on a bigger support, e.g. \(\frac{1}{N} \sum_{i=1}^{N} 1_{Z(v_i) \geq z_c}\) is the recovered ore on the panels \(V = \cup_{i=1}^{N}v_i\).
UC computes block recovery functions averaged on a panel.
Two change of support models are available for a Gaussian model, they are noted respectively \(DGM-1\) and \(DGM-2\).
Initially, we generate a realization of the variable on a fine grid, modeling the regionalized variable defined on the point support. This realization, or simulation, is sampled to defined (\(np\) points uniformly sampled on the domain) the input data set.
We will use the Geostatistics library gstlearn.
# Set the Seed for the Random Number generator
law_set_random_seed(32131)
## NULL
# Defining the format for dumping statistics of variables in a Db
dbfmt = DbStringFormat()
dbfmt$setFlags(flag_resume=TRUE,flag_vars=FALSE,flag_locator=TRUE)
## NULL
# Defining global options
flag.debug = FALSE
Setting the trace: this option allows dumping all the elements to check calculations when the rank of the target is equal to 1 (first block, first panel, …). This option is important for debugging but it creates lots of printout.
OptDbg_reset()
## NULL
if (flag.debug) OptDbg_setReference(1)
Defining color scales common to all future representations of estimated quantities:
ZEstMin = -3.
ZEstMax = +3.
TEstMin = 0.
TEstMax = 1.
TStdMin = 0.
TStdMax = 1.
QEstMin = 0.
QEstMax = 3.
QStdMin = 0.
QStdMax = 2.
Three grids are defined:
The grid of the samples. It representes a reference realization of the regionalized variable. This realization is sampled to define the data set of the observations. This data set is then used to evaluate the ability of the non linear technique to reproduce the selectivity curves computed on the reference simulation.
The grid of the panels
The grid of the blocs
# grid of samples
nx_S = c(100,100)
dx_S = c(0.01, 0.01)
# grid of panels
dx_P = 0.25
# grid of blocs
nx_B = 5
dx_B = dx_P / nx_B
# Generate initial grid
grid = DbGrid_create(nx = nx_S, dx = dx_S)
grid$display()
##
## Data Base Grid Characteristics
## ==============================
##
## Data Base Summary
## -----------------
## File is organized as a regular grid
## Space dimension = 2
## Number of Columns = 3
## Total number of samples = 10000
##
## Grid characteristics:
## ---------------------
## Origin : 0.000 0.000
## Mesh : 0.010 0.010
## Number : 100 100
##
## Variables
## ---------
## Column = 0 - Name = rank - Locator = NA
## Column = 1 - Name = x1 - Locator = x1
## Column = 2 - Name = x2 - Locator = x2
## NULL
# Create grid of panels covering the simulated area
panels = DbGrid_createCoveringDb(grid, dx=c(dx_P,dx_P))
panels$display()
##
## Data Base Grid Characteristics
## ==============================
##
## Data Base Summary
## -----------------
## File is organized as a regular grid
## Space dimension = 2
## Number of Columns = 2
## Total number of samples = 25
##
## Grid characteristics:
## ---------------------
## Origin : 0.000 0.000
## Mesh : 0.250 0.250
## Number : 5 5
##
## Variables
## ---------
## Column = 0 - Name = x1 - Locator = x1
## Column = 1 - Name = x2 - Locator = x2
## NULL
# Discretization with a grid of blocks which covers the simulated area
blocs = DbGrid_createCoveringDb(grid, dx=c(dx_B,dx_B))
blocs$display()
##
## Data Base Grid Characteristics
## ==============================
##
## Data Base Summary
## -----------------
## File is organized as a regular grid
## Space dimension = 2
## Number of Columns = 2
## Total number of samples = 441
##
## Grid characteristics:
## ---------------------
## Origin : 0.000 0.000
## Mesh : 0.050 0.050
## Number : 21 21
##
## Variables
## ---------
## Column = 0 - Name = x1 - Locator = x1
## Column = 1 - Name = x2 - Locator = x2
## NULL
A lognormal model is defined and a simulation is performed on the grid of samples.
The regionalized variable \(z(x)\) is modeled using a lognormal model \(Z(x) = m \times e^{\sigma \, Y(x) - 1/2 \sigma^2}\) with
\(Y\) a stationary Gaussian Random Function with an exponential variogram with a range equal to \(0.1\) and a sill equal to \(1.0\). The mean of \(Y\) is null.
The lognormal transform is parametrized by its mean \(m\) and the dispersion coefficient \(\sigma\). The first two moments are:
# Simulation of the Gaussian variable
model_init = Model_createFromParam(ECov_EXPONENTIAL(), range=0.1, sill=1.)
err = simtub(NULL, grid, model_init, namconv=NamingConvention("Y"))
# Nonlinear transform (lognormal)
m_Z = 1.5
s_Z = 0.5
grid["Z"] = m_Z * exp(s_Z * grid["Y"] - 0.5*s_Z^2)
p = ggplot()
p = p + plot.grid(grid, "Z")
ggPrint(p)
p = ggplot()
p = p + plot.hist(grid, name = "Y", bins=100, col='gray', fill='skyblue')
p = p + plot.decoration(xlab = "Y", title = "Simulated samples (Y variable)")
ggPrint(p)
p = ggplot()
p = p + plot.hist(grid, name = "Z", bins = 100, col = "gray", fill = "orange")
p = p + plot.decoration(xlab = "Z", title = "Simulated samples (Z variable)")
ggPrint(p)
opers = EStatOption_fromKeys(c("NUM","MINI","MAXI","MEAN","STDV"))
stats = dbStatisticsMono(grid, c("Y","Z"), opers=opers)
knitr::kable(stats$toTL(), digits=2,
caption = "Statistics of the simulated variables on the point support")
Number | Minimum | Maximum | Mean | St. Dev. | |
---|---|---|---|---|---|
Y | 10000 | -4.30 | 3.65 | 0.07 | 1.01 |
Z | 10000 | 0.15 | 8.20 | 1.56 | 0.85 |
We create a new data set by extracting few samples from the previous grid.
np = 500
data = Db_createSamplingDb(grid, number=np, names=c("x1","x2","Y","Z"))
data$setLocator("Z", ELoc_Z())
## NULL
data$display()
##
## Data Base Characteristics
## =========================
##
## Data Base Summary
## -----------------
## File is organized as a set of isolated points
## Space dimension = 2
## Number of Columns = 5
## Total number of samples = 500
##
## Variables
## ---------
## Column = 0 - Name = rank - Locator = NA
## Column = 1 - Name = x1 - Locator = x1
## Column = 2 - Name = x2 - Locator = x2
## Column = 3 - Name = Y - Locator = NA
## Column = 4 - Name = Z - Locator = z1
## NULL
p = ggplot()
p = p + plot.point(data, nameColor = "Z", nameSize = "Z", flagLegendSize = TRUE)
ggPrint(p)
p = ggplot()
p = p + plot.hist(data, name = "Y", bins = 100, col = "gray", fill = "skyblue")
ggPrint(p)
p = ggplot()
p = p + plot.decoration(xlab = "Y", title = "Sampled data (Y variable)")
ggPrint(p)
p = ggplot()
p = p + plot.hist(data, name = "Z", bins = 100, col = "gray", fill = "orange")
p = p + plot.decoration(xlab = "Z", title = "Sampled data (Z variable)")
ggPrint(p)
stats = dbStatisticsMono(data, c("Y","Z"), opers=opers)
knitr::kable(stats$toTL(), digits=2,
caption = "Statistics of the sampled data on the point support")
Number | Minimum | Maximum | Mean | St. Dev. | |
---|---|---|---|---|---|
Y | 500 | -3.42 | 2.99 | 0.03 | 1.04 |
Z | 500 | 0.24 | 5.92 | 1.54 | 0.84 |
varZ <- stats$getValue(1,4)^2
anam = AnamHermite_create(nbpoly=20)
err = anam$fit(data, "Z")
anam
##
## Hermitian Anamorphosis
## ----------------------
## Minimum absolute value for Y = -2.8
## Maximum absolute value for Y = 3.9
## Minimum absolute value for Z = 0.274228
## Maximum absolute value for Z = 5.99041
## Minimum practical value for Y = -2.8
## Maximum practical value for Y = 3.1
## Minimum practical value for Z = 0.274228
## Maximum practical value for Z = 5.86055
## Mean = 1.53861
## Variance = 0.709613
## Number of Hermite polynomials = 20
## Normalized coefficients for Hermite polynomials (punctual variable)
## [, 0] [, 1] [, 2] [, 3] [, 4] [, 5] [, 6]
## [ 0,] 1.539 -0.792 0.273 -0.061 -0.009 0.032 -0.031
## [ 7,] 0.010 0.018 -0.021 -0.001 0.016 -0.009 -0.007
## [ 14,] 0.012 0.000 -0.012 0.004 0.010 -0.006
We focus on Tonnage (T) and Metal Quantity (Q) for few cutoffs (Zcuts).
Zcuts <- c(0.0, 0.5, 0.75, 1.0)
selectivity = Selectivity_createByKeys(
keys = c("T", "Q"), zcuts=Zcuts,
flag_est=TRUE, flag_std=TRUE)
# Global experimental selectivity, calculated form the experimental Data Set
table = selectivity$eval(data)
table$setTitle("Selectivity curves computed on data set")
## NULL
table
##
## Selectivity curves computed on data set
## ---------------------------------------
## Z-Cut T-estim Q-estim B-estim M-estim
## 0.000 1.000 1.539 1.539 1.539
## 0.500 0.972 1.527 1.041 1.571
## 0.750 0.858 1.453 0.810 1.694
## 1.000 0.718 1.330 0.612 1.853
# Selectivity in the model, derived from the parameters contained in the Anamorphosis
table = selectivity$evalFromAnamorphosis(anam)
table$setTitle("Selectivity curves computed on anamorphosis")
## NULL
table
##
## Selectivity curves computed on anamorphosis
## -------------------------------------------
## Z-Cut T-estim Q-estim B-estim M-estim
## 0.000 1.000 1.539 1.539 1.539
## 0.500 0.972 1.527 1.041 1.570
## 0.750 0.867 1.461 0.810 1.684
## 1.000 0.716 1.327 0.611 1.853
data["Gaussian.Z"] <- VectorHelper_normalScore(data["Z"])
p = ggplot()
p = p + plot.hist(data, name = "Gaussian.Z", bins = 25, col = "gray", fill = "yellow")
p = p + plot.decoration(xlab = "Normal score")
ggPrint(p)
data$display()
##
## Data Base Characteristics
## =========================
##
## Data Base Summary
## -----------------
## File is organized as a set of isolated points
## Space dimension = 2
## Number of Columns = 6
## Total number of samples = 500
##
## Variables
## ---------
## Column = 0 - Name = rank - Locator = NA
## Column = 1 - Name = x1 - Locator = x1
## Column = 2 - Name = x2 - Locator = x2
## Column = 3 - Name = Y - Locator = NA
## Column = 4 - Name = Z - Locator = z1
## Column = 5 - Name = Gaussian.Z - Locator = NA
## NULL
Define the variogram calculation parameters:
omni-directional variogram,
10 lags of 0.025.
varioparam = VarioParam_createOmniDirection(npas=10, dpas=0.025)
Define the variogram calculation parameters:
# Computing the experimental variogram
err = data$setLocator("Z", ELoc_Z())
vario_raw = Vario_computeFromDb(varioparam, db=data)
# Fitting the variogram model on the experimental variogram
model_raw = Model_create()
err = model_raw$fit(vario_raw,
types = ECov_fromKeys(c("NUGGET", "EXPONENTIAL", "EXPONENTIAL")),
constraints=Constraints(varZ))
model_raw$display()
##
## Model characteristics
## =====================
## Space dimension = 2
## Number of variable(s) = 1
## Number of basic structure(s) = 2
## Number of drift function(s) = 0
## Number of drift equation(s) = 0
##
## Covariance Part
## ---------------
## Nugget Effect
## - Sill = 0.114
## Exponential
## - Sill = 0.597
## - Range = 0.124
## - Theo. Range = 0.041
## Total Sill = 0.711
## NULL
p = ggplot()
p = p + plot.varmod(vario_raw, model_raw)
p = p + plot.decoration(title = "Raw variable")
ggPrint(p)
# Computing of the experimental variogram
err = data$setLocator("Gaussian.Z", ELoc_Z(), cleanSameLocator=TRUE)
vario = Vario_computeFromDb(varioparam, data)
# Fitting the Model on the experimental variogram with a sill equal to one.
model = Model()
err = model$fit(vario,
types = ECov_fromKeys(c("EXPONENTIAL", "EXPONENTIAL")),
constraints=Constraints(1.))
model$display()
##
## Model characteristics
## =====================
## Space dimension = 2
## Number of variable(s) = 1
## Number of basic structure(s) = 2
## Number of drift function(s) = 0
## Number of drift equation(s) = 0
##
## Covariance Part
## ---------------
## Exponential
## - Sill = 0.234
## - Range = 0.017
## - Theo. Range = 0.006
## Exponential
## - Sill = 0.766
## - Range = 0.140
## - Theo. Range = 0.047
## Total Sill = 1.000
## NULL
p = ggplot()
p = p + plot.varmod(vario, model)
p = p + plot.decoration(title = "Gaussian variable")
ggPrint(p)
model_Y = model$clone()
err = model$setAnam(anam)
Checking the experimental variogram of the Raw variable against its Model derived from the Model of the Gaussian transform.
model$setActiveFactor(-1)
## NULL
p = ggplot()
p = p + plot.varmod(vario_raw, model)
p = p + plot.decoration(title = "Raw variable (model of the Gaussian)")
ggPrint(p)
model$setActiveFactor(0)
## NULL
nmini = 5
nmaxi = 5
radius = 1.
neigh = NeighMoving_create(nmaxi=nmaxi, radius=radius, nmini=nmini)
neigh
##
## Moving Neighborhood
## ===================
## Minimum number of samples = 5
## Maximum number of samples = 5
## Maximum horizontal distance = 1
A unified workflow for the nonlinear techniques available with the Gaussian model is proposed below. The estimators are: the conditional expectation (CE), the disjunctive kriging (DK), and the uniform conditioning (UC).
The workflow is:
Pre-processing of the input dataset using an anamorphosis. It is required for CE and DK.
CE: computation of the Gaussian values from the raw values
DK: computation of the values of the Hermite polynomials from the raw values
UC: No pre-processing is required.
Computing the kriging
CE: computation of simple kriging of the Gaussian variable (estimation value and standard deviation). The standard function kriging is used. Ordinary kriging may be used.
DK: computation of the kriging of the Hermite polynomials from the raw values. The function krigingFactors is used to loop over the factors and define the proper covariance function.
UC: computation of ordinary/simple kriging of the raw variable (estimation value and variance of the estimator). The standard function kriging is used.
Here, we have
kriging(data, target, model, ….), or krigingFactors(data, target, model, …)
Post-processing to calculate the estimator of the selectivity curves or any nonlinear transform from the kriging of the factors or Gaussian variable.
CE: dedicated function ConditionExpectation(db, anam, selectivity, …)
DK: dedicated function DisjunctiveKriging(db, anam, selectivity, …)
UC: dedicated function UniformConditioning(db, anam, selectivity, …)
Two versions are now considered: the point estimate and the block estimate.
Two methods are presented:
The conditional expectation
The disjunctive kriging
The Conditional Expectation is used to estimate selectivity curves for a point support at the grid nodes (i.e. the center of the blocs).
The Gaussian variable has been computed earlier.
Simple Kriging is used to estimate the point value of the Gaussian variable at the grid nodes (point support). The known mean is equal to zero.
err = model$delDrift(0)
err = model$setMean(0.)
data$setLocator("Gaussian.Z", ELoc_Z(), cleanSameLocator=TRUE)
## NULL
err = kriging(data, blocs, model, neigh,
calcul = EKrigOpt_POINT(),
namconv= NamingConvention("G_Pts"))
p = ggplot()
p = p + plot(blocs, "G_Pts.Gaussian.Z.estim",
flagLegendRaster=TRUE, legendNameRaster = "Estimation")
p = p + plot.decoration(title = "Punctual Simple Kriging of Y")
ggPrint(p)
p = ggplot()
p = p + plot(blocs, "G_Pts.Gaussian.Z.stdev",
flagLegendRaster=TRUE, legendNameRaster = "Std. Dev.")
p = p + plot.decoration(title = "Punctual Simple Kriging of Y")
ggPrint(p)
Kriging has been computed, the back transform is computed for a punctual suport:
\[ \psi(Y(x_0))^{CE} = \int \psi(y(x_0)^{SK}+\sigma_{SK} \, u)g(u)du \] It is possible to check the computed values for ore:
\[ (1_{Y(x_0)\geq y_c})^{CE} = 1 - G(\frac{y_c - y(x_0)^{SK}}{\sigma_{SK}}) \]
For change of support calculation DGM kriging (krigdgm) should be used. To be checked ?
err = ConditionalExpectation(db = blocs,
anam = anam,
selectivity = selectivity,
name_est="G_Pts*estim",
name_std="G_Pts*stdev",
namconv = NamingConvention("Pts_Recovery", FALSE))
blocs$display()
##
## Data Base Grid Characteristics
## ==============================
##
## Data Base Summary
## -----------------
## File is organized as a regular grid
## Space dimension = 2
## Number of Columns = 20
## Total number of samples = 441
##
## Grid characteristics:
## ---------------------
## Origin : 0.000 0.000
## Mesh : 0.050 0.050
## Number : 21 21
##
## Variables
## ---------
## Column = 0 - Name = x1 - Locator = x1
## Column = 1 - Name = x2 - Locator = x2
## Column = 2 - Name = G_Pts.Gaussian.Z.estim - Locator = NA
## Column = 3 - Name = G_Pts.Gaussian.Z.stdev - Locator = NA
## Column = 4 - Name = Pts_Recovery.T-estim-0 - Locator = NA
## Column = 5 - Name = Pts_Recovery.T-estim-0.5 - Locator = NA
## Column = 6 - Name = Pts_Recovery.T-estim-0.75 - Locator = NA
## Column = 7 - Name = Pts_Recovery.T-estim-1 - Locator = NA
## Column = 8 - Name = Pts_Recovery.T-stdev-0 - Locator = NA
## Column = 9 - Name = Pts_Recovery.T-stdev-0.5 - Locator = NA
## Column = 10 - Name = Pts_Recovery.T-stdev-0.75 - Locator = NA
## Column = 11 - Name = Pts_Recovery.T-stdev-1 - Locator = NA
## Column = 12 - Name = Pts_Recovery.Q-estim-0 - Locator = NA
## Column = 13 - Name = Pts_Recovery.Q-estim-0.5 - Locator = NA
## Column = 14 - Name = Pts_Recovery.Q-estim-0.75 - Locator = NA
## Column = 15 - Name = Pts_Recovery.Q-estim-1 - Locator = NA
## Column = 16 - Name = Pts_Recovery.Q-stdev-0 - Locator = NA
## Column = 17 - Name = Pts_Recovery.Q-stdev-0.5 - Locator = NA
## Column = 18 - Name = Pts_Recovery.Q-stdev-0.75 - Locator = NA
## Column = 19 - Name = Pts_Recovery.Q-stdev-1 - Locator = z1
## NULL
# Graphics
# Metal (the anamorphosis is used)
p = ggplot()
p = p + plot(blocs, "Pts_Recovery.Q-estim-0",
flagLegendRaster=TRUE, legendNameRaster= "Q(0.0)")
p = p + plot.decoration(title = "Point conditional expectation")
ggPrint(p)
# Tonnage (the anamorphosis is not used)
p = ggplot()
p = p + plot(blocs, "Pts_Recovery.T-estim-0",
flagLegendRaster = TRUE, legendNameRaster = "T(0.0)")
p = p + plot.decoration(title = "Point conditional expectation")
ggPrint(p)
p = ggplot()
p = p + plot(blocs, "Pts_Recovery.T-estim-1",
flagLegendRaster=TRUE, legendNameRaster = "T(1.0)")
p = p + plot.decoration(title = "Point conditional expectation")
ggPrint(p)
Values computed by the anamorphosis are invalid…. Est-ce toujours vrai (DR) ? Si non, supprimer le paragraphe QC suivant.
# QC of Conditional Expectation
Ycuts <- anam$rawToGaussianVector(Zcuts)
indsel = blocs["G_Pts.Gaussian.Z.stdev"] <= 0.
# Tonnage à zc = 1.0
blocs["T_1"] <- 1 - VectorHelper_pnormVec((Ycuts[4] - blocs["G_Pts.Gaussian.Z.estim"]) /
blocs["G_Pts.Gaussian.Z.stdev"])
blocs["T_1"][indsel] = NA
p = ggplot()
p = p + plot.correlation(blocs, namex="Pts_Recovery.T-estim-1", namey="T_1",
bins=100, flagDiag=TRUE)
p = p + plot.decoration(xlab = "Computed value", ylab = "Theoretical value",
title = "Conditional Expectation: T(1.0)")
ggPrint(p)
print(paste0(">>> Sum of differences for T(1.00): ",
sum(blocs["T_1"] != blocs["Pts_Recovery.T-estim-1"],
na.rm=TRUE)))
## [1] ">>> Sum of differences for T(1.00): 0"
# Tonnage à zc = 0.75
blocs["T_0.75"] <- 1 - VectorHelper_pnormVec((Ycuts[3] -
blocs["G_Pts.Gaussian.Z.estim"]) / blocs["G_Pts.Gaussian.Z.stdev"])
blocs["T_0.75"][indsel] = NA
p = ggplot()
p = p + plot.correlation(blocs, namex="Pts_Recovery.T-estim-0.75", namey="T_0.75",
bins=100, flagDiag=TRUE)
p = p + plot.decoration(xlab = "Computed value", ylab = "Theoretical value",
title = "Conditional Expectation: T(0.75)")
ggPrint(p)
print(paste0(">>> Sum of differences for T(0.75): ",
sum(blocs["T_0.75"] != blocs["Pts_Recovery.T-estim-0.75"],
na.rm=TRUE)))
## [1] ">>> Sum of differences for T(0.75): 0"
#mm <- model$clone()
# model of the Gaussian variable
p = ggplot()
p = p + plot.varmod(vario, model)
p = p + plot.decoration(title = "Variogram of the Gaussian variable")
ggPrint(p)
# attach the anamorphosis to the model
err = model$setAnam(anam)
# model of the raw variable as the anamorphosis is attached to the model
err = model$setActiveFactor(-1)
p = ggplot()
p = p + plot.varmod(vario_raw, model)
p = p + plot.decoration(title = "Variogram of the raw variable")
ggPrint(p)
err = model$setActiveFactor(0)
Computing the point factors (i.e., the values of the Hermite polynomials). The following plot gives the proportion of the variance explained by the factors. It indicates the relevant number of factors that should be considered afterwards.
plot(1:(anam$getNFactor()-1),anam$cumulateVarianceRatio(1.),
ylim = c(0,1),
xlab = "Number of factors", ylab = "Variance proportion",
type = "b", pch = 19, col = "gray")
Plot the variogram model of the factors (rho^n in the Gaussian model)
hmax = 0.2
nf = anam$getNFactor()
asCov = FALSE # Pb. if asCov = FALSE (should be 1 - rho^n and not (1-rho)^n)
# model of the factors
p = ggplot()
for (i in 1:nf)
{
model$setActiveFactor(i)
p <- p + plot.model(model, hmax=hmax, asCov = asCov)
}
p = p + geom_hline(yintercept = 1.0, colour = "red")
p = p + plot.decoration(title = "Covariance of the different factors")
ggPrint(p)
# model of the raw variable
err = model$setActiveFactor(-1)
p = ggplot()
p = p + plot.model(model, hmax=hmax, asCov = asCov)
p = p + geom_hline(yintercept = varZ, colour = "red")
p = p + plot.decoration(title = "Covariance of the raw variable")
ggPrint(p)
Computation of the factors: H1, H2 et H3… They correspond to the Hermite polynomials. Their presence will dictate the number of terms used in the Hermite expansion in subsequent calculations.
nfactor = 3
err = anam$rawToFactor(db = data, nfactor = nfactor,
namconv = NamingConvention("F", FALSE))
What is performed? Compute the (simple) kriging of the factors. Simple Kriging is used and point or block estimate can be used as kriging is a linear operator.
err = krigingFactors(dbin = data, # Input data set (a Db structure)
dbout = blocs, # Output data set (a DbGrid structure)
model = model, neigh, calcul= EKrigOpt_POINT(),
namconv = NamingConvention("DK_Pts"))
blocs$display()
##
## Data Base Grid Characteristics
## ==============================
##
## Data Base Summary
## -----------------
## File is organized as a regular grid
## Space dimension = 2
## Number of Columns = 28
## Total number of samples = 441
##
## Grid characteristics:
## ---------------------
## Origin : 0.000 0.000
## Mesh : 0.050 0.050
## Number : 21 21
##
## Variables
## ---------
## Column = 0 - Name = x1 - Locator = x1
## Column = 1 - Name = x2 - Locator = x2
## Column = 2 - Name = G_Pts.Gaussian.Z.estim - Locator = NA
## Column = 3 - Name = G_Pts.Gaussian.Z.stdev - Locator = NA
## Column = 4 - Name = Pts_Recovery.T-estim-0 - Locator = NA
## Column = 5 - Name = Pts_Recovery.T-estim-0.5 - Locator = NA
## Column = 6 - Name = Pts_Recovery.T-estim-0.75 - Locator = NA
## Column = 7 - Name = Pts_Recovery.T-estim-1 - Locator = NA
## Column = 8 - Name = Pts_Recovery.T-stdev-0 - Locator = NA
## Column = 9 - Name = Pts_Recovery.T-stdev-0.5 - Locator = NA
## Column = 10 - Name = Pts_Recovery.T-stdev-0.75 - Locator = NA
## Column = 11 - Name = Pts_Recovery.T-stdev-1 - Locator = NA
## Column = 12 - Name = Pts_Recovery.Q-estim-0 - Locator = NA
## Column = 13 - Name = Pts_Recovery.Q-estim-0.5 - Locator = NA
## Column = 14 - Name = Pts_Recovery.Q-estim-0.75 - Locator = NA
## Column = 15 - Name = Pts_Recovery.Q-estim-1 - Locator = NA
## Column = 16 - Name = Pts_Recovery.Q-stdev-0 - Locator = NA
## Column = 17 - Name = Pts_Recovery.Q-stdev-0.5 - Locator = NA
## Column = 18 - Name = Pts_Recovery.Q-stdev-0.75 - Locator = NA
## Column = 19 - Name = Pts_Recovery.Q-stdev-1 - Locator = NA
## Column = 20 - Name = T_1 - Locator = NA
## Column = 21 - Name = T_0.75 - Locator = NA
## Column = 22 - Name = DK_Pts.F.1.estim - Locator = z1
## Column = 23 - Name = DK_Pts.F.2.estim - Locator = z2
## Column = 24 - Name = DK_Pts.F.3.estim - Locator = z3
## Column = 25 - Name = DK_Pts.F.1.stdev - Locator = NA
## Column = 26 - Name = DK_Pts.F.2.stdev - Locator = NA
## Column = 27 - Name = DK_Pts.F.3.stdev - Locator = NA
## NULL
Consistency between the simple kriging of the Gaussian variable and the kriging of F.1. As F.1 = -Y, its kriging should follow (F.1)^{SK} = (-Y)^{SK}
# H_1 = -Y
p = ggplot()
p = p + plot.correlation(blocs,
namex = "G_Pts.Gaussian.Z.estim",
namey = "DK_Pts.F.1.estim",
bins=100, flagDiag = FALSE)
p = p + geom_abline(intercept = 0, slope = -1, colour = "red")
p = p + plot.decoration(xlab = "Gaussian variable",
ylab = "H_1 = -Y",
title = "Values of the Simple Kriging")
ggPrint(p)
p = ggplot()
p = p + plot.correlation(blocs,
namex = "G_Pts.Gaussian.Z.stdev",
namey = "DK_Pts.F.1.stdev",
bins=100, flagDiag = TRUE)
p = p + plot.decoration(xlab = "Gaussian variable",
ylab = "H_1 = -Y",
title = "Standard Deviation of the Simple Kriging error")
ggPrint(p)
The final aim is to estimate a nonlinear function of the Gaussian variable \[\psi(Y(x)) = \sum_{n=0}^{nf} \psi_n \, H_n(Y(x))\] As kriging is a linear operator and the Hermite polynomials are orthogonal, \[(\psi(Y(x))^{DK} = \sum_{n=0}^{nf} \psi_n \, H_n^{SK}(Y(x))\] and \[Var\{\psi(Y(x)) - \psi(Y(x))^{DK}\} = \sum_{n=1}^{nf} \psi_n^2 \, \sigma^2_{SK}(n)+ \sum_{n=nf+1}^{+\infty} \psi_n^2= Var\{\psi(Y(x))\}-\sum_{n=1}^{nf} \psi_n^2 \, (1-\sigma^2_{SK}(n))\]
We have first to calculate the Hermite coefficients of the selectivity curve or the nonlinear transform to be evaluated and then compute the linear combinations.
Raw variable, \(\psi(y) = \phi(y)\)
Hence, \(\psi_n = \phi_n\)
Recovered ore, or the indicator above the cutoff, \(\psi(y) = 1_{y \geq y_c}\),
Hence, \(\psi_0 = 1- G(y_c)\) and \(\psi_n = -\frac{g(y_c)}{\sqrt{n}} H_{n-1}(y_c)\) for \(n>0\).
Recovered metal above the cutoff, \(\psi(y) = \phi(y)\times 1_{y \geq y_c}\),
Hence, \(\psi_n = \sum_{p \geq 0} \phi_p \, \alpha_{n,p}(y_c)\) with \(\alpha_{n,p}(y_c) = \int_{y_c}^{+\infty} H_n(u)H_p(u) g(u)du\).
We should also be able to compute non linear transforms such as:
The value over a limit, \(\psi(y) = \max (y_c, y)\),
Hence, \(\psi_0 = g(y_c) + y_c\, G(y_c)\), \(\psi_1 = G(y_c)-1\), and \(\psi_n = \frac{g(y_c)}{\sqrt{n\times(n-1)}} H_{n-2}(y_c)\) for \(n > 1\).
The lognormal transform, \(\psi(y) = m \times e^{\sigma\, y - \frac{1}{2} \sigma^2}\)
Hence, \(\psi_n = m \, e^{-\frac{1}{2}\sigma^2} \, \frac{(-\sigma)^n}{\sqrt{n!}}\) for \(n \geq 0\).
etc.
err = DisjunctiveKriging(db = blocs,
anam = anam,
selectivity = selectivity,
name_est="DK_Pts.F.*.estim",
name_std="DK_Pts.F.*.stdev",
namconv = NamingConvention("DK_Pts.Recovery", FALSE))
Can be used to estimate the average of the point selectivity curve on a block, e.g. \[ (\frac{1}{|v|}\int_v 1_{Y(u) \geq y_c} du)^{DK} = \sum_{n=0}^{nf} \psi_n \, \{\frac{1}{|v|}\int_v H_n(Y(u)) du\}^{SK} \] The post-processing is identical using DisjunctiveKriging.
# The number of estimated factors is defined by the number of selected variables
# in the input data base
ndisc_B = c(5, 5)
err = krigingFactors(dbin = data, # Input data set (a Db structure)
dbout = blocs, # Output data set (a DbGrid structure)
model = model, neigh, calcul= EKrigOpt_BLOCK(), ndisc = ndisc_B,
namconv = NamingConvention("DK_Blk"))
blocs$display()
##
## Data Base Grid Characteristics
## ==============================
##
## Data Base Summary
## -----------------
## File is organized as a regular grid
## Space dimension = 2
## Number of Columns = 34
## Total number of samples = 441
##
## Grid characteristics:
## ---------------------
## Origin : 0.000 0.000
## Mesh : 0.050 0.050
## Number : 21 21
##
## Variables
## ---------
## Column = 0 - Name = x1 - Locator = x1
## Column = 1 - Name = x2 - Locator = x2
## Column = 2 - Name = G_Pts.Gaussian.Z.estim - Locator = NA
## Column = 3 - Name = G_Pts.Gaussian.Z.stdev - Locator = NA
## Column = 4 - Name = Pts_Recovery.T-estim-0 - Locator = NA
## Column = 5 - Name = Pts_Recovery.T-estim-0.5 - Locator = NA
## Column = 6 - Name = Pts_Recovery.T-estim-0.75 - Locator = NA
## Column = 7 - Name = Pts_Recovery.T-estim-1 - Locator = NA
## Column = 8 - Name = Pts_Recovery.T-stdev-0 - Locator = NA
## Column = 9 - Name = Pts_Recovery.T-stdev-0.5 - Locator = NA
## Column = 10 - Name = Pts_Recovery.T-stdev-0.75 - Locator = NA
## Column = 11 - Name = Pts_Recovery.T-stdev-1 - Locator = NA
## Column = 12 - Name = Pts_Recovery.Q-estim-0 - Locator = NA
## Column = 13 - Name = Pts_Recovery.Q-estim-0.5 - Locator = NA
## Column = 14 - Name = Pts_Recovery.Q-estim-0.75 - Locator = NA
## Column = 15 - Name = Pts_Recovery.Q-estim-1 - Locator = NA
## Column = 16 - Name = Pts_Recovery.Q-stdev-0 - Locator = NA
## Column = 17 - Name = Pts_Recovery.Q-stdev-0.5 - Locator = NA
## Column = 18 - Name = Pts_Recovery.Q-stdev-0.75 - Locator = NA
## Column = 19 - Name = Pts_Recovery.Q-stdev-1 - Locator = NA
## Column = 20 - Name = T_1 - Locator = NA
## Column = 21 - Name = T_0.75 - Locator = NA
## Column = 22 - Name = DK_Pts.F.1.estim - Locator = NA
## Column = 23 - Name = DK_Pts.F.2.estim - Locator = NA
## Column = 24 - Name = DK_Pts.F.3.estim - Locator = NA
## Column = 25 - Name = DK_Pts.F.1.stdev - Locator = NA
## Column = 26 - Name = DK_Pts.F.2.stdev - Locator = NA
## Column = 27 - Name = DK_Pts.F.3.stdev - Locator = NA
## Column = 28 - Name = DK_Blk.F.1.estim - Locator = z1
## Column = 29 - Name = DK_Blk.F.2.estim - Locator = z2
## Column = 30 - Name = DK_Blk.F.3.estim - Locator = z3
## Column = 31 - Name = DK_Blk.F.1.stdev - Locator = NA
## Column = 32 - Name = DK_Blk.F.2.stdev - Locator = NA
## Column = 33 - Name = DK_Blk.F.3.stdev - Locator = NA
## NULL
Comparing Point and Block estimations and standard deviation of Estimation errors (this comparison is performed on the results of the first factors only).
p = ggplot()
p = p + plot.correlation(blocs, namex = "DK_Pts*1.estim", namey = "DK_Blk*1.estim",
bins = 100, flagDiag=TRUE)
p = p + plot.decoration(xlab = "Point estimation", ylab = "Block estimation",
title = "First factor (Gaussian model)")
ggPrint(p)
p = ggplot()
p = p + plot.correlation(blocs, namex = "DK_Pts*2.estim", namey = "DK_Blk*2.estim",
bins = 100, flagDiag=TRUE)
p = p + plot.decoration(xlab = "Point estimation", ylab = "Block estimation",
title = "Second factor (Gaussian model)")
ggPrint(p)
p = ggplot()
p = p + plot.correlation(blocs, namex = "DK_Pts*3.estim", namey = "DK_Blk*3.estim",
bins=100, flagDiag = TRUE)
p = p + plot.decoration(xlab = "Point estimation", ylab = "Block estimation",
title = "Third factor (Gaussian model)")
ggPrint(p)
Can be used to estimate the average of the point selectivity curve on a panel.
XF: factors should be used instead if DK has to be computed.
# The number of estimated factors is defined by the number of selected variables
# in the input data base
ndisc_P = c(10, 10)
data$setLocator("Gaussian.Z", ELoc_Z(), cleanSameLocator=TRUE)
## NULL
err = kriging(dbin = data, # Input data set (a Db structure).
dbout = panels, # Output data set (a DbGrid structure)
model = model, neigh, calcul= EKrigOpt_BLOCK(), ndisc = ndisc_P,
namconv = NamingConvention("DK_Blk"))
panels$display()
##
## Data Base Grid Characteristics
## ==============================
##
## Data Base Summary
## -----------------
## File is organized as a regular grid
## Space dimension = 2
## Number of Columns = 4
## Total number of samples = 25
##
## Grid characteristics:
## ---------------------
## Origin : 0.000 0.000
## Mesh : 0.250 0.250
## Number : 5 5
##
## Variables
## ---------
## Column = 0 - Name = x1 - Locator = x1
## Column = 1 - Name = x2 - Locator = x2
## Column = 2 - Name = DK_Blk.Gaussian.Z.estim - Locator = z1
## Column = 3 - Name = DK_Blk.Gaussian.Z.stdev - Locator = NA
## NULL
p = ggplot()
p = p + plot(panels, "DK_Blk.*.estim", flagLegendRaster = TRUE)
p = p + plot.decoration(xlab = "Easting", ylab = "Northing",
title = "Estimate of the first factor (panels)")
ggPrint(p)
p = ggplot()
p = p + plot(panels, "DK_Blk.*.stdev", flagLegendRaster = TRUE)
p = p + plot.decoration(xlab = "Easting", ylab = "Northing",
title = "Std. of the first factor estimate (panels)")
ggPrint(p)
Estimators are tested on the fine grid: the reference simulated values are estimated using the subset of 500 random points. Cross plot of the actual value versus the estimated value are plotted and scoring rules evaluated
Mean error \(ME = \frac{1}{N} \sum_{i = 1}^N (z_i - z_i^*))\)
Mean absolute error \(MAE = \frac{1}{N} \sum_{i = 1}^N |z_i - z_i^*|)\)
Root mean square error \(RMSE = \sqrt{\frac{1}{N} \sum_{i = 1}^N (z_i - z_i^*)^2}\)
Mean standardized error \(MSE = \frac{1}{N} \sum_{i = 1}^N \frac{z_i - z_i^*}{\sigma_i^*}\)
Mean absolute standardized error \(MASE = \frac{1}{N} \sum_{i = 1}^N |\frac{z_i - z_i^*}{\sigma_i^*}|\)
Root mean square standardized error \(RMSSE = \sqrt{\frac{1}{N} \sum_{i = 1}^N (\frac{z_i - z_i^*}{\sigma_i^*})^2}\)
TODO: - results with the disjunctive kriging are dubious (see xplot for Z and Ind) - k-fold cross validation to be implemented ?
# cleaning the target data base
err = grid$deleteColumns("SK.*")
err = grid$deleteColumns("CE.*")
err = grid$deleteColumns("DK.*")
err = grid$deleteColumns("sel-.*")
# Selectivity curve : Z
zcuts <- round(quantile(data$getColumn("Z"), probs = c(0.5, 0.75)),3)
ycuts <- round(anam$rawToGaussianVector(zcuts),3)
Z_est = Selectivity_createByKeys(keys = c("Z"), zcuts=c(0.0),
flag_est=TRUE, flag_std=TRUE)
# Global experimental selectivity, calculated form the experimental Data Set
grid$setLocator("Z", ELoc_Z(), cleanSameLocator=TRUE)
## NULL
table = Z_est$eval(grid)
table$setTitle("Selectivity curves computed on fine grid")
## NULL
table
##
## Selectivity curves computed on fine grid
## ----------------------------------------
## Z-Cut T-estim Q-estim B-estim M-estim
## 0.000 1.000 1.561 1.561 1.561
# Simple kriging of Z
err = model_raw$delDrift(0)
err = model_raw$setMean(mean(data["Z"]))
data$setLocator("Z", ELoc_Z(), cleanSameLocator=TRUE)
## NULL
err = kriging(data, grid, model_raw, neigh,
calcul = EKrigOpt_POINT(),
namconv= NamingConvention("SK"))
# Conditional expectation
err = model_Y$delDrift(0)
err = model_Y$setMean(0.)
data$setLocator("Gaussian.Z", ELoc_Z(), cleanSameLocator=TRUE)
## NULL
err = kriging(data, grid, model_Y, neigh,
calcul = EKrigOpt_POINT(),
namconv= NamingConvention("SK"))
err = ConditionalExpectation(db = grid,
anam = anam,
selectivity = Z_est,
name_est="SK.Gaussian.Z.estim",
name_std="SK.Gaussian.Z.stdev",
namconv = NamingConvention("CE", FALSE))
grid$setName("CE.Z-estim", "CE.Z.estim")
## NULL
grid$setName("CE.Z-stdev", "CE.Z.stdev")
## NULL
# Disjunctive kriging
data$setLocator("F.*", ELoc_Z(), cleanSameLocator=TRUE)
## NULL
err = krigingFactors(dbin = data, # Input data set (a Db structure)
dbout = grid, # Ouput data set (a DbGrid structure)
model = model, neigh, calcul= EKrigOpt_POINT(),
namconv = NamingConvention("SK"))
# Disjunctive kriging for Z
err = DisjunctiveKriging(db = grid,
anam = anam,
selectivity = Z_est ,
name_est="SK.F.*.estim",
name_std="SK.F.*.stdev",
namconv = NamingConvention("DK", FALSE))
err = grid$setName("DK.Z-estim", "DK.Z.estim")
err = grid$setName("DK.Z-stdev", "DK.Z.stdev")
# Disjunctive kriging for T and Q
Z_est = Selectivity_createByKeys(keys = c("T", "Q"), zcuts=zcuts,
flag_est=TRUE, flag_std=TRUE)
err = DisjunctiveKriging(db = grid,
anam = anam,
selectivity = Z_est ,
name_est="SK.F.*.estim",
name_std="SK.F.*.stdev",
namconv = NamingConvention("DK", FALSE))
grid$deleteColumns("DK.*.qc.*")
## NULL
N = 1000
Y = seq(from = -3.5, to = 3.5, length.out = 1000)
H = matrix(NaN, nrow = length(Y), ncol = N)
for (i in seq_along(Y)) {
H[i,] = hermitePolynomials(Y[i], 1.0, N)
}
# Lognormal
m = 1.0 ; sigma = 1.5
psi.LN = hermiteLognormal(mean = m, sigma = sigma, nbpoly = N)
plot(Y, H %*% psi.LN, type = "l", col = "red",
xlab = "Gaussian value", ylab = "Raw value",
main = paste0("Lognormal m=", m, " and sigma = ", sigma))
lines(Y, m * exp(sigma*Y - 1/2 * sigma^2), col = "blue", lwd = 2, lty = 2)
legend("topleft", legend = c("Hermite", "Theoretical"),
col = c("red", "blue"), lty = c(1,2))
# QC of the indicator decomposition (OK)
# psi.LN.qc = rep(NaN, N)
# psi.LN.qc = m * (-sigma)^(0:(N-1)) / sqrt(factorial(0:(N-1)))
# plot(psi.LN, psi.LN.qc)
# abline(a = 0, b = 1, col = "red")
# Indicator
yc = ycuts[1]
psi.ind = hermiteCoefIndicator(yc = yc, nbpoly = N)
plot(Y, H %*% psi.ind, type = "l", col = "red",
xlab = "Gaussian value", ylab = "Raw value",
main = paste0("Indicator of ", yc))
lines (Y, as.numeric(Y >= yc), col = "blue", lty = 2)
abline(h = c(0,1), col = "gray", lty = 2)
legend("topleft", legend = c("Hermite", "Theoretical"),
col = c("red", "blue"), lty = c(1,2))
# QC of the indicator decomposition (OK)
# psi.ind.qc = rep(NaN, N)
# psi.ind.qc[1] = 1 - pnorm(yc)
# psi.ind.qc[-1] = -dnorm(yc)/sqrt(1:(N-1))*hermitePolynomials(yc, 1.0, N-1)
# plot(psi.ind, psi.ind.qc)
# abline(a = 0, b = 1, col = "red")
# Metal
yc = 2.0
psi.metal = hermiteCoefMetal(yc = yc, phi = psi.LN)
plot(Y, H %*% psi.metal, type = "l", col = "red",
xlab = "Gaussian value", ylab = "Raw value",
main = paste0("Metal for cutoff = ", yc))
lines (Y, m * exp(sigma * Y - 1/2 * sigma^2) * as.numeric(Y >= yc),
col = "blue", lty = 2)
legend("topleft", legend = c("Hermite", "Theoretical"),
col = c("red", "blue"), lty = c(1,2))
# QC of the metal decomposition (OK)
# psi.metal.qc = matrix(hermiteIncompleteIntegral(yc, N)$getValues(), N, N) %*% psi.LN
# plot(psi.metal, psi.metal.qc)
# abline(a = 0, b = 1, col = "red")
# generic function to test DK
compute_DK <- function(psi, name) {
N <- length(psi)
stopifnot(nfactor < N)
nm_estim <- paste0(name, ".estim")
nm_stdev <- paste0(name, ".stdev")
nm_qc_estim <- paste0(name, ".qc.estim")
nm_qc_stdev <- paste0(name, ".qc.stdev")
grid[nm_qc_estim] <- psi[1] + Hn.est %*% psi[-1][1:nfactor]
grid[nm_qc_stdev] <- sqrt(Hn.std^2 %*% (psi[-1][1:nfactor]^2) + sum(psi[-1][(nfactor+1):(N-1)]^2))
# Raw variable
p = ggplot()
p = p + plot.correlation(grid, namex = nm_estim, namey = nm_qc_estim,
bins=100, flagDiag=TRUE)
p = p + plot.decoration(xlab = nm_estim, ylab = nm_qc_estim,
title = "Disjunctive Kriging - Estimation value")
ggPrint(p)
p = ggplot()
p = p + plot.correlation(grid, namex = nm_stdev, namey = nm_qc_stdev,
bins=100, flagDiag=TRUE)
p = p + plot.decoration(xlab = nm_stdev, ylab = nm_qc_stdev,
title = "Disjunctive Kriging - Standard deviation")
ggPrint(p)
invisible()
}
# QC of the DK
Hn.est <- matrix(grid$getColumns(names = c("SK.F.*.estim"), useSel = FALSE),
ncol = nfactor, nrow = grid$getSampleNumber())
Hn.std <- matrix(grid$getColumns(names = c("SK.F.*.stdev"), useSel = FALSE),
ncol = nfactor, nrow = grid$getSampleNumber())
# estimation of Z using DK
psi <- anam$getPsiHns()
compute_DK(psi = psi, name = "DK.Z")
# estimation of the tonnage Z > yc
psi <- hermiteCoefIndicator(ycuts[1], anam$getNFactor())
print(paste0("Std(I) = ", round(sqrt(psi[1]*(1 - psi[1])),2)))
## [1] "Std(I) = 0.5"
print(paste0("Std(I)*= ", round(sqrt(sum(psi[-1]^2)),2)))
## [1] "Std(I)*= 0.47"
grid$setName(paste0("DK.T-estim-", zcuts[1]), "DK.T1.estim")
## NULL
grid$setName(paste0("DK.T-stdev-", zcuts[1]), "DK.T1.stdev")
## NULL
compute_DK(psi = psi, name = "DK.T1")
# estimation of the metal Z > yc
psi <- hermiteCoefMetal(yc = ycuts[1], phi = anam$getPsiHns())
grid$setName(paste0("DK.Q-estim-", zcuts[1]), "DK.Q1.estim")
## NULL
grid$setName(paste0("DK.Q-stdev-", zcuts[1]), "DK.Q1.stdev")
## NULL
compute_DK(psi = psi, name = "DK.Q1")
# Evaluation
sel <- (grid["SK.Z.stdev"] > 0.2)
grid["sel"] <- sel
grid$setLocator("sel", ELoc_SEL(), cleanSameLocator=TRUE)
## NULL
nm <-c("Z",
paste0(c("SK", "DK", "CE"), ".Z.estim"),
paste0(c("SK", "DK", "CE"), ".Z.stdev")
)
# Correlation actual values vs. estimated values
for (est in c("SK", "DK", "CE")) {
p = ggplot()
p = p + plot.correlation(grid, namex=paste0(est, ".Z.estim"), namey="Z",
bins=100, flagDiag = TRUE)
p = p + plot.decoration(xlab = "Estimated value", ylab = "Actual value",
title = paste(est, " of raw variable"))
ggPrint(p)
}
# Comparison of the estimators
p = ggplot()
p = p + plot.correlation(grid, namex="CE.Z.estim", namey="SK.Z.estim",
bins=100, flagDiag = TRUE)
p = p + plot.decoration(xlab = "Conditional expectation", ylab = "Simple kriging",
title = "Point raw variable")
ggPrint(p)
p = ggplot()
p = p + plot.correlation(grid, namex="DK.Z.estim", namey="SK.Z.estim",
bins=100, flagDiag = TRUE)
p = p + plot.decoration(xlab = "Disjunctive Kriging", ylab = "Simple kriging",
title = "Point raw variable")
ggPrint(p)
# Maps of the estimators
for (est in c("SK", "DK", "CE")){
for (type in c("estim", "stdev")){
p = ggplot()
p = p + plot(grid, paste0(est, ".Z.", type),
flagLegendRaster=TRUE, legendNameRaster=type)
p = p + plot.decoration(xlab = "Easting", ylab = "Northing",
title = paste("Estimator = ", est))
ggPrint(p)
if (type == "estim") col_fill = "orange"
if (type == "stdev") col_fill = "skyblue"
p = ggplot()
p = p + plot.hist(grid, name = paste0(est, ".Z.", type), bins = 50,
col = "gray", fill = col_fill)
p = p + plot.decoration(xlab = paste0(est, ".Z.", type),
title = paste("Estimator = ", est))
ggPrint(p)
}
}
# Mono variate statistics of the estimators
knitr::kable(dbStatisticsMono(grid, nm, opers=opers)$toTL(), digits=3,
caption = "Statistics of the estimated values")
Number | Minimum | Maximum | Mean | St. Dev. | |
---|---|---|---|---|---|
Z | 9500 | 0.154 | 8.198 | 1.562 | 0.846 |
SK.Z.estim | 9500 | 0.591 | 4.733 | 1.553 | 0.480 |
DK.Z.estim | 9500 | 0.499 | 4.178 | 1.555 | 0.452 |
CE.Z.estim | 9500 | 0.510 | 4.349 | 1.554 | 0.457 |
SK.Z.stdev | 9500 | 0.515 | 0.833 | 0.683 | 0.061 |
DK.Z.stdev | 9500 | 0.538 | 0.831 | 0.699 | 0.054 |
CE.Z.stdev | 9500 | 0.206 | 1.374 | 0.679 | 0.194 |
# Scoring rules
scoringRules <- function(db, nm_val, nm_est, nm_std){
val <- matrix(
db$getColumns(names = c(nm_val, nm_est, nm_std), useSel = TRUE),
nrow = db$getActiveSampleNumber(),
ncol = 3, byrow = FALSE)
err <- (val[,1] - val[,2])
nerr <- err / val[,3]
c(
mean(err), mean(abs(err)), sqrt(mean(err^2)),
mean(nerr),mean(abs(nerr)), sqrt(mean(nerr^2))
)
}
res <- matrix(c(
scoringRules(grid, "Z", "SK.Z.estim", "SK.Z.stdev"),
scoringRules(grid, "Z", "DK.Z.estim", "DK.Z.stdev"),
scoringRules(grid, "Z", "CE.Z.estim", "CE.Z.stdev")),
nrow = 3, ncol = 6, byrow = TRUE)
colnames(res) <- c("ME", "MAE", "RMSE", "MSE", "MASE", "RMSSE")
rownames(res) <- c("SK", "DK", "CE")
knitr::kable(res, digits=4,
caption = "Scores of point estimators of Z using 500 data points")
ME | MAE | RMSE | MSE | MASE | RMSSE | |
---|---|---|---|---|---|---|
SK | 0.0091 | 0.4849 | 0.6726 | 0.0157 | 0.7079 | 0.9784 |
DK | 0.0075 | 0.4818 | 0.6712 | 0.0131 | 0.6868 | 0.9539 |
CE | 0.0085 | 0.4804 | 0.6691 | -0.0072 | 0.6980 | 0.9122 |
TODO: 2023-01-31, it should be completed (XF)
K <- 10 # Number of folds
res_xval <- data$clone() # duplicate the data base
# cleaning the target data base
res_xval$deleteColumns("SK.*")
res_xval$deleteColumns("CE.*")
res_xval$deleteColumns("DK.*")
res_xval$deleteColumns("code")
# definition of the folds
res_xval["code"] <- sample(x = 1:K, size = res_xval$getSampleNumber(), replace = TRUE)
res_xval$setLocator("code", ELoc_C(), cleanSameLocator=TRUE)
# generic function for k-fold cross validation
# the db should have:
# - db: the data base with a "code" variable defining the folds
# - fn_estim: a R function which defines the interpolation process. Its prototype is int f(dbin, dbout)
# - nameconv: the naming convention
# the outputs are
# - err: code of error
# - db: computed variables (estimated value and the estimation Std.)
kfold_compute <- function(db, fn_estim, namconv = NamingConvention("kfold")){
# TODO
err = 0
err
}
# the evaluation of the estimator is:
# 1) define the fn_estim
# 2) compute the cross validation using the *kfold_compute* function
# 3) compute the scoring using the *scoringRules* function
# 4) display the results *knitr::kable(...))*
# TODO: not completed
Still to be done:
mise en oeuvre de la validation croisée et comparaison avec les ressources récupérables calculée sur la simulation initiale.
Exploiter les résultats du KD, EC et UC pour définir des ressources récupérables sur un polygone