Introduction

This case study is meant to demonstrate how to use gstlearn for non-linear geostatistics with the Gaussian model.

We consider three supports:

The variable of interest \(z\) is defined over the domain \(S \subset \mathbb{R}^d\) with \(d = 2\). The domain is uniformly divided into panels and each panel is regularly subdivided into blocs, hence defining two regular grids, the grid of panels and the grid of blocs.

Hence, we will have a data set and two grids.

The regionalised variable \(z(x)\) for \(x \in D \subset \mathbb{R}^d\) is modeled as a transform of the stationary Gaussian Random Function, \(Z(x) = \phi(Y(x))\) where

\(E\{Y(x)\} = 0\) and \(Cov\{Y(x), Y(x+h)\} = \rho_Y(h) = 1 -\gamma_Y (h)\)

As \(Var(Z) < +\infty\), \(\phi \in L^2(g)\) and it can be expressed as a linear combination of Hermite polynomials \(\phi(y) = \sum_{n = 1}^{+\infty} \phi_n H_n(y)\).

Non linear Geostatistics implements non linear estimators of non linear transforms of the variable of interest \(Z\). Two issues are addressed, the selection and the change of support, with the following tasks common in geosciences (i.e., in a mining context or for environmental studies),

For example, recovered mineral resources will be defined by the selected ore at a given cutoff, \(T(z_c) = 1_{Z \geq z_c}\), and the metal contained in the selected ore, \(Q(z_c) = Z \times 1_{Z \geq z_c}\).

The average value of the volume \(v\) is noted \(Z(v) = \frac{1}{|v|} \int_v Z(u) du\).

A first task is to predict marginal distribution of \(Z(v)\), or the histogram, knowing the histogram of \(Z\) and its spatial structure. A second task is to predict the conditional distribution of \(Z(v)\) given the spatial the prior spatial model and some observations. The first question is referred to as the global recoverable resources, and the second one as the local recoverable resources.

Three estimators will be illustrated:

EC and DK can be used to evaluate point recovery functions at a non observed point \(x_0\), \(1_{Z(x_0) \geq z_c}\) and \(Z(x_0) \times 1_{Z(x_0) \geq z_c}\). They can also be used to evaluate bloc recovery functions for a block \(v\), \(1_{Z(v) \geq z_c}\) and \(Z(v) \times 1_{Z(v) \geq z_c}\). DK can also evaluate recovery functions average on a bigger support, e.g. \(\frac{1}{N} \sum_{i=1}^{N} 1_{Z(v_i) \geq z_c}\) is the recovered ore on the panels \(V = \cup_{i=1}^{N}v_i\).

UC computes block recovery functions averaged on a panel.

Two change of support models are available for a Gaussian model, they are noted respectively \(DGM-1\) and \(DGM-2\).

Initially, we generate a realization of the variable on a fine grid, modeling the regionalized variable defined on the point support. This realization, or simulation, is sampled to defined (\(np\) points uniformly sampled on the domain) the input data set.

Initialisation

We will use the Geostatistics library gstlearn.

# Set the Seed for the Random Number generator
law_set_random_seed(32131)
## NULL
# Defining the format for dumping statistics of variables in a Db
dbfmt = DbStringFormat()
dbfmt$setFlags(flag_resume=TRUE,flag_vars=FALSE,flag_locator=TRUE)
## NULL
# Defining global options
flag.debug = FALSE

Setting the trace: this option allows dumping all the elements to check calculations when the rank of the target is equal to 1 (first block, first panel, …). This option is important for debugging but it creates lots of printout.

OptDbg_reset()
## NULL
if (flag.debug) OptDbg_setReference(1)

Defining color scales common to all future representations of estimated quantities:

ZEstMin = -3.
ZEstMax = +3.
TEstMin = 0.
TEstMax = 1.
TStdMin = 0.
TStdMax = 1.
QEstMin = 0.
QEstMax = 3.
QStdMin = 0.
QStdMax = 2.

Generate initial grids

Three grids are defined:

  • The grid of the samples. It representes a reference realization of the regionalized variable. This realization is sampled to define the data set of the observations. This data set is then used to evaluate the ability of the non linear technique to reproduce the selectivity curves computed on the reference simulation.

  • The grid of the panels

  • The grid of the blocs

# grid of samples
nx_S = c(100,100)
dx_S = c(0.01, 0.01)
# grid of panels
dx_P = 0.25
# grid of blocs
nx_B = 5
dx_B = dx_P / nx_B

# Generate initial grid
grid = DbGrid_create(nx = nx_S, dx = dx_S)
grid$display()
## 
## Data Base Grid Characteristics
## ==============================
## 
## Data Base Summary
## -----------------
## File is organized as a regular grid
## Space dimension              = 2
## Number of Columns            = 3
## Total number of samples      = 10000
## 
## Grid characteristics:
## ---------------------
## Origin :      0.000     0.000
## Mesh   :      0.010     0.010
## Number :        100       100
## 
## Variables
## ---------
## Column = 0 - Name = rank - Locator = NA
## Column = 1 - Name = x1 - Locator = x1
## Column = 2 - Name = x2 - Locator = x2
## NULL
# Create grid of panels covering the simulated area
panels = DbGrid_createCoveringDb(grid, dx=c(dx_P,dx_P))
panels$display()
## 
## Data Base Grid Characteristics
## ==============================
## 
## Data Base Summary
## -----------------
## File is organized as a regular grid
## Space dimension              = 2
## Number of Columns            = 2
## Total number of samples      = 25
## 
## Grid characteristics:
## ---------------------
## Origin :      0.000     0.000
## Mesh   :      0.250     0.250
## Number :          5         5
## 
## Variables
## ---------
## Column = 0 - Name = x1 - Locator = x1
## Column = 1 - Name = x2 - Locator = x2
## NULL
# Discretization with a grid of blocks which covers the simulated area
blocs = DbGrid_createCoveringDb(grid, dx=c(dx_B,dx_B))
blocs$display()
## 
## Data Base Grid Characteristics
## ==============================
## 
## Data Base Summary
## -----------------
## File is organized as a regular grid
## Space dimension              = 2
## Number of Columns            = 2
## Total number of samples      = 441
## 
## Grid characteristics:
## ---------------------
## Origin :      0.000     0.000
## Mesh   :      0.050     0.050
## Number :         21        21
## 
## Variables
## ---------
## Column = 0 - Name = x1 - Locator = x1
## Column = 1 - Name = x2 - Locator = x2
## NULL

Simulation of the data set

A lognormal model is defined and a simulation is performed on the grid of samples.

The regionalized variable \(z(x)\) is modeled using a lognormal model \(Z(x) = m \times e^{\sigma \, Y(x) - 1/2 \sigma^2}\) with

  • \(Y\) a stationary Gaussian Random Function with an exponential variogram with a range equal to \(0.1\) and a sill equal to \(1.0\). The mean of \(Y\) is null.

  • The lognormal transform is parametrized by its mean \(m\) and the dispersion coefficient \(\sigma\). The first two moments are:

    • Mean \(E\{Z\} = m\)
    • Variance \(Var\{Z\} = m^2 (e^{\sigma^2}-1)\)
# Simulation of the Gaussian variable
model_init = Model_createFromParam(ECov_EXPONENTIAL(), range=0.1, sill=1.)
err = simtub(NULL, grid, model_init, namconv=NamingConvention("Y"))
# Nonlinear transform (lognormal)
m_Z = 1.5
s_Z = 0.5
grid["Z"] = m_Z * exp(s_Z * grid["Y"] - 0.5*s_Z^2)

p = ggplot()
p = p + plot.grid(grid, "Z")
ggPrint(p)

p = ggplot()
p = p + plot.hist(grid, name = "Y", bins=100, col='gray', fill='skyblue')
p = p + plot.decoration(xlab = "Y", title = "Simulated samples (Y variable)")
ggPrint(p)

p = ggplot()
p = p + plot.hist(grid, name = "Z", bins = 100, col = "gray", fill = "orange")
p = p + plot.decoration(xlab = "Z", title = "Simulated samples (Z variable)")
ggPrint(p)

opers = EStatOption_fromKeys(c("NUM","MINI","MAXI","MEAN","STDV"))
stats = dbStatisticsMono(grid, c("Y","Z"), opers=opers)
knitr::kable(stats$toTL(), digits=2,
    caption = "Statistics of the simulated variables on the point support")
Statistics of the simulated variables on the point support
Number Minimum Maximum Mean St. Dev.
Y 10000 -4.30 3.65 0.07 1.01
Z 10000 0.15 8.20 1.56 0.85

Data extraction

We create a new data set by extracting few samples from the previous grid.

np = 500
data = Db_createSamplingDb(grid, number=np, names=c("x1","x2","Y","Z"))
data$setLocator("Z", ELoc_Z())
## NULL
data$display()
## 
## Data Base Characteristics
## =========================
## 
## Data Base Summary
## -----------------
## File is organized as a set of isolated points
## Space dimension              = 2
## Number of Columns            = 5
## Total number of samples      = 500
## 
## Variables
## ---------
## Column = 0 - Name = rank - Locator = NA
## Column = 1 - Name = x1 - Locator = x1
## Column = 2 - Name = x2 - Locator = x2
## Column = 3 - Name = Y - Locator = NA
## Column = 4 - Name = Z - Locator = z1
## NULL
p = ggplot()
p = p + plot.point(data, nameColor = "Z", nameSize = "Z", flagLegendSize = TRUE)
ggPrint(p)

p = ggplot()
p = p + plot.hist(data, name = "Y", bins = 100, col = "gray", fill = "skyblue")
p = p + plot.decoration(xlab = "Y", title = "Sampled data (Y variable)")
ggPrint(p)

p = ggplot()
p = p + plot.hist(data, name = "Z", bins = 100, col = "gray", fill = "orange")
p = p + plot.decoration(xlab = "Z", title = "Sampled data (Z variable)")
ggPrint(p)

stats = dbStatisticsMono(data, c("Y","Z"), opers=opers)
knitr::kable(stats$toTL(), digits=2, 
             caption = "Statistics of the sampled data on the point support")
Statistics of the sampled data on the point support
Number Minimum Maximum Mean St. Dev.
Y 500 -3.42 2.99 0.03 1.04
Z 500 0.24 5.92 1.54 0.84
varZ <- stats$getValue(1,4)^2

Gaussian Anamorphosis with 20 coefficients

anam = AnamHermite_create(nbpoly=20)
err = anam$fit(data, "Z")
anam
## 
## Hermitian Anamorphosis
## ----------------------
## Minimum absolute value for Y  = -2.8
## Maximum absolute value for Y  = 3.9
## Minimum absolute value for Z  = 0.274228
## Maximum absolute value for Z  = 5.99041
## Minimum practical value for Y = -2.8
## Maximum practical value for Y = 3.1
## Minimum practical value for Z = 0.274228
## Maximum practical value for Z = 5.86055
## Mean                          = 1.53861
## Variance                      = 0.709613
## Number of Hermite polynomials = 20
## Normalized coefficients for Hermite polynomials (punctual variable)
##                [,  0]    [,  1]    [,  2]    [,  3]    [,  4]    [,  5]    [,  6]
##      [  0,]     1.539    -0.792     0.273    -0.061    -0.009     0.032    -0.031
##      [  7,]     0.010     0.018    -0.021    -0.001     0.016    -0.009    -0.007
##      [ 14,]     0.012     0.000    -0.012     0.004     0.010    -0.006

Selectivity curves

We focus on Tonnage (T) and Metal Quantity (Q) for few cutoffs (Zcuts).

Zcuts <- c(0.0, 0.5, 0.75, 1.0)
selectivity = Selectivity_createByKeys(
  keys = c("T", "Q"), zcuts=Zcuts,
  flag_est=TRUE, flag_std=TRUE)

# Global experimental selectivity, calculated form the experimental Data Set
table = selectivity$eval(data)
table$setTitle("Selectivity curves computed on data set")
## NULL
table
## 
## Selectivity curves computed on data set
## ---------------------------------------
##       Z-Cut    T-estim    Q-estim    B-estim    M-estim
##       0.000      1.000      1.539      1.539      1.539
##       0.500      0.972      1.527      1.041      1.571
##       0.750      0.858      1.453      0.810      1.694
##       1.000      0.718      1.330      0.612      1.853
# Selectivity in the model, derived from the parameters contained in the Anamorphosis
table = selectivity$evalFromAnamorphosis(anam)
table$setTitle("Selectivity curves computed on anamorphosis")
## NULL
table
## 
## Selectivity curves computed on anamorphosis
## -------------------------------------------
##       Z-Cut    T-estim    Q-estim    B-estim    M-estim
##       0.000      1.000      1.539      1.539      1.539
##       0.500      0.972      1.527      1.041      1.570
##       0.750      0.867      1.461      0.810      1.684
##       1.000      0.716      1.327      0.611      1.853

Transform Data into Gaussian variable

data["Gaussian.Z"] <- VectorHelper_normalScore(data["Z"])
p = ggplot()
p = p + plot.hist(data, name = "Gaussian.Z", bins = 25, col = "gray", fill = "yellow") 
p = p + plot.decoration(xlab = "Normal score")
ggPrint(p)

data$display()
## 
## Data Base Characteristics
## =========================
## 
## Data Base Summary
## -----------------
## File is organized as a set of isolated points
## Space dimension              = 2
## Number of Columns            = 6
## Total number of samples      = 500
## 
## Variables
## ---------
## Column = 0 - Name = rank - Locator = NA
## Column = 1 - Name = x1 - Locator = x1
## Column = 2 - Name = x2 - Locator = x2
## Column = 3 - Name = Y - Locator = NA
## Column = 4 - Name = Z - Locator = z1
## Column = 5 - Name = Gaussian.Z - Locator = NA
## NULL

Variography

Define the variogram calculation parameters:

varioparam = VarioParam_createOmniDirection(npas=10, dpas=0.025)

Variography of the raw variable

Define the variogram calculation parameters:

# Computing the experimental variogram
err = data$setLocator("Z", ELoc_Z())
vario_raw = Vario_computeFromDb(varioparam, db=data)

# Fitting the variogram model on the experimental variogram
model_raw = Model_create()
err = model_raw$fit(vario_raw, 
                    types = ECov_fromKeys(c("NUGGET", "EXPONENTIAL", "EXPONENTIAL")), 
                    constraints=Constraints(varZ))
model_raw$display()
## 
## Model characteristics
## =====================
## Space dimension              = 2
## Number of variable(s)        = 1
## Number of basic structure(s) = 2
## Number of drift function(s)  = 0
## Number of drift equation(s)  = 0
## 
## Covariance Part
## ---------------
## Nugget Effect
## - Sill         =      0.114
## Exponential
## - Sill         =      0.597
## - Range        =      0.124
## - Theo. Range  =      0.041
## Total Sill     =      0.711
## NULL
p = ggplot()
p = p + plot.varmod(vario_raw, model_raw)
p = p + plot.decoration(title = "Raw variable")
ggPrint(p)

Variography of the Gaussian variable

# Computing of the experimental variogram
err = data$setLocator("Gaussian.Z", ELoc_Z(), cleanSameLocator=TRUE)
vario = Vario_computeFromDb(varioparam, data)

# Fitting the Model on the experimental variogram with a sill equal to one.
model = Model()
err = model$fit(vario, 
                types = ECov_fromKeys(c("EXPONENTIAL", "EXPONENTIAL")), 
                constraints=Constraints(1.))
model$display()
## 
## Model characteristics
## =====================
## Space dimension              = 2
## Number of variable(s)        = 1
## Number of basic structure(s) = 2
## Number of drift function(s)  = 0
## Number of drift equation(s)  = 0
## 
## Covariance Part
## ---------------
## Exponential
## - Sill         =      0.234
## - Range        =      0.017
## - Theo. Range  =      0.006
## Exponential
## - Sill         =      0.766
## - Range        =      0.140
## - Theo. Range  =      0.047
## Total Sill     =      1.000
## NULL
p = ggplot()
p = p + plot.varmod(vario, model)
p = p + plot.decoration(title = "Gaussian variable")
ggPrint(p)

model_Y = model$clone()

err = model$setAnam(anam)

Checking the experimental variogram of the Raw variable against its Model derived from the Model of the Gaussian transform.

model$setActiveFactor(-1)
## NULL
p = ggplot()
p = p + plot.varmod(vario_raw, model)
p = p + plot.decoration(title = "Raw variable (model of the Gaussian)")
ggPrint(p)

model$setActiveFactor(0)
## NULL

Creating a Moving Neighborhood

nmini = 5
nmaxi = 5
radius = 1.
neigh = NeighMoving_create(nmaxi=nmaxi, radius=radius, nmini=nmini)
neigh
## 
## Moving Neighborhood
## ===================
## Minimum number of samples           = 5
## Maximum number of samples           = 5
## Maximum horizontal distance         = 1

Nonlinear estimates with the Gaussian model

A unified workflow for the nonlinear techniques available with the Gaussian model is proposed below. The estimators are: the conditional expectation (CE), the disjunctive kriging (DK), and the uniform conditioning (UC).

The workflow is:

  1. Pre-processing of the input dataset using an anamorphosis. It is required for CE and DK.

    • CE: computation of the Gaussian values from the raw values

    • DK: computation of the values of the Hermite polynomials from the raw values

    • UC: No pre-processing is required.

  2. Computing the kriging

    • CE: computation of simple kriging of the Gaussian variable (estimation value and standard deviation). The standard function kriging is used. Ordinary kriging may be used.

    • DK: computation of the kriging of the Hermite polynomials from the raw values. The function krigingFactors is used to loop over the factors and define the proper covariance function.

    • UC: computation of ordinary/simple kriging of the raw variable (estimation value and variance of the estimator). The standard function kriging is used.

    Here, we have

    kriging(data, target, model, ….), or krigingFactors(data, target, model, …)

  3. Post-processing to calculate the estimator of the selectivity curves or any nonlinear transform from the kriging of the factors or Gaussian variable.

    • CE: dedicated function ConditionExpectation(db, anam, selectivity, …)

    • DK: dedicated function DisjunctiveKriging(db, anam, selectivity, …)

    • UC: dedicated function UniformConditioning(db, anam, selectivity, …)

Two versions are now considered: the point estimate and the block estimate.

Point estimate

Two methods are presented:

Conditional Expectation

The Conditional Expectation is used to estimate selectivity curves for a point support at the grid nodes (i.e. the center of the blocs).

Preprocessing

The Gaussian variable has been computed earlier.

Kriging of the Gaussian variable

Simple Kriging is used to estimate the point value of the Gaussian variable at the grid nodes (point support). The known mean is equal to zero.

err = model$delDrift(0)
err = model$setMean(0.)
data$setLocator("Gaussian.Z", ELoc_Z(), cleanSameLocator=TRUE)
## NULL
err = kriging(data, blocs, model, neigh, 
              calcul = EKrigOpt_POINT(),
              namconv= NamingConvention("G_Pts"))

p = ggplot()
p = p + plot(blocs, nameRaster="G_Pts.Gaussian.Z.estim", 
         flagLegendRaster=TRUE, legendNameRaster = "Estimation")
p = p + plot.decoration(title = "Punctual Simple Kriging of Y")
ggPrint(p)

p = ggplot()
p = p + plot(blocs, nameRaster="G_Pts.Gaussian.Z.stdev", 
         flagLegendRaster=TRUE, legendNameRaster = "Std. Dev.")
p = p + plot.decoration(title = "Punctual Simple Kriging of Y")
ggPrint(p)

Calculating the conditional expectation on blocs

Kriging has been computed, the back transform is computed for a punctual suport:

\[ \psi(Y(x_0))^{CE} = \int \psi(y(x_0)^{SK}+\sigma_{SK} \, u)g(u)du \] It is possible to check the computed values for ore:

\[ (1_{Y(x_0)\geq y_c})^{CE} = 1 - G(\frac{y_c - y(x_0)^{SK}}{\sigma_{SK}}) \]

For change of support calculation DGM kriging (krigdgm) should be used. To be checked ?

err = ConditionalExpectation(db = blocs, 
                             anam = anam, 
                             selectivity = selectivity, 
                             name_est="G_Pts*estim", 
                             name_std="G_Pts*stdev",
                             namconv = NamingConvention("Pts_Recovery", FALSE))

blocs$display()
## 
## Data Base Grid Characteristics
## ==============================
## 
## Data Base Summary
## -----------------
## File is organized as a regular grid
## Space dimension              = 2
## Number of Columns            = 20
## Total number of samples      = 441
## 
## Grid characteristics:
## ---------------------
## Origin :      0.000     0.000
## Mesh   :      0.050     0.050
## Number :         21        21
## 
## Variables
## ---------
## Column = 0 - Name = x1 - Locator = x1
## Column = 1 - Name = x2 - Locator = x2
## Column = 2 - Name = G_Pts.Gaussian.Z.estim - Locator = NA
## Column = 3 - Name = G_Pts.Gaussian.Z.stdev - Locator = NA
## Column = 4 - Name = Pts_Recovery.T-estim-0 - Locator = NA
## Column = 5 - Name = Pts_Recovery.T-estim-0.5 - Locator = NA
## Column = 6 - Name = Pts_Recovery.T-estim-0.75 - Locator = NA
## Column = 7 - Name = Pts_Recovery.T-estim-1 - Locator = NA
## Column = 8 - Name = Pts_Recovery.T-stdev-0 - Locator = NA
## Column = 9 - Name = Pts_Recovery.T-stdev-0.5 - Locator = NA
## Column = 10 - Name = Pts_Recovery.T-stdev-0.75 - Locator = NA
## Column = 11 - Name = Pts_Recovery.T-stdev-1 - Locator = NA
## Column = 12 - Name = Pts_Recovery.Q-estim-0 - Locator = NA
## Column = 13 - Name = Pts_Recovery.Q-estim-0.5 - Locator = NA
## Column = 14 - Name = Pts_Recovery.Q-estim-0.75 - Locator = NA
## Column = 15 - Name = Pts_Recovery.Q-estim-1 - Locator = NA
## Column = 16 - Name = Pts_Recovery.Q-stdev-0 - Locator = NA
## Column = 17 - Name = Pts_Recovery.Q-stdev-0.5 - Locator = NA
## Column = 18 - Name = Pts_Recovery.Q-stdev-0.75 - Locator = NA
## Column = 19 - Name = Pts_Recovery.Q-stdev-1 - Locator = z1
## NULL
# Graphics
# Metal (the anamorphosis is used)
p = ggplot()
p = p + plot(blocs, nameRaster="Pts_Recovery.Q-estim-0",
         flagLegendRaster=TRUE, legendNameRaster= "Q(0.0)")
p = p + plot.decoration(title = "Point conditional expectation")
ggPrint(p)

# Tonnage (the anamorphosis is not used)
p = ggplot()
p = p + plot(blocs, nameRaster="Pts_Recovery.T-estim-0",
         flagLegendRaster = TRUE, legendNameRaster = "T(0.0)")
p = p + plot.decoration(title = "Point conditional expectation")
ggPrint(p)

p = ggplot()
p = p + plot(blocs, nameRaster="Pts_Recovery.T-estim-1",
     flagLegendRaster=TRUE, legendNameRaster = "T(1.0)")
p = p + plot.decoration(title = "Point conditional expectation")
ggPrint(p)

Values computed by the anamorphosis are invalid…. Est-ce toujours vrai (DR) ? Si non, supprimer le paragraphe QC suivant.

# QC of Conditional Expectation
Ycuts <- anam$rawToGaussianVector(Zcuts)
indsel = blocs["G_Pts.Gaussian.Z.stdev"] <= 0.

# Tonnage à zc = 1.0
blocs["T_1"] <- 1 - VectorHelper_pnormVec((Ycuts[4] - blocs["G_Pts.Gaussian.Z.estim"]) /
                                           blocs["G_Pts.Gaussian.Z.stdev"])
blocs["T_1"][indsel] = NA

p = ggplot()
p = p + plot.correlation(blocs, namex="Pts_Recovery.T-estim-1", namey="T_1", 
                         bins=100, flagDiag=TRUE)
p = p + plot.decoration(xlab = "Computed value", ylab = "Theoretical value",
                title = "Conditional Expectation: T(1.0)")
ggPrint(p)

print(paste0(">>> Sum of differences for T(1.00): ",
             sum(blocs["T_1"] != blocs["Pts_Recovery.T-estim-1"], 
                 na.rm=TRUE)))
## [1] ">>> Sum of differences for T(1.00): 0"
# Tonnage à zc = 0.75

blocs["T_0.75"] <- 1 - VectorHelper_pnormVec((Ycuts[3] -
                   blocs["G_Pts.Gaussian.Z.estim"]) / blocs["G_Pts.Gaussian.Z.stdev"])
blocs["T_0.75"][indsel] = NA

p = ggplot()
p = p + plot.correlation(blocs, namex="Pts_Recovery.T-estim-0.75", namey="T_0.75", 
                        bins=100, flagDiag=TRUE)
p = p + plot.decoration(xlab = "Computed value", ylab = "Theoretical value",
                title = "Conditional Expectation: T(0.75)")
ggPrint(p)

print(paste0(">>> Sum of differences for T(0.75): ",
             sum(blocs["T_0.75"] != blocs["Pts_Recovery.T-estim-0.75"],
                 na.rm=TRUE)))
## [1] ">>> Sum of differences for T(0.75): 0"

Disjunctive Kriging with the Gaussian model

Preprocessing

#mm <- model$clone()
# model of the Gaussian variable
p = ggplot()
p = p + plot.varmod(vario, model)
p = p + plot.decoration(title = "Variogram of the Gaussian variable")
ggPrint(p)

# attach the anamorphosis to the model
err = model$setAnam(anam)

# model of the raw variable as the anamorphosis is attached to the model
err = model$setActiveFactor(-1)
p = ggplot()
p = p + plot.varmod(vario_raw, model)
p = p + plot.decoration(title = "Variogram of the raw variable")
ggPrint(p)

err = model$setActiveFactor(0)

Computing the point factors (i.e., the values of the Hermite polynomials). The following plot gives the proportion of the variance explained by the factors. It indicates the relevant number of factors that should be considered afterwards.

plot(1:(anam$getNFactor()-1),anam$cumulateVarianceRatio(1.), 
     ylim = c(0,1),
     xlab = "Number of factors", ylab = "Variance proportion",
     type = "b", pch = 19, col = "gray")

Plot the variogram model of the factors (rho^n in the Gaussian model)

hmax = 0.2
nf = anam$getNFactor()
asCov = FALSE # Pb. if asCov = FALSE (should be 1 - rho^n and not (1-rho)^n)

# model of the factors
p = ggplot()
for (i in 1:nf) 
{
  model$setActiveFactor(i)
  p <- p + plot.model(model, hmax=hmax, asCov = asCov)
}
p = p + geom_hline(yintercept = 1.0, colour = "red")
p = p + plot.decoration(title = "Covariance of the different factors")
ggPrint(p)

# model of the raw variable
err = model$setActiveFactor(-1)
p = ggplot()
p = p + plot.model(model, hmax=hmax, asCov = asCov)
p = p + geom_hline(yintercept = varZ, colour = "red")
p = p + plot.decoration(title = "Covariance of the raw variable")
ggPrint(p)

Computation of the factors: H1, H2 et H3… They correspond to the Hermite polynomials. Their presence will dictate the number of terms used in the Hermite expansion in subsequent calculations.

nfactor = 3
err = anam$rawToFactor(db = data, nfactor = nfactor,
                       namconv = NamingConvention("F", FALSE))

Simple Point Kriging over the blocs

What is performed? Compute the (simple) kriging of the factors. Simple Kriging is used and point or block estimate can be used as kriging is a linear operator.

err = krigingFactors(dbin   = data,   # Input data set (a Db structure)
                     dbout  = blocs,  # Output data set (a DbGrid structure)
                     model  = model, neigh, calcul= EKrigOpt_POINT(),
                     namconv = NamingConvention("DK_Pts"))
blocs$display()
## 
## Data Base Grid Characteristics
## ==============================
## 
## Data Base Summary
## -----------------
## File is organized as a regular grid
## Space dimension              = 2
## Number of Columns            = 28
## Total number of samples      = 441
## 
## Grid characteristics:
## ---------------------
## Origin :      0.000     0.000
## Mesh   :      0.050     0.050
## Number :         21        21
## 
## Variables
## ---------
## Column = 0 - Name = x1 - Locator = x1
## Column = 1 - Name = x2 - Locator = x2
## Column = 2 - Name = G_Pts.Gaussian.Z.estim - Locator = NA
## Column = 3 - Name = G_Pts.Gaussian.Z.stdev - Locator = NA
## Column = 4 - Name = Pts_Recovery.T-estim-0 - Locator = NA
## Column = 5 - Name = Pts_Recovery.T-estim-0.5 - Locator = NA
## Column = 6 - Name = Pts_Recovery.T-estim-0.75 - Locator = NA
## Column = 7 - Name = Pts_Recovery.T-estim-1 - Locator = NA
## Column = 8 - Name = Pts_Recovery.T-stdev-0 - Locator = NA
## Column = 9 - Name = Pts_Recovery.T-stdev-0.5 - Locator = NA
## Column = 10 - Name = Pts_Recovery.T-stdev-0.75 - Locator = NA
## Column = 11 - Name = Pts_Recovery.T-stdev-1 - Locator = NA
## Column = 12 - Name = Pts_Recovery.Q-estim-0 - Locator = NA
## Column = 13 - Name = Pts_Recovery.Q-estim-0.5 - Locator = NA
## Column = 14 - Name = Pts_Recovery.Q-estim-0.75 - Locator = NA
## Column = 15 - Name = Pts_Recovery.Q-estim-1 - Locator = NA
## Column = 16 - Name = Pts_Recovery.Q-stdev-0 - Locator = NA
## Column = 17 - Name = Pts_Recovery.Q-stdev-0.5 - Locator = NA
## Column = 18 - Name = Pts_Recovery.Q-stdev-0.75 - Locator = NA
## Column = 19 - Name = Pts_Recovery.Q-stdev-1 - Locator = NA
## Column = 20 - Name = T_1 - Locator = NA
## Column = 21 - Name = T_0.75 - Locator = NA
## Column = 22 - Name = DK_Pts.F.1.estim - Locator = z1
## Column = 23 - Name = DK_Pts.F.2.estim - Locator = z2
## Column = 24 - Name = DK_Pts.F.3.estim - Locator = z3
## Column = 25 - Name = DK_Pts.F.1.stdev - Locator = NA
## Column = 26 - Name = DK_Pts.F.2.stdev - Locator = NA
## Column = 27 - Name = DK_Pts.F.3.stdev - Locator = NA
## NULL

Consistency between the simple kriging of the Gaussian variable and the kriging of F.1. As F.1 = -Y, its kriging should follow (F.1)^{SK} = (-Y)^{SK}

# H_1 = -Y
p = ggplot()
p = p + plot.correlation(blocs, 
                 namex = "G_Pts.Gaussian.Z.estim",
                 namey = "DK_Pts.F.1.estim", 
                 bins=100, flagDiag = FALSE)
p = p + geom_abline(intercept = 0, slope = -1, colour = "red")
p = p + plot.decoration(xlab = "Gaussian variable",
                ylab = "H_1 = -Y",
                title = "Values of the Simple Kriging")
ggPrint(p)

p = ggplot()
p = p + plot.correlation(blocs, 
                 namex = "G_Pts.Gaussian.Z.stdev",
                 namey = "DK_Pts.F.1.stdev", 
                 bins=100, flagDiag = TRUE)
p = p + plot.decoration(xlab = "Gaussian variable",
                ylab = "H_1 = -Y",
                title = "Standard Deviation of the Simple Kriging error")
ggPrint(p)

Computation of the disjunctive kriging

The final aim is to estimate a nonlinear function of the Gaussian variable \[\psi(Y(x)) = \sum_{n=0}^{nf} \psi_n \, H_n(Y(x))\] As kriging is a linear operator and the Hermite polynomials are orthogonal, \[(\psi(Y(x))^{DK} = \sum_{n=0}^{nf} \psi_n \, H_n^{SK}(Y(x))\] and \[Var\{\psi(Y(x)) - \psi(Y(x))^{DK}\} = \sum_{n=1}^{nf} \psi_n^2 \, \sigma^2_{SK}(n)+ \sum_{n=nf+1}^{+\infty} \psi_n^2= Var\{\psi(Y(x))\}-\sum_{n=1}^{nf} \psi_n^2 \, (1-\sigma^2_{SK}(n))\]

We have first to calculate the Hermite coefficients of the selectivity curve or the nonlinear transform to be evaluated and then compute the linear combinations.

  • Raw variable, \(\psi(y) = \phi(y)\)

    Hence, \(\psi_n = \phi_n\)

  • Recovered ore, or the indicator above the cutoff, \(\psi(y) = 1_{y \geq y_c}\),

    Hence, \(\psi_0 = 1- G(y_c)\) and \(\psi_n = -\frac{g(y_c)}{\sqrt{n}} H_{n-1}(y_c)\) for \(n>0\).

  • Recovered metal above the cutoff, \(\psi(y) = \phi(y)\times 1_{y \geq y_c}\),

    Hence, \(\psi_n = \sum_{p \geq 0} \phi_p \, \alpha_{n,p}(y_c)\) with \(\alpha_{n,p}(y_c) = \int_{y_c}^{+\infty} H_n(u)H_p(u) g(u)du\).

We should also be able to compute non linear transforms such as:

  • The value over a limit, \(\psi(y) = \max (y_c, y)\),

    Hence, \(\psi_0 = g(y_c) + y_c\, G(y_c)\), \(\psi_1 = G(y_c)-1\), and \(\psi_n = \frac{g(y_c)}{\sqrt{n\times(n-1)}} H_{n-2}(y_c)\) for \(n > 1\).

  • The lognormal transform, \(\psi(y) = m \times e^{\sigma\, y - \frac{1}{2} \sigma^2}\)

    Hence, \(\psi_n = m \, e^{-\frac{1}{2}\sigma^2} \, \frac{(-\sigma)^n}{\sqrt{n!}}\) for \(n \geq 0\).

  • etc.

err = DisjunctiveKriging(db = blocs, 
                         anam = anam, 
                         selectivity = selectivity, 
                         name_est="DK_Pts.F.*.estim", 
                         name_std="DK_Pts.F.*.stdev",
                         namconv = NamingConvention("DK_Pts.Recovery", FALSE))

Simple Block Kriging over the blocs

Can be used to estimate the average of the point selectivity curve on a block, e.g. \[ (\frac{1}{|v|}\int_v 1_{Y(u) \geq y_c} du)^{DK} = \sum_{n=0}^{nf} \psi_n \, \{\frac{1}{|v|}\int_v H_n(Y(u)) du\}^{SK} \] The post-processing is identical using DisjunctiveKriging.

# The number of estimated factors is defined by the number of selected variables 
# in the input data base
ndisc_B = c(5, 5)
err = krigingFactors(dbin   = data,   # Input data set (a Db structure)
                     dbout  = blocs,  # Output data set (a DbGrid structure)
                     model  = model, neigh, calcul= EKrigOpt_BLOCK(), ndisc = ndisc_B,
                     namconv = NamingConvention("DK_Blk"))
blocs$display()
## 
## Data Base Grid Characteristics
## ==============================
## 
## Data Base Summary
## -----------------
## File is organized as a regular grid
## Space dimension              = 2
## Number of Columns            = 34
## Total number of samples      = 441
## 
## Grid characteristics:
## ---------------------
## Origin :      0.000     0.000
## Mesh   :      0.050     0.050
## Number :         21        21
## 
## Variables
## ---------
## Column = 0 - Name = x1 - Locator = x1
## Column = 1 - Name = x2 - Locator = x2
## Column = 2 - Name = G_Pts.Gaussian.Z.estim - Locator = NA
## Column = 3 - Name = G_Pts.Gaussian.Z.stdev - Locator = NA
## Column = 4 - Name = Pts_Recovery.T-estim-0 - Locator = NA
## Column = 5 - Name = Pts_Recovery.T-estim-0.5 - Locator = NA
## Column = 6 - Name = Pts_Recovery.T-estim-0.75 - Locator = NA
## Column = 7 - Name = Pts_Recovery.T-estim-1 - Locator = NA
## Column = 8 - Name = Pts_Recovery.T-stdev-0 - Locator = NA
## Column = 9 - Name = Pts_Recovery.T-stdev-0.5 - Locator = NA
## Column = 10 - Name = Pts_Recovery.T-stdev-0.75 - Locator = NA
## Column = 11 - Name = Pts_Recovery.T-stdev-1 - Locator = NA
## Column = 12 - Name = Pts_Recovery.Q-estim-0 - Locator = NA
## Column = 13 - Name = Pts_Recovery.Q-estim-0.5 - Locator = NA
## Column = 14 - Name = Pts_Recovery.Q-estim-0.75 - Locator = NA
## Column = 15 - Name = Pts_Recovery.Q-estim-1 - Locator = NA
## Column = 16 - Name = Pts_Recovery.Q-stdev-0 - Locator = NA
## Column = 17 - Name = Pts_Recovery.Q-stdev-0.5 - Locator = NA
## Column = 18 - Name = Pts_Recovery.Q-stdev-0.75 - Locator = NA
## Column = 19 - Name = Pts_Recovery.Q-stdev-1 - Locator = NA
## Column = 20 - Name = T_1 - Locator = NA
## Column = 21 - Name = T_0.75 - Locator = NA
## Column = 22 - Name = DK_Pts.F.1.estim - Locator = NA
## Column = 23 - Name = DK_Pts.F.2.estim - Locator = NA
## Column = 24 - Name = DK_Pts.F.3.estim - Locator = NA
## Column = 25 - Name = DK_Pts.F.1.stdev - Locator = NA
## Column = 26 - Name = DK_Pts.F.2.stdev - Locator = NA
## Column = 27 - Name = DK_Pts.F.3.stdev - Locator = NA
## Column = 28 - Name = DK_Blk.F.1.estim - Locator = z1
## Column = 29 - Name = DK_Blk.F.2.estim - Locator = z2
## Column = 30 - Name = DK_Blk.F.3.estim - Locator = z3
## Column = 31 - Name = DK_Blk.F.1.stdev - Locator = NA
## Column = 32 - Name = DK_Blk.F.2.stdev - Locator = NA
## Column = 33 - Name = DK_Blk.F.3.stdev - Locator = NA
## NULL

Comparing Point and Block estimations and standard deviation of Estimation errors (this comparison is performed on the results of the first factors only).

p = ggplot()
p = p + plot.correlation(blocs, namex = "DK_Pts*1.estim", namey = "DK_Blk*1.estim",
                     bins = 100, flagDiag=TRUE)
p = p + plot.decoration(xlab = "Point estimation", ylab = "Block estimation", 
                title = "First factor (Gaussian model)")
ggPrint(p)

p = ggplot()
p = p + plot.correlation(blocs, namex = "DK_Pts*2.estim", namey = "DK_Blk*2.estim",
                     bins = 100, flagDiag=TRUE)
p = p + plot.decoration(xlab = "Point estimation", ylab = "Block estimation", 
                 title = "Second factor (Gaussian model)")
ggPrint(p)

p = ggplot()
p = p + plot.correlation(blocs, namex = "DK_Pts*3.estim", namey = "DK_Blk*3.estim",
                 bins=100, flagDiag = TRUE)
p = p + plot.decoration(xlab = "Point estimation", ylab = "Block estimation", 
                 title = "Third factor (Gaussian model)")
ggPrint(p)

Simple Block Kriging over the panels

Can be used to estimate the average of the point selectivity curve on a panel.

XF: factors should be used instead if DK has to be computed.

# The number of estimated factors is defined by the number of selected variables 
# in the input data base
ndisc_P = c(10, 10)
data$setLocator("Gaussian.Z", ELoc_Z(), cleanSameLocator=TRUE)
## NULL
err = kriging(dbin   = data,   # Input data set (a Db structure).
              dbout  = panels, # Output data set (a DbGrid structure)
              model  = model, neigh, calcul= EKrigOpt_BLOCK(), ndisc = ndisc_P,
              namconv = NamingConvention("DK_Blk"))
panels$display()
## 
## Data Base Grid Characteristics
## ==============================
## 
## Data Base Summary
## -----------------
## File is organized as a regular grid
## Space dimension              = 2
## Number of Columns            = 4
## Total number of samples      = 25
## 
## Grid characteristics:
## ---------------------
## Origin :      0.000     0.000
## Mesh   :      0.250     0.250
## Number :          5         5
## 
## Variables
## ---------
## Column = 0 - Name = x1 - Locator = x1
## Column = 1 - Name = x2 - Locator = x2
## Column = 2 - Name = DK_Blk.Gaussian.Z.estim - Locator = z1
## Column = 3 - Name = DK_Blk.Gaussian.Z.stdev - Locator = NA
## NULL
p = ggplot()
p = p + plot(panels, nameRaster="DK_Blk.*.estim", flagLegendRaster = TRUE) 
p = p + plot.decoration(xlab = "Easting", ylab = "Northing", 
            title = "Estimate of the first factor (panels)")
ggPrint(p)

p = ggplot()
p = p + plot(panels, nameRaster="DK_Blk.*.stdev", flagLegendRaster = TRUE)
p = p + plot.decoration(xlab = "Easting", ylab = "Northing", 
              title = "Std. of the first factor estimate (panels)")
ggPrint(p)

Cross validation

Estimators are tested on the fine grid: the reference simulated values are estimated using the subset of 500 random points. Cross plot of the actual value versus the estimated value are plotted and scoring rules evaluated

  • Mean error \(ME = \frac{1}{N} \sum_{i = 1}^N (z_i - z_i^*))\)

  • Mean absolute error \(MAE = \frac{1}{N} \sum_{i = 1}^N |z_i - z_i^*|)\)

  • Root mean square error \(RMSE = \sqrt{\frac{1}{N} \sum_{i = 1}^N (z_i - z_i^*)^2}\)

  • Mean standardized error \(MSE = \frac{1}{N} \sum_{i = 1}^N \frac{z_i - z_i^*}{\sigma_i^*}\)

  • Mean absolute standardized error \(MASE = \frac{1}{N} \sum_{i = 1}^N |\frac{z_i - z_i^*}{\sigma_i^*}|\)

  • Root mean square standardized error \(RMSSE = \sqrt{\frac{1}{N} \sum_{i = 1}^N (\frac{z_i - z_i^*}{\sigma_i^*})^2}\)

TODO: - results with the disjunctive kriging are dubious (see xplot for Z and Ind) - k-fold cross validation to be implemented ?

Estimation on the initial fine grid

# cleaning the target data base
err = grid$deleteColumns("SK.*")
err = grid$deleteColumns("CE.*")
err = grid$deleteColumns("DK.*")
err = grid$deleteColumns("sel-.*")
# Selectivity curve : Z
zcuts <- round(quantile(data$getColumn("Z"), probs = c(0.5, 0.75)),3)
ycuts <- round(anam$rawToGaussianVector(zcuts),3)

Z_est = Selectivity_createByKeys(keys = c("Z"), zcuts=c(0.0),
                                 flag_est=TRUE, flag_std=TRUE)

# Global experimental selectivity, calculated form the experimental Data Set
grid$setLocator("Z", ELoc_Z(), cleanSameLocator=TRUE)
## NULL
table = Z_est$eval(grid)
table$setTitle("Selectivity curves computed on fine grid")
## NULL
table
## 
## Selectivity curves computed on fine grid
## ----------------------------------------
##       Z-Cut    T-estim    Q-estim    B-estim    M-estim
##       0.000      1.000      1.561      1.561      1.561
# Simple kriging of Z
err = model_raw$delDrift(0)
err = model_raw$setMean(mean(data["Z"]))
data$setLocator("Z", ELoc_Z(), cleanSameLocator=TRUE)
## NULL
err = kriging(data, grid, model_raw, neigh, 
              calcul = EKrigOpt_POINT(),
              namconv= NamingConvention("SK"))

# Conditional expectation
err = model_Y$delDrift(0)
err = model_Y$setMean(0.)
data$setLocator("Gaussian.Z", ELoc_Z(), cleanSameLocator=TRUE)
## NULL
err = kriging(data, grid, model_Y, neigh, 
              calcul = EKrigOpt_POINT(),
              namconv= NamingConvention("SK"))
err = ConditionalExpectation(db = grid, 
                             anam = anam, 
                             selectivity = Z_est, 
                             name_est="SK.Gaussian.Z.estim", 
                             name_std="SK.Gaussian.Z.stdev",
                             namconv = NamingConvention("CE", FALSE))
grid$setName("CE.Z-estim", "CE.Z.estim")
## NULL
grid$setName("CE.Z-stdev", "CE.Z.stdev")
## NULL
# Disjunctive kriging
data$setLocator("F.*", ELoc_Z(), cleanSameLocator=TRUE)
## NULL
err = krigingFactors(dbin   = data,   # Input data set (a Db structure)
                     dbout  = grid,  # Ouput data set (a DbGrid structure)
                     model  = model, neigh, calcul= EKrigOpt_POINT(),
                     namconv = NamingConvention("SK"))

# Disjunctive kriging for Z
err = DisjunctiveKriging(db = grid, 
                         anam = anam, 
                         selectivity = Z_est , 
                         name_est="SK.F.*.estim", 
                         name_std="SK.F.*.stdev",
                         namconv = NamingConvention("DK", FALSE))
err = grid$setName("DK.Z-estim", "DK.Z.estim")
err = grid$setName("DK.Z-stdev", "DK.Z.stdev")

# Disjunctive kriging for T and Q
Z_est = Selectivity_createByKeys(keys = c("T", "Q"), zcuts=zcuts,
                                  flag_est=TRUE, flag_std=TRUE)
err = DisjunctiveKriging(db = grid, 
                         anam = anam, 
                         selectivity = Z_est , 
                         name_est="SK.F.*.estim", 
                         name_std="SK.F.*.stdev",
                         namconv = NamingConvention("DK", FALSE))

QC of the Disjunctive Kriging functions

grid$deleteColumns("DK.*.qc.*")
## NULL
N  = 1000
Y = seq(from = -3.5, to = 3.5, length.out = 1000)
H = matrix(NaN, nrow = length(Y), ncol = N)
for (i in seq_along(Y)) {
  H[i,] = hermitePolynomials(Y[i], 1.0, N)
}

# Lognormal
m = 1.0 ; sigma = 1.5
psi.LN = hermiteLognormal(mean = m, sigma = sigma, nbpoly = N)
plot(Y, H %*% psi.LN, type = "l", col = "red", 
     xlab = "Gaussian value", ylab = "Raw value", 
     main = paste0("Lognormal m=", m, " and sigma = ", sigma))
lines(Y, m * exp(sigma*Y - 1/2 * sigma^2), col = "blue", lwd = 2, lty = 2)
legend("topleft", legend = c("Hermite", "Theoretical"), 
       col = c("red", "blue"), lty = c(1,2))

# QC of the indicator decomposition (OK)
# psi.LN.qc = rep(NaN, N)
# psi.LN.qc = m * (-sigma)^(0:(N-1)) / sqrt(factorial(0:(N-1)))
# plot(psi.LN, psi.LN.qc)
# abline(a = 0, b = 1, col = "red")

# Indicator
yc = ycuts[1]
psi.ind = hermiteCoefIndicator(yc = yc, nbpoly = N)
plot(Y, H %*% psi.ind, type = "l", col = "red", 
     xlab = "Gaussian value", ylab = "Raw value", 
     main = paste0("Indicator of ", yc))
lines (Y, as.numeric(Y >= yc), col = "blue", lty = 2)
abline(h = c(0,1), col = "gray", lty = 2)
legend("topleft", legend = c("Hermite", "Theoretical"), 
       col = c("red", "blue"), lty = c(1,2))

# QC of the indicator decomposition (OK)
# psi.ind.qc = rep(NaN, N)
# psi.ind.qc[1] = 1 - pnorm(yc)
# psi.ind.qc[-1] = -dnorm(yc)/sqrt(1:(N-1))*hermitePolynomials(yc, 1.0, N-1)
# plot(psi.ind, psi.ind.qc)
# abline(a = 0, b = 1, col = "red")

# Metal
yc = 2.0
psi.metal = hermiteCoefMetal(yc = yc, phi = psi.LN)
plot(Y, H %*% psi.metal, type = "l", col = "red", 
     xlab = "Gaussian value", ylab = "Raw value", 
     main = paste0("Metal for cutoff = ", yc))
lines (Y, m * exp(sigma * Y - 1/2 * sigma^2) * as.numeric(Y >= yc), 
       col = "blue", lty = 2)
legend("topleft", legend = c("Hermite", "Theoretical"), 
       col = c("red", "blue"), lty = c(1,2))

# QC of the metal decomposition (OK)
# psi.metal.qc = matrix(hermiteIncompleteIntegral(yc, N)$getValues(), N, N) %*% psi.LN
# plot(psi.metal, psi.metal.qc)
# abline(a = 0, b = 1, col = "red")


# generic function to test DK
compute_DK <- function(psi, name) {
N   <- length(psi)
stopifnot(nfactor < N)

nm_estim    <- paste0(name, ".estim")
nm_stdev    <- paste0(name, ".stdev")

nm_qc_estim <- paste0(name, ".qc.estim")
nm_qc_stdev <- paste0(name, ".qc.stdev")

grid[nm_qc_estim] <- psi[1] + Hn.est %*% psi[-1][1:nfactor]
grid[nm_qc_stdev] <- sqrt(Hn.std^2 %*% (psi[-1][1:nfactor]^2)  + sum(psi[-1][(nfactor+1):(N-1)]^2))

# Raw variable
p = ggplot()
p = p + plot.correlation(grid, namex = nm_estim, namey = nm_qc_estim, 
                         bins=100, flagDiag=TRUE)
p = p + plot.decoration(xlab = nm_estim, ylab = nm_qc_estim,
                title = "Disjunctive Kriging - Estimation value")
ggPrint(p)

p = ggplot()
p = p + plot.correlation(grid, namex = nm_stdev, namey = nm_qc_stdev, 
                         bins=100, flagDiag=TRUE)
p = p + plot.decoration(xlab = nm_stdev, ylab = nm_qc_stdev,
                title = "Disjunctive Kriging - Standard deviation")
ggPrint(p)
invisible()
}


# QC of the DK
Hn.est <- matrix(grid$getColumns(names = c("SK.F.*.estim"), useSel = FALSE),
                 ncol = nfactor, nrow = grid$getSampleNumber())
Hn.std <- matrix(grid$getColumns(names = c("SK.F.*.stdev"), useSel = FALSE),
                ncol = nfactor, nrow = grid$getSampleNumber())

# estimation of Z using DK
psi <- anam$getPsiHns()
compute_DK(psi = psi, name = "DK.Z")

# estimation of the tonnage Z > yc
psi <- hermiteCoefIndicator(ycuts[1], anam$getNFactor())
print(paste0("Std(I) = ", round(sqrt(psi[1]*(1 - psi[1])),2)))
## [1] "Std(I) = 0.5"
print(paste0("Std(I)*= ", round(sqrt(sum(psi[-1]^2)),2)))
## [1] "Std(I)*= 0.47"
grid$setName(paste0("DK.T-estim-", zcuts[1]), "DK.T1.estim")
## NULL
grid$setName(paste0("DK.T-stdev-", zcuts[1]), "DK.T1.stdev")
## NULL
compute_DK(psi = psi, name = "DK.T1")

# estimation of the metal Z > yc
psi <- hermiteCoefMetal(yc = ycuts[1], phi = anam$getPsiHns())
grid$setName(paste0("DK.Q-estim-", zcuts[1]), "DK.Q1.estim")
## NULL
grid$setName(paste0("DK.Q-stdev-", zcuts[1]), "DK.Q1.stdev")
## NULL
compute_DK(psi = psi, name = "DK.Q1")

Scores of the estimators

# Evaluation
sel <- (grid["SK.Z.stdev"] > 0.2)
grid["sel"] <- sel
grid$setLocator("sel", ELoc_SEL(), cleanSameLocator=TRUE)
## NULL
nm <-c("Z",
       paste0(c("SK", "DK", "CE"), ".Z.estim"),
       paste0(c("SK", "DK", "CE"), ".Z.stdev")
)

# Correlation actual values vs. estimated values
for (est in c("SK", "DK", "CE")) {
  p = ggplot()
  p = p + plot.correlation(grid, namex=paste0(est, ".Z.estim"), namey="Z", 
                           bins=100, flagDiag = TRUE)
  p = p + plot.decoration(xlab = "Estimated value", ylab = "Actual value",
                          title = paste(est, " of raw variable"))
  ggPrint(p)
}

# Comparison of the estimators
p = ggplot()
p = p + plot.correlation(grid, namex="CE.Z.estim", namey="SK.Z.estim", 
                         bins=100, flagDiag = TRUE)
p = p + plot.decoration(xlab = "Conditional expectation", ylab = "Simple kriging",
                title = "Point raw variable")
ggPrint(p)

p = ggplot()
p = p + plot.correlation(grid, namex="DK.Z.estim", namey="SK.Z.estim", 
                         bins=100, flagDiag = TRUE)
p = p + plot.decoration(xlab = "Disjunctive Kriging", ylab = "Simple kriging",
                title = "Point raw variable")
ggPrint(p)

# Maps of the estimators
for (est in c("SK", "DK", "CE")){
  for (type in c("estim", "stdev")){
    
    p = ggplot()
    p = p + plot(grid, paste0(est, ".Z.", type),
                 flagLegendRaster=TRUE, legendNameRaster=type) 
    p = p + plot.decoration(xlab = "Easting", ylab = "Northing", 
                            title = paste("Estimator = ", est)) 
    ggPrint(p)
    
    if (type == "estim") col_fill = "orange"
    if (type == "stdev") col_fill = "skyblue"
    
    p = ggplot()
    p = p + plot.hist(grid, name = paste0(est, ".Z.", type), bins = 50, 
                      col = "gray", fill = col_fill)
    p = p + plot.decoration(xlab = paste0(est, ".Z.", type),
                            title = paste("Estimator = ", est))
    ggPrint(p)
  }
}

# Mono variate statistics of the estimators
knitr::kable(dbStatisticsMono(grid, nm, opers=opers)$toTL(), digits=3,
    caption = "Statistics of the estimated values")
Statistics of the estimated values
Number Minimum Maximum Mean St. Dev.
Z 9500 0.154 8.198 1.562 0.846
SK.Z.estim 9500 0.591 4.733 1.553 0.480
DK.Z.estim 9500 0.499 4.178 1.555 0.452
CE.Z.estim 9500 0.510 4.349 1.554 0.457
SK.Z.stdev 9500 0.515 0.833 0.683 0.061
DK.Z.stdev 9500 0.538 0.831 0.699 0.054
CE.Z.stdev 9500 0.206 1.374 0.679 0.194
# Scoring rules
scoringRules <- function(db, nm_val, nm_est, nm_std){
  val  <- matrix(
    db$getColumns(names = c(nm_val, nm_est, nm_std), useSel = TRUE),
    nrow = db$getActiveSampleNumber(),
    ncol = 3, byrow = FALSE)
  err  <- (val[,1] - val[,2])
  nerr <- err / val[,3]
  c(
    mean(err), mean(abs(err)),  sqrt(mean(err^2)),
    mean(nerr),mean(abs(nerr)), sqrt(mean(nerr^2))
  )
}

res <- matrix(c(
  scoringRules(grid, "Z", "SK.Z.estim", "SK.Z.stdev"),
  scoringRules(grid, "Z", "DK.Z.estim", "DK.Z.stdev"),
  scoringRules(grid, "Z", "CE.Z.estim", "CE.Z.stdev")),
  nrow = 3, ncol = 6, byrow = TRUE)
colnames(res) <- c("ME", "MAE", "RMSE", "MSE", "MASE", "RMSSE")
rownames(res) <- c("SK", "DK", "CE")
knitr::kable(res, digits=4,
             caption = "Scores of point estimators of Z using 500 data points")
Scores of point estimators of Z using 500 data points
ME MAE RMSE MSE MASE RMSSE
SK 0.0091 0.4849 0.6726 0.0157 0.7079 0.9784
DK 0.0075 0.4818 0.6712 0.0131 0.6868 0.9539
CE 0.0085 0.4804 0.6691 -0.0072 0.6980 0.9122

k-fold cross validation of the data set

TODO: 2023-01-31, it should be completed (XF)

K <- 10 # Number of folds
res_xval <- data$clone() # duplicate the data base
# cleaning the target data base
res_xval$deleteColumns("SK.*")
res_xval$deleteColumns("CE.*")
res_xval$deleteColumns("DK.*")
res_xval$deleteColumns("code")

# definition of the folds
res_xval["code"] <- sample(x = 1:K, size = res_xval$getSampleNumber(), replace = TRUE)
res_xval$setLocator("code", ELoc_C(), cleanSameLocator=TRUE)

# generic function for k-fold cross validation
# the db should have:
# - db: the data base with a "code" variable defining the folds
# - fn_estim: a R function which defines the interpolation process. Its prototype is int f(dbin, dbout)
# - nameconv: the naming convention 
# the outputs are
# - err: code of error
# - db: computed variables (estimated value and the estimation Std.)

kfold_compute <- function(db, fn_estim, namconv = NamingConvention("kfold")){
  # TODO
  err = 0
  err
}
# the evaluation of the estimator is: 
# 1) define the fn_estim
# 2) compute the cross validation using the *kfold_compute* function
# 3) compute the scoring using the *scoringRules* function
# 4) display the results *knitr::kable(...))*

# TODO: not completed

Discussion and further works

Still to be done:

References