Introduction

This notebook presents examples for the co-kriging of linear combinations. Starting from a set of variables \(\mathbf{Z}(\alpha) = (Z_i(\alpha))_{i \in 1:n}\), the task is to compute the co-kriging of a linear combination of the variable at location \(\mathbf{o}\): \[ (\beta^T \mathbf{Z}(\mathbf{o}))^{K} = \sum_{i \in 1:n} \sum_{\alpha \in A_i} \lambda_i^{\alpha} Z_i(\alpha) \] Kriging is linear, hence \((\beta^T \cdot \mathbf{Y}(\mathbf{o}))^{K} = \sum_{i \in 1:p} \beta_i \times Y_i(\mathbf{o})^{K}\) but the variance of the kriging error is not easily computable. However, the matLC option of the kriging function offers the possibility to compute directly the estimate and its kriging variance of any linear combinations. The method is evaluated on a simulated data set.

The following cases are considered:

In all cases, the results seem consistent with the theory.

Kriging systems when kriging a linear combination are given below:

Simple kriging with a second order stationary model (SK/SOS)

The expression of the estimator is: \[ (\beta^T \, \mathbf{Z_{(o)}})^{SK} = \beta^T \mathbf{m} +\sum_{i \in 1:n} \sum_{\alpha \in A_i} \lambda^{\alpha} (Z_i(\alpha) - m_i) \]

There is no bias as: \[ \mathbb{E}\{(\beta^T \, \mathbf{Z_{(o)}})^{SK}\} = \beta^T \mathbf{m} = \mathbb{E}\{\beta^T \, \mathbf{Z_{(o)}}\}. \]

The variance of the kriging error is: \[ \begin{array}{cl} \text{Var}\{\beta^T \, \mathbf{Z_{(o)}} - (\beta^T \, \mathbf{Z_{(o)}})^{SK}\} = & \beta^T \, \text{Var}\{\mathbf{Z_{(o)}}\} \, \beta \\ & - 2 \sum_{k,i \in 1:n} \sum_{\alpha \in A_i} \beta_k \, \lambda^{\alpha} \text{Cov}(Z_k(o),Z_i(\alpha)) \\ & + \sum_{i,j \in 1:n} \sum_{\alpha \in A_i, \alpha' \in A_j} \beta_k \, \lambda^{\alpha} \lambda^{\alpha'} \text{Cov}(Z_i(\alpha),Z_j(\alpha')) \end{array} \]

Minimizing this variance, we gets the kriging system of for the estimation of the linear combination: \[ \left( \text{Cov}(Z_{i}(\alpha),Z_{j}(\alpha')) \right) \Lambda = \sum_{k\in 1:n} \beta_k \text{Cov}(Z_k(o), Z_i(\alpha)) \]

The variance of the kriging error boils down to: \[ \text{Var}\{\beta^T \, \mathbf{Z_{(o)}} - (\beta^T \, \mathbf{Z_{(o)}})^{SK}\} = \beta^T \, \text{Var}\{\mathbf{Z_{(o)}}\} \, \beta - \sum_{k,i \in 1:n} \sum_{\alpha \in A_i} \beta_k \text{Cov}(Z_k(o),Z_i(\alpha)) \lambda^{\alpha} \]

ordinary kriging with a Intrinsic Random Fonction of order 0 (OK/IRF0)

The expression of the estimator is: \[ (\beta^T \, \mathbf{Z_{(o)}})^{OK} = \sum_{i \in 1:n} \sum_{\alpha \in A_i} \lambda^{\alpha} Z_i(\alpha) \]

The no-bias conditions are: \[ \sum_{\alpha \in A_i} \lambda^{\alpha} = \beta_i, \text{ for } \forall i \in 1:n. \]

The variance of the kriging error is: \[ \begin{array}{cl} \text{Var}\{\beta^T \, \mathbf{Z_{(o)}} - (\beta^T \, \mathbf{Z_{(o)}})^{OK}\} = & \beta^T \, \text{Var}\{\mathbf{Z_{(o)}}\} \, \beta \\ & - 2 \sum_{k\in 1:n} \sum_{i, \alpha \in A_i} \beta_k \text{Cov}(Z_k(o),Z_i(\alpha)) \, \lambda^{\alpha} \\ & + \sum_{i, \alpha \in A_i} \sum_{j, \alpha' \in A_j} \lambda^{\alpha} \text{Cov}(Z_i(\alpha),Z_j(\alpha')) \lambda^{\alpha'} \end{array} \]

Minimizing the Lagrangian, \[ \mathcal{L}(\Lambda, \mu) = \text{Var}\{\beta^T \, \mathbf{Z_{(o)}} - (\beta^T \, \mathbf{Z_{(o)}})^{OK}\} + \sum_{k \in 1:n} \mu_k (\sum_{\alpha \in A_k} \lambda^{\alpha} - \beta_k) \]

we gets the kriging system of for the estimation of the linear combination: \[ \left[ \begin{array}{cc} \text{Cov}(Z_i(\alpha), Z_j(\alpha')) & \mathbf{1}_{\alpha \in A_j} \\ \mathbf{1}_{\alpha' \in A_i} & \mathbf{0} \end{array} \right] \left[ \begin{array}{c} \lambda \\ \mu \end{array} \right] = \left[ \begin{array}{c} \sum_{k \in 1:n} \beta_k\text{Cov}(Z_k(o), Z_i(\alpha)) \\ \beta_i \end{array} \right] \]

The variance of the kriging error boils down to: \[ \text{Var}\{\beta^T \, \mathbf{Z_{(o)}} - (\beta^T \, \mathbf{Z_{(o)}})^{OK}\} = \beta^T \, \text{Var}\{\mathbf{Z_{(o)}}\} \, \beta - \sum_{k \in 1:n} \sum_{i,\alpha \in A_i} \beta_k \text{Cov}(Z_k(o),Z_i(\alpha))\, \lambda^{\alpha} - \sum_{k \in 1:n} \beta_k \, \mu_k \]

Initialisation

Simulation of a reference data set

We create a reference data set using the the Turning Bands method (simtub):

  • the distribution is log normal,

  • the stationary covariance function is exponential.

5% of the two dimensional grid is sampled and considered as the reference data set.

# parameters for the simulation
m   = c(1, 2)
sig = c(0.25, 0.75)
rho = 0.7
CO = matrix(c(1, rho, rho, 1), 2, 2)
# initialization of the grid
grd = DbGrid_create(x0=c(0.0,0.0), dx=c(0.01,0.01), nx=c(100,100))

# simulation from a model
model = Model_createFromParam(type = ECov_EXPONENTIAL(), range=0.2, sills = CO)
err = simtub(dbin = NULL, dbout = grd, model = model, nbsimu = 1)
err = grd$setName("Simu.1", "Y1")
err = grd$setName("Simu.2", "Y2")
grd["Z1"] = m[1] * exp(sig[1] * grd["Y1"] - sig[1]**2 / 2)
grd["Z2"] = m[2] * exp(sig[2] * grd["Y2"] - sig[2]**2 / 2)

# Data set (ssp proportion of the grid)
data = Db_createFillRandom(ndat = ssp*grd$getSampleNumber(), nvar = 0)
err  = data$setName("x-1", "x1")
err  = data$setName("x-2", "x2")
err  = migrate(dbin = grd, name = "Z1", dbout = data, dmax = c(0.1, 0.1), 
               namconv = NamingConvention(""))
err  = migrate(dbin = grd, name = "Z2", dbout = data, dmax = c(0.1, 0.1), 
               namconv = NamingConvention(""))

# plots
p1 = ggDefaultGeographic() + 
  plot.grid(grd, nameRaster = "Z1", palette = "Spectral") +
  plot.point(data, colour = "red") +
  plot.decoration(title="Z1")
p2 = ggDefault() +
  plot.hist(grd, name='Z1',  bins = 25, bg = "orange", color= "gray") +
  plot.decoration(xlab = "Raw variable", title="Histogram of the initial variable Z1")
p3 = ggDefaultGeographic() + 
  plot.grid(grd, nameRaster = "Z2", palette = "Spectral") +
  plot.point(data, colour = "red") +
  plot.decoration(title="Z2")
p4 = ggDefault() +
  plot.hist(grd, name='Z2',  bins = 25, bg = "orange", color= "gray") +
  plot.decoration(xlab = "Raw variable", title="Histogram of the initial variable Z2")
ggarrange(p1, p2, p3, p4, ncol = 2, nrow = 2)

# defining the linear combination
beta = c(1, -3) # coefficients and computing the linear combination
nm_var = c("Z1", "Z2")
nm_LC  = c("LC", "Z1")
matLC  = matrix(c(beta, 1, 0), nrow = 2, ncol = 2, byrow = TRUE)
colnames(matLC) <- nm_var
rownames(matLC) <- nm_LC

# computing the linear computation
data["LC"] = matrix(data$getColumns(colnames(matLC)), nrow = data$getSampleNumber(), 
                    ncol = dim(matLC)[2]) %*% t(matLC)[,1]
grd["LC"]  = matrix(grd$getColumns(colnames(matLC)),  nrow = grd$getSampleNumber(),  
                    ncol = dim(matLC)[2]) %*% t(matLC)[,1]

# statistics
tab = rbind(
  dbStatisticsMono(grd, names = c("Z*", "LC"), opers = opers)$toTL(),
  dbStatisticsMono(data, names = c("Z*", "LC"), opers = opers)$toTL())
rownames(tab) <- c(
  "grid-Z1", "grid-Z2", "grid-LC", 
  "samples-Z1", "samples-Z2", "samples-LC")
knitr::kable(tab, caption = "Statistics on the reference data set", digits = 4)
Statistics on the reference data set
Number Minimum Maximum Mean St. Dev.
grid-Z1 10000 0.4528 2.2903 1.0419 0.2619
grid-Z2 10000 0.1181 15.8685 1.9705 1.7018
grid-LC 10000 -47.0365 1.5622 -4.8696 5.2569
samples-Z1 50 0.7668 1.6995 1.1311 0.2404
samples-Z2 50 0.2648 4.6042 1.6170 1.1103
samples-LC 50 -12.9096 0.7581 -3.7199 3.4745

Global parameters for spatial modelling

# Variogram parameters (omnidirectional)
varioParam = VarioParam_createOmniDirection(npas=10, dpas=0.05)
# Variogram basic structures
types = ECov_fromKeys(c("NUGGET", "EXPONENTIAL", "EXPONENTIAL"))
# types = ECov_fromKeys(c("EXPONENTIAL"))
# weighted proportional to the number of pairs and inverse proportional to the distance
opt   = Option_AutoFit()
err   = opt$setWmode(2) 

#define neighborhood
neigh = NeighMoving_create(nmaxi=30, radius=0.5)

# Display the kriging system for a fixed target when using kriging
OptDbg_setReference(0)
## NULL
# function to compute the model of the linear combination (using a square matrix) 
# TODO: generalize to rectangular matrix?
model_of_LC <- function(model, A) {
  stopifnot(dim(A)[1] == dim(A)[2])
  nv = model$getVariableNumber()
  stopifnot(nv == dim(A)[1])
  ns = model$getCovaNumber()
  model_LC = model$clone()
  for (k in 1:ns) {
    CC = A %*% model$getSillValues(k-1)$toTL() %*% t(A)
    for (i in 1:nv) {
      for (j in 1:nv) {
        err = model_LC$setSill(icov = k-1, ivar=i-1, jvar=j-1, value = CC[i, j])
        }
      }
    }
  model_LC
}

Ordinary kriging (OK)

Intrinsic Random Function of order 0

# Locate Z
err = data$setLocators("Z*", locatorType = ELoc_Z(), cleanSameLocator = TRUE)
# Compute variogram
var_Z = Vario_computeFromDb(varioParam, data)
# fit model
mod_Z_IRF0 = Model()
err   = mod_Z_IRF0$fit(var_Z, types = types, mauto = opt)

# setting the drift "unknown mean" for Ordinary Kriging
err = mod_Z_IRF0$setDriftIRF(order = 0)

# plot
multi.varmod(var_Z, mod_Z_IRF0, flagLegend=TRUE)

if(flag.verbose) {
  mod_Z_IRF0
}

Ordinary kriging of the raw variables

err = grd$deleteColumns(paste("OK", nm_var, "*", sep = "."))
err = grd$deleteColumns(paste("LC", "OK", "*", sep = "."))
if(flag.verbose) {
  print(paste0(">>> Ordinary kriging of the raw variables"))
}
err = data$setLocators(c("Z1", "Z2"), ELoc_Z(), cleanSameLocator=TRUE)
err = kriging(data, grd, mod_Z_IRF0, neigh, 
              flag_est = TRUE, flag_std = TRUE,
              namconv = NamingConvention("OK"))

# computing the LC of kriging
grd["LC.OK.estim"]  = matrix(grd$getColumns(names = paste("OK", colnames(matLC), "estim", sep = ".")),
                    nrow = grd$getSampleNumber(),  ncol = dim(matLC)[2]) %*% t(matLC)[,1]

# statistics
vn  = c("Z1", paste("OK", "Z1", c("estim", "stdev"), sep = "."),
        "Z2", paste("OK", "Z2", c("estim", "stdev"), sep = "."))
tab = dbStatisticsMono(grd, vn, opers = opers)$toTL()
knitr::kable(tab, caption = "Statistics on the grid for Ordinary Kriging with the initial model", digits = 4)
Statistics on the grid for Ordinary Kriging with the initial model
Number Minimum Maximum Mean St. Dev.
Z1 10000 0.4528 2.2903 1.0419 0.2619
OK.Z1.estim 10000 0.7978 1.6791 1.1037 0.1053
OK.Z1.stdev 10000 0.0443 0.2663 0.2206 0.0273
Z2 10000 0.1181 15.8685 1.9705 1.7018
OK.Z2.estim 10000 0.2454 3.4234 1.7027 0.6338
OK.Z2.stdev 10000 0.7099 1.2544 0.8919 0.0659
# plot of estimated values
p1 = ggDefaultGeographic() + 
    plot.grid(grd, nameRaster =  paste("OK", "Z1", "estim", sep = "."),
            palette = "Spectral", flagLegendRaster = TRUE, legendNameRaster = "estim") +
    plot.decoration(xlab = "", ylab = "", title = paste0("Ordinary coK for Z1"))
p2 = ggDefaultGeographic() + 
    plot.grid(grd, nameRaster =  paste("OK", "Z1", "stdev", sep = "."),
            palette = "Spectral", flagLegendRaster = TRUE, legendNameRaster = "stdev") +
    plot.decoration(xlab = "", ylab = "", title = paste0("Std. of Ordinary coK for Z1"))
p3 = ggDefaultGeographic() + 
    plot.grid(grd, nameRaster =  paste("OK", "Z2", "estim", sep = "."),
            palette = "Spectral", flagLegendRaster = TRUE, legendNameRaster = "estim") +
    plot.decoration(xlab = "", ylab = "", title = paste0("Ordinary coK for Z2"))
p4 = ggDefaultGeographic() + 
    plot.grid(grd, nameRaster =  paste("OK", "Z2", "stdev", sep = "."),
            palette = "Spectral", flagLegendRaster = TRUE, legendNameRaster = "stdev") +
    plot.decoration(xlab = "", ylab = "", title = paste0("Std. of Ordinary coK for Z2"))
ggarrange(p1, p2, p3, p4, ncol = 2, nrow = 2)