Kriging¶
Let suppose that
$Z(s_i) = X_i\beta + Y(s_i)$
where $Y$ is a second order stationary random field with mean 0 and covariance function $C$.
If $Z$ is a vector of observations, we denote $Z = X\beta + Y$ with $\Sigma$ the covariance of Y
Simple Kriging¶
If $\beta$ is known, we can obtain the simple kriging
$Z_0^{KS} = X_0\beta + \Sigma_0^t\Sigma^{-1}(Z-X\beta) = X_0\beta + \lambda_{KS}^t(Z-X\beta)$
where the simple kriging weights are
$\lambda_{KS}=\Sigma^{-1}\Sigma_0$
the variance of the estimator
$\textrm{Var}(Z_0^{KS})=\lambda_{KS}^t\Sigma\lambda_{KS}=\lambda_{KS}^t\Sigma_0$
and the estimation variance
$\sigma_{KS}^2 = \textrm{Var}(Z_0-Z_0^{KS}) = \sigma_0^2-\Sigma_0^t\Sigma^{-1}\Sigma_0=\sigma_0^2-\lambda_{KS}^t\Sigma_0$
Universal kriging¶
If $\beta$ is unknown, we can estimate it by
$\hat\beta = \Sigma_c X^t\Sigma^{-1}Z$
Introducing the notation
$\Sigma_c = (X^t\Sigma^{-1}X)^{-1} $
then
$\hat\beta = \Sigma_c X^t\Sigma^{-1}Z$
$\textrm{Var}(\hat\beta)=\Sigma_c$
The Universal kriging is obtained by first computing $\hat\beta$ and then to plug $\hat\beta$ in the simple kriging procedure.
$Z^{KU}_0 = X_0\hat\beta + \Sigma_0^t\Sigma^{-1}(Z-X\hat\beta)= \Sigma_0^t\Sigma^{-1}Z + (X_0 - \Sigma_0^t\Sigma^{-1}X)\hat\beta$
We can rewrite everything with respect to $Z$
$Z^{KU}_0 = (\Sigma_0^t\Sigma^{-1} + (X_0 - \Sigma_0^t\Sigma^{-1}X)\Sigma_c X^t\Sigma^{-1})Z \\ =(\lambda_{KS}^t+(X_0-\lambda_{KS}^tX) \Sigma_c X^t\Sigma^{-1})Z\\ =\lambda_{KU}^tZ$
with
$\lambda_{KU}=\lambda_{KS}+\Sigma^{-1}X \Sigma_c(X_0^t-X^t\lambda_{KS})$
Then the variance of the estimator is
$\textrm{Var}(Z^{KU}_0) = \lambda_{KU}^t\Sigma\lambda_{KU} \\ =\textrm{Var}(Z^{KS}_0) +2\lambda_{KS}^tX \Sigma_c \Sigma_c (X_0^t-X^t\lambda_{KS})+(X_0-\lambda_{KS}^tX)\Sigma_c X^t\Sigma^{-1}X\Sigma_c (X_0^t-X^t\lambda_{KS})\\ =\textrm{Var}(Z^{KS}_0) +2\lambda_{KS}^tX\Sigma_c (X_0^t-X^t\lambda_{KS})+(X_0-\lambda_{KS}^tX)\Sigma_c (X_0^t-X^t\lambda_{KS})\\ =\textrm{Var}(Z^{KS}_0)+(\lambda_{KS}^tX+X_0)\Sigma_c (X_0^t-X^t\lambda_{KS})\\ =\textrm{Var}(Z^{KS}_0)-\lambda_{KS}^tX\Sigma_c X^t\lambda_{KS}+X_0 \Sigma_c X_0^t$
and the estimation variance
$\textrm{Var}(Z_0 - Z^{KU}_0) = \sigma_0^2 - 2\textrm{Cov}(Z_0,Z^{KU}_0)+ \textrm{Var}(Z^{KU}_0)\\ = \sigma_0^2 -2\Sigma_0^t\lambda_{KU}+\textrm{Var}(Z^{KU}_0)\\ = \sigma_0^2 -2\Sigma_0^t(\lambda_{KS}+\Sigma^{-1}X \Sigma_c(X_0^t-X^t\lambda_{KS}))+\textrm{Var}(Z^{KS}_0)-\lambda_{KS}^tX \Sigma_c X^t\lambda_{KS}+X_0 \Sigma_c X_0^t\\ = \sigma_0^2 -\Sigma_0^t\lambda_{KS} -2\Sigma_0^t\Sigma^{-1}X \Sigma_c (X_0^t-X^t\lambda_{KS})-\lambda_{KS}^tX \Sigma_c X^t\lambda_{KS}+X_0 \Sigma_c X_0^t\\ =\sigma_{KS}^2-2\lambda_{KS}^tX \Sigma_c (X_0^t-X^t\lambda_{KS})-\lambda_{KS}^tX \Sigma_c X^t\lambda_{KS}+X_0 \Sigma_c X_0^t\\ =\sigma_{KS}^2+(X_0-\lambda_{KS}^tX) \Sigma_c (X_0^t-X^t\lambda_{KS}) $
Summary¶
Simple Kriging¶
- the estimator
$Z_0^{KS} = X_0\beta + \lambda_{KS}^t(Z-X\beta) = \lambda_{KS}^tZ +(X_0 -\lambda_{KS}^tX)\beta$
where
$\lambda_{KS}=\Sigma^{-1}\Sigma_0$
- the variance of the estimator
$\textrm{Var}(Z_0^{KS})=\lambda_{KS}^t\Sigma\lambda_{KS}=\lambda_{KS}^t\Sigma_0$
- the variance of the estimation error
$\textrm{Var}(Z_0 - Z^{KS}_0) = \textrm{Var}(Z_0-Z_0^{KS}) = \sigma_0^2-\Sigma_0^t\Sigma^{-1}\Sigma_0=\sigma_0^2-\lambda_{KS}^t\Sigma_0$
Universal Kriging¶
$\hat\beta = \Sigma_c X^t\Sigma^{-1}Z$
$\textrm{Var}(\hat\beta)= \Sigma_c $
- the estimator
$Z^{KU}_0 =\lambda^t_{KS}Z + (X_0 - \lambda_{KS}^tX)\hat\beta= \lambda^t_{KU}Z$
$\lambda_{KU}=\lambda_{KS}+\Sigma^{-1}X \Sigma_c (X_0^t-X^t\lambda_{KS})$
$\mu_{KU}=\Sigma_c (X_0 - \lambda_{KS}^tX)^t$
- the variance of the estimation error
$\textrm{Var}(Z_0 - Z^{KU}_0) =\sigma_{KS}^2+(X_0-\lambda_{KS}^tX) \Sigma_c (X_0^t-X^t\lambda_{KS}) $
- the variance of the estimator
$\textrm{Var}(Z^{KU}_0) = \textrm{Var}(Z^{KS}_0)-\lambda_{KS}^tX\Sigma_c X^t\lambda_{KS}+X_0\Sigma_c X_0^t$
Bayesian framework¶
In the Bayesian framework, we assume that
$\beta\sim\mathcal{N}(\beta_0,S)$
We obtain
$\beta|Z\sim\mathcal{N}(\mu_c,\Sigma_c)$
$\Sigma_c = (X^t\Sigma^{-1}X+S^{-1})^{-1}$
and
$\mu_c=\Sigma_c(S^{-1}\beta_0+X^t\Sigma^{-1}Z)$
We obtain the Bayesian quantities:
- the estimator
$Z^{Bayes}_0 =\lambda^t_{KS}Z + (X_0 - \lambda_{KS}^tX)\mu_c$
- the variance of the estimation error
$\textrm{Var}(Z_0 - Z^{Bayes}_0) =\sigma_{KS}^2+(X_0-\lambda_{KS}^tX)\Sigma_c(X_0^t-X^t\lambda_{KS}) $
- the variance of the estimator
$\textrm{Var}(Z^{Bayes}_0) = \textrm{Var}(Z^{KS}_0)-\lambda_{KS}^tX\Sigma_c X^t\lambda_{KS}+X_0\Sigma_c X_0^t$
import gstlearn as gl
import gstlearn.test as gt
import numpy as np
from scipy.spatial import distance_matrix
def cova(x,sills=1):
return np.kron(sills,np.exp(-x/2))
np.random.seed(1234)
A = np.random.normal(size=(3,3))
sills = gl.VectorDouble((A@A.T).reshape(1,-1)[0])
model = gl.Model.createFromParam(gl.ECov.EXPONENTIAL,range = 2.,flagRange=False,sills=sills)
nx = [10,10]
def matriceReduce(m,ind):
M = gl.MatrixSquareSymmetric(len(ind))
for i,j in enumerate(ind):
for k in range(j,len(ind)):
M.setValue(i,k,m.getValue(j,ind[k]))
return M
def modelReduce(model,var):
ctxt=model.getContext()
ctxt.setNVar(len(var))
modeln = gl.Model.create(ctxt)
covlist = model.getCovAnisoList()
for i in range(covlist.getCovaNumber()):
cova = covlist.getCova(i)
sills = matriceReduce(cova.getSill(),var)
covar = gl.CovAniso.createAnisotropicMulti(ctxt,
cova.getType(),
cova.getRanges(),
sills,
cova.getParam(),
cova.getAnisoAngles())
modeln.addCov(covar)
return modeln
def createDbIn(ndat,nvar,percent,ndim=2,selDbin=False,measurement_error=False,ndrift = 0,
flag_isotopic=False,seed=1234):
db = gl.Db.create()
np.random.seed(seed)
for i in range(ndim):
db["x" +str(i)] = np.random.uniform(size = ndat)
db.setLocators(["x*"],gl.ELoc.X)
indIn = np.arange(ndat)
if selDbin:
np.random.shuffle(indIn)
indIn = np.sort(indIn)
indIn = indIn[range(int(ndat/2))]
sel = np.zeros(shape=ndat)
sel[indIn] = 1
db["sel"] = sel
db.setLocator("sel",gl.ELoc.SEL)
#Creation of an heterotopic data set
indList = []
for i in range(nvar):
u = np.array([None for i in range(ndat)])
ind = np.arange(ndat)
if not flag_isotopic and nvar>1:
np.random.shuffle(ind)
ind = ind[range(int(ndat*percent[i]))]
ind = np.sort(list(set(ind) & set(indIn)))
indList += [ind]
vect = np.array([None for i in range(ndat)])
vect[ind] = np.random.normal(size = len(ind))
db["z"+str(i)]=vect
db.setLocators(["z*"],gl.ELoc.Z)
indF = []
for i in range(nvar):
indF += list(np.array(indList[i]) + ndat * i)
if measurement_error :
for i in range(nvar):
db["err"+str(i)] = 0.1 * np.random.uniform(size = ndat)
db.setLocators(["err*"],gl.ELoc.V)
if ndrift>0:
for i in range(ndrift):
db["ff" + str(i)] = np.random.normal(size = ndat)
db.setLocator("ff*",gl.ELoc.F)
return db,indF
def test_covmat(ndat,nx,nvar,percent,model,cova,
irf=None,drift=False,measurement_error=False,compute_vars = True,
selDbin = True, selDbout = True,flag_isotopic=True,
seed=1234,tol=1e-12,eps=1e-3,test = True,verbose=False):
np.random.seed(seed)
ndrift = 1 if drift else 0
modeln = modelReduce(model,range(nvar))
#### Create the description of the case #####
casetxt = "case:\n"
inter = ""
if nvar > 1:
inter = "co-"
if irf is None and drift:
return 0
if irf is None and not drift:
casetxt += "- simple "+ inter+ "kriging\n"
else :
if irf is not None :
casetxt += "- KU with drift of degree " + str(irf) + "\n"
if drift :
casetxt +="- with external drift\n"
if nvar > 1:
casetxt +="- number of covariables for co-kriging " + str(nvar) + "\n"
if flag_isotopic:
casetxt += "- isotopic case\n"
else:
casetxt += "- heterotopic case\n"
if measurement_error:
casetxt += "- with measurement error\n"
else:
casetxt += "- without measurement error\n"
if compute_vars:
casetxt += "- no dual form\n"
else:
casetxt += "- dual\n"
casetxt += "- selection on Dbin " + str(selDbin) + "\n"
casetxt += "- selection on Dbout "+ str(selDbout) + "\n"
casetxt += "- number of data " + str(ndat) + "\n"
casetxt += "- nx = ["+str(nx[0]) +"," + str(nx[1]) + "]\n"
if verbose:
print(casetxt)
##################################################
db,indF = createDbIn(ndat,nvar,percent,2,selDbin,measurement_error,ndrift,flag_isotopic,seed)
target = gl.DbGrid.create(nx = nx)
indOut = np.arange(target.getSampleNumber())
if selDbout:
np.random.shuffle(indOut)
indOut = indOut[range(int(target.getSampleNumber()/2))]
indOut = np.sort(indOut)
sel = np.zeros(shape = target.getSampleNumber())
sel[indOut] = 1
target["sel"] = sel
target.setLocator("sel",gl.ELoc.SEL)
if irf is not None:
modeln.setDriftIRF(irf)
if drift :
target["ff"] = np.random.normal(size = target.getSampleNumber())
target.setLocator("ff",gl.ELoc.F)
modeln.addDrift(gl.DriftF(0))
v = np.array([db["x0"],db["x1"]]).T
v0 = np.array([target["x1"][indOut],target["x2"][indOut]]).T
cov = cova(distance_matrix(v,v),modeln.getSillValues(0).toTL())[indF,:][:,indF]
c0 = cova(distance_matrix(v,v0),modeln.getSillValues(0).toTL())[indF,:]
#Creation of a db2 without selection to build the complete covariance matrix
db2 = db.clone()
db2.setLocator("sel")
vect = gl.VectorDouble(nvar**2 * db2.getSampleNumber()**2)
target2 = target.clone()
target2.setLocator("sel")
covt = modeln.evalCovMatrixSymmetric(db).toTL()
c0gl = modeln.evalCovMatrix(db,target).toTL()
if measurement_error:
err = db["err*"].T.reshape(-1,)
np.fill_diagonal(cov,np.diag(cov)+err[indF])
vect = gl.VectorDouble(nvar**2 * db2.getSampleNumber() * len(indOut))
neigh = gl.NeighUnique()
errcode = 0
diff = np.max(np.abs(cov-covt))
if diff > eps:
errcode = errcode + 1
print(f"Difference in Cov = {round(diff,4)}")
diff = np.max(np.abs(c0-c0gl))
if diff > eps:
errcode = errcode + 1
print(f"Difference in Cov0 = {round(diff,4)}")
if irf is not None or drift:
driftd = np.kron(np.eye(nvar),modeln.getDrifts(db2, False))[:,indF]
driftt = np.kron(np.eye(nvar),modeln.getDrifts(target, True))
driftdgl = modeln.evalDriftMatrix(db).toTL().T
drifttgl = modeln.evalDriftMatrix(target).toTL().T
diff = np.max(np.abs(driftdgl-driftd))
if diff > eps:
errcode = errcode + 1
print(f"Difference in X = {round(diff,4)}")
diff = np.max(np.abs(drifttgl-driftt))
if diff > eps:
errcode = errcode + 1
print(f"Difference in X0 = {round(diff,4)}")
return errcode
percent = [0.5,0.9,1.]
nerr = 0
ntest = 0
for irf in [None,0,1]:
for drift in [False,True]:
for measurement_error in [True, False]:
for selDbin in [True, False]:
for selDbout in [True, False]:
for nx in [[5,5]]:
for nvar in [1,2,3]:
isolist = [True]
if nvar >1 :
isolist = [True,False]
for iso in isolist:
for cv in [False,True]:
errcode = test_covmat(40,nx,nvar,percent,
model,cova,compute_vars=cv,
irf=irf,drift=drift,
measurement_error=measurement_error,
selDbin=selDbin,selDbout=selDbout,
flag_isotopic = False,
seed=1234,tol=1e-8,eps=1e-3,verbose=False)
nerr = nerr + errcode
ntest = ntest + 1
print(ntest,"test(s) have been performed with", nerr, "error(s)")
480 test(s) have been performed with 0 error(s)