In [1]:
import gstlearn as gl
import gstlearn.test as gt
import gstlearn.document as gdoc
import numpy as np
from IPython.display import Markdown
from scipy.spatial import distance_matrix
In [2]:
Markdown(gdoc.loadDoc("Kriging.md"))
Out[2]:

Kriging¶

Let suppose that

Z(si)=Xiβ+Y(si)

where Y is a second order stationary random field with mean 0 and covariance function C.

If Z is a vector of observations, we denote Z=Xβ+Y with Σ the covariance of Y

Simple Kriging¶

If β is known, we can obtain the simple kriging

ZSK0=X0β+Σt0Σ−1(Z−Xβ)=X0β+λtSK(Z−Xβ)

with:

  • the simple kriging weights

λSK=Σ−1Σ0

  • the variance of the estimator

Var(ZSK0)=λtSKΣλSK=λtSKΣ0

  • the estimation variance

σ2SK=Var(Z0−ZSK0)=σ20−Σt0Σ−1Σ0=σ20−λtSKΣ0

In matrix notation:¶

Simple Kriging System

[Σ]×[λSK]=[Σ0]

Estimation

ZSK0=[Z]t×[λSK]+m(1−∑λSK)

Variance of Estimation error

σ2SK=σ20−[λSK]t×[Σ0]

Variance of Estimator

Var(ZSK0)=[λSK]t×[Σ]×[λSK]=[λSK]t×[Σ0]

Universal kriging¶

If β is unknown, we can estimate it by

ˆβ=ΣcXtΣ−1Z

Introducing the notation

Σc=(XtΣ−1X)−1

then

ˆβ=ΣcXtΣ−1Z

Var(ˆβ)=Σc

The Universal kriging is obtained by first computing ˆβ and then pluging ˆβ in the simple kriging procedure.

ZUK0=X0ˆβ+Σt0Σ−1(Z−Xˆβ)=Σt0Σ−1Z+(X0−Σt0Σ−1X)ˆβ

We can rewrite everything with respect to Z

ZUK0=(Σt0Σ−1+(X0−Σt0Σ−1X)ΣcXtΣ−1)Z=(λtSK+(X0−λtSKX)ΣcXtΣ−1)Z=λtUKZ

with

  • the Universal Kriging Weights

λUK=λSK+Σ−1XΣc(Xt0−XtλSK)

  • the variance of the estimator is

Var(ZUK0)=λtUKΣλUK=Var(ZSK0)+2λtSKXΣcΣc(Xt0−XtλSK)+(X0−λtSKX)ΣcXtΣ−1XΣc(Xt0−XtλSK)=Var(ZSK0)+2λtSKXΣc(Xt0−XtλSK)+(X0−λtSKX)Σc(Xt0−XtλSK)=Var(ZSK0)+(λtSKX+X0)Σc(Xt0−XtλSK)=Var(ZSK0)−λtSKXΣcXtλSK+X0ΣcXt0

  • the estimation variance

σ2UK=σ20−2Cov(Z0,ZUK0)+Var(ZUK0)=σ20−2Σt0λUK+Var(ZUK0)=σ20−2Σt0(λSK+Σ−1XΣc(Xt0−XtλSK))+Var(ZSK0)−λtSKXΣcXtλSK+X0ΣcXt0=σ20−Σt0λSK−2Σt0Σ−1XΣc(Xt0−XtλSK)−λtSKXΣcXtλSK+X0ΣcXt0=σ2SK−2λtSKXΣc(Xt0−XtλSK)−λtSKXΣcXtλSK+X0ΣcXt0=σ2SK+(X0−λtSKX)Σc(Xt0−XtλSK)

In matrix notation:¶

Universal Kriging System

[ΣXXt0]×[λUK−μ]=[Σ0Xt0]

Estimation

ZUK0=[Z0]t×[λUK−μ]

Variance of estimation error

σ2UK=σ20−[λUK−μ]t×[Σ0Xt0]

Variance of estimator

Var(ZUK0)=[λUK]t×[Σ]×[λUK]

Summary¶

Simple Kriging¶

  • the estimator

ZSK0=X0β+λtSK(Z−Xβ)=λtSKZ+(X0−λtSKX)β

where

λSK=Σ−1Σ0

  • the variance of the estimator

Var(ZSK0)=λtSKΣλSK=λtSKΣ0

  • the variance of the estimation error

σ2SK=Var(Z0−ZSK0)=σ20−Σt0Σ−1Σ0=σ20−λtSKΣ0

Universal Kriging¶

ˆβ=ΣcXtΣ−1Z

Var(ˆβ)=Σc

  • the estimator

ZUK0=λtSKZ+(X0−λtSKX)ˆβ=λtUKZ

λUK=λSK+Σ−1XΣc(Xt0−XtλSK)

μUK=Σc(X0−λtSKX)t

  • the variance of the estimator

Var(ZUK0)=Var(ZSK0)−λtSKXΣcXtλSK+X0ΣcXt0

  • the variance of the estimation error

σ2UK=σ2SK+(X0−λtSKX)Σc(Xt0−XtλSK)

Collocated Option (Unique Neighborhood)¶

This is an interesting case for:

  • estimating a Target
  • in a multivariate case (say with N variables)
  • based on the information in the input Db (note that all 'N' variable(s) do not have to be known in the 'heterotopic' case)
  • when the Target contains information on some of the 'N' variables (these are the collocated variables)

When working in Unique Neighborhood, the relevant information (from the input data base) do not change for any of the targets. But the Collocated information changes at each target.

Hence the interest of benefiting of the inversion of the covariance matrix (restricted to the information of the Input data base).

In [3]:
Markdown(gdoc.loadDoc("Kriging_Bayesian.md"))
Out[3]:

Bayesian framework¶

In the Bayesian framework, we assume that

β∼N(β0,S)

We obtain

β|Z∼N(μc,Σc)

Σc=(XtΣ−1X+S−1)−1

and

μc=Σc(S−1β0+XtΣ−1Z)

We obtain the Bayesian quantities:

  • the estimator

ZBayes0=λtSKZ+(X0−λtSKX)μc

  • the variance of the estimator

Var(ZBayes0)=Var(ZSK0)−λtSKXΣcXtλSK+X0ΣcXt0

  • the variance of the estimation error

σ2Bayes=σ2SK+(X0−λtSKX)Σc(Xt0−XtλSK)

In [4]:
def cova(x,sills=1):
    return np.kron(sills,np.exp(-x/2))

np.random.seed(1234)
A = np.random.normal(size=(3,3))
sills = A@A.T
model = gl.Model.createFromParam(gl.ECov.EXPONENTIAL,range = 2.,flagRange=False,sills=sills)

nx = [10,10]

def matriceReduce(m,ind):
    M = gl.MatrixSymmetric(len(ind))
    for i,j in enumerate(ind):
        for k in range(j,len(ind)):
            M.setValue(i,k,m.getValue(j,ind[k]))
    return M

def modelReduce(model,var):
    ctxt=model.getContext()
    ctxt.setNVar(len(var))
    modeln = gl.Model.create(ctxt)
    covlist = model.getCovAnisoList()
    
    for i in range(covlist.getNCov()):
        cova = covlist.getCovAniso(i)
        sills = matriceReduce(cova.getSill(),var)
        covar = gl.CovAniso.createAnisotropicMulti(ctxt,
                                             cova.getType(),
                                             cova.getRanges(),
                                             sills,
                                             cova.getParam(),
                                             cova.getAnisoAngles())
        modeln.addCov(covar)
    return modeln

def createDbIn(ndat,nvar,percent,ndim=2,selDbin=False,measurement_error=False,ndrift = 0,
               flag_isotopic=False,seed=1234):
    db = gl.Db.create()
    np.random.seed(seed)
    for i in range(ndim):
        db["x" +str(i)] = np.random.uniform(size = ndat)
     
    db.setLocators(["x*"],gl.ELoc.X)
        
    indIn = np.arange(ndat)
    if selDbin:
        np.random.shuffle(indIn)
        indIn = np.sort(indIn)
        indIn = indIn[range(int(ndat/2))]
        sel = np.zeros(shape=ndat)
        sel[indIn] = 1
        db["sel"] = sel
        db.setLocator("sel",gl.ELoc.SEL)
      
    #Creation of an heterotopic data set
    indList = [] 
    for i in range(nvar):
        u = np.array([None for i in range(ndat)])
        ind = np.arange(ndat)
        if not flag_isotopic and nvar>1: 
            np.random.shuffle(ind)
            ind = ind[range(int(ndat*percent[i]))]
        ind = np.sort(list(set(ind) & set(indIn)))
        indList += [ind]
        vect = np.array([None for i in range(ndat)])
        vect[ind] = np.random.normal(size = len(ind))
        db["z"+str(i)]=vect
          
    db.setLocators(["z*"],gl.ELoc.Z)
    
    indF = []
    
    for i in range(nvar):
        indF += list(np.array(indList[i]) + ndat * i)
    
    if measurement_error :
        for i in range(nvar):
            db["err"+str(i)] = 0.1 * np.random.uniform(size = ndat)
            
        db.setLocators(["err*"],gl.ELoc.V)
    
    if ndrift>0:
        for i in range(ndrift):
            db["ff" + str(i)] = np.random.normal(size = ndat)
        db.setLocator("ff*",gl.ELoc.F)
    
    return db,indF

def test_covmat(ndat,nx,nvar,percent,model,cova,
                 irf=None,drift=False,measurement_error=False,compute_vars = True,
                 selDbin = True, selDbout = True,flag_isotopic=True,
                 seed=1234,tol=1e-12,eps=1e-3,test = True,verbose=False, debug=False):
    
    np.random.seed(seed)
    ndrift = 1 if drift else 0
    modeln = modelReduce(model,range(nvar))
    #### Create the description of the case #####
    casetxt = "case:\n"
    
    inter = ""
    if nvar > 1:
        inter = "co-"
        
    if irf is None and drift:
        return 0
    
    if irf is None and not drift:
        casetxt += "- simple "+ inter+ "kriging\n"
    else :
        if irf is not None :
            casetxt += "- KU with drift of degree " + str(irf) + "\n"
        if drift :
            casetxt +="- with external drift\n"
    if nvar > 1:
        casetxt +="- number of covariables for co-kriging " + str(nvar) + "\n"
        if flag_isotopic:
            casetxt += "- isotopic case\n"
        else:
            casetxt += "- heterotopic case\n"
    if measurement_error:
        casetxt += "- with measurement error\n"
    else:
        casetxt += "- without measurement error\n"
    if compute_vars:
        casetxt += "- no dual form\n"
    else:
        casetxt += "- dual\n"
    casetxt += "- selection on Dbin " + str(selDbin) + "\n"
    casetxt += "- selection on Dbout "+ str(selDbout) + "\n"
    casetxt += "- number of data " + str(ndat) + "\n"
    casetxt += "- nx = ["+str(nx[0]) +"," + str(nx[1]) + "]\n"
    
    if verbose:
        print(casetxt)
        
    ##################################################
    db,indF = createDbIn(ndat,nvar,percent,2,selDbin,measurement_error,ndrift,flag_isotopic,seed)
    
    target = gl.DbGrid.create(nx = nx)
   
    indOut = np.arange(target.getNSample())
    
    if selDbout:
        np.random.shuffle(indOut)
        indOut = indOut[range(int(target.getNSample()/2))]
        indOut = np.sort(indOut)
        sel = np.zeros(shape = target.getNSample())
        sel[indOut] = 1
        target["sel"] = sel
        target.setLocator("sel",gl.ELoc.SEL)
                  
    if irf is not None:
        modeln.setDriftIRF(irf)
    
    if drift :
        target["ff"] = np.random.normal(size = target.getNSample())
        
        target.setLocator("ff",gl.ELoc.F)
        modeln.addDrift(gl.DriftF(0))
      
    v = np.array([db["x0"],db["x1"]]).T
    v0 = np.array([target["x1"][indOut],target["x2"][indOut]]).T
    cov = cova(distance_matrix(v,v),modeln.getSills(0).toTL())[indF,:][:,indF]
    c0  = cova(distance_matrix(v,v0),modeln.getSills(0).toTL())[indF,:]
    #Creation of a db2 without selection to build the complete covariance matrix
    db2 = db.clone()
    db2.setLocator("sel")
    vect = gl.VectorDouble(nvar**2 * db2.getNSample()**2)
    
    target2 = target.clone()
    target2.setLocator("sel")
    covt = modeln.evalCovMatSym(db).toTL()
    c0gl = modeln.evalCovMat(db,target).toTL()
    
    if measurement_error:
        err = db["err*"].T.reshape(-1,)
        np.fill_diagonal(cov,np.diag(cov)+err[indF])
    
    vect = gl.VectorDouble(nvar**2 * db2.getNSample() * len(indOut))
    
    neigh = gl.NeighUnique()
    
    errcode = 0
    diff = np.max(np.abs(cov-covt))
    if diff > eps:
        errcode = errcode + 1
        print(f"Difference in Cov  = {round(diff,4)}")
        if debug:
            print("Reference", cov)
            print("Gstlearn", covt)
        
    diff = np.max(np.abs(c0-c0gl))
    if diff > eps:
        errcode = errcode + 1
        print(f"Difference in Cov0 = {round(diff,4)}")
        if debug:
            print("Reference", c0)
            print("Gstlearn", c0gl)
        
    if irf is not None or drift:
        driftd = np.kron(np.eye(nvar),modeln.getDrifts(db2, False))[:,indF]
        driftt = np.kron(np.eye(nvar),modeln.getDrifts(target, True))
        driftdgl = modeln.evalDriftMat(db).toTL().T
        drifttgl = modeln.evalDriftMat(target).toTL().T
        diff = np.max(np.abs(driftdgl-driftd))
        if diff > eps:
            errcode = errcode + 1
            print(f"Difference in X  = {round(diff,4)}")
            if debug:
                print("Reference", driftd)
                print("Gstlearn", driftdgl)
            
        diff = np.max(np.abs(drifttgl-driftt))
        if diff > eps:
            errcode = errcode + 1
            print(f"Difference in X0 = {round(diff,4)}")
            if debug:
                print("Reference", driftt)
                print("Gstlearn", drifttgl)

    return errcode
In [5]:
percent = [0.5,0.9,1.]
verbose = False
runAll = True
In [6]:
if runAll:
    nerr = 0
    ntest = 0
    ndat = 40
    nx = [5,5]
    for irf in [None,0,1]:
      for drift in [False,True]:
        for measurement_error in [True, False]:
          for selDbin in [True, False]:
            for selDbout in [True, False]:
                for nvar in [1,2,3]:
                    for cv in [False,True]:
                      errcode = test_covmat(ndat,nx,nvar,percent,
                                            model,cova,compute_vars=cv,
                                            irf=irf,drift=drift,
                                            measurement_error=measurement_error,
                                            selDbin=selDbin,selDbout=selDbout,
                                            flag_isotopic = False,
                                            seed=1234,tol=1e-8,eps=1e-3,
                                            verbose=verbose, debug=False)
                      nerr = nerr + errcode
                      ntest = ntest + 1
                  
    print(ntest,"test(s) have been performed with", nerr, "error(s)")
288 test(s) have been performed with 0 error(s)
In [7]:
if not runAll:
    ndat = 5
    nx = [2,2]
    irf = None
    drift = False
    measurement_error = True
    selDbin = False
    selDbout = True
    nvar = 1
    cv = False
    debug = True
    errcode = test_covmat(ndat,nx,nvar,percent,
                    model,cova,compute_vars=cv,
                                        irf=irf,drift=drift,
                                        measurement_error=measurement_error,
                                        selDbin=selDbin,selDbout=selDbout,
                                        flag_isotopic = False,
                                        seed=1234,tol=1e-8,eps=1e-3,verbose=verbose, debug=debug)