Discrete Disjunctive Kriging¶
import numpy as np
import matplotlib.pyplot as plt
import gstlearn as gl
import gstlearn.plot as gp
This notebook presents an example for Discrete-Disjunctive Kriging (DDK). The theory is not detailed here, see for example:
Rivoirard, J. (1994). Introduction to disjunctive kriging and non-linear geostatistics. Number
551.021 R626i. Clarendon Press
Simulation of a reference data set¶
We create a reference data set (lognormal distribution) based on a model that we define, using simtub (based on Turning Bands).
# parameters for the simulation
m = 1.
sig = 0.5
# initialization of the grid
grd = gl.DbGrid.create(x0=(0.0,0.0), dx=(0.01,0.01), nx=(100,100))
# simulation from a model
model = gl.Model.createFromParam(gl.ECov.EXPONENTIAL, range=0.2)
gl.simtub(dbin = None, dbout = grd, model = model, nbsimu = 1)
grd.setName("Simu", "Y")
grd["Z"] = m * np.exp(sig * grd["Y"].squeeze() - sig**2 / 2)
# Data set (10% of the grid)
data = gl.Db.createSamplingDb(grd, 0.1, flagAddSampleRank=0)
data.useSel = True
# plots
fig, (ax1,ax2) = plt.subplots(1,2, figsize=(15,6))
ax1.raster(grd,"Z")
ax1.decoration(title="Initial variable")
ax2.histogram(grd, name='Z', bins = 25, color="orange")
ax2.decoration(xlabel = "Raw variable", title="Histogram of the initial variable")
fig, ax3 = plt.subplots(figsize=(6,6))
ax3.raster(grd,"Z")
ax3.gstpoint(data,nameSize="Z", color='yellow')
ax3.decoration(title="Random data subsample")
Elementary statistics¶
We define cutoffs values corresponding to quantiles 0%, 30%, 50%, 70% and 90%.
zcut = np.quantile(data['Z'], q = [0.3, 0.5, 0.7, 0.9])
print("\n{:^40}".format("Coupures sur la variable Z"))
print("{:^10}{:^10}{:^10}{:^10}".format(0.3, 0.5, 0.7, 0.9))
print("{:^10.3f}{:^10.3f}{:^10.3f}{:^10.3f}".format(*zcut),'\n')
mylimits = gl.Limits(zcut, True) #defines limits based on the cutoff values
# True for addFromZero, so that we add the interval [0, z_1[
mylimits.display()
Coupures sur la variable Z 0.3 0.5 0.7 0.9 0.712 0.920 1.241 1.778 Bound( 1 ) : [ 0 ; 0.712209 [ Bound( 2 ) : [ 0.712209 ; 0.920089 [ Bound( 3 ) : [ 0.920089 ; 1.24076 [ Bound( 4 ) : [ 1.24076 ; 1.77795 [
Discretization of the variable on cutoff values¶
From the cutoff values defined above, the variable Z is discretized on the four intervals delimited by the cutoff values. The indicators of the intervals are $\mathbb{1}(z_i \le Z < z_{i+1})$. The fifth indicator ($\mathbb{1}(Z \ge z_{5})$) is not computed because it can be deducted from the four other ones as their sum equals to one.
# Compute the indicators (4 new variables)
iud_Indic = mylimits.toIndicator(data, name='Z', OptionIndicator=1)
# Compute the discretized version of the variable
iud_Mean = mylimits.toIndicator(data, name='Z', OptionIndicator=0)
# statistics on the indicators
w = gl.dbStatisticsMono(data, ["Indicator.Z.Class*"], [gl.EStatOption.MEAN]).getValues()
w = list(w) + [1 - np.sum(w)]
print("Proportions = ", np.round(w,3))
Nclass = len(w) # total number of indicators, including the fifth one which is not computed
Proportions = [0.3 0.2 0.2 0.2 0.1]
print(data)
ax = data.plot(nameColor="*Mean")
ax.decoration(title="Discretized variable")
Data Base Characteristics ========================= Data Base Summary ----------------- File is organized as a set of isolated points Space dimension = 2 Number of Columns = 10 Total number of samples = 1000 Variables --------- Column = 0 - Name = rank - Locator = NA Column = 1 - Name = x1 - Locator = x1 Column = 2 - Name = x2 - Locator = x2 Column = 3 - Name = Y - Locator = NA Column = 4 - Name = Z - Locator = NA Column = 5 - Name = Indicator.Z.Class.1 - Locator = NA Column = 6 - Name = Indicator.Z.Class.2 - Locator = NA Column = 7 - Name = Indicator.Z.Class.3 - Locator = NA Column = 8 - Name = Indicator.Z.Class.4 - Locator = NA Column = 9 - Name = Indicator.Z.Mean - Locator = z1
# Locate Z
data.clearLocators(gl.ELoc.Z)
data.setLocator("Z", gl.ELoc.Z)
# Variogram parameters (omnidirectional)
varioParam = gl.VarioParam.createOmniDirection(npas=10, dpas=0.05)
# Compute variogram
var_Z = gl.Vario.computeFromDb(varioParam, data)
# fit model
mod_Z = gl.Model()
opt=gl.Option_AutoFit()
opt.setWmode(2) # weighted proportional to the number of pairs and inverse proportional to the distance
mod_Z.fit(var_Z, [gl.ECov.NUGGET, gl.ECov.EXPONENTIAL, gl.ECov.EXPONENTIAL], mauto = opt)
# plot
ax = gp.varmod(var_Z, mod_Z, flagLegend=True)
ax.decoration(title = "Variogram of the data sample")
mod_Z.display()
Model characteristics ===================== Space dimension = 2 Number of variable(s) = 1 Number of basic structure(s) = 2 Number of drift function(s) = 0 Number of drift equation(s) = 0 Covariance Part --------------- Exponential - Sill = 0.152 - Range = 0.091 - Theo. Range = 0.030 Exponential - Sill = 0.207 - Range = 0.627 - Theo. Range = 0.209 Total Sill = 0.359 Known Mean(s) 0.000
Variogram of the discretized variable¶
data.clearLocators(gl.ELoc.Z)
data.setLocator("Indicator.Z.Mean", gl.ELoc.Z)
var_Z = gl.Vario.computeFromDb(varioParam, data)
mod_Z = gl.Model()
mod_Z.fit(var_Z, [gl.ECov.NUGGET, gl.ECov.EXPONENTIAL, gl.ECov.EXPONENTIAL], mauto = opt)
ax = gp.varmod(var_Z, mod_Z, flagLegend=True)
ax.decoration(title = "Variogram of the discretized variable")
mod_Z.display()
Model characteristics ===================== Space dimension = 2 Number of variable(s) = 1 Number of basic structure(s) = 3 Number of drift function(s) = 0 Number of drift equation(s) = 0 Covariance Part --------------- Nugget Effect - Sill = 0.016 Exponential - Sill = 0.054 - Range = 0.074 - Theo. Range = 0.025 Exponential - Sill = 0.079 - Range = 0.641 - Theo. Range = 0.214 Total Sill = 0.149 Known Mean(s) 0.000
Variograms of the Indicator variables¶
data.clearLocators(gl.ELoc.Z)
data.setLocator("Indicator.Z.Class*", gl.ELoc.Z)
var_Z = gl.Vario.computeFromDb(varioParam, data)
mod_Z = gl.Model()
mod_Z.fit(var_Z, [gl.ECov.NUGGET, gl.ECov.EXPONENTIAL, gl.ECov.EXPONENTIAL], mauto = opt)
ax = gp.varmod(var_Z, mod_Z)
gp.decoration(ax,title = "Simple and cross variograms of the indicators of Z", fontsize=20)