In this preamble, we load the gstlearn library, and clean the workspace.
rm(list=ls())
library(gstlearn)
library(ggplot2)
library(ggpubr)
library(ggnewscale)
Then we download the data base dat
.
fileNF = loadData("Scotland", "Scotland_Temperatures.NF")
dat = Db_createFromNF(fileNF)
We also create a Db
object containing a grid covering
the data points in the data base dat
. To do so, we start by
displaying the minimal and maximal coordinates of the points in
dat
using the getExtremas()
method from the
Db
class.
dat$getExtremas()
## [[1]]
## [1] 78.2 460.7
##
## [[2]]
## [1] 530.4 1208.9
The first (resp. second) element of the list contains the min and max
coordinates in the first (resp. second) space dimension. Based on this
information, we create a grid covering all the points using the
DbGrid_create
function. We specify the coordinates of the
origin (i.e. lower left corner) of the grid (argument x0
),
the step sizes in each dimension (argument dx
) and the
number of points in each dimension (argument nx
).
grid = DbGrid_create(x0=c(65,530),dx=c(4.94, 4.96),nx=c(82,138))
We then print a summary of the content of the grid using the
display
method of Db
class, which we supply
with a DbStringFormat
object specifying that we would like
information about the extent of the grid (argument
flag_extend
in the
DbStringFormat_createFromFlags
function).
dbfmt = DbStringFormat_createFromFlags(flag_extend=TRUE)
grid$display(dbfmt)
##
## Data Base Grid Characteristics
## ==============================
##
## Data Base Summary
## -----------------
## File is organized as a regular grid
## Space dimension = 2
## Number of Columns = 3
## Total number of samples = 11316
##
## Grid characteristics:
## ---------------------
## Origin : 65.000 530.000
## Mesh : 4.940 4.960
## Number : 82 138
##
## Data Base Extension
## -------------------
## Coor #1 - Min = 65.000 - Max = 465.140 - Ext = 400.14
## Coor #2 - Min = 530.000 - Max = 1209.520 - Ext = 679.52
##
## Variables
## ---------
## Column = 0 - Name = rank - Locator = NA
## Column = 1 - Name = x1 - Locator = x1
## Column = 2 - Name = x2 - Locator = x2
## NULL
We compute the experimental variogram vario2dir (in 2 directions) (cf. Variography for more details).
varioParamMulti = VarioParam_createMultiple(ndir=2, npas=15, dpas=15.)
vario2dir = Vario(varioParamMulti)
err = vario2dir$compute(dat)
We then the fit a model fitmod
fitmod = Model()
types = ECov_fromKeys(c("NUGGET","EXPONENTIAL","GAUSSIAN"))
err = fitmod$fit(vario2dir,types=types)
ggplot() + plot.varmod(vario2dir, fitmod)
Let suppose that
\(Z(s_i) = X_i\beta + Y(s_i)\)
where \(Y\) is a second order stationary random field with mean 0 and covariance function \(C\).
If \(Z\) is a vector of observations, we denote \(Z = X\beta + Y\) with \(\Sigma\) the covariance of Y
If \(\beta\) is known, we can obtain the simple kriging
\(Z_0^{SK} = X_0\beta + \Sigma_0^t\Sigma^{-1}(Z-X\beta) = X_0\beta + \lambda_{SK}^t(Z-X\beta)\)
with:
\(\lambda_{SK}=\Sigma^{-1}\Sigma_0\)
\(\textrm{Var}(Z_0^{SK})=\lambda_{SK}^t\Sigma\lambda_{SK}=\lambda_{SK}^t\Sigma_0\)
\(\sigma_{SK}^2 = \textrm{Var}(Z_0-Z_0^{SK}) = \sigma_0^2-\Sigma_0^t\Sigma^{-1}\Sigma_0=\sigma_0^2-\lambda_{SK}^t\Sigma_0\)
Simple Kriging System
\[ \begin{bmatrix} \Sigma \end{bmatrix} \times \begin{bmatrix} \lambda_{SK} \end{bmatrix} = \begin{bmatrix} \Sigma_0 \end{bmatrix} \]
Estimation
\[ Z_0^{SK} = \begin{bmatrix} Z \end{bmatrix}^t \times \begin{bmatrix} \lambda_{SK} \end{bmatrix} + m ({ 1 - \sum{\lambda_{SK}}} ) \]
Variance of Estimation error
\[ \sigma_{SK}^2 = \sigma_0^2 - \begin{bmatrix} \lambda_{SK} \end{bmatrix}^t \times \begin{bmatrix} \Sigma_0 \end{bmatrix} \]
Variance of Estimator
\[ \textrm{Var}(Z_0^{SK}) = \begin{bmatrix} \lambda_{SK} \end{bmatrix}^t \times \begin{bmatrix} \Sigma \end{bmatrix} \times \begin{bmatrix} \lambda_{SK} \end{bmatrix} = \begin{bmatrix} \lambda_{SK} \end{bmatrix}^t \times \begin{bmatrix} \Sigma_0 \end{bmatrix} \]
To perform simple kriging, we use the function called
kriging
. We specify:
Db
object containing the data points (argument
dbin
) : the variable used for kriging is the (unique)
variable of the data base with a z
locator (i.e. it should
have locator z1
and the other variables should not have a
locator starting with z
)Db
object containing the target points, i.e. the
points where the kriging predictor will be computed (argument
dbout
)Model
object containing the model used to define
the kriging predictor (argument model
): in particular, the
mean used to define the predictor is the one set in the
Model
objectneigh
), eg. unique neighborhood (to use all the data points
for each predictor) or moving neighborhood (to use only the data points
in the vicinity of the target point in the prediction). This argument is
defined using a “neighborhood” object (see example below).Additionally, it is possible to specify whether we wish to compute,
at each target point, the kriging predictor (argument
flag_est
, default=TRUE
), the kriging
standard-deviation (argument flag_std
,
default=TRUE
) and the kriging variance (argument
flag_varz
, default=FALSE
).
The kriging
function then adds new variables to the
Db
entered in the dbout
argument corresponding
to these variables. The names of these newly created variables will
start by Kriging
, but this prefix can be changed using the
namconv
argument of the kriging function
In the next example, we perform a simple kriging prediction (with
unique neighborhood) of the variable January_temp
in the
dat
data base, on the grid defined in the grid
data base. To do so, we start by selecting the variable
January_temp
in the dat
data base (i.e. we
make ensure that it is the only variable with a z
locator).
dat$setLocator("January_temp",ELoc_Z())
## NULL
dat
##
## Data Base Characteristics
## =========================
##
## Data Base Summary
## -----------------
## File is organized as a set of isolated points
## Space dimension = 2
## Number of Columns = 5
## Total number of samples = 236
##
## Variables
## ---------
## Column = 0 - Name = rank - Locator = NA
## Column = 1 - Name = Longitude - Locator = x1
## Column = 2 - Name = Latitude - Locator = x2
## Column = 3 - Name = Elevation - Locator = NA
## Column = 4 - Name = January_temp - Locator = z1
We then create a “neighborhood” object corresponding to the
specification of a unique neighborhood: this is done using the
NeighUnique
function.
uniqueNeigh = NeighUnique()
We now call the kriging
function to perform the kriging
prediction. We use the model fitmod
that we previously
fitted on our data, require to compute the kriging predictor and its
standard-deviation (but not its variance), and change the prefix of the
newly created variables to “SK”.
err = kriging(dbin=dat, dbout=grid, model=fitmod,
neigh=uniqueNeigh,
flag_est=TRUE, flag_std=TRUE, flag_varz=FALSE,
namconv=NamingConvention("SK")
)
We see that the kriging predictor and its standard deviation have
been added to the grid
data base.
grid
##
## Data Base Grid Characteristics
## ==============================
##
## Data Base Summary
## -----------------
## File is organized as a regular grid
## Space dimension = 2
## Number of Columns = 5
## Total number of samples = 11316
##
## Grid characteristics:
## ---------------------
## Origin : 65.000 530.000
## Mesh : 4.940 4.960
## Number : 82 138
##
## Variables
## ---------
## Column = 0 - Name = rank - Locator = NA
## Column = 1 - Name = x1 - Locator = x1
## Column = 2 - Name = x2 - Locator = x2
## Column = 3 - Name = SK.January_temp.estim - Locator = z1
## Column = 4 - Name = SK.January_temp.stdev - Locator = NA
Finally, we plot the kriging prediction over the grid using the
plot.grid
function and the data points.
p = ggDefaultGeographic()
p = p + plot.grid(grid,
flagLegendRaster = TRUE, palette="Spectral",legendNameRaster="°C")
p = p + plot.point(dat,flagCst = T,pch=18,cex=1.5)
p = p + plot.decoration(title="Simple Kriging over whole Grid")
ggPrint(p)
By default, the plotting function plots the variable with locator
z1
, which in our case corresponds to the kriging predictor
(as the kriging
function automatically assigns the locator
z1
to it). To plot another variable, we can simply specify
their name.
For instance, we can plot the kriging standard deviation using the following code.
p = ggDefaultGeographic()
p = p + plot.grid(grid,nameRaster="SK.January_temp.stdev",
flagLegendRaster = TRUE, palette="Spectral",legendNameRaster="°C")
p = p + plot.point(dat,flagCst = T,pch=18,cex=1.5)
p = p + plot.decoration(title="Simple Kriging std-dev over whole Grid")
ggPrint(p)
As mentioned above, the mean used in the simple kriging predictor is
the one set in the Model
object supplied in the
kriging
function. By default, this mean is zero. It can be
changed using the setMean
method of the Model
object.
For instance, considering the model fitmod
previously
fitted on the data, we can clone it (using the clone
method), and assign it a new mean (equal to 4) as follows.
fitmodSK = fitmod$clone()
err = fitmodSK$setMean(mean=4)
Then, simple kriging is performed using the same command as before, but with the newly created model.
err = kriging(dbin=dat, dbout=grid, model=fitmodSK,
neigh=uniqueNeigh,
flag_est=TRUE, flag_std=TRUE, flag_varz=FALSE,
namconv=NamingConvention("Mean4_SK")
)
Finally, we plot the new kriging prediction over the grid and the data points.
p = ggDefaultGeographic()
p = p + plot.grid(grid,
flagLegendRaster = TRUE, palette="Spectral",legendNameRaster="°C")
p = p + plot.point(dat,flagCst = T,pch=18,cex=1.5)
p = p + plot.decoration(title="Simple Kriging over whole Grid: Mean=4")
ggPrint(p)
In gstlearn, ordinary kriging is seen as the formulation of a kriging
predictor on a model with a constant drift. Hence, to perform ordinary
kriging, we use the kriging
function with the same syntax
as for simple kriging, but call it with a model that includes a constant
drift. Adding a constant drift to a model is done with the
addDrift()
method.
Note: Removing a drift from a model can be done using the
delDrift
method (while specifying the index of the drift we
wish to remove) or using the delAllDrifts()
(to remove all
the drifts at once).
Let us go back to our running example. Considering the model
fitmod
previously fitted on the data, we can clone it
(using the clone
method), and add a constant drift as
follows.
fitmodOK = fitmod$clone()
err = fitmodOK$addDrift(DriftM())
Then, ordinary kriging is performed using the same command as before, but with the newly created model.
err = kriging(dbin=dat, dbout=grid, model=fitmodOK,
neigh=uniqueNeigh,
flag_est=TRUE, flag_std=TRUE, flag_varz=FALSE,
namconv=NamingConvention("OK")
)
Finally, we plot the new kriging prediction over the grid and the data points.
p = ggDefaultGeographic()
p = p + plot.grid(grid,
flagLegendRaster = TRUE, palette="Spectral",legendNameRaster="°C")
p = p + plot.point(dat,flagCst = T,pch=18,cex=1.5)
p = p + plot.decoration(title="Ordinary Kriging over whole Grid")
ggPrint(p)
We also plot the kriging standard deviations.
p = ggDefaultGeographic()
p = p + plot.grid(grid,nameRaster="OK.January_temp.stdev",
flagLegendRaster = TRUE, palette="Spectral",legendNameRaster="°C")
p = p + plot.point(dat,flagCst = T,pch=18,cex=1.5)
p = p + plot.decoration(title="Ordinary Kriging std-dev over whole Grid")
ggPrint(p)
Let us compare the results from the simple and ordinary kriging predictors. To do so, we create a correlation plot between the two predictors.
p = ggplot()
p = p + plot.correlation(grid,namex="OK.January_temp.estim",namey="SK.January_temp.estim",
flagBiss=TRUE, flagSameAxes=TRUE, bins=100)
p = p + plot.decoration(title="Estimation Simple vs. Ordinary",
xlab="Ordinary Kriging", ylab="Simple Kriging")
ggPrint(p)
We also compare the kriging standard-deviations obtained in both cases.
p = ggplot()
p = p + plot.correlation(grid,namex="OK.January_temp.stdev",namey="SK.January_temp.stdev",
flagBiss=TRUE, flagSameAxes=TRUE, bins=100)
p = p + plot.decoration(title="St. dev. Simple vs. Ordinary",
xlab="Ordinary Kriging", ylab="Simple Kriging")
ggPrint(p)
We now load new grid.
fileNF = loadData("Scotland", "Scotland_Elevations.NF")
grid = DbGrid_createFromNF(fileNF)
grid$display(dbfmt)
##
## Data Base Grid Characteristics
## ==============================
##
## Data Base Summary
## -----------------
## File is organized as a regular grid
## Space dimension = 2
## Number of Columns = 4
## Total number of samples = 11097
## Number of active samples = 3092
##
## Grid characteristics:
## ---------------------
## Origin : 65.000 535.000
## Mesh : 4.938 4.963
## Number : 81 137
##
## Data Base Extension
## -------------------
## Coor #1 - Min = 65.000 - Max = 455.123 - Ext = 390.123
## Coor #2 - Min = 535.000 - Max = 1200.109 - Ext = 665.109
##
## Variables
## ---------
## Column = 0 - Name = Longitude - Locator = x1
## Column = 1 - Name = Latitude - Locator = x2
## Column = 2 - Name = Elevation - Locator = f1
## Column = 3 - Name = inshore - Locator = sel
## NULL
This new grid data base contains a variable called
inshore
, with a locator sel
. This indicates
that this variable is a selection, i.e. a binary variable allowing to
select some points, called active cells, in a data base (by
setting them to 1, while the other points will be 0). Selections allow
to restrict computations and operations made on a data base to only the
active cells. For, when plotting a data base with a selection, only the
active cells are represented.
p = ggDefaultGeographic()
p = p + plot.grid(grid,nameRaster="Longitude",
flagLegendRaster = TRUE, palette="Spectral",legendNameRaster="°C")
p = p + plot.point(dat,flagCst = T,pch=18,cex=1.5)
p = p + plot.decoration(title="Longitude values (restricted to the selection)")
ggPrint(p)
When calling the kriging
function on a data base that
contains a selection, the predictors are only computed on the active
cells (while the other cells are left undefined).
err = kriging(dbin=dat, dbout=grid, model=fitmodOK, neigh=uniqueNeigh,
flag_est=TRUE, flag_std=TRUE, flag_varz=FALSE,
namconv=NamingConvention("OK"))
p = ggDefaultGeographic()
p = p + plot.grid(grid,nameRaster="OK*estim",
flagLegendRaster = TRUE, palette="Spectral",legendNameRaster="°C")
p = p + plot.point(dat,flagCst = T,pch=18,cex=1.5)
p = p + plot.decoration(title="Estimation by Ordinary Kriging (restricted to the selection)")
ggPrint(p)
The function xvalid
can be used to perform leave-one-out
cross-validation with kriging: each point of the data base is separately
predicted by kriging, using all the other points in the data base. This
function is called in a similar way as the kriging
function. We specify:
Db
object containing the data points on which we
want to perform the cross-validation (argument dbin
) : once
again, the variable used for the study is the (unique) variable of the
data base with a z
locatorModel
object containing the model used to define
the kriging predictor (argument model
)neigh
)flag_xvalid_est=-1
, returns a variable with name
ending with “estim”), the cross-validation error defined as the
difference between the predictor and the true value (argument
flag_xvalid_est=1
, returns a variable with name ending with
“esterr”) or neither (argument flag_xvalid_est=0
)flag_xvalid_std=-1
, returns a variable with name ending
with “stdev”), a standardized error defined as the ratio
(Cross-validation error)/(Kriging standard-deviation) (argument
flag_xvalid_std=1
, returns a variable with name ending with
“stderr”) or neither (argument flag_xvalid_std=0
)Going back to our running example, we perform a cross-validation on
our data base dat
. In particular we ask, at each point, for
the cross-validation error and for the standardized error described
above. We also specify, through the nameconv
that we do not
wish to modify the current locators in the data base (otherwise, the
locator z1
is “moved” to the variable containing the
cross-validation error).
err = xvalid(db=dat, model=fitmodOK, neigh=uniqueNeigh,
flag_xvalid_est=1, flag_xvalid_std=1,
namconv=NamingConvention_create("Xvalid", flag_locator = FALSE)
)
We plot the histogram of cross-validation errors.
p = ggplot()
p = p + plot.hist(dat,name="*esterr*",bins=30,fill="blue")
p = p + plot.decoration(xlab="Estimation Errors", title="Cross-Validation")
ggPrint(p)
We plot the histogram of standardized errors.
p = ggplot()
p = p + plot.hist(dat,name="*stderr*",bins=30,fill="blue")
p = p + plot.decoration(xlab="Standardized Errors", title="Cross-Validation")
ggPrint(p)
Finally, we compute a few statistics about these errors.
print(c("Mean cross-validation error:",round(mean(dat$getColumn("*esterr*"),na.rm=TRUE),5)))
## [1] "Mean cross-validation error:" "-0.00417"
print(c("Mean squared cross-validation error:",round(mean(dat$getColumn("*esterr*")^2,na.rm=TRUE),5)))
## [1] "Mean squared cross-validation error:"
## [2] "0.23937"
print(c("Mean standardized error:",round(mean(dat$getColumn("*stderr*")^2,na.rm=TRUE),5)))
## [1] "Mean standardized error:" "0.91178"
We now plot the absolute value of the cross-validation errors at each
point on top of the grid map. We use the argument
flagAbsSize = TRUE
in the plot.point
function
to specify that we want the size of the points to be proportional to the
absolute value of the variable specified in the nameSize
argument (here, the cross-validation errors).
p = ggDefaultGeographic()
p = p + plot.grid(grid,"inshore")
p = p + plot.point(dat,nameSize="*esterr",sizmax=3,flagAbsSize = TRUE)
p = p + plot.decoration(title="Cross-Validation scores")
ggPrint(p)
Up until now, we only considered kriging with a unique neighborhood.
To work with a moving neighborhood, we first need to define it by
creating “neighborhood” object describing its characteristics. This is
done using the NeighMoving_create
function. We can
specify:
radius
,
default=1.234e30): note that the default radius value is taken very
large so that the neighborhood radius can basically be seen as infinite
if the argument is not set by the usernmini
and nmaxi
): for a
given target point, if the number of data points within the neighborhood
radius is smaller that the specified minimum, then no prediction is
performed at this target (it is set to undefined)For instance, to design a small Moving Neighborhood with only 1 sample per neighborhood (irregardless of its distance to the target point), we use the following command:
smallNeigh = NeighMoving_create(nmini=1, nmaxi=1)
Then, (ordinary) kriging with moving neighborhood is performed using
the same commands as before, but replacing the unique neighborhood in
the neigh
argument by our custom moving neighborhood
object.
err = kriging(dbin=dat, dbout=grid, model=fitmodOK, neigh=smallNeigh,
flag_est=TRUE, flag_std=TRUE,
namconv=NamingConvention("Small"))
p = ggDefaultGeographic()
p = p + plot.grid(grid,nameRaster="Small*estim",
flagLegendRaster = TRUE, palette="Spectral",legendNameRaster="°C")
p = p + plot.point(dat,flagCst = T,pch=18,cex=1.5)
p = p + plot.decoration(title="Estimation by Ordinary Kriging (Small Neigh.)")
ggPrint(p)
To create a moving neighborhood with radius 20 and containing between 1 and 10 points, we use the following command:
movingNeigh = NeighMoving_create(nmini=1, nmaxi=10, radius=20)
Running the Ordinary Kriging then gives:
err = kriging(dat,grid,fitmodOK,movingNeigh,
flag_est=TRUE, flag_std=TRUE,
namconv=NamingConvention("Reduced"))
p = ggDefaultGeographic()
p = p + plot.grid(grid,nameRaster="Reduced*estim",
flagLegendRaster = TRUE, palette="Spectral",legendNameRaster="°C")
p = p + plot.point(dat,flagCst = T,pch=18,cex=1.5)
p = p + plot.decoration(title="Estimation by Ordinary Kriging (Reduced Moving Neigh.)")
ggPrint(p)
Note that some of the target sites are not predicted as no sample is found within their neighborhood.
Let us then consider a moving neighborhood with a bigger radius (150) and containing between 1 and 10 points.
movingNeigh = NeighMoving_create(nmini=1, nmaxi=10, radius=150)
Running the Ordinary Kriging then gives:
err = kriging(dat,grid,fitmodOK,movingNeigh,
flag_est=TRUE, flag_std=TRUE,
namconv=NamingConvention("Moving"))
p = ggDefaultGeographic()
p = p + plot.grid(grid,nameRaster="Moving*estim",
flagLegendRaster = TRUE, palette="Spectral",legendNameRaster="°C")
p = p + plot.point(dat,flagCst = T,pch=18,cex=1.5)
p = p + plot.decoration(title="Estimation by Ordinary Kriging (Moving Neigh.)")
ggPrint(p)
And plotting the associated kriging standard-deviations gives:
p = ggDefaultGeographic()
p = p + plot.grid(grid,nameRaster="Moving*stdev",
flagLegendRaster = TRUE, palette="Spectral",legendNameRaster="°C")
p = p + plot.point(dat,flagCst = T,pch=18,cex=1.5)
p = p + plot.decoration(title="St. dev. by Ordinary Kriging (Moving Neigh.)")
ggPrint(p)
Finally, let us compare the results obtained with a unique and a moving neighborhood (through correlation plot). First, we compare the kriging predictions in both cases.
p = ggplot()
p = p + plot.correlation(grid,namex = "OK*estim",namey="Moving*estim",
bins=100, flagDiag=TRUE)
p = p + plot.decoration(title="Ordinary Kriging Estimation",
xlab="Unique", ylab="Moving")
ggPrint(p)
Then, we compare the kriging standard-deviations in both cases.
p = ggplot()
p = p + plot.correlation(grid,namex = "OK*stdev",namey="Moving*stdev",
bins=100, flagDiag=TRUE)
p = p + plot.decoration(title="Ordinary Kriging St. Dev.",
xlab="Unique", ylab="Moving")
ggPrint(p)