In this preamble, we load the gstlearn library and clean the workspace.
rm(list=ls())
library(gstlearn)
library(ggplot2)
library(ggpubr)
library(ggnewscale)
plot.setDefaultGeographic(dims=c(8,8))
We load two data bases:
dat
containing point observations of two
variables across Scotland: the elevation (Elevation
) and
the temperature (January_temp
, then renamed as
Temperature
)target
containing a grid of points covering
Scotland with a selection variable (inshore
) selecting the
points that are on land, and a variable (Elevation
) giving
the elevation at every point on land## Data points
fileNF = loadData("Scotland", "Scotland_Temperatures.NF")
dat = Db_createFromNF(fileNF)
### Rename the temperature variable
dat$setName("*temp", "Temperature")
## Target grid
fileNF = loadData("Scotland", "Scotland_Elevations.NF")
target = DbGrid_createFromNF(fileNF)
The target
data base a (grid) map of the elevation
across Scotland.
target
##
## Data Base Grid Characteristics
## ==============================
##
## Data Base Summary
## -----------------
## File is organized as a regular grid
## Space dimension = 2
## Number of Columns = 4
## Total number of samples = 11097
## Number of active samples = 3092
##
## Grid characteristics:
## ---------------------
## Origin : 65.000 535.000
## Mesh : 4.938 4.963
## Number : 81 137
##
## Variables
## ---------
## Column = 0 - Name = Longitude - Locator = x1
## Column = 1 - Name = Latitude - Locator = x2
## Column = 2 - Name = Elevation - Locator = f1
## Column = 3 - Name = inshore - Locator = sel
Let us plot its content.
ggDefaultGeographic() + plot(target, nameRaster="Elevation",
flagLegendRaster=TRUE,palette="Spectral",
legendNameRaster="Elevation (m)")
The dat
data base contains 236 (point) samples of two
variables across Scotland: elevation and temperature.
dat
##
## Data Base Characteristics
## =========================
##
## Data Base Summary
## -----------------
## File is organized as a set of isolated points
## Space dimension = 2
## Number of Columns = 5
## Total number of samples = 236
##
## Variables
## ---------
## Column = 0 - Name = rank - Locator = NA
## Column = 1 - Name = Longitude - Locator = x1
## Column = 2 - Name = Latitude - Locator = x2
## Column = 3 - Name = Elevation - Locator = NA
## Column = 4 - Name = Temperature - Locator = z1
We can use the dbStatisticsPrint
function to compute
statistics on variables of a Db
. We specify
Db
of interest (argument db
),names
),opers
).
This last argument is set through a EStatOption_fromKeys
function, which we call with a vector containing the (abbreviated) names
of the statistics (run EStatOption_printAll()
for the full
list)flagIso
allowing to specify whether we want to
compute the statistics for each variable separately
(flagIso=TRUE
), or whether we want to compute “filtered”
statistics (flagIso=FALSE
). In the latter case, we compute
the statistics while only using the points where all the variables are
defined.For instance, let us count the number of observations of each
variable using the dbStatisticsPrint
.
dbStatisticsPrint(dat, names = c("Elevation", "Temperature"),
opers=EStatOption_fromKeys(c("NUM")),
flagIso = FALSE, title="Number of observations", radix="")
##
## Number of observations
## ----------------------
## Number
## Elevation 236
## Temperature 151
## NULL
Since the data base dat
contains 236 samples, we can
conclude that the Elevation
variable is defined at every
point, but not the Temperature
variable. Similarly, we can
compute the mean and the extreme values of each variable in the
observation data base.
dbStatisticsPrint(dat, names = c("Elevation", "Temperature"),
opers=EStatOption_fromKeys(c("MEAN","MINI","MAXI")),
flagIso = FALSE, title="Statistics of observations", radix="")
##
## Statistics of observations
## --------------------------
## Minimum Maximum Mean
## Elevation 2.000 800.000 146.441
## Temperature 0.600 5.200 2.815
## NULL
Finally, we compute the mean and the extreme values of the elevation in the target data base.
dbStatisticsPrint(target, names = c("Elevation"),
opers=EStatOption_fromKeys(c("MEAN","MINI","MAXI")),
flagIso = FALSE, title="Statistics of target", radix="")
##
## Statistics of target
## --------------------
## Minimum Maximum Mean
## Elevation 0.000 1270.000 241.152
## NULL
We can then compute the filtered mean, minimum and maximum for the
Elevation
and Temperature
variables as
follows:
dbStatisticsPrint(dat, names = c("Elevation", "Temperature"),
opers=EStatOption_fromKeys(c("MEAN","MINI","MAXI")),
flagIso = TRUE, title="Filtered statistics of observations", radix="")
##
## Filtered statistics of observations
## -----------------------------------
## Minimum Maximum Mean
## Elevation 3.000 387.000 87.974
## Temperature 0.600 5.200 2.815
## NULL
As explained above, the first row of the table contains contains the
mean, the minimum and the maximum of the observations of the
Elevation
variable, over all the locations where both the
Elevation
na Temperature
variables are defined
(i.e. in our case, all the points where the temperature is defined).
Hence, we see that the points where the temperature is observed are
located at relatively low altitudes within the Scottish landscape.
To confirm that, we plot the points where the temperature is sampled on top of the elevation map of Scotland.
p = ggDefaultGeographic()
p = p + plot.grid(target, nameRaster="Elevation",
flagLegendRaster=TRUE,
palette="Spectral",
legendNameRaster="Elevation (m)")
#p = p + new_scale_color()
#p = p + new_scale("linetype")
p = p + plot.point(dat, nameSize="Temperature",
sizmin = 0.5, sizmax = 3,
flagLegendSize=TRUE,legendNameSize="°C")
ggPrint(p)
From observing this last plot, it seems like the bigger points (corresponding to high temperatures) are located where the elevation is smaller: this seems to hint (confirm?) that the temperature is negatively correlated with the elevation. To corroborate this, we plot a correlation plot between the two variables.
p = ggplot()
p = p + plot.correlation(dat, namex="Elevation", namey="Temperature",
asPoint=TRUE, flagRegr=TRUE)
p = p + plot.decoration(title="Correlation between Temperature and Elevation")
ggPrint(p)
In this course, we will introduce four methods allowing to deal with multivariate data: Cokriging, Residual modeling, Models with polynomial trends and finally Models with an external drift. As a baseline, we will compare the results obtained with each model with the results obtained with a univariate model.
We start by computing an experimental directional variogram
vario_raw2dir
of the “raw” temperature observations, along
two directions (\(0^{\circ}\text{C}\)
and \(90^{\circ}\text{C}\)).
## Create experimental variogram on raw data
varioparam = VarioParam_createMultiple(ndir=2, npas=30, dpas=10)
vario_raw2dir = Vario(varioparam)
err = vario_raw2dir$compute(dat)
We then fit a model on the experimental variogram of raw temperature observations. Since we want to use ordinary kriging, we add a constant drift to the model before.
fitmod_raw = Model()
err = fitmod_raw$fit(vario_raw2dir,
types=ECov_fromKeys(c("NUGGET","EXPONENTIAL","CUBIC","LINEAR")))
err = fitmod_raw$setDriftIRF(0,0)
p = ggplot()
p = p + plot.varmod(vario_raw2dir, fitmod_raw)
p = p + plot.decoration(title="Experimental and fitted variogram models - Raw temperature observations")
ggPrint(p)
We create a “neighborhood” object specifying a unique neighborhood, which we will use throughout the course.
uniqueNeigh = NeighUnique_create()
We now perform an ordinary kriging prediction of the temperature on
the target
grid using the model fitted on the raw
observations, and compute statistics on the predicted values.
err = kriging(dbin=dat, dbout=target, model=fitmod_raw,
neigh=uniqueNeigh,
namconv=NamingConvention_create(prefix="OK"))
## Plot
p = ggDefaultGeographic()
p = p + plot.grid(target,nameRaster = "OK*estim",
flagLegendRaster = TRUE, palette="Spectral",legendNameRaster="°C")
p = p + plot.point(dat,flagCst = T,pch=18,cex=1)
p = p + plot.decoration(title="Temperature - Ordinary Kriging")
ggPrint(p)
opers=EStatOption_fromKeys(c("NUM","MINI","MAXI","MEAN","STDV"))
dbStatisticsPrint(target, names=(c("OK.T*")), opers=opers,
title="Statistics on the Ordinary Kriging:", radix="")
##
## Statistics on the Ordinary Kriging:
## -----------------------------------
## Number Minimum Maximum Mean St. Dev.
## OK.Temperature.estim 3092 0.588 5.153 2.826 0.976
## OK.Temperature.stdev 3092 0.036 0.990 0.407 0.187
## NULL
We perform a cross-validation of the fitted model using Ordinary Kriging, and calculate the Mean Squared cross-validation and standardized errors.
## Compute cross-validation
err = xvalid(dat, model=fitmod_raw,
neigh=uniqueNeigh,
namconv=NamingConvention_create(prefix="CV_OK",flag_locator=FALSE))
mse=mean(dat$getColumn("CV_OK*esterr*")^2,na.rm=TRUE)
cat(c("Mean squared cross-validation error:",
round(mse,3),
"\n"))
## Mean squared cross-validation error: 0.281
mse=mean(dat$getColumn("CV_OK*stderr*")^2,na.rm=TRUE)
cat(c("Mean squared standardized error:",
round(mse,3),
"\n"))
## Mean squared standardized error: 2.635
To create and work with multivariate models, we simply need to work
with Db
objects containing more than one variable with a
z
locator. All the variables with a z
locator
will be considered as part of the multivariate model. Then, the same
commands as in the univariate case can be used to create and fit
experimental variograms, and to perform (co)kriging predictions.
Let us illustrate our point with our running example. We would like
now to consider a bivariate model of the temperature and the elevation.
To do so, we simply allocate, in the observation data base
dat
, a z
locator to both variables using the
setLocators
method.
dat$setLocators(names=c("Temperature", "Elevation"),
locatorType=ELoc_Z())
## NULL
dat
##
## Data Base Characteristics
## =========================
##
## Data Base Summary
## -----------------
## File is organized as a set of isolated points
## Space dimension = 2
## Number of Columns = 7
## Total number of samples = 236
##
## Variables
## ---------
## Column = 0 - Name = rank - Locator = NA
## Column = 1 - Name = Longitude - Locator = x1
## Column = 2 - Name = Latitude - Locator = x2
## Column = 3 - Name = Elevation - Locator = z2
## Column = 4 - Name = Temperature - Locator = z1
## Column = 5 - Name = CV_OK.Temperature.esterr - Locator = NA
## Column = 6 - Name = CV_OK.Temperature.stderr - Locator = NA
To create experimental (directional) variograms and cross-variograms,
we use the same commands as in the univariate case: since the data base
dat
now contains two variables with a z
locator, the compute
method automatically computes both
variograms and cross-variograms for these variables.
varioexp2var = Vario_create(varioparam)
err = varioexp2var$compute(dat)
We can then plot the experimental variograms and cross-variograms
with a simple command: the plot in the i-th row and j-th column
corresponds to the cross-variogram between the variables with locators
zi
and zj
(hence the diagonal plots correspond
to the variograms of each variable).
multi.varmod(varioexp2var)
To fit a model on the experimental variograms and cross-variograms, we use the same commands as in the univariate case.
fitmod2var = Model()
err = fitmod2var$fit(varioexp2var,
types=ECov_fromKeys(c("NUGGET","EXPONENTIAL","CUBIC","LINEAR")))
err = fitmod2var$setDriftIRF(0,0)
multi.varmod(varioexp2var, fitmod2var)
To compute predictions by (simple) cokriging on the grid, we use the same syntax as in univariate case: a predictor for each variable in the multivariate model is produced. (Note: we revert back to a unique neighborhood to compare with the predictors previously introduced).
err = kriging(dbin=dat, dbout=target, model=fitmod2var,
neigh=uniqueNeigh,
namconv=NamingConvention_create(prefix="COK"))
We can then represent the cokriging predictor for the temperature.
p = ggDefaultGeographic()
p = p + plot.grid(target,nameRaster = "COK.Temp*estim",
flagLegendRaster = TRUE,
palette="Spectral",legendNameRaster="°C")
p = p + plot.point(dat,flagCst = T,pch=18,cex=1)
p = p + plot.decoration(title="Temperature - CoKriging")
ggPrint(p)
For this predictor, we get the following statistics:
opers=EStatOption_fromKeys(c("NUM","MINI","MAXI","MEAN","STDV"))
dbStatisticsPrint(target, names=(c("COK.T*")), opers=opers,
title="Statistics on the CoKriging predictions",
radix="")
##
## Statistics on the CoKriging predictions
## ---------------------------------------
## Number Minimum Maximum Mean St. Dev.
## COK.Temperature.estim 3092 0.200 5.094 2.671 0.970
## COK.Temperature.stdev 3092 0.231 0.948 0.448 0.109
## NULL
Finally, we compare the cokriging predictor to the ordinary kriging predictor.
p = ggplot()
p = p + plot.correlation(target, namex="OK.T*estim", namey="COK.T*estim",
flagBiss=TRUE, bins=100)
p = p + plot.decoration(xlab="Ordinary Kriging",ylab="CoKriging")
ggPrint(p)
opers=EStatOption_fromKeys(c("NUM","MINI","MAXI","MEAN","STDV"))
dbStatisticsPrint(target, names=c("OK.T*estim", "COK.T*estim"), opers=opers,
title="Comparison between Ordinary and Universal kriging predictions",
radix="")
##
## Comparison between Ordinary and Universal kriging predictions
## -------------------------------------------------------------
## Number Minimum Maximum Mean St. Dev.
## OK.Temperature.estim 3092 0.588 5.153 2.826 0.976
## COK.Temperature.estim 3092 0.200 5.094 2.671 0.970
## NULL
Since cokriging can be time-consuming in Unique Neighborhood, we create a small moving neighborhood for demonstration.
movingNeigh = NeighMoving_create(radius = 1000, nmaxi = 10)
To perform a cross-validation of the bivariate model using co-kriging, we use the same commands as in the univariate case. Then cross-validation errors are computed for each variable of the multivariate model (hence for both the Temperature and the Elevation in our case).
err = xvalid(dat, model=fitmod2var,
neigh=movingNeigh,
namconv=NamingConvention_create(prefix="CV_COK",flag_locator=FALSE))
We obtain the following Mean Squared Errors for the temperature.
mse=mean(dat$getColumn("CV_COK.Temperature.esterr*")^2,na.rm=TRUE)
cat(c("Mean squared cross-validation error:",
round(mse,3),
"\n"))
## Mean squared cross-validation error: 0.279
mse=mean(dat$getColumn("CV_COK.Temperature.stderr*")^2,na.rm=TRUE)
cat(c("Mean squared standardized error:",
round(mse,3),
"\n"))
## Mean squared standardized error: 1.227
We obtain the following Mean Squared Errors for the elevation.
mse=mean(dat$getColumn("CV_COK.Elevation.esterr*")^2,na.rm=TRUE)
cat(c("Mean squared cross-validation error:",
round(mse,3),
"\n"))
## Mean squared cross-validation error: 17849.434
mse=mean(dat$getColumn("CV_COK.Elevation.stderr*")^2,na.rm=TRUE)
cat(c("Mean squared standardized error:",
round(mse,3),
"\n"))
## Mean squared standardized error: 1.206
In this section, we assume that the variable of interest \(Z\) is modeled (at each location \(x\)) as \[ Z(x) = b+a Y(x) + \varepsilon(x)\] where \(Y\) is an auxiliary variable known at every location, \(a\) and \(b\) are some (unknown) regression coefficients, and \(\varepsilon\) denotes stationary residuals. Our goal will be to model and work directly with the residuals \(\varepsilon\) (since they are the one who are assumed to be stationary).
In our running example, the variable of interest \(Z\) will be the temperature, and the
auxiliary variable \(Y\) will be the
elevation. In the observation data base, we start by assigning the
locator z
to the Temperature
variable (this is
our variable of interest), and ensure that it is the only variable with
a z
locator.
## Set `z` locator to temperature
err = dat$setLocator("Temperature",ELoc_Z(),cleanSameLocator=TRUE)
To compute the coefficients \(a\)
and \(b\) of the linear regression
between the temperature and the elevation, we can use the
regression
function. We specify the name of response
variable (argument nameResp
) and the names of the auxiliary
variables (argument nameAux
), and set the argument
mode=0
to specify that we would like to compute a
regression defined from the variables with the specified names. We also
set the argument flagCst=TRUE
to specify that we are
looking for an affine regression model between the variables (i.e. that
includes the bias coefficient b
).
## Fit regression model
regr = regression(dat, nameResp="Temperature", nameAux="Elevation", mode=0, flagCst=TRUE)
regr$display()
##
## Linear Regression
## -----------------
## - Calculated on 151 active values
## - Constant term = 3.61197
## - Explanatory Variable #1 = -0.0090641
## - Initial variance = 1.01979
## - Variance of residuals = 0.363298
## NULL
## Extract coefficients
b = regr$getCoeff(0)
a = regr$getCoeff(1)
From these regression coefficients obtained above, we can then
compute the residuals explicitly as \(\varepsilon(x)=Z(x) - (b+a Y(x) )\). An
alternative method consists in using the dbRegression
function: this functions fits a regression model, computes the residuals
and add them directly on the data base containing the data. The
dbRegression
function is called in a similar way as the
regression
function.
In the next example, we compute the residuals of the linear regression between the temperature and the elevation and add them to the observation data base (with a name starting with “RegRes”and without changing the locators in the data base).
err = dbRegression(dat, nameResp="Temperature",nameAux="Elevation", mode=0,
flagCst = TRUE,
namconv = NamingConvention_create(prefix="RegRes",flag_locator=FALSE))
We then compute some statistics about these residuals.
opers=EStatOption_fromKeys(c("NUM","MINI","MAXI","MEAN","STDV"))
dbStatisticsPrint(dat, names=c("RegRes*"), opers=opers,
title="Statistics on the residuals",
radix="")
##
## Statistics on the residuals
## ---------------------------
## Number Minimum Maximum Mean St. Dev.
## RegRes.Temperature 151 -1.359 1.795 0.000 0.603
## NULL
Finally we plot a correlation plot between the residuals and the regressor variable (i.e. the elevation).
ggplot() + plot.correlation(dat,namex="Elevation",namey="RegRes*",
flagRegr=TRUE,asPoint=TRUE)
Now that the residuals are explicitly computed and available in the observation data base, we can proceed to work with them as with any other variable.
We start by setting their locator to z
to specify that
they are now our variable of interest within the data base (instead of
the raw temperature observations).
dat$setLocator("RegRes*",ELoc_Z(), cleanSameLocator=TRUE)
## NULL
Then we can compute an experimental variogram for the residuals and fit a model on them.
## Compute experimental variogram
varioexpR = Vario(varioparam)
err = varioexpR$compute(dat)
## Fit model
fitmodR = Model()
err = fitmodR$fit(varioexpR,
types=ECov_fromKeys(c("NUGGET","SPHERICAL","LINEAR")))
err = fitmodR$setDriftIRF(0,0)
p = ggplot()
p = p + plot.varmod(varioexpR, fitmodR)
p = p + plot.decoration(title="Experimental and fitted variogram models - Temperature Residual")
ggPrint(p)
Finally, we can compute an ordinary kriging prediction of the residuals and plot the results.
err = kriging(dbin=dat, dbout=target, model=fitmodR,
neigh=uniqueNeigh,
namconv=NamingConvention_create(prefix="ROK"))
p = ggDefaultGeographic()
p = p + plot.grid(target,nameRaster = "ROK*estim",
flagLegendRaster = TRUE, palette="Spectral",legendNameRaster="°C")
p = p + plot.point(dat,flagCst = T,pch=18,cex=1)
p = p + plot.decoration(title="Temperature residuals - Ordinary Kriging")
ggPrint(p)
Now that the residuals \(\varepsilon^{OK}\) are predicted everywhere on the grid (by ordinary kriging), we can compute a predictor \(Z^*\) for the temperature by simply adding the back the linear regression part of the model, i.e. by computing
\[
Z^{\star}(x) = b + a Y(x) + \varepsilon^{OK}(x)
\] We can compute this predictor by directly manipulating the
variables of the target
data base.
## Compute temperature predictor
ROK_estim = b + a * target["Elevation"] + target["ROK*estim"]
## Add it to data base
uid = target$addColumns(ROK_estim,"KR.Temperature.estim")
Let us plot the resulting temperature predictions.
p = ggDefaultGeographic()
p = p + plot.grid(target,nameRaster = "KR.T*estim",
flagLegendRaster = TRUE, palette="Spectral",legendNameRaster="°C")
p = p + plot.point(dat,flagCst = T,pch=18,cex=1)
p = p + plot.decoration(title="Temperature - Ordinary Kriging of the residuals")
ggPrint(p)
Finally, we compare the predictor obtained by kriging of the residuals to the ordinary kriging predictor.
p = ggplot()
p = p + plot.correlation(target, namex="OK.T*estim", namey="KR.T*estim",
flagBiss=TRUE, bins=100)
p = p + plot.decoration(xlab="Ordinary Kriging",ylab="Kriging with Residuals")
ggPrint(p)
opers=EStatOption_fromKeys(c("NUM","MINI","MAXI","MEAN","STDV"))
dbStatisticsPrint(target, names=c("OK.T*estim", "KR.T*estim"), opers=opers,
title="Comparison between Ordinary and Residual kriging predictions",
radix="")
##
## Comparison between Ordinary and Residual kriging predictions
## ------------------------------------------------------------
## Number Minimum Maximum Mean St. Dev.
## OK.Temperature.estim 3092 0.588 5.153 2.826 0.976
## KR.Temperature.estim 3092 -8.097 5.108 1.445 1.906
## NULL
In this subsection, we move to a model for the data where the temperature \(Z\) at each location is considered to have a polynomial trend, i.e. we can write, for any location \(x\), \[ Z(x) = P(x) + \varepsilon(x)\] where \(P\) is a polynomial function of the coordinates \(x\), and \(\varepsilon\) denotes stationary residuals. This model can also be referred to as an intrinsic model.
As before, in the observation data base, we start by assigning the
locator z
to the Temperature
variable (this is
our variable of interest), and ensure that it is the only variable with
a z
locator.
err = dat$setLocator("Temperature",ELoc_Z(),cleanSameLocator=TRUE)
It is possible to directly compute the experimental variogram of the
residuals \(\varepsilon\). To do so, we
start by creating a Model
object in which we specify,
through the setDriftIRF
method, the (maximal) degree of the
polynomial trend P
.
Then, computing the (possibly directional) experimental variogram of
the residuals is done in the same way as for an “ordinary” experimental
variogram, the only difference being that we now add in the
compute
method of the Vario
class, the
Model
object specifying the polynomial trend (argument
model
). Then, the methods takes care of filtering out of
the observations a polynomial trend of the specified degree, before
computing the experimental variogram on the remaining residuals.
For instance, if we want to compute an experimental variogram of the residuals from a model with a linear trend (polynomial of degree 1), then we can run the following commands.
## Create model and set polynomial drift of degree 1
polDriftModel = Model_create()
err = polDriftModel$setDriftIRF(order=1)
## Compute variogram of residuals
vario_res2dir = Vario(varioparam)
err = vario_res2dir$compute(dat,model=polDriftModel)
When plotting the experimental variograms, we notice that in one of the directions, the variogram does not reach a plateau. This suggests that a non-stationary model could be more suited for the analysis of these temperature observations.
p = ggplot()
p = p + plot.varmod(vario_raw2dir)
p = p + plot.decoration(title="Temperature (°C)")
ggPrint(p)
Then, we can now fit that a model on the experimental variorgram
using the fit
method in the usual way.
err = polDriftModel$fit(vario_res2dir,
types=ECov_fromKeys(c("NUGGET","EXPONENTIAL","CUBIC")))
p = ggplot()
p = p + plot.varmod(vario_res2dir, polDriftModel)
p = p + plot.decoration(title="Experimental and fitted variogram models - Residuals")
ggPrint(p)
Finally, note that it is possible to compute the coefficients of the
polynomial trend from the intrinsic model defined above, using the
function regression
. This function allows to perform linear
regressions on variables defined in a data base.
For instance, to fit a linear trend on the temperature observations
in the dat
data base, we specify the name of response
variable (argument nameResp
) and the drift model (argument
model
), and set the argument mode=2
to specify
that we would like to compute a regression defined from a drift model.
Finally we set the argument verbose=TRUE
to print the
results.
regResults = regression(dat, nameResp="Temperature", model=polDriftModel, mode=2)
regResults$display()
##
## Linear Regression
## -----------------
## - Calculated on 151 active values
## - Explanatory Variable #1 = 3.52136
## - Explanatory Variable #2 = -0.00746599
## - Explanatory Variable #3 = 0.00197753
## - Initial variance = 1.01979
## - Variance of residuals = 0.735557
## NULL
Note: The regression results are stored in an object. We can access
the coefficients through the slot coeffs
, the initial
variance of the data through the slot variance
and the
variance of the residuals through the slot varres
.
To work with universal kriging, it suffices to call the
kriging
(to compute predictions) or xvalid
(to
perform cross-validation) with a Model
object that includes
polynomial drift functions. In particular, we do not need to explictly
compute any type of residuals.
Let’s compute an universal kriging prediction of the temperature on
the target
grid using the model fitted on the
residuals.
err = kriging(dbin=dat, dbout=target, model=polDriftModel,
neigh=uniqueNeigh,
namconv=NamingConvention_create(prefix="UK"))
## Plot
p = ggDefaultGeographic()
p = p + plot.grid(target,nameRaster = "UK*estim",
flagLegendRaster = TRUE,
palette="Spectral",legendNameRaster="°C")
p = p + plot.point(dat,flagCst = T,pch=18,cex=1)
p = p + plot.decoration(title="Temperature - Universal Kriging")
ggPrint(p)
We compute some statistics on the predicted values.
opers=EStatOption_fromKeys(c("NUM","MINI","MAXI","MEAN","STDV"))
dbStatisticsPrint(target, names=c("UK.T*"), opers=opers,
title="Statistics on the Universal Kriging:", radix="")
##
## Statistics on the Universal Kriging:
## ------------------------------------
## Number Minimum Maximum Mean St. Dev.
## UK.Temperature.estim 3092 0.613 5.051 2.841 0.923
## UK.Temperature.stdev 3092 0.083 0.919 0.555 0.138
## NULL
Let us compare the predictions obtained when considering a polynomial trend in the model, with the ordinary kriging predictor based on the “raw” temperature observations. We plot a correlation plot between the ordinary and universal kriging predictions, compare their respective statistics.
p = ggplot()
p = p + plot.correlation(target, namex="OK*estim", namey="UK*estim",
flagBiss=TRUE, bins=100)
p = p + plot.decoration(xlab="Ordinary Kriging",ylab="Universal Kriging")
ggPrint(p)
opers=EStatOption_fromKeys(c("NUM","MINI","MAXI","MEAN","STDV"))
dbStatisticsPrint(target, names=c("OK.T*estim", "UK.T*estim"), opers=opers,
title="Comparison between Ordinary and Universal kriging predictions:",
radix="")
##
## Comparison between Ordinary and Universal kriging predictions:
## --------------------------------------------------------------
## Number Minimum Maximum Mean St. Dev.
## OK.Temperature.estim 3092 0.588 5.153 2.826 0.976
## UK.Temperature.estim 3092 0.613 5.051 2.841 0.923
## NULL
To perform cross-validation using Universal Kriging, we simply call
the xvalid
function with the model we fitted on the
residuals.
err = xvalid(dat, model=polDriftModel,
neigh=uniqueNeigh,
namconv=NamingConvention_create(prefix="CV_UK",flag_locator=FALSE))
The Mean Squared cross-validation and standardized errors of the universal kriging predictor are:
mse=mean(dat$getColumn("CV_UK*esterr*")^2,na.rm=TRUE)
cat(c("Mean squared cross-validation error:",
round(mse,3),
"\n"))
## Mean squared cross-validation error: 0.251
mse=mean(dat$getColumn("CV_UK*stderr*")^2,na.rm=TRUE)
cat(c("Mean squared standardized error:",
round(mse,3),
"\n"))
## Mean squared standardized error: 0.855
In this last section, we investigate how to specify and work with
external drifts when modeling data. In a data base, external drifts are
identified by allocating a locator f
to them in the data
bases.
For instance, circling back to our running example, let us assume
that we would like to model the temperature with an external drift given
by the elevation. Then, in the observation data base, we simply need to
allocate a z
locator to the temperature variable and a
f
locator to the elevation variable using the
setLocator
method. Note: we use the flag
cleanSameLocator=TRUE
to make sure that only the variable
we specify carries the locator.
## Set `z` locator to temperature
err = dat$setLocator("Temperature",ELoc_Z(), cleanSameLocator=TRUE)
## Set `f` locator to elevation
err = dat$setLocator("Elevation",ELoc_F(), cleanSameLocator=TRUE)
Then, defining and fitting the models on the one hand, and performing kriging predictions on the other hand, is done using the same approach as the one described earlier for models with a polynomial trend.
To create experimental (directional) variograms of the residuals from a model with external drift, we use the same approach as the one described for modeling data with polynomial trends.
First, we create a Model
object where we specify that we
deal with external drifts. This is once again done through the
setDriftIRF
function where we specify:
nfex
):
this is the number of variables with a f
locator the we
want to work withorder
): setting order=0
allows to add a
constant drift to the model, and setting order
to a value
\(n\) allows to add all the monomial
functions of the coordinates of order up to \(n\) as external drifts (on top of the
nfex
external drift variables)Circling back to our example, we create a model with a single external drift (the elevation), and add a constant drift (that basically acts like a bias term).
EDmodel = Model_create()
err = EDmodel$setDriftIRF(order=0,nfex=1)
Then, to compute the experimental variogram, we use the same commands
as in the case of polynomial trends: we create a Vario
object from the data base containing the data, and call the
compute
method with the model we just created. The
experimental variogram is computed on the residuals obtained after
“filtering out” the (linear) effect of the drift variables (and possibly
of a polynomial trend if specified in the model).
varioKED = Vario(varioparam)
err = varioKED$compute(dat,model=EDmodel)
As a reference, we plot the experimental variograms computed on the raw temperature data (dashed lines) and on the residuals from the model with external drift (solid line).
p = ggplot()
p = p + plot.varmod(vario_raw2dir, varioLinetype="dashed")
p = p + plot.varmod(varioKED, varioLinetype="solid")
p = p + plot.decoration(title="Temperature (°C)")
ggPrint(p)
Finally, we fit our model with external drift using the
fit
method (which we call on the experimental variogram of
residuals).
err = EDmodel$fit(varioKED,
types=ECov_fromKeys(c("NUGGET","CUBIC","GAUSSIAN")))
p = ggplot()
p = p + plot.varmod(varioKED, EDmodel)
p = p + plot.decoration(title="Experimental and fitted variogram models - Residuals")
ggPrint(p)
Similarly to Universal kriging, to perform a cross-validation or
predictions using kriging with External Drifts, we simply call the
xvalid
and kriging
functions with models where
external drifts are specified.
For instance, to compute kriging predictions with external drift of
the temperature on the target
.
err = kriging(dbin=dat, dbout=target, model=EDmodel,
neigh=uniqueNeigh,
namconv=NamingConvention_create(prefix="KED"))
p = ggDefaultGeographic()
p = p + plot.grid(target,nameRaster = "KED*estim",
flagLegendRaster = TRUE, palette="Spectral",legendNameRaster="°C")
p = p + plot.point(dat,flagCst = T,pch=18,cex=1)
p = p + plot.decoration(title="Temperature - Kriging with external drift")
ggPrint(p)
For this predictor, we get the following statistics
opers=EStatOption_fromKeys(c("NUM","MINI","MAXI","MEAN","STDV"))
err = dbStatisticsPrint(target, names=(c("KED.T*")), opers=opers,
title="Statistics on the Kriging with External Drift predictions",
radix="")
##
## Statistics on the Kriging with External Drift predictions
## ---------------------------------------------------------
## Number Minimum Maximum Mean St. Dev.
## KED.Temperature.estim 3092 -6.004 4.773 1.778 1.540
## KED.Temperature.stdev 3092 0.312 0.615 0.396 0.051
We can then compare these predictions to the ones obtained by ordinary kriging. We create a correlation plot between the ordinary kriging predictions, and the kriging with external drift (KED) predictions.
p = ggplot()
p = p + plot.correlation(target, namex="OK.T*estim", namey="KED.T*estim",
flagBiss=TRUE, bins=100)
p = p + plot.decoration(xlab="Ordinary Kriging", ylab="Kriging with External Drift")
ggPrint(p)
Note that negative Estimates are present when using External Drift.
To perform a cross-validation, we simply call the xvalid
function.
err = xvalid(dat, model=EDmodel,
neigh=uniqueNeigh,
namconv=NamingConvention_create(prefix="CV_KED",flag_locator=FALSE))
The Mean Squared cross-validation and standardized errors of the resulting kriging predictor are
mse=mean(dat$getColumn("CV_KED*esterr*")^2,na.rm=TRUE)
cat(c("Mean squared cross-validation error:",
round(mse,3),
"\n"))
## Mean squared cross-validation error: 0.172
mse=mean(dat$getColumn("CV_KED*stderr*")^2,na.rm=TRUE)
cat(c("Mean squared standardized error:",
round(mse,3),
"\n"))
## Mean squared standardized error: 1.143
We compare the Mean Squared cross-validation errors obtained in for the different kriging predictions (UK=Universal kriging, OK=Ordinary kriging, COK= Cokriging, KED= Kriging with external drift, KR=Kriging of residuals).
opers=EStatOption_fromKeys(c("NUM","MINI","MAXI","MEAN","STDV"))
err = dbStatisticsPrint(dat, names=(c("CV*.Temperature.esterr")), opers=opers,
title="Mean-squared cross-validation errors",
radix="")
##
## Mean-squared cross-validation errors
## ------------------------------------
## Number Minimum Maximum Mean St. Dev.
## CV_OK.Temperature.esterr 151 -1.411 1.644 -0.017 0.530
## CV_COK.Temperature.esterr 151 -1.759 1.648 -0.105 0.517
## CV_UK.Temperature.esterr 151 -1.713 1.477 -0.003 0.501
## CV_KED.Temperature.esterr 151 -1.577 1.001 -0.009 0.414
We then compare various statistics computed for each predictor.
opers=EStatOption_fromKeys(c("NUM","MINI","MAXI","MEAN","STDV"))
err = dbStatisticsPrint(target, names = (c("*.Temperature.estim")), opers=opers,
title="Statistics of the predictors",
radix="")
##
## Statistics of the predictors
## ----------------------------
## Number Minimum Maximum Mean St. Dev.
## OK.Temperature.estim 3092 0.588 5.153 2.826 0.976
## COK.Temperature.estim 3092 0.200 5.094 2.671 0.970
## ROK.RegRes.Temperature.estim 3092 -0.771 1.586 0.019 0.455
## KR.Temperature.estim 3092 -8.097 5.108 1.445 1.906
## UK.Temperature.estim 3092 0.613 5.051 2.841 0.923
## KED.Temperature.estim 3092 -6.004 4.773 1.778 1.540
Finally, we compare various statistics computed for the standard-deviation of each predictor.
opers=EStatOption_fromKeys(c("NUM","MINI","MAXI","MEAN","STDV"))
err = dbStatisticsPrint(target, names = (c("*.Temperature.stdev")), opers=opers,
title="Statistics of the standard-deviation of each predictors",
radix="")
##
## Statistics of the standard-deviation of each predictors
## -------------------------------------------------------
## Number Minimum Maximum Mean St. Dev.
## OK.Temperature.stdev 3092 0.036 0.990 0.407 0.187
## COK.Temperature.stdev 3092 0.231 0.948 0.448 0.109
## ROK.RegRes.Temperature.stdev 3092 0.304 0.504 0.362 0.031
## UK.Temperature.stdev 3092 0.083 0.919 0.555 0.138
## KED.Temperature.stdev 3092 0.312 0.615 0.396 0.051