In this preamble, we load the gstlearn library and clean the workspace.
rm(list=ls())
library(gstlearn)
library(ggplot2)
library(ggpubr)
library(ggnewscale)
plot.setDefaultGeographic(dims=c(8,8))
We load two data bases:
dat
containing point observations of two
variables across Scotland: the elevation (Elevation
) and
the temperature (January_temp
, then renamed as
Temperature
)target
containing a grid of points covering
Scotland with a selection variable (inshore
) selecting the
points that are on land, and a variable (Elevation
) giving
the elevation at every point on land## Data points
fileNF = loadData("Scotland", "Scotland_Temperatures.NF")
dat = Db_createFromNF(fileNF)
### Rename the temperature variable
dat$setName("*temp", "Temperature")
## Target grid
fileNF = loadData("Scotland", "Scotland_Elevations.NF")
target = DbGrid_createFromNF(fileNF)
The target
data base a (grid) map of the elevation
across Scotland.
target
##
## Data Base Grid Characteristics
## ==============================
##
## Data Base Summary
## -----------------
## File is organized as a regular grid
## Space dimension = 2
## Number of Columns = 4
## Total number of samples = 11097
## Number of active samples = 3092
##
## Grid characteristics:
## ---------------------
## Origin : 65.000 535.000
## Mesh : 4.938 4.963
## Number : 81 137
##
## Variables
## ---------
## Column = 0 - Name = Longitude - Locator = x1
## Column = 1 - Name = Latitude - Locator = x2
## Column = 2 - Name = Elevation - Locator = f1
## Column = 3 - Name = inshore - Locator = sel
Let us plot its content.
ggDefaultGeographic() + plot(target, nameRaster="Elevation",
flagLegendRaster=TRUE,palette="Spectral",
legendNameRaster="Elevation (m)")
The dat
data base contains 236 (point) samples of two
variables across Scotland: elevation and temperature.
dat
##
## Data Base Characteristics
## =========================
##
## Data Base Summary
## -----------------
## File is organized as a set of isolated points
## Space dimension = 2
## Number of Columns = 5
## Total number of samples = 236
##
## Variables
## ---------
## Column = 0 - Name = rank - Locator = NA
## Column = 1 - Name = Longitude - Locator = x1
## Column = 2 - Name = Latitude - Locator = x2
## Column = 3 - Name = Elevation - Locator = NA
## Column = 4 - Name = Temperature - Locator = z1
We can use the dbStatisticsPrint
function to compute
statistics on variables of a Db
. We specify
Db
of interest (argument db
),names
),opers
).
This last argument is set through a EStatOption_fromKeys
function, which we call with a vector containing the (abbreviated) names
of the statistics (run EStatOption_printAll()
for the full
list)flagIso
allowing to specify whether we want to
compute the statistics for each variable separately
(flagIso=TRUE
), or whether we want to compute “filtered”
statistics (flagIso=FALSE
). In the latter case, we compute
the statistics while only using the points where all the variables are
defined.For instance, let us count the number of observations of each
variable using the dbStatisticsPrint
.
dbStatisticsPrint(dat, names = c("Elevation", "Temperature"),
opers=EStatOption_fromKeys(c("NUM")),
flagIso = FALSE, title="Number of observations", radix="")
##
## Number of observations
## ----------------------
## Number
## Elevation 236
## Temperature 151
## NULL
Since the data base dat
contains 236 samples, we can
conclude that the Elevation
variable is defined at every
point, but not the Temperature
variable. Similarly, we can
compute the mean and the extreme values of each variable in the
observation data base.
dbStatisticsPrint(dat, names = c("Elevation", "Temperature"),
opers=EStatOption_fromKeys(c("MEAN","MINI","MAXI")),
flagIso = FALSE, title="Statistics of observations", radix="")
##
## Statistics of observations
## --------------------------
## Minimum Maximum Mean
## Elevation 2.000 800.000 146.441
## Temperature 0.600 5.200 2.815
## NULL
Finally, we compute the mean and the extreme values of the elevation in the target data base.
dbStatisticsPrint(target, names = c("Elevation"),
opers=EStatOption_fromKeys(c("MEAN","MINI","MAXI")),
flagIso = FALSE, title="Statistics of target", radix="")
##
## Statistics of target
## --------------------
## Minimum Maximum Mean
## Elevation 0.000 1270.000 241.152
## NULL
We can then compute the filtered mean, minimum and maximum for the
Elevation
and Temperature
variables as
follows:
dbStatisticsPrint(dat, names = c("Elevation", "Temperature"),
opers=EStatOption_fromKeys(c("MEAN","MINI","MAXI")),
flagIso = TRUE, title="Filtered statistics of observations", radix="")
##
## Filtered statistics of observations
## -----------------------------------
## Minimum Maximum Mean
## Elevation 3.000 387.000 87.974
## Temperature 0.600 5.200 2.815
## NULL
As explained above, the first row of the table contains contains the
mean, the minimum and the maximum of the observations of the
Elevation
variable, over all the locations where both the
Elevation
na Temperature
variables are defined
(i.e. in our case, all the points where the temperature is defined).
Hence, we see that the points where the temperature is observed are
located at relatively low altitudes within the Scottish landscape.
To confirm that, we plot the points where the temperature is sampled on top of the elevation map of Scotland.
p = ggDefaultGeographic()
p = p + plot.grid(target, nameRaster="Elevation",
flagLegendRaster=TRUE,
palette="Spectral",
legendNameRaster="Elevation (m)")
#p = p + new_scale_color()
#p = p + new_scale("linetype")
p = p + plot.point(dat, nameSize="Temperature",
sizmin = 0.5, sizmax = 3,
flagLegendSize=TRUE,legendNameSize="°C")
ggPrint(p)
From observing this last plot, it seems like the bigger points (corresponding to high temperatures) are located where the elevation is smaller: this seems to hint (confirm?) that the temperature is negatively correlated with the elevation. To corroborate this, we plot a correlation plot between the two variables.
p = ggplot()
p = p + plot.correlation(dat, namex="Elevation", namey="Temperature",
asPoint=TRUE, flagRegr=TRUE)
p = p + plot.decoration(title="Correlation between Temperature and Elevation")
ggPrint(p)
In this course, we will introduce four methods allowing to deal with multivariate data: Cokriging, Residual modeling, Models with polynomial trends and finally Models with an external drift. As a baseline, we will compare the results obtained with each model with the results obtained with a univariate model.
We start by computing an experimental directional variogram
vario_raw2dir
of the “raw” temperature observations, along
two directions (\(0^{\circ}\text{C}\)
and \(90^{\circ}\text{C}\)).
## Create experimental variogram on raw data
varioparam = VarioParam_createMultiple(ndir=2, npas=30, dpas=10)
vario_raw2dir = Vario(varioparam)
err = vario_raw2dir$compute(dat)
We then fit a model on the experimental variogram of raw temperature observations. Since we want to use ordinary kriging, we add a constant drift to the model before.
fitmod_raw = Model()
err = fitmod_raw$fit(vario_raw2dir,
types=ECov_fromKeys(c("NUGGET","EXPONENTIAL","CUBIC","LINEAR")))
err = fitmod_raw$setDriftIRF(0,0)
p = ggplot()
p = p + plot.varmod(vario_raw2dir, fitmod_raw)
p = p + plot.decoration(title="Experimental and fitted variogram models - Raw temperature observations")
ggPrint(p)
We create a “neighborhood” object specifying a unique neighborhood, which we will use throughout the course.
uniqueNeigh = NeighUnique_create()
We now perform an ordinary kriging prediction of the temperature on
the target
grid using the model fitted on the raw
observations, and compute statistics on the predicted values.
err = kriging(dbin=dat, dbout=target, model=fitmod_raw,
neigh=uniqueNeigh,
namconv=NamingConvention_create(prefix="OK"))
## Plot
p = ggDefaultGeographic()
p = p + plot.grid(target,nameRaster = "OK*estim",
flagLegendRaster = TRUE, palette="Spectral",legendNameRaster="°C")
p = p + plot.point(dat,flagCst = T,pch=18,cex=1)
p = p + plot.decoration(title="Temperature - Ordinary Kriging")
ggPrint(p)