Preamble

In this preamble, we load the gstlearn library and clean the workspace.

rm(list=ls())
library(gstlearn)
library(ggplot2)
library(ggpubr)
library(ggnewscale)
plot.setDefaultGeographic(dims=c(8,8))

We load two data bases:

## Data points
fileNF = loadData("Scotland", "Scotland_Temperatures.NF")
dat = Db_createFromNF(fileNF)
### Rename the temperature variable
dat$setName("*temp", "Temperature")

## Target grid
fileNF = loadData("Scotland", "Scotland_Elevations.NF")
target = DbGrid_createFromNF(fileNF)

Exploratory data analysis

The target data base a (grid) map of the elevation across Scotland.

target
## 
## Data Base Grid Characteristics
## ==============================
## 
## Data Base Summary
## -----------------
## File is organized as a regular grid
## Space dimension              = 2
## Number of Columns            = 4
## Total number of samples      = 11097
## Number of active samples     = 3092
## 
## Grid characteristics:
## ---------------------
## Origin :     65.000   535.000
## Mesh   :      4.938     4.963
## Number :         81       137
## 
## Variables
## ---------
## Column = 0 - Name = Longitude - Locator = x1
## Column = 1 - Name = Latitude - Locator = x2
## Column = 2 - Name = Elevation - Locator = f1
## Column = 3 - Name = inshore - Locator = sel

Let us plot its content.

ggDefaultGeographic() + plot(target, nameRaster="Elevation",
                             flagLegendRaster=TRUE,palette="Spectral",
                             legendNameRaster="Elevation (m)")

The dat data base contains 236 (point) samples of two variables across Scotland: elevation and temperature.

dat
## 
## Data Base Characteristics
## =========================
## 
## Data Base Summary
## -----------------
## File is organized as a set of isolated points
## Space dimension              = 2
## Number of Columns            = 5
## Total number of samples      = 236
## 
## Variables
## ---------
## Column = 0 - Name = rank - Locator = NA
## Column = 1 - Name = Longitude - Locator = x1
## Column = 2 - Name = Latitude - Locator = x2
## Column = 3 - Name = Elevation - Locator = NA
## Column = 4 - Name = Temperature - Locator = z1

We can use the dbStatisticsPrint function to compute statistics on variables of a Db. We specify

  • the Db of interest (argument db),
  • a vector containing the names of the variables of interest (argument names),
  • the statistics we want to compute (argument opers). This last argument is set through a EStatOption_fromKeys function, which we call with a vector containing the (abbreviated) names of the statistics (run EStatOption_printAll() for the full list)
  • a flag flagIso allowing to specify whether we want to compute the statistics for each variable separately (flagIso=TRUE), or whether we want to compute “filtered” statistics (flagIso=FALSE). In the latter case, we compute the statistics while only using the points where all the variables are defined.

For instance, let us count the number of observations of each variable using the dbStatisticsPrint.

dbStatisticsPrint(dat, names = c("Elevation", "Temperature"), 
                  opers=EStatOption_fromKeys(c("NUM")),
                  flagIso = FALSE, title="Number of observations", radix="")
## 
## Number of observations
## ----------------------
##                 Number
## Elevation          236
## Temperature        151
## NULL

Since the data base dat contains 236 samples, we can conclude that the Elevation variable is defined at every point, but not the Temperature variable. Similarly, we can compute the mean and the extreme values of each variable in the observation data base.

dbStatisticsPrint(dat, names = c("Elevation", "Temperature"), 
                  opers=EStatOption_fromKeys(c("MEAN","MINI","MAXI")),
                  flagIso = FALSE, title="Statistics of observations", radix="")
## 
## Statistics of observations
## --------------------------
##                Minimum    Maximum       Mean
## Elevation        2.000    800.000    146.441
## Temperature      0.600      5.200      2.815
## NULL

Finally, we compute the mean and the extreme values of the elevation in the target data base.

dbStatisticsPrint(target, names = c("Elevation"), 
                  opers=EStatOption_fromKeys(c("MEAN","MINI","MAXI")),
                  flagIso = FALSE, title="Statistics of target", radix="")
## 
## Statistics of target
## --------------------
##              Minimum    Maximum       Mean
## Elevation      0.000   1270.000    241.152
## NULL

We can then compute the filtered mean, minimum and maximum for the Elevation and Temperature variables as follows:

dbStatisticsPrint(dat, names = c("Elevation", "Temperature"), 
                  opers=EStatOption_fromKeys(c("MEAN","MINI","MAXI")),
                  flagIso = TRUE, title="Filtered statistics of observations", radix="")
## 
## Filtered statistics of observations
## -----------------------------------
##                Minimum    Maximum       Mean
## Elevation        3.000    387.000     87.974
## Temperature      0.600      5.200      2.815
## NULL

As explained above, the first row of the table contains contains the mean, the minimum and the maximum of the observations of the Elevation variable, over all the locations where both the Elevation na Temperature variables are defined (i.e. in our case, all the points where the temperature is defined). Hence, we see that the points where the temperature is observed are located at relatively low altitudes within the Scottish landscape.

To confirm that, we plot the points where the temperature is sampled on top of the elevation map of Scotland.

p = ggDefaultGeographic()
p = p + plot.grid(target, nameRaster="Elevation",
                  flagLegendRaster=TRUE,
                  palette="Spectral",
                  legendNameRaster="Elevation (m)")
#p = p + new_scale_color()
#p = p + new_scale("linetype")
p = p + plot.point(dat, nameSize="Temperature",
                   sizmin = 0.5, sizmax = 3,
                   flagLegendSize=TRUE,legendNameSize="°C")
ggPrint(p)

From observing this last plot, it seems like the bigger points (corresponding to high temperatures) are located where the elevation is smaller: this seems to hint (confirm?) that the temperature is negatively correlated with the elevation. To corroborate this, we plot a correlation plot between the two variables.

p = ggplot()
p = p + plot.correlation(dat, namex="Elevation", namey="Temperature", 
                     asPoint=TRUE, flagRegr=TRUE)
p = p + plot.decoration(title="Correlation between Temperature and Elevation")
ggPrint(p)

Baseline univariate model

In this course, we will introduce four methods allowing to deal with multivariate data: Cokriging, Residual modeling, Models with polynomial trends and finally Models with an external drift. As a baseline, we will compare the results obtained with each model with the results obtained with a univariate model.

Model fitting

We start by computing an experimental directional variogram vario_raw2dir of the “raw” temperature observations, along two directions (\(0^{\circ}\text{C}\) and \(90^{\circ}\text{C}\)).

## Create experimental variogram on raw data
varioparam = VarioParam_createMultiple(ndir=2, npas=30, dpas=10)
vario_raw2dir = Vario(varioparam)
err = vario_raw2dir$compute(dat)

We then fit a model on the experimental variogram of raw temperature observations. Since we want to use ordinary kriging, we add a constant drift to the model before.

fitmod_raw = Model()
err = fitmod_raw$fit(vario_raw2dir,
                    types=ECov_fromKeys(c("NUGGET","EXPONENTIAL","CUBIC","LINEAR")))
err = fitmod_raw$setDriftIRF(0,0)
p = ggplot()
p = p + plot.varmod(vario_raw2dir, fitmod_raw)
p = p + plot.decoration(title="Experimental and fitted variogram models - Raw temperature observations")
ggPrint(p)