In this preamble, we load the **gstlearn** library and clean the workspace.

```
rm(list=ls())
library(gstlearn)
library(ggplot2)
library(ggpubr)
library(ggnewscale)
```

`plot.setDefaultGeographic(dims=c(8,8))`

We load two data bases:

- a data base
`dat`

containing point observations of two variables across Scotland: the elevation (`Elevation`

) and the temperature (`January_temp`

, then renamed as`Temperature`

) - a data base
`target`

containing a grid of points covering Scotland with a selection variable (`inshore`

) selecting the points that are on land, and a variable (`Elevation`

) giving the elevation at every point on land

```
## Data points
fileNF = loadData("Scotland", "Scotland_Temperatures.NF")
dat = Db_createFromNF(fileNF)
### Rename the temperature variable
dat$setName("*temp", "Temperature")
## Target grid
fileNF = loadData("Scotland", "Scotland_Elevations.NF")
target = DbGrid_createFromNF(fileNF)
```

The `target`

data base a (grid) map of the elevation across Scotland.

`target`

```
##
## Data Base Grid Characteristics
## ==============================
##
## Data Base Summary
## -----------------
## File is organized as a regular grid
## Space dimension = 2
## Number of Columns = 4
## Total number of samples = 11097
## Number of active samples = 3092
##
## Grid characteristics:
## ---------------------
## Origin : 65.000 535.000
## Mesh : 4.938 4.963
## Number : 81 137
##
## Variables
## ---------
## Column = 0 - Name = Longitude - Locator = x1
## Column = 1 - Name = Latitude - Locator = x2
## Column = 2 - Name = Elevation - Locator = f1
## Column = 3 - Name = inshore - Locator = sel
```

Let us plot its content.

```
ggDefaultGeographic() + plot(target, nameRaster="Elevation",
flagLegendRaster=TRUE,palette="Spectral",
legendNameRaster="Elevation (m)")
```

The `dat`

data base contains 236 (point) samples of two variables across Scotland: elevation and temperature.

`dat`

```
##
## Data Base Characteristics
## =========================
##
## Data Base Summary
## -----------------
## File is organized as a set of isolated points
## Space dimension = 2
## Number of Columns = 5
## Total number of samples = 236
##
## Variables
## ---------
## Column = 0 - Name = rank - Locator = NA
## Column = 1 - Name = Longitude - Locator = x1
## Column = 2 - Name = Latitude - Locator = x2
## Column = 3 - Name = Elevation - Locator = NA
## Column = 4 - Name = Temperature - Locator = z1
```

We can use the `dbStatisticsPrint`

function to compute statistics on variables of a `Db`

. We specify

- the
`Db`

of interest (argument`db`

), - a vector containing the names of the variables of interest (argument
`names`

), - the statistics we want to compute (argument
`opers`

). This last argument is set through a`EStatOption_fromKeys`

function, which we call with a vector containing the (abbreviated) names of the statistics (run`EStatOption_printAll()`

for the full list) - a flag
`flagIso`

allowing to specify whether we want to compute the statistics for each variable separately (`flagIso=TRUE`

), or whether we want to compute “filtered” statistics (`flagIso=FALSE`

). In the latter case, we compute the statistics while only using the points where all the variables are defined.

For instance, let us count the number of observations of each variable using the `dbStatisticsPrint`

.

```
dbStatisticsPrint(dat, names = c("Elevation", "Temperature"),
opers=EStatOption_fromKeys(c("NUM")),
flagIso = FALSE, title="Number of observations", radix="")
```

```
##
## Number of observations
## ----------------------
## Number
## Elevation 236
## Temperature 151
```

`## NULL`

Since the data base `dat`

contains 236 samples, we can conclude that the `Elevation`

variable is defined at every point, but not the `Temperature`

variable. Similarly, we can compute the mean and the extreme values of each variable in the observation data base.

```
dbStatisticsPrint(dat, names = c("Elevation", "Temperature"),
opers=EStatOption_fromKeys(c("MEAN","MINI","MAXI")),
flagIso = FALSE, title="Statistics of observations", radix="")
```

```
##
## Statistics of observations
## --------------------------
## Minimum Maximum Mean
## Elevation 2.000 800.000 146.441
## Temperature 0.600 5.200 2.815
```

`## NULL`

Finally, we compute the mean and the extreme values of the elevation in the target data base.

```
dbStatisticsPrint(target, names = c("Elevation"),
opers=EStatOption_fromKeys(c("MEAN","MINI","MAXI")),
flagIso = FALSE, title="Statistics of target", radix="")
```

```
##
## Statistics of target
## --------------------
## Minimum Maximum Mean
## Elevation 0.000 1270.000 241.152
```

`## NULL`

We can then compute the filtered mean, minimum and maximum for the `Elevation`

and `Temperature`

variables as follows:

```
dbStatisticsPrint(dat, names = c("Elevation", "Temperature"),
opers=EStatOption_fromKeys(c("MEAN","MINI","MAXI")),
flagIso = TRUE, title="Filtered statistics of observations", radix="")
```

```
##
## Filtered statistics of observations
## -----------------------------------
## Minimum Maximum Mean
## Elevation 3.000 387.000 87.974
## Temperature 0.600 5.200 2.815
```

`## NULL`

As explained above, the first row of the table contains contains the mean, the minimum and the maximum of the observations of the `Elevation`

variable, over all the locations where both the `Elevation`

na `Temperature`

variables are defined (i.e. in our case, all the points where the temperature is defined). Hence, we see that the points where the temperature is observed are located at relatively low altitudes within the Scottish landscape.

To confirm that, we plot the points where the temperature is sampled on top of the elevation map of Scotland.

```
p = ggDefaultGeographic()
p = p + plot.grid(target, nameRaster="Elevation",
flagLegendRaster=TRUE,
palette="Spectral",
legendNameRaster="Elevation (m)")
#p = p + new_scale_color()
#p = p + new_scale("linetype")
p = p + plot.point(dat, nameSize="Temperature",
sizmin = 0.5, sizmax = 3,
flagLegendSize=TRUE,legendNameSize="°C")
ggPrint(p)
```

From observing this last plot, it seems like the bigger points (corresponding to high temperatures) are located where the elevation is smaller: this seems to hint (confirm?) that the temperature is negatively correlated with the elevation. To corroborate this, we plot a correlation plot between the two variables.

```
p = ggplot()
p = p + plot.correlation(dat, namex="Elevation", namey="Temperature",
asPoint=TRUE, flagRegr=TRUE)
p = p + plot.decoration(title="Correlation between Temperature and Elevation")
ggPrint(p)
```

In this course, we will introduce four methods allowing to deal with multivariate data: Cokriging, Residual modeling, Models with polynomial trends and finally Models with an external drift. As a baseline, we will compare the results obtained with each model with the results obtained with a univariate model.

We start by computing an experimental directional variogram `vario_raw2dir`

of the “raw” temperature observations, along two directions (\(0^{\circ}\text{C}\) and \(90^{\circ}\text{C}\)).

```
## Create experimental variogram on raw data
varioparam = VarioParam_createMultiple(ndir=2, npas=30, dpas=10)
vario_raw2dir = Vario(varioparam)
err = vario_raw2dir$compute(dat)
```

We then fit a model on the experimental variogram of raw temperature observations. Since we want to use ordinary kriging, we add a constant drift to the model before.

```
fitmod_raw = Model()
err = fitmod_raw$fit(vario_raw2dir,
types=ECov_fromKeys(c("NUGGET","EXPONENTIAL","CUBIC","LINEAR")))
err = fitmod_raw$setDriftIRF(0,0)
```

```
p = ggplot()
p = p + plot.varmod(vario_raw2dir, fitmod_raw)
p = p + plot.decoration(title="Experimental and fitted variogram models - Raw temperature observations")
ggPrint(p)
```