In this preamble, we load the **gstlearn** library and clean the workspace.

```
rm(list=ls())
library(gstlearn)
library(ggplot2)
library(ggpubr)
library(ggnewscale)
```

Then the necessary data set is downloaded and named **dat**: the target variable is **January_temp**

```
fileNF = loadData("Scotland", "Scotland_Temperatures.NF")
dat = Db_createFromNF(fileNF)
```

**Variogram Cloud**

The data is modeled as *samples of a regionalized* variable \(z\), i.e. as evaluations at locations \(x_1,..,x_n\) of a variable \(z\) defined across a spatial domain: \[\lbrace z_i = z(x_i) : i = 1, ..., n\rbrace.\]

The variogram cloud is the set of pair of points defined as \[ \big\lbrace \big( \Vert x_i - x_j\Vert, \big\vert z(x_i)-z(x_j)\big\vert^2 \big) \quad\text{where}\quad 1\le i\le j\le n \big\rbrace \]

In **gstlearn**, variogram clouds are computed as grids.

```
varioParamOmni = VarioParam_createOmniDirection(npas = 100)
grid.cloud = db_vcloud(dat, varioParamOmni)
grid.cloud$display()
```

```
##
## Data Base Grid Characteristics
## ==============================
##
## Data Base Summary
## -----------------
## File is organized as a regular grid
## Space dimension = 2
## Number of Columns = 4
## Total number of samples = 10000
##
## Grid characteristics:
## ---------------------
## Origin : 0.000 0.000
## Mesh : 7.789 0.031
## Number : 100 100
##
## Variables
## ---------
## Column = 0 - Name = rank - Locator = NA
## Column = 1 - Name = x1 - Locator = x1
## Column = 2 - Name = x2 - Locator = x2
## Column = 3 - Name = Cloud.January_temp - Locator = NA
```

`## NULL`

```
p = ggplot()
p = p + plot.grid(grid.cloud, "Cloud.January*")
p = p + plot.geometry(asp=0)
ggPrint(p)
```

**Experimental Variogram**

The experimental (isotropic) variogram \(\gamma\) is a function defined as

\[\gamma(h)=\frac{1}{2\vert N(h)\vert}\sum_{(i,j) \in N(h)}\big\vert z(x_i)-z(x_j)\big\vert^2, \quad h\ge 0,\]

where \(N(h)\) is set of all pairs of data points separated by a distance \(h\) (called *lag*): \[ N(h) = \bigg\lbrace (i,j) : \Vert x_j-x_i\Vert = h\bigg\rbrace_{1\le i\le j\le n},\]

and \(\vert N(h)\vert\) is the cardinal of \(N(h)\). In practice, when computing \(\gamma(h)\), we look for pairs of data points separated by a distance \(h \pm \tau h\) where \(\tau > 0\) is a tolerance on the separation distance \(h\). In other words, \(N(h)\) is replaced by \[ \widehat N(h) = \bigg\lbrace (i,j) : (1-\tau)h \le \Vert x_j-x_i\Vert \le (1+\tau) h\bigg\rbrace_{1\le i\le j\le n}\]

To compute an experimental variogram, we start by creating a `VarioParam`

object containing the parameters of the variogram. This is done using the function `VarioParam_createOmniDirection`

. We can specify the number of lags \(h\) for which the experimental variogram is computed (argument `npas`

), and the distance between these lags (argument `dpas`

), as well as the tolerance \(\tau\) on the lags (argument `toldis`

).

Then, the experimental variogram is computed in two steps. First, a `Vario`

object is initialized from the `VarioParam`

object and the `Db`

containing the data points. Then, the values of the experimental variogram at the lags specified by the `VarioParam`

object are computed using the method `compute`

of the `Vario`

object (which returns an error code, `0`

meaning that no error was detected).

Note : The variable \(z\) for which we wish to define the experimental variogram should be the only variable in the `Db`

with a `z`

locator (i.e. it should have locator `z1`

and the other variables should not have a locator starting with `z`

). This can be done bu using the method `setLocator`

of the `Db`

object containing the data. If several variables with `z`

locators are present in the `Db`

, then cross-variograms between are also computed (this subject will be covered in the course on multivariate analysis).

In the next example, we compute an experimental variogram with \(40\) lags separated by a distance \(10\) (meaning that we take \(h =10i\) for \(i=0, ..., 39\)), and consider a tolerance \(\tau = 10\%\) for the variogram computations. We use the `Db`

`dat`

, and select the variable `January_temp`

as our variable of interest (by setting its locator to “z”).

```
varioParamOmni = VarioParam_createOmniDirection(npas=40, dpas=10,toldis = 0.1)
dat$setLocator("January_temp",ELoc_Z())
varioexp = Vario(varioParamOmni)
err = varioexp$compute(dat)
```

We now print the contents of the newly created experimental variogram. The \(40\) experimental variogram values are displayed (Column `Value`

), together with the number \(\vert \widehat N(h)\vert\) of pairs used to compute the value (Column `Npairs`

) and the average distance between the points forming these pairs (Column `Distance`

).

`varioexp`

```
##
## Variogram characteristics
## =========================
## Number of variable(s) = 1
## Number of direction(s) = 1
## Space dimension = 2
## Variance-Covariance Matrix 1.020
##
## Direction #1
## ------------
## Number of lags = 40
## Direction coefficients = 1.000 0.000
## Direction angles (degrees) = 0.000 0.000
## Tolerance on direction = 90.000 (degrees)
## Calculation lag = 10.000
## Tolerance on distance = 10.000 (Percent of the lag value)
##
## For variable 1
## Rank Npairs Distance Value
## 0 2.000 0.141 0.002
## 1 12.000 9.973 0.129
## 2 31.000 20.131 0.270
## 3 48.000 30.003 0.470
## 4 61.000 40.019 0.599
## 5 72.000 50.011 0.582
## 6 82.000 59.995 0.586
## 7 77.000 69.926 0.907
## 8 92.000 80.027 0.899
## 9 96.000 89.985 0.980
## 10 96.000 100.013 0.856
## 11 96.000 109.991 0.905
## 12 95.000 120.064 1.013
## 13 69.000 129.945 1.247
## 14 101.000 139.943 1.001
## 15 98.000 150.020 0.942
## 16 80.000 159.974 1.022
## 17 81.000 170.051 1.330
## 18 75.000 179.943 1.058
## 19 85.000 189.976 1.185
## 20 78.000 200.100 0.957
## 21 73.000 210.005 1.117
## 22 68.000 220.027 1.685
## 23 78.000 230.014 1.405
## 24 49.000 239.927 1.406
## 25 48.000 250.034 1.248
## 26 34.000 260.056 1.364
## 27 39.000 269.806 1.728
## 28 28.000 279.716 1.308
## 29 40.000 289.998 1.223
## 30 29.000 299.826 0.981
## 31 19.000 310.007 1.461
## 32 18.000 319.904 1.154
## 33 13.000 330.008 1.329
## 34 12.000 340.078 1.396
## 35 19.000 349.855 1.428
## 36 11.000 359.903 1.128
## 37 13.000 370.069 0.959
## 38 9.000 379.623 0.629
## 39 7.000 389.619 0.731
```

We now plot the experimental variogram. In the resulting figure, the experimental variogram is plotted in blue, and the dashed blacked line corresponds to the value of the variance of the data.

`ggplot() + plot.varmod(varioexp)`

We can also adapt the size of experimental variogram points in the plot so that it is proportional to the number of pairs of points used to compute the value.

`ggplot() + plot.varmod(varioexp,drawPsize=TRUE)`

Fitting a variogram model on an experimental variogram is done in two steps. First, we create `Model`

object. These objects aim at containing all the necessary information about the covariance structure of a random field. In particular, it is assumed that this covariance structure is a superposition of basic elementary covariance structures: the `Model`

objects then contains the covariance types and parameters of each one of these basic covariance structures.

In our case, we wish to build our `Model`

object from an experimental variogram, meaning that we want to find a composition of basic covariance structures which would result in a variogram “close” to the experimental variogram that we computed from the data. This is done by calling the method `fit`

of the `Model`

object, while providing it with the experimental variogram.

In the next example, we create a `Model`

object, that we fit on the experimental variogram the we computed earlier. We then plot both the experimental variogram and the variogram model resulting from the fitting using the `plot.varmod`

function. In the figure we obtain, In the figure above, the dashed blue line corresponds to the experimental variogram, and the solid blue line corresponds to the fitted variogram model.

```
fitmod = Model()
err = fitmod$fit(varioexp)
```

`ggplot() + plot.varmod(varioexp, fitmod)`

We now print the content of our newly created model. As we can see, only one basic covariance structure is used to define the model (namely, a Spherical covariance function whose range and sill are printed).

`fitmod`

```
##
## Model characteristics
## =====================
## Space dimension = 2
## Number of variable(s) = 1
## Number of basic structure(s) = 1
## Number of drift function(s) = 0
## Number of drift equation(s) = 0
##
## Covariance Part
## ---------------
## Spherical
## - Sill = 1.123
## - Range = 129.766
## Total Sill = 1.123
## Known Mean(s) 0.000
```

It is also possible to guide the model fitting by proposing a list of basic covariance structures from which the model is to be built. The list of available basic covariance structures is obtained by running the following command:

`ECov_printAll()`

```
## -2 - UNKNOWN : Unknown covariance
## -1 - FUNCTION : External covariance function
## 0 - NUGGET : Nugget effect
## 1 - EXPONENTIAL : Exponential
## 2 - SPHERICAL : Spherical
## 3 - GAUSSIAN : Gaussian
## 4 - CUBIC : Cubic
## 5 - SINCARD : Sine Cardinal
## 6 - BESSEL_J : Bessel J
## 7 - BESSEL_K : Bessel K
## 8 - GAMMA : Gamma
## 9 - CAUCHY : Cauchy
## 10 - STABLE : Stable
## 11 - LINEAR : Linear
## 12 - POWER : Power
## 13 - ORDER1_GC : First Order Generalized covariance
## 14 - SPLINE_GC : Spline Generalized covariance
## 15 - ORDER3_GC : Third Order Generalized covariance
## 16 - ORDER5_GC : Fifth Order Generalized covariance
## 17 - COSINUS : Cosine
## 18 - TRIANGLE : Triangle
## 19 - COSEXP : Cosine Exponential
## 20 - REG1D : 1-D Regular
## 21 - PENTA : Pentamodel
## 22 - SPLINE2_GC : Order-2 Spline
## 23 - STORKEY : Storkey covariance in 1-D
## 24 - WENDLAND0 : Wendland covariance (2,0)
## 25 - WENDLAND1 : Wendland covariance (3,1)
## 26 - WENDLAND2 : Wendland covariance (4,2)
## 27 - MARKOV : Markovian covariances
## 28 - GEOMETRIC : Geometric (Sphere only)
## 29 - POISSON : Poisson (Sphere only)
## 30 - LINEARSPH : Linear (Sphere only)
```

`## NULL`

In practice, we start by creating a list of basic structures using the `ECov_fromKeys`

function which we supply with a vector containing the names of the basic structures we would like to see in the model. To fit the model, we then once again call the `fit`

method and supply it with both the experimental variogram and the newly created list of basic structures (argument `types`

). Then the fitting procedures tries find the composition of models from the supplied list that best fits the experimental variogram.

Note that by default, the fitting algorithm tries to be parsimonious and can therefore “drop” some of the structures that we supply if it deems that a model with less structures provides a better fit. To force the fitting algorithm to keep all the structures from the list, we simply need to add the argument `optvar=Option_VarioFit(TRUE)`

to the `fit`

method.

In the next example, we once again define a model by fitting it on our experimental variogram. But this time, we specify that we want the resulting model to be a composition of basic structures restricted to these choices: a Nugget effect, a Cubic covariance and a Spherical covariance.

```
types = ECov_fromKeys(c("NUGGET","CUBIC","SPHERICAL"))
err = fitmod$fit(varioexp, types=types)
```

`ggplot() + plot.varmod(varioexp, fitmod)`

When printing the contents of the model, we now notice that it consists of a superposition of a Cubic covariance and a Spherical covariance, as intended. Note that the Nugget effect does not appear (it has been dropped by the fitting algorithm).

`fitmod`

```
##
## Model characteristics
## =====================
## Space dimension = 2
## Number of variable(s) = 1
## Number of basic structure(s) = 2
## Number of drift function(s) = 0
## Number of drift equation(s) = 0
##
## Covariance Part
## ---------------
## Cubic
## - Sill = 0.371
## - Range = 58.088
## Spherical
## - Sill = 0.904
## - Range = 237.071
## Total Sill = 1.275
## Known Mean(s) 0.000
```

It is possible to impose (in)equality constraints on the covariance parameters of the basic structures used in the model fitting procedure. This is done by creating a `Constraints`

object that is used to specify the constraints we wish to impose on the parameters of the different basic structures composing the model. To add a constraint to the object, we can use the method `addItemFromParamId`

, which takes as arguments the type of parameter for which the constraint applies (given as an `EConsElem`

object: run `EConsElem_printAll()`

for the list of available options), the index of the basic structure for which the constraint applies (argument `icov`

), the type of constraint we wish to apply (argument `type`

, given as an `EConsType`

object: run `EConsType_printAll()`

for the list of available options) and finally the numerical value (argument `value`

) defining the constraint.

In the next example, we start from a list of three basic structures (a Nugget effect, a Cubic covariance and a Spherical covariance), and create a `Constraints`

object containing two constraints. The first one applies to the basic structure of index \(1\) (the cubic structure), and sets an upper-bound of \(20\) for its range. The second one also applies to the basic structure of index \(1\) (the cubic structure), and sets an lower-bound of \(0.03\) for its sill. Finally, the `fit`

method is called to fit the model on the experimental variogram. Note that we also added the option `optvar=Option_VarioFit(TRUE)`

to force the fitting algorithm to keep the three basic structures that we supplied.

```
types = ECov_fromKeys(c("NUGGET","CUBIC","SPHERICAL"))
constraints = Constraints()
err = constraints$addItemFromParamId(EConsElem_RANGE(),icov=1,type=EConsType_UPPER(),value=20.)
err = constraints$addItemFromParamId(EConsElem_SILL(),icov=1,type=EConsType_LOWER(),value=0.03)
err = fitmod$fit(varioexp, types=types, constraints=constraints, optvar=Option_VarioFit(TRUE))
```

`ggplot() + plot.varmod(varioexp, fitmod)`