In this preamble, we load the gstlearn library and clean the workspace.
rm(list=ls())
library(gstlearn)
library(ggplot2)
library(ggpubr)
library(ggnewscale)
plot.setDefaultGeographic(dims=c(8,8))
We load two data bases:
dat
containing point observations of two variables across Scotland: the elevation (Elevation
) and the temperature (January_temp
, then renamed as Temperature
)target
containing a grid of points covering Scotland with a selection variable (inshore
) selecting the points that are on land, and a variable (Elevation
) giving the elevation at every point on land## Data points
fileNF = loadData("Scotland", "Scotland_Temperatures.NF")
dat = Db_createFromNF(fileNF)
### Rename the temperature variable
dat$setName("*temp", "Temperature")
## Target grid
fileNF = loadData("Scotland", "Scotland_Elevations.NF")
target = DbGrid_createFromNF(fileNF)
The target
data base a (grid) map of the elevation across Scotland.
target
##
## Data Base Grid Characteristics
## ==============================
##
## Data Base Summary
## -----------------
## File is organized as a regular grid
## Space dimension = 2
## Number of Columns = 4
## Total number of samples = 11097
## Number of active samples = 3092
##
## Grid characteristics:
## ---------------------
## Origin : 65.000 535.000
## Mesh : 4.938 4.963
## Number : 81 137
##
## Variables
## ---------
## Column = 0 - Name = Longitude - Locator = x1
## Column = 1 - Name = Latitude - Locator = x2
## Column = 2 - Name = Elevation - Locator = f1
## Column = 3 - Name = inshore - Locator = sel
Let us plot its content.
ggDefaultGeographic() + plot(target, nameRaster="Elevation",
flagLegendRaster=TRUE,palette="Spectral",
legendNameRaster="Elevation (m)")
The dat
data base contains 236 (point) samples of two variables across Scotland: elevation and temperature.
dat
##
## Data Base Characteristics
## =========================
##
## Data Base Summary
## -----------------
## File is organized as a set of isolated points
## Space dimension = 2
## Number of Columns = 5
## Total number of samples = 236
##
## Variables
## ---------
## Column = 0 - Name = rank - Locator = NA
## Column = 1 - Name = Longitude - Locator = x1
## Column = 2 - Name = Latitude - Locator = x2
## Column = 3 - Name = Elevation - Locator = NA
## Column = 4 - Name = Temperature - Locator = z1
We can use the dbStatisticsPrint
function to compute statistics on variables of a Db
. We specify
Db
of interest (argument db
),names
),opers
). This last argument is set through a EStatOption_fromKeys
function, which we call with a vector containing the (abbreviated) names of the statistics (run EStatOption_printAll()
for the full list)flagIso
allowing to specify whether we want to compute the statistics for each variable separately (flagIso=TRUE
), or whether we want to compute “filtered” statistics (flagIso=FALSE
). In the latter case, we compute the statistics while only using the points where all the variables are defined.For instance, let us count the number of observations of each variable using the dbStatisticsPrint
.
dbStatisticsPrint(dat, names = c("Elevation", "Temperature"),
opers=EStatOption_fromKeys(c("NUM")),
flagIso = FALSE, title="Number of observations", radix="")
##
## Number of observations
## ----------------------
## Number
## Elevation 236
## Temperature 151
## NULL
Since the data base dat
contains 236 samples, we can conclude that the Elevation
variable is defined at every point, but not the Temperature
variable. Similarly, we can compute the mean and the extreme values of each variable in the observation data base.
dbStatisticsPrint(dat, names = c("Elevation", "Temperature"),
opers=EStatOption_fromKeys(c("MEAN","MINI","MAXI")),
flagIso = FALSE, title="Statistics of observations", radix="")
##
## Statistics of observations
## --------------------------
## Minimum Maximum Mean
## Elevation 2.000 800.000 146.441
## Temperature 0.600 5.200 2.815
## NULL
Finally, we compute the mean and the extreme values of the elevation in the target data base.
dbStatisticsPrint(target, names = c("Elevation"),
opers=EStatOption_fromKeys(c("MEAN","MINI","MAXI")),
flagIso = FALSE, title="Statistics of target", radix="")
##
## Statistics of target
## --------------------
## Minimum Maximum Mean
## Elevation 0.000 1270.000 241.152
## NULL
We can then compute the filtered mean, minimum and maximum for the Elevation
and Temperature
variables as follows:
dbStatisticsPrint(dat, names = c("Elevation", "Temperature"),
opers=EStatOption_fromKeys(c("MEAN","MINI","MAXI")),
flagIso = TRUE, title="Filtered statistics of observations", radix="")
##
## Filtered statistics of observations
## -----------------------------------
## Minimum Maximum Mean
## Elevation 3.000 387.000 87.974
## Temperature 0.600 5.200 2.815
## NULL
As explained above, the first row of the table contains contains the mean, the minimum and the maximum of the observations of the Elevation
variable, over all the locations where both the Elevation
na Temperature
variables are defined (i.e. in our case, all the points where the temperature is defined). Hence, we see that the points where the temperature is observed are located at relatively low altitudes within the Scottish landscape.
To confirm that, we plot the points where the temperature is sampled on top of the elevation map of Scotland.
p = ggDefaultGeographic()
p = p + plot.grid(target, nameRaster="Elevation",
flagLegendRaster=TRUE,
palette="Spectral",
legendNameRaster="Elevation (m)")
#p = p + new_scale_color()
#p = p + new_scale("linetype")
p = p + plot.point(dat, nameSize="Temperature",
sizmin = 0.5, sizmax = 3,
flagLegendSize=TRUE,legendNameSize="°C")
ggPrint(p)
From observing this last plot, it seems like the bigger points (corresponding to high temperatures) are located where the elevation is smaller: this seems to hint (confirm?) that the temperature is negatively correlated with the elevation. To corroborate this, we plot a correlation plot between the two variables.
p = ggplot()
p = p + plot.correlation(dat, namex="Elevation", namey="Temperature",
asPoint=TRUE, flagRegr=TRUE)
p = p + plot.decoration(title="Correlation between Temperature and Elevation")
ggPrint(p)
In this course, we will introduce four methods allowing to deal with multivariate data: Cokriging, Residual modeling, Models with polynomial trends and finally Models with an external drift. As a baseline, we will compare the results obtained with each model with the results obtained with a univariate model.
We start by computing an experimental directional variogram vario_raw2dir
of the “raw” temperature observations, along two directions (\(0^{\circ}\text{C}\) and \(90^{\circ}\text{C}\)).
## Create experimental variogram on raw data
varioparam = VarioParam_createMultiple(ndir=2, npas=30, dpas=10)
vario_raw2dir = Vario(varioparam)
err = vario_raw2dir$compute(dat)
We then fit a model on the experimental variogram of raw temperature observations. Since we want to use ordinary kriging, we add a constant drift to the model before.
fitmod_raw = Model()
err = fitmod_raw$fit(vario_raw2dir,
types=ECov_fromKeys(c("NUGGET","EXPONENTIAL","CUBIC","LINEAR")))
err = fitmod_raw$setDriftIRF(0,0)
p = ggplot()
p = p + plot.varmod(vario_raw2dir, fitmod_raw)
p = p + plot.decoration(title="Experimental and fitted variogram models - Raw temperature observations")
ggPrint(p)