Introduction

In this tutorial, we show how to use the API SPDE. This is a free transcription in R of the python note book tuto_SPDE.

SPDE approach

Lindgren et al. (2011) defined an explicit link between Gaussian fields and Gaussian markov random fields. This approach is implemented in the SPDE module of \(gstlearn\). The random field is a weak solution of some stochastic partial differential equation solved using the finite element method. It follows that the standard tools of Geostatistics, kriging and stochastic simulation, can be rewritten using sparse linear algebra.

The SPDE model represents the domain by a mesh and random field by its values for each mesh cell \(\mathbf{Z} \sim \mathcal{N}(\mathbf{0,Q^{-1}})\) where the precision matrix \(\mathbf{Q = \Sigma^{-1}}\) is sparse.

The latent vector \(\mathbf{Z}\) can be linearly interpolated to:

the target locations \(\mathbf{Y_T = A_g \, Z}\), or
the observation locations \(\mathbf{Y = A_d \, Z + \tau W}\).

In the latter case, \(\tau\) is the standard deviation of the noise modelling the error of observation and the modelling error. The number of observations is \(p\).

In this case, the precision matrix of the vector \(\mathbf{(Z, Y_D)}\) is: \[ \mathbf{ \tilde{Q} = \tilde{\Sigma}^{-1}= \begin{bmatrix}\mathbf{Q+\tau^{-2}A_d^T A_d} & \mathbf{-\tau^{-2}A_d^T} \\ \mathbf{-\tau^{-2}A_d} & \mathbf{\tau^{-2}I_p} \end{bmatrix} } \] From this expression the kriging and the conditional simulation can be derived:

the Kriging of \(\mathbf{Z}\) is \(E\{\mathbf{Z|Y = y}\} = \tau^{-2}\mathbf{(Q + \tau^{-2}A_d^TA_d)^{-1}A_d^T y}\)
the conditional variance is \(Cov\{\mathbf{Z|Y = y}\} = \mathbf{(Q + \tau^{-2}A_d^TA_d)^{-1}}\)
the non conditional simulation is \(\mathbf{(S, S_D)\sim \mathcal{N}(0, \tilde{Q}^{-1})}\)
the conditional simulation of the latent vector is \(\mathbf{S_{|Y=y} = S + \tau^{-2} (Q+ \tau^{-2}A_d^TA_d)^{-1} A_d^T(y - S_D)}\)

The estimated or simulated latent vector can be linearly interpolated to any target grid.

Initialization of gstlearn

Parameters of the SPDE model

# Data
set.seed (123)
ndat = 100

# Model
rangev = 0.2
sill = 1.
tau = 0.01

# parameters of the final grid on [0,1]^2
nx = 200 * c(1,1)
dx = 1/(nx-1)
x0 = c(0,0)

# Grid meshing on [-0.25, 1.25]^2
nxm = 100 * c(1,1)
dxm = 1.5 / (nxm-1)
x0m = c(-0.25,-0.25)

# number of simulations for Monte-Carlo estimators
nsim = 20

Grid definition

grid    = DbGrid_create(nx,dx,x0)
gridExt = DbGrid_create(nxm,dxm,x0m)
mesh    = MeshETurbo(gridExt)

# limits
limits <- list(XP = 1.0*c(0, 1, 1, 0, 0),
               YP = 1.0*c(0, 0, 1, 1, 0))
p_lim   = PolyElem(limits$XP, limits$YP)
pol_lim = Polygons()
err     = pol_lim$addPolyElem(p_lim)

p = ggDefaultGeographic() + 
    plot.grid(gridExt) +
    plot.polygon(poly = pol_lim, color = "orange", fill = NA) +
    plot.decoration(xlab = "Easting", ylab = "Northing", 
                  title = "Coarse grid used to compute the meshing")
ggPrint(p)

p = ggDefaultGeographic() + 
    plot.grid(grid) +
    plot.polygon(poly = pol_lim, color = "orange", fill = NA) +
    plot.decoration(xlab = "Easting", ylab = "Northing", 
                  title = "Target grid")
ggPrint(p)

Model definition

We define a Matern’s stationary covariance function.

model = Model_createFromParam(ECov_BESSEL_K(), param=1,range=rangev,sill=sill)

This covariance function is converted into a SPDE model.

spde = SPDE_create(model, domain = gridExt, calcul = ESPDECalcMode_SIMUNONCOND(), 
                   mesh = mesh)

Possible values for the calcul parameter are:

KRIGING for “Kriging or Likelihood”,
SIMUCOND for “Conditional simulations”,
SIMUNONCOND for “Non conditional simulations”,

Non conditional simulations

The latent vector (of length \(n\)) \(\mathbf{Z \sim \mathcal{N}(0, Q^{-1})}\) is defined on the meshing by its sparse precision matrix \(\mathbf{Q = \Sigma^{-1}}\). \(\mathbf{Q}\) is factorized by the Cholesky method: \(\mathbf{Q = L\, L^{T}}\).

Thus, the Gaussian vector can be rewritten \(\mathbf{Z = (L^T)^{-1} \, U}\) with \(\mathbf{U \sim \mathcal{N}(0, I_n)}\).

Finally the Gaussian vector collecting the values of the random field at the grid nodes \(Y\) is achieved by the interpolation of \(\mathbf{Z}\) on the mesh \(\mathbf{Y_g = A_{g} \, Z}\).

To compute a non conditional simulation on the grid,

Compute the projection matrix of the latent vector \(\mathbf{Z}\) to the grid/

# projection matrix of the latent vector on the grid
A_g  = ProjMatrix(grid, mesh)$toTL()

Compute \(\mathbf{L}\) the Cholesky decomposition of the sparse precision matrix \(\mathbf{Q}\)

# Cholesky decomposition of precision matrix of the latent vector
Q = spde$getPrecisionOpCs()$getQ()$toTL()
cholQ = Matrix::Cholesky(Q, LDL = FALSE)

Compute the normal Gaussian vector \(\mathbf{U \sim N(0, I_n)}\),

U = rnorm(n = dim(Q)[1])

Compute the latent vector \(\mathbf{Z}\) solving the sparse linear system \(\mathbf{L^{T} \, Z = U}\),

#Z = (t(expand(cholQ)$P) %*% solve(cholQ, U, system = "Lt"))@x
Z = as.numeric(t(expand(cholQ)$P) %*% solve(cholQ, U, system = "Lt"))

Compute the interpolation of the latent vector to the target grid \(\mathbf{Y_g = A_{g} \, Z}\),

Y_g =(A_g %*% Z)[,1]

The final result is aded to the grid db.

err = grid$setColumn(tab = Y_g, name = "manual.simu", useSel = FALSE)
# plot
p = ggDefaultGeographic() + 
    plot.grid(grid, nameRaster = "manual.simu", useSel = TRUE, 
            flagLegendRaster = TRUE, legendNameRaster = "NC Simu", palette = "Spectral") +
    plot.decoration(xlab = "Easting", ylab = "Northing", 
                  title = "Non conditional simulation")
ggPrint(p)

Kriging and conditional simulation distribution

Observations of the process

We define the observations from the initial non conditional simulation.

dat = Db_create()
err = dat$setColumn(tab = runif(n=ndat), name = "x")
err = dat$setColumn(tab = runif(n=ndat), name = "y")
err = dat$setLocators(names = c("x", "y"), locatorType = ELoc_X(), cleanSameLocator = TRUE)

# extraction of the simulated value at data location
err = grid$setLocator(name = "manual.simu", locatorType = ELoc_Z(), cleanSameLocator = TRUE)
err = migrate(dbin = grid, dbout = dat, name = "manual.simu", namconv = NamingConvention())
err = dat$setName("manual.simu.*", "Z")

# projection matrix of the latent vector on the data locations
A_d = ProjMatrix(dat, mesh)$toTL()

The \(p\) observations are defined by the vector \(\mathbf{Y_d = A_d \, Z + \tau \, W_d}\), where \(\mathbf{A_d}\) is the projection matrix from the latent vector \(\mathbf{Z}\) to the data location, \(\tau\) is the standard deviation of the modelling error, and \(\mathbf{W_d \sim \mathcal{N}(0, I_p)}\).

Kriging

Kriging is computed using the Cholesky factorization.

Q_TT = Q + 1/(tau^2) * t(A_d) %*% A_d
cholQ_TT =  Matrix::Cholesky(Q_TT, LDL = FALSE)

z_dat = dat["Z"]
kriging = as.numeric(A_g %*% solve(cholQ_TT, t(A_d) %*% z_dat/(tau^2)))

err = grid$setColumn(name = "manual.kriging", tab = kriging)

# plot of kriging
p = ggDefaultGeographic() + 
    plot.grid(grid, nameRaster = "manual.kriging", useSel = TRUE, palette = "Spectral", 
            flagLegendRaster = TRUE, legendNameRaster = "Kriging") +
    plot.point(dat, col = "red") +
    plot.decoration(xlab = "Easting", ylab = "Northing", 
                  title = "Kriging")
ggPrint(p)

Conditional simulations

The conditional simulation is performed in three steps:

a non conditional simulation of the latent vector \(\mathbf{Z}\) is computed first,
then, the kriging of the error at observation locations is computed and added to the non conditional simulation,
finally, the simulated latent vector is interpolated on the grid

# computing non conditional simulations of the latent vector (on the mesh)
U = matrix(rnorm(n = dim(Q)[1]*nsim), nrow = dim(Q)[1], ncol = nsim)
Z = (t(expand(cholQ)$P) %*% solve(cholQ, U, system = "Lt"))

# computing the kriging of the simulated errors and adding them to the non conditional simulation (on the mesh)
Zc = Z + solve(cholQ_TT, t(A_d) %*% (z_dat - A_d %*% Z)/(tau^2))

# computing the simulated value on the grid
S_g = (A_g %*% Zc)[,1:nsim]

# storing the results +  plot
err = grid$setColumn(tab = S_g[,1], name = "manual.cond.simu", useSel = TRUE)
p = ggDefaultGeographic() + 
    plot.grid(grid, nameRaster = "manual.cond.simu", useSel = TRUE, palette = "Spectral", 
            flagLegendRaster = TRUE, legendNameRaster = "CD. Simu") +
    plot.point(dat, col = "red") +
    plot.decoration(xlab = "Easting", ylab = "Northing", 
                  title = "Conditional simulation")
ggPrint(p)

Monte-Carlo evaluation of the conditional expectation

# empirical conditional expectation
ce = apply(X = S_g, MARGIN = 1, FUN = mean)
err = grid$setColumn(tab = ce, name = "manual.cond.expectation", useSel = TRUE)

# empirical conditional standard deviation
ce = apply(X = S_g, MARGIN = 1, FUN = sd)
err = grid$setColumn(tab = ce, name = "manual.cond.std", useSel = TRUE)

# plots

p = ggDefaultGeographic() + 
    plot.grid(grid, nameRaster = "manual.cond.expectation", useSel = TRUE, 
              palette = "Spectral", 
              flagLegendRaster = TRUE, legendNameRaster = "CD. Exp.") +
    # plot.point(dat, col = "red") +
    plot.decoration(xlab = "Easting", ylab = "Northing", 
                  title = "Conditional expectation")
ggPrint(p)

p = ggDefaultGeographic() + 
    plot.grid(grid, nameRaster = "manual.cond.std", useSel = TRUE, palette = "Spectral", 
              flagLegendRaster = TRUE, legendNameRaster = "CD. Std.") +
    # plot.point(dat, col = "red") +
    plot.decoration(xlab = "Easting", ylab = "Northing", 
                  title = "Conditional standard deviation")
ggPrint(p)

# cross plot: initial simulation vs. kriging
p = ggDefault() + 
    plot.correlation(db1 = grid, 
                     namex = "manual.kriging", 
                     namey = "manual.simu", asPoint = FALSE, bins=100,
                     flagDiag = TRUE, diagColor = "red", flagSameAxes = FALSE) +
    plot.decoration(xlab = "Estimation (Kriging)", 
                    ylab = "Actual value (Initial simulation)", 
                    title = "Simulation vs. kriging")
ggPrint(p)

# cross plot: empirical conditional expectation vs. kriging
p = ggDefault() + 
    plot.correlation(db1 = grid, 
                     namex = "manual.cond.expectation", 
                     namey = "manual.kriging", asPoint = FALSE, bins=100,
                     flagDiag = TRUE, diagColor = "red", flagSameAxes = FALSE) +
    plot.decoration(xlab = "Cond. Expectation", 
                    ylab = "Kriging", 
                    title = "Kriging vs. conditional expectation")
ggPrint(p)

Using the SPDE api defined in gstlearn

Non conditional simulations

err = grid$deleteColumns(names = "spde.simu.*")
# non conditional simulations
spde = SPDE(model = model, domain = gridExt, data = NULL, 
            calcul = ESPDECalcMode_SIMUNONCOND(), mesh = mesh)
iuid = spde$compute(dbout = grid, nbsimu = nsim, namconv = NamingConvention("spde"))

# plot
p1 = ggDefaultGeographic() + plot.grid(grid, nameRaster="spde.1", palette = "Spectral")
p2 = ggDefaultGeographic() + plot.grid(grid, nameRaster="spde.2", palette = "Spectral")
p3 = ggDefaultGeographic() + plot.grid(grid, nameRaster="spde.3", palette = "Spectral")
p4 = ggDefaultGeographic() + plot.grid(grid, nameRaster="spde.4", palette = "Spectral")
ggarrange(p1, p2, p3, p4, ncol = 2, nrow = 2)

Conditional simulations

err = grid$deleteColumns(names = "spde.*")
# modeling the observation process
model_obs = model$clone()
err     = model_obs$addCovFromParam(ECov_NUGGET(),sill=(tau^2))
model_obs$display()

## 
## Model characteristics
## =====================
## Space dimension              = 2
## Number of variable(s)        = 1
## Number of basic structure(s) = 2
## Number of drift function(s)  = 0
## Number of drift equation(s)  = 0
## 
## Covariance Part
## ---------------
## K-Bessel (Third Parameter = 1)
## - Sill         =      1.000
## - Range        =      0.200
## - Theo. Range  =      0.058
## Nugget Effect
## - Sill         =      0.000
## Total Sill     =      1.000
## Known Mean(s)     0.000

## NULL

# conditional simulations
spde_obs = SPDE(model = model_obs, domain = gridExt, data = dat, 
                calcul = ESPDECalcMode_SIMUCOND(), mesh = mesh)
iuid = spde_obs$compute(dbout = grid, nbsimu = nsim, namconv = NamingConvention("spde"))

# plot
p1 = ggDefaultGeographic() + plot.grid(grid, nameRaster="spde.Z.1", palette = "Spectral")
p2 = ggDefaultGeographic() + plot.grid(grid, nameRaster="spde.Z.2", palette = "Spectral")
p3 = ggDefaultGeographic() + plot.grid(grid, nameRaster="spde.Z.3", palette = "Spectral")
p4 = ggDefaultGeographic() + plot.grid(grid, nameRaster="spde.Z.4", palette = "Spectral")
ggarrange(p1, p2, p3, p4, ncol = 2, nrow = 2)

Kriging

err = grid$deleteColumns(names = c("spde.Z.estim", "spde.Z.stdev"))

# kriging value
spdeRes = SPDE(model = model_obs, domain = gridExt, data = dat, 
               calcul = ESPDECalcMode_KRIGING(), mesh = mesh)
iuid    = spdeRes$compute(grid, namconv = NamingConvention("spdeK"))

# computing the stdev by Monte Carlo
tab = matrix(grid$getColumns(names = "spde.Z.*", useSel = FALSE), 
             nrow = grid$getSampleNumber(FALSE), ncol = nsim)
val_std = apply(X = tab, MARGIN = 1, FUN = sd)
err = grid$setColumn(tab = val_std, name = "spdeK.Z.stdev", useSel = FALSE)

# plot of Z* ans Std. of the kriging error (Z - Z*)
p1 = ggDefaultGeographic() + plot.grid(grid, nameRaster="spdeK.Z.estim", palette = "Spectral")
p2 = ggDefaultGeographic() + plot.grid(grid, nameRaster="spdeK.Z.stdev", palette = "Spectral")
ggarrange(p1, p2, ncol = 2, nrow = 2)

# cross plot for kriging values
p = ggDefault() + 
    plot.correlation(db1 = grid, 
                     namex = "manual.kriging", 
                     namey = "spdeK.Z.estim", asPoint = FALSE, bins=100,
                     flagDiag = TRUE, diagColor = "red", flagSameAxes = FALSE) +
    plot.decoration(xlab = "Manual", ylab = "SPDE", 
                  title = "Comparison for Kriging")
ggPrint(p)

Using the SPDE auxiliary functions defined in gstlearn

Non conditional simulations

err = grid$deleteColumns(names = "spde.fn.simu.*")
# non conditional simulations
err  = simulateSPDE(dbin = NULL, dbout = grid, model = model, nbsimu = nsim, 
                    namconv = NamingConvention("spde.fn"))

# plot
p1 = ggDefaultGeographic() + plot.grid(grid, nameRaster="spde.fn.1", palette = "Spectral")
p2 = ggDefaultGeographic() + plot.grid(grid, nameRaster="spde.fn.2", palette = "Spectral")
p3 = ggDefaultGeographic() + plot.grid(grid, nameRaster="spde.fn.3", palette = "Spectral")
p4 = ggDefaultGeographic() + plot.grid(grid, nameRaster="spde.fn.4", palette = "Spectral")
ggarrange(p1, p2, p3, p4, ncol = 2, nrow = 2)

Conditional simulations

err = grid$deleteColumns(names = "spde.fn.Z.condSimu.*")

# conditional simulations
err  = simulateSPDE(dbin = dat, dbout = grid, model = model, nbsimu = nsim, 
                    namconv = NamingConvention("spde.fn"))

# plot
p1 = ggDefaultGeographic() + plot.grid(grid, nameRaster="spde.fn.Z.1", palette = "Spectral")
p2 = ggDefaultGeographic() + plot.grid(grid, nameRaster="spde.fn.Z.2", palette = "Spectral")
p3 = ggDefaultGeographic() + plot.grid(grid, nameRaster="spde.fn.Z.3", palette = "Spectral")
p4 = ggDefaultGeographic() + plot.grid(grid, nameRaster="spde.fn.Z.4", palette = "Spectral")
ggarrange(p1, p2, p3, p4, ncol = 2, nrow = 2)

Kriging

err = grid$deleteColumns(names = c("spde.fn.Z.kriging", "spde.fn.Z.stdev", "spde.fn.Z.varz"))

# kriging value
err = krigingSPDE(dbin = dat, dbout = grid, model = model, 
                  flag_est = TRUE, flag_std = TRUE, nbMC=0, seed=0,
                  namconv = NamingConvention("spde.fn"))

# cross plot for kriging values
p = ggDefault() + 
    plot.correlation(db1 = grid, 
                     namex = "manual.kriging", 
                     namey = "spde.fn.Z.estim", asPoint = FALSE, bins=100,
                     flagDiag = TRUE, diagColor = "red", flagSameAxes = FALSE) +
    plot.decoration(xlab = "Manual", ylab = "SPDE", 
                  title = "Comparison for Kriging")
ggPrint(p)

References

Lindgren, F., Rue, H., and Lindström, J. (2011). An explicit link between gaussian fields and gaussian markov random fields: the spde approach (with discussion). JR 671 Stat Soc, Series B, 73:423–498.
Pereira, M., Desassis, N., & Allard, D. (2022). Geostatistics for Large Datasets on Riemannian Manifolds: A Matrix-Free Approach. Journal of Data Science, 20(4), 512-532. doi:10.6339/22-JDS1075

A gentle introduction to SPDE in gstlearn

gstlearn Team

2023-09-06

Introduction

SPDE approach

Initialization of gstlearn

Parameters of the SPDE model

Grid definition

Model definition

Non conditional simulations

Kriging and conditional simulation distribution

Observations of the process

Kriging

Conditional simulations

Monte-Carlo evaluation of the conditional expectation

Using the SPDE api defined in gstlearn

Non conditional simulations

Conditional simulations

Kriging

Using the SPDE auxiliary functions defined in gstlearn

Non conditional simulations

Conditional simulations

Kriging

References