In this tutorial, we show how to use the API SPDE. This is a free transcription in R of the python note book tuto_SPDE.
Lindgren et al. (2011) defined an explicit link between Gaussian fields and Gaussian markov random fields. This approach is implemented in the SPDE module of \(gstlearn\). The random field is a weak solution of some stochastic partial differential equation solved using the finite element method. It follows that the standard tools of Geostatistics, kriging and stochastic simulation, can be rewritten using sparse linear algebra.
The SPDE model represents the domain by a mesh and random field by its values for each mesh cell \(\mathbf{Z} \sim \mathcal{N}(\mathbf{0,Q^{-1}})\) where the precision matrix \(\mathbf{Q = \Sigma^{-1}}\) is sparse.
The latent vector \(\mathbf{Z}\) can be linearly interpolated to:
the target locations \(\mathbf{Y_T = A_g \, Z}\), or
the observation locations \(\mathbf{Y = A_d \, Z + \tau W}\).
In the latter case, \(\tau\) is the standard deviation of the noise modelling the error of observation and the modelling error. The number of observations is \(p\).
In this case, the precision matrix of the vector \(\mathbf{(Z, Y_D)}\) is: \[ \mathbf{ \tilde{Q} = \tilde{\Sigma}^{-1}= \begin{bmatrix}\mathbf{Q+\tau^{-2}A_d^T A_d} & \mathbf{-\tau^{-2}A_d^T} \\ \mathbf{-\tau^{-2}A_d} & \mathbf{\tau^{-2}I_p} \end{bmatrix} } \] From this expression the kriging and the conditional simulation can be derived:
the Kriging of \(\mathbf{Z}\) is \(E\{\mathbf{Z|Y = y}\} = \tau^{-2}\mathbf{(Q + \tau^{-2}A_d^TA_d)^{-1}A_d^T y}\)
the conditional variance is \(Cov\{\mathbf{Z|Y = y}\} = \mathbf{(Q + \tau^{-2}A_d^TA_d)^{-1}}\)
the non conditional simulation is \(\mathbf{(S, S_D)\sim \mathcal{N}(0, \tilde{Q}^{-1})}\)
the conditional simulation of the latent vector is \(\mathbf{S_{|Y=y} = S + \tau^{-2} (Q+ \tau^{-2}A_d^TA_d)^{-1} A_d^T(y - S_D)}\)
The estimated or simulated latent vector can be linearly interpolated to any target grid.
# Data
set.seed (123)
ndat = 100
# Model
rangev = 0.2
sill = 1.
tau = 0.01
# parameters of the final grid on [0,1]^2
nx = 200 * c(1,1)
dx = 1/(nx-1)
x0 = c(0,0)
# Grid meshing on [-0.25, 1.25]^2
nxm = 100 * c(1,1)
dxm = 1.5 / (nxm-1)
x0m = c(-0.25,-0.25)
# number of simulations for Monte-Carlo estimators
nsim = 20
grid = DbGrid_create(nx,dx,x0)
gridExt = DbGrid_create(nxm,dxm,x0m)
mesh = MeshETurbo(gridExt)
# limits
limits <- list(XP = 1.0*c(0, 1, 1, 0, 0),
YP = 1.0*c(0, 0, 1, 1, 0))
p_lim = PolyElem(limits$XP, limits$YP)
pol_lim = Polygons()
err = pol_lim$addPolyElem(p_lim)
p = ggDefaultGeographic() +
plot.grid(gridExt) +
plot.polygon(poly = pol_lim, color = "orange", fill = NA) +
plot.decoration(xlab = "Easting", ylab = "Northing",
title = "Coarse grid used to compute the meshing")
ggPrint(p)
p = ggDefaultGeographic() +
plot.grid(grid) +
plot.polygon(poly = pol_lim, color = "orange", fill = NA) +
plot.decoration(xlab = "Easting", ylab = "Northing",
title = "Target grid")
ggPrint(p)
We define a Matern’s stationary covariance function.
model = Model_createFromParam(ECov_BESSEL_K(), param=1,range=rangev,sill=sill)
This covariance function is converted into a SPDE model.
spde = SPDE_create(model, domain = gridExt, calcul = ESPDECalcMode_SIMUNONCOND(),
mesh = mesh)
Possible values for the calcul parameter are:
KRIGING for “Kriging or Likelihood”,
SIMUCOND for “Conditional simulations”,
SIMUNONCOND for “Non conditional simulations”,
The latent vector (of length \(n\)) \(\mathbf{Z \sim \mathcal{N}(0, Q^{-1})}\) is defined on the meshing by its sparse precision matrix \(\mathbf{Q = \Sigma^{-1}}\). \(\mathbf{Q}\) is factorized by the Cholesky method: \(\mathbf{Q = L\, L^{T}}\).
Thus, the Gaussian vector can be rewritten \(\mathbf{Z = (L^T)^{-1} \, U}\) with \(\mathbf{U \sim \mathcal{N}(0, I_n)}\).
Finally the Gaussian vector collecting the values of the random field at the grid nodes \(Y\) is achieved by the interpolation of \(\mathbf{Z}\) on the mesh \(\mathbf{Y_g = A_{g} \, Z}\).
To compute a non conditional simulation on the grid,
# projection matrix of the latent vector on the grid
A_g = ProjMatrix(grid, mesh)$toTL()
# Cholesky decomposition of precision matrix of the latent vector
Q = spde$getPrecisionOpCs()$getQ()$toTL()
cholQ = Matrix::Cholesky(Q, LDL = FALSE)
U = rnorm(n = dim(Q)[1])
#Z = (t(expand(cholQ)$P) %*% solve(cholQ, U, system = "Lt"))@x
Z = as.numeric(t(expand(cholQ)$P) %*% solve(cholQ, U, system = "Lt"))
Y_g =(A_g %*% Z)[,1]
The final result is aded to the grid db.
err = grid$setColumn(tab = Y_g, name = "manual.simu", useSel = FALSE)
# plot
p = ggDefaultGeographic() +
plot.grid(grid, nameRaster = "manual.simu", useSel = TRUE,
flagLegendRaster = TRUE, legendNameRaster = "NC Simu", palette = "Spectral") +
plot.decoration(xlab = "Easting", ylab = "Northing",
title = "Non conditional simulation")
ggPrint(p)
We define the observations from the initial non conditional simulation.
dat = Db_create()
err = dat$setColumn(tab = runif(n=ndat), name = "x")
err = dat$setColumn(tab = runif(n=ndat), name = "y")
err = dat$setLocators(names = c("x", "y"), locatorType = ELoc_X(), cleanSameLocator = TRUE)
# extraction of the simulated value at data location
err = grid$setLocator(name = "manual.simu", locatorType = ELoc_Z(), cleanSameLocator = TRUE)
err = migrate(dbin = grid, dbout = dat, name = "manual.simu", namconv = NamingConvention())
err = dat$setName("manual.simu.*", "Z")
# projection matrix of the latent vector on the data locations
A_d = ProjMatrix(dat, mesh)$toTL()
The \(p\) observations are defined by the vector \(\mathbf{Y_d = A_d \, Z + \tau \, W_d}\), where \(\mathbf{A_d}\) is the projection matrix from the latent vector \(\mathbf{Z}\) to the data location, \(\tau\) is the standard deviation of the modelling error, and \(\mathbf{W_d \sim \mathcal{N}(0, I_p)}\).
Kriging is computed using the Cholesky factorization.
Q_TT = Q + 1/(tau^2) * t(A_d) %*% A_d
cholQ_TT = Matrix::Cholesky(Q_TT, LDL = FALSE)
z_dat = dat["Z"]
kriging = as.numeric(A_g %*% solve(cholQ_TT, t(A_d) %*% z_dat/(tau^2)))
err = grid$setColumn(name = "manual.kriging", tab = kriging)
# plot of kriging
p = ggDefaultGeographic() +
plot.grid(grid, nameRaster = "manual.kriging", useSel = TRUE, palette = "Spectral",
flagLegendRaster = TRUE, legendNameRaster = "Kriging") +
plot.point(dat, col = "red") +
plot.decoration(xlab = "Easting", ylab = "Northing",
title = "Kriging")
ggPrint(p)
The conditional simulation is performed in three steps:
a non conditional simulation of the latent vector \(\mathbf{Z}\) is computed first,
then, the kriging of the error at observation locations is computed and added to the non conditional simulation,
finally, the simulated latent vector is interpolated on the grid
# computing non conditional simulations of the latent vector (on the mesh)
U = matrix(rnorm(n = dim(Q)[1]*nsim), nrow = dim(Q)[1], ncol = nsim)
Z = (t(expand(cholQ)$P) %*% solve(cholQ, U, system = "Lt"))
# computing the kriging of the simulated errors and adding them to the non conditional simulation (on the mesh)
Zc = Z + solve(cholQ_TT, t(A_d) %*% (z_dat - A_d %*% Z)/(tau^2))
# computing the simulated value on the grid
S_g = (A_g %*% Zc)[,1:nsim]
# storing the results + plot
err = grid$setColumn(tab = S_g[,1], name = "manual.cond.simu", useSel = TRUE)
p = ggDefaultGeographic() +
plot.grid(grid, nameRaster = "manual.cond.simu", useSel = TRUE, palette = "Spectral",
flagLegendRaster = TRUE, legendNameRaster = "CD. Simu") +
plot.point(dat, col = "red") +
plot.decoration(xlab = "Easting", ylab = "Northing",
title = "Conditional simulation")
ggPrint(p)
# empirical conditional expectation
ce = apply(X = S_g, MARGIN = 1, FUN = mean)
err = grid$setColumn(tab = ce, name = "manual.cond.expectation", useSel = TRUE)
# empirical conditional standard deviation
ce = apply(X = S_g, MARGIN = 1, FUN = sd)
err = grid$setColumn(tab = ce, name = "manual.cond.std", useSel = TRUE)
# plots
p = ggDefaultGeographic() +
plot.grid(grid, nameRaster = "manual.cond.expectation", useSel = TRUE,
palette = "Spectral",
flagLegendRaster = TRUE, legendNameRaster = "CD. Exp.") +
# plot.point(dat, col = "red") +
plot.decoration(xlab = "Easting", ylab = "Northing",
title = "Conditional expectation")
ggPrint(p)
p = ggDefaultGeographic() +
plot.grid(grid, nameRaster = "manual.cond.std", useSel = TRUE, palette = "Spectral",
flagLegendRaster = TRUE, legendNameRaster = "CD. Std.") +
# plot.point(dat, col = "red") +
plot.decoration(xlab = "Easting", ylab = "Northing",
title = "Conditional standard deviation")
ggPrint(p)
# cross plot: initial simulation vs. kriging
p = ggDefault() +
plot.correlation(db1 = grid,
namex = "manual.kriging",
namey = "manual.simu", asPoint = FALSE, bins=100,
flagDiag = TRUE, diagColor = "red", flagSameAxes = FALSE) +
plot.decoration(xlab = "Estimation (Kriging)",
ylab = "Actual value (Initial simulation)",
title = "Simulation vs. kriging")
ggPrint(p)
# cross plot: empirical conditional expectation vs. kriging
p = ggDefault() +
plot.correlation(db1 = grid,
namex = "manual.cond.expectation",
namey = "manual.kriging", asPoint = FALSE, bins=100,
flagDiag = TRUE, diagColor = "red", flagSameAxes = FALSE) +
plot.decoration(xlab = "Cond. Expectation",
ylab = "Kriging",
title = "Kriging vs. conditional expectation")
ggPrint(p)
err = grid$deleteColumns(names = "spde.simu.*")
# non conditional simulations
spde = SPDE(model = model, domain = gridExt, data = NULL,
calcul = ESPDECalcMode_SIMUNONCOND(), mesh = mesh)
iuid = spde$compute(dbout = grid, nbsimu = nsim, namconv = NamingConvention("spde"))
# plot
p1 = ggDefaultGeographic() + plot.grid(grid, nameRaster="spde.1", palette = "Spectral")
p2 = ggDefaultGeographic() + plot.grid(grid, nameRaster="spde.2", palette = "Spectral")
p3 = ggDefaultGeographic() + plot.grid(grid, nameRaster="spde.3", palette = "Spectral")
p4 = ggDefaultGeographic() + plot.grid(grid, nameRaster="spde.4", palette = "Spectral")
ggarrange(p1, p2, p3, p4, ncol = 2, nrow = 2)
err = grid$deleteColumns(names = "spde.*")
# modeling the observation process
model_obs = model$clone()
err = model_obs$addCovFromParam(ECov_NUGGET(),sill=(tau^2))
model_obs$display()
##
## Model characteristics
## =====================
## Space dimension = 2
## Number of variable(s) = 1
## Number of basic structure(s) = 2
## Number of drift function(s) = 0
## Number of drift equation(s) = 0
##
## Covariance Part
## ---------------
## K-Bessel (Third Parameter = 1)
## - Sill = 1.000
## - Range = 0.200
## - Theo. Range = 0.058
## Nugget Effect
## - Sill = 0.000
## Total Sill = 1.000
## Known Mean(s) 0.000
## NULL
# conditional simulations
spde_obs = SPDE(model = model_obs, domain = gridExt, data = dat,
calcul = ESPDECalcMode_SIMUCOND(), mesh = mesh)
iuid = spde_obs$compute(dbout = grid, nbsimu = nsim, namconv = NamingConvention("spde"))
# plot
p1 = ggDefaultGeographic() + plot.grid(grid, nameRaster="spde.Z.1", palette = "Spectral")
p2 = ggDefaultGeographic() + plot.grid(grid, nameRaster="spde.Z.2", palette = "Spectral")
p3 = ggDefaultGeographic() + plot.grid(grid, nameRaster="spde.Z.3", palette = "Spectral")
p4 = ggDefaultGeographic() + plot.grid(grid, nameRaster="spde.Z.4", palette = "Spectral")
ggarrange(p1, p2, p3, p4, ncol = 2, nrow = 2)
err = grid$deleteColumns(names = c("spde.Z.estim", "spde.Z.stdev"))
# kriging value
spdeRes = SPDE(model = model_obs, domain = gridExt, data = dat,
calcul = ESPDECalcMode_KRIGING(), mesh = mesh)
iuid = spdeRes$compute(grid, namconv = NamingConvention("spdeK"))
# computing the stdev by Monte Carlo
tab = matrix(grid$getColumns(names = "spde.Z.*", useSel = FALSE),
nrow = grid$getSampleNumber(FALSE), ncol = nsim)
val_std = apply(X = tab, MARGIN = 1, FUN = sd)
err = grid$setColumn(tab = val_std, name = "spdeK.Z.stdev", useSel = FALSE)
# plot of Z* ans Std. of the kriging error (Z - Z*)
p1 = ggDefaultGeographic() + plot.grid(grid, nameRaster="spdeK.Z.estim", palette = "Spectral")
p2 = ggDefaultGeographic() + plot.grid(grid, nameRaster="spdeK.Z.stdev", palette = "Spectral")
ggarrange(p1, p2, ncol = 2, nrow = 2)
# cross plot for kriging values
p = ggDefault() +
plot.correlation(db1 = grid,
namex = "manual.kriging",
namey = "spdeK.Z.estim", asPoint = FALSE, bins=100,
flagDiag = TRUE, diagColor = "red", flagSameAxes = FALSE) +
plot.decoration(xlab = "Manual", ylab = "SPDE",
title = "Comparison for Kriging")
ggPrint(p)
err = grid$deleteColumns(names = "spde.fn.simu.*")
# non conditional simulations
err = simulateSPDE(dbin = NULL, dbout = grid, model = model, nbsimu = nsim,
namconv = NamingConvention("spde.fn"))
# plot
p1 = ggDefaultGeographic() + plot.grid(grid, nameRaster="spde.fn.1", palette = "Spectral")
p2 = ggDefaultGeographic() + plot.grid(grid, nameRaster="spde.fn.2", palette = "Spectral")
p3 = ggDefaultGeographic() + plot.grid(grid, nameRaster="spde.fn.3", palette = "Spectral")
p4 = ggDefaultGeographic() + plot.grid(grid, nameRaster="spde.fn.4", palette = "Spectral")
ggarrange(p1, p2, p3, p4, ncol = 2, nrow = 2)
err = grid$deleteColumns(names = "spde.fn.Z.condSimu.*")
# conditional simulations
err = simulateSPDE(dbin = dat, dbout = grid, model = model, nbsimu = nsim,
namconv = NamingConvention("spde.fn"))
# plot
p1 = ggDefaultGeographic() + plot.grid(grid, nameRaster="spde.fn.Z.1", palette = "Spectral")
p2 = ggDefaultGeographic() + plot.grid(grid, nameRaster="spde.fn.Z.2", palette = "Spectral")
p3 = ggDefaultGeographic() + plot.grid(grid, nameRaster="spde.fn.Z.3", palette = "Spectral")
p4 = ggDefaultGeographic() + plot.grid(grid, nameRaster="spde.fn.Z.4", palette = "Spectral")
ggarrange(p1, p2, p3, p4, ncol = 2, nrow = 2)
err = grid$deleteColumns(names = c("spde.fn.Z.kriging", "spde.fn.Z.stdev", "spde.fn.Z.varz"))
# kriging value
err = krigingSPDE(dbin = dat, dbout = grid, model = model,
flag_est = TRUE, flag_std = TRUE, nbMC=0, seed=0,
namconv = NamingConvention("spde.fn"))
# cross plot for kriging values
p = ggDefault() +
plot.correlation(db1 = grid,
namex = "manual.kriging",
namey = "spde.fn.Z.estim", asPoint = FALSE, bins=100,
flagDiag = TRUE, diagColor = "red", flagSameAxes = FALSE) +
plot.decoration(xlab = "Manual", ylab = "SPDE",
title = "Comparison for Kriging")
ggPrint(p)
Lindgren, F., Rue, H., and Lindström, J. (2011). An explicit link between gaussian fields and gaussian markov random fields: the spde approach (with discussion). JR 671 Stat Soc, Series B, 73:423–498.
Pereira, M., Desassis, N., & Allard, D. (2022). Geostatistics for Large Datasets on Riemannian Manifolds: A Matrix-Free Approach. Journal of Data Science, 20(4), 512-532. doi:10.6339/22-JDS1075