Simulation post-processing¶
This test is meant to demonstrate the tool for post-processing simulation results.
import gstlearn as gl
import gstlearn.plot as gp
import gstlearn.document as gdoc
import numpy as np
import matplotlib.pyplot as plt
import gstlearn.plot as gp
import scipy.sparse as sc
gdoc.setNoScroll()
We construct several simulations independently. Here, the manner these simulations are constrcuted has no interest and we simply sample distributions (Uniform or Gaussian) with various parameters. We construct three simulations (called SimuA, SimuB and SimuC) with different occurence numbers (3, 2 and 2).
Then, for each cell of the output grid:
- a search for the samples belonging to the cell is performed
- a particular transformation (calculating their sum and their standard deviation) is applied to the outcomes of these samples
- the transformed values of these samples are upscaled (using the mean) to one single multivariate per cell
- Some statistics (mean and variance) are calculated on the results of the transformation per cell.
Creating a point Data Base
nech=10
db = gl.Db.createFillRandom(ndat=nech, ndim=2, nvar=0, seed = 3143233)
err = db.addColumns(np.arange(0,nech),"rank")
Creating a Grid Data Base covering the same area
nx = 5
dx = 1 / nx
x0 = dx / 2
dbgrid = gl.DbGrid.create(nx=[nx,nx], dx=[dx,dx], x0=[x0,x0])
Adding a first set of 3 simulation outcomes (sampled from Normal distribution)
nsimuA = 3
uid = db.addColumnsByConstant(nsimuA, 0., "SimuA")
for i in range(nsimuA):
vec = gl.VectorHelper.simulateGaussian(nech)
db.setColumnByUID(vec, uid + i)
Adding a second set of 2 simulations (sampled from Uniform distribution between 100 and 200)
nsimuB = 2
uid = db.addColumnsByConstant(nsimuB, 0., "SimuB")
for i in range(nsimuB):
vec = gl.VectorHelper.simulateUniform(nech, 100., 200.)
db.setColumnByUID(vec, uid + i)
Adding a third set of 2 simulations (sampled from Uniform distribution between -4 and -2)
nsimuC = 2
uid = db.addColumnsByConstant(nsimuC, 0., "SimuC")
for i in range(nsimuC):
vec = gl.VectorHelper.simulateUniform(nech, -4., -2.)
db.setColumnByUID(vec, uid + i)
In the following paragraph, we provide the complete dump of the contents of the Data Base to better understand the next steps.
dbfmt = gl.DbStringFormat.createFromFlags(flag_resume = False, flag_vars=False, flag_array=True)
gl.OptCst.defineByKey("NTROW",-1)
gl.OptCst.defineByKey("NTCOL",-1)
db.display(dbfmt)
Data Base Characteristics ========================= Data Base Contents ------------------ rank x-1 x-2 rank.1 SimuA-1 SimuA-2 SimuA-3 [ 0,] 1.000 0.502 0.135 0.000 1.489 -0.056 0.187 [ 1,] 2.000 0.693 0.185 1.000 0.519 -0.594 -0.438 [ 2,] 3.000 0.809 0.387 2.000 -2.893 0.909 0.171 [ 3,] 4.000 0.915 0.617 3.000 1.224 -1.397 0.647 [ 4,] 5.000 0.088 0.808 4.000 -0.171 -0.214 -0.783 [ 5,] 6.000 0.189 0.802 5.000 1.071 2.205 -0.047 [ 6,] 7.000 0.835 0.201 6.000 -0.722 -0.029 -0.781 [ 7,] 8.000 0.647 0.105 7.000 -0.462 -0.816 0.772 [ 8,] 9.000 0.907 0.070 8.000 1.315 -0.602 -1.556 [ 9,] 10.000 0.258 0.374 9.000 -0.496 -1.302 0.050 SimuB-1 SimuB-2 SimuC-1 SimuC-2 [ 0,] 196.812 114.333 -3.668 -3.635 [ 1,] 165.288 104.997 -3.142 -3.692 [ 2,] 155.281 124.644 -3.934 -3.673 [ 3,] 104.455 187.628 -3.087 -3.684 [ 4,] 167.794 100.972 -2.142 -2.837 [ 5,] 118.355 102.044 -2.891 -3.889 [ 6,] 127.263 114.671 -3.572 -2.395 [ 7,] 162.597 140.488 -3.040 -3.493 [ 8,] 172.682 151.285 -3.228 -2.714 [ 9,] 131.565 184.920 -2.987 -2.919
Display of the samples and overlay the grid of cells
ax = gp.setDefaultGeographic([10,10])
ax = gp.grid(dbgrid, flagCell=True)
ax = gp.literal(db, name="rank", s=100)
We perform the simulations post-processing as described in the introduction.
In this first application, we bypass any transformation function.
A particular attention is brought to the cell (ranked 4 [1-based]) in order to check the work flow applied to the information.
err = gl.simuPost(db, dbgrid, ["SimuA*", "SimuB*", "SimuC*"], flag_match=False,
upscale = gl.EPostUpscale.MEAN,
stats = [gl.EPostStat.MEAN, gl.EPostStat.VAR],
check_targets=[4], verbose=True, namconv=gl.NamingConvention("Post1"))
Simulation Post-Processing -------------------------- Multiplicity order for all variables - Variable 1 (SimuA*) = 3 - Variable 2 (SimuB*) = 2 - Variable 3 (SimuC*) = 2 Number of Iterations: 12 (using the 'product' criterion) Number of Statistics: 2 Number of Output Variables: 6 == Cell #4/25 (regrouping 2 samples) Mean Variance Var 1 -0.170 0.160 Var 2 143.343 462.942 Var 3 -3.342 0.068
dbgrid
Data Base Grid Characteristics ============================== Data Base Summary ----------------- File is organized as a regular grid Space dimension = 2 Number of Columns = 9 Total number of samples = 25 Grid characteristics: --------------------- Origin : 0.100 0.100 Mesh : 0.200 0.200 Number : 5 5 Variables --------- Column = 0 - Name = rank - Locator = NA Column = 1 - Name = x1 - Locator = x1 Column = 2 - Name = x2 - Locator = x2 Column = 3 - Name = Post1.Var1.Mean - Locator = NA Column = 4 - Name = Post1.Var1.Variance - Locator = NA Column = 5 - Name = Post1.Var2.Mean - Locator = NA Column = 6 - Name = Post1.Var2.Variance - Locator = NA Column = 7 - Name = Post1.Var3.Mean - Locator = NA Column = 8 - Name = Post1.Var3.Variance - Locator = NA
ax = gp.setDefaultGeographic([10,10])
ax = gp.literal(dbgrid, name="Post1.Var1.Variance", marker="")
ax = gp.grid(dbgrid, flagCell=True)
ax = gp.literal(db, name="rank", s=100)
In this second trial, we use simuPostDemo where a stransformation has been embedded in the procedure: this transformation calculates the sum and the standard deviation of the simulation outcomes for eachsample of the target cell. For comparison sake, we still keep the focus on the same cell.
err = gl.simuPostDemo(db, dbgrid, ["SimuA*", "SimuB*", "SimuC*"], flag_match=False,
upscale = gl.EPostUpscale.MEAN,
stats = [gl.EPostStat.MEAN, gl.EPostStat.VAR],
check_targets=[4], verbose=True, namconv=gl.NamingConvention("Post2"))
Simulation Post-Processing -------------------------- Multiplicity order for all variables - Variable 1 (SimuA*) = 3 - Variable 2 (SimuB*) = 2 - Variable 3 (SimuC*) = 2 Number of Iterations: 12 (using the 'product' criterion) Number of Statistics: 2 Number of Transform Variables: 2 Number of Output Variables: 4 == Cell #4/25 (regrouping 2 samples) Mean Variance Var 1 46.610 51.463 Var 2 83.789 154.267
ax = gp.setDefaultGeographic([10,10])
ax = gp.literal(dbgrid, name="Post2.Var1.Variance", marker="")
ax = gp.grid(dbgrid, flagCell=True)
ax = gp.literal(db, name="rank", s=100)
In the next paragraph, we perform the simulation post-processing stage in place. This means that we do not provide any output Db and the Upscaling phase does not take place (however note that the upscaling rule must not be left empty). The results are stored in the Input Db.
err = gl.simuPost(db, None, ["SimuA*", "SimuB*", "SimuC*"], flag_match=False,
upscale = gl.EPostUpscale.UNKNOWN,
stats = [gl.EPostStat.MEAN, gl.EPostStat.VAR],
check_targets=[4], verbose=True, namconv=gl.NamingConvention("Post1"))
Simulation Post-Processing -------------------------- Multiplicity order for all variables - Variable 1 (SimuA*) = 3 - Variable 2 (SimuB*) = 2 - Variable 3 (SimuC*) = 2 Number of Iterations: 12 (using the 'product' criterion) Number of Statistics: 2 Number of Output Variables: 6 == Cell #4/10 Mean Variance Var 1 0.158 1.380 Var 2 146.042 1886.664 Var 3 -3.386 0.097
db.display(dbfmt)
Data Base Characteristics ========================= Data Base Contents ------------------ rank x-1 x-2 rank.1 SimuA-1 SimuA-2 SimuA-3 [ 0,] 1.000 0.502 0.135 0.000 1.489 -0.056 0.187 [ 1,] 2.000 0.693 0.185 1.000 0.519 -0.594 -0.438 [ 2,] 3.000 0.809 0.387 2.000 -2.893 0.909 0.171 [ 3,] 4.000 0.915 0.617 3.000 1.224 -1.397 0.647 [ 4,] 5.000 0.088 0.808 4.000 -0.171 -0.214 -0.783 [ 5,] 6.000 0.189 0.802 5.000 1.071 2.205 -0.047 [ 6,] 7.000 0.835 0.201 6.000 -0.722 -0.029 -0.781 [ 7,] 8.000 0.647 0.105 7.000 -0.462 -0.816 0.772 [ 8,] 9.000 0.907 0.070 8.000 1.315 -0.602 -1.556 [ 9,] 10.000 0.258 0.374 9.000 -0.496 -1.302 0.050 SimuB-1 SimuB-2 SimuC-1 SimuC-2 *.Var1.Mean *1.Variance *.Var2.Mean [ 0,] 196.812 114.333 -3.668 -3.635 0.540 0.502 155.573 [ 1,] 165.288 104.997 -3.142 -3.692 -0.171 0.264 135.142 [ 2,] 155.281 124.644 -3.934 -3.673 -0.605 2.957 139.962 [ 3,] 104.455 187.628 -3.087 -3.684 0.158 1.380 146.042 [ 4,] 167.794 100.972 -2.142 -2.837 -0.389 0.085 134.383 [ 5,] 118.355 102.044 -2.891 -3.889 1.076 0.922 110.200 [ 6,] 127.263 114.671 -3.572 -2.395 -0.511 0.127 120.967 [ 7,] 162.597 140.488 -3.040 -3.493 -0.168 0.505 151.543 [ 8,] 172.682 151.285 -3.228 -2.714 -0.281 1.555 161.983 [ 9,] 131.565 184.920 -2.987 -2.919 -0.583 0.336 158.243 *2.Variance *.Var3.Mean *3.Variance [ 0,] 1855.304 -3.652 0.000 [ 1,] 991.390 -3.417 0.082 [ 2,] 255.980 -3.804 0.019 [ 3,] 1886.664 -3.386 0.097 [ 4,] 1217.776 -2.489 0.132 [ 5,] 72.553 -3.390 0.272 [ 6,] 43.240 -2.983 0.378 [ 7,] 133.306 -3.266 0.056 [ 8,] 124.859 -2.971 0.072 [ 9,] 776.385 -2.953 0.001
Same operation with a (built-in) transformation.
err = gl.simuPostDemo(db, None, ["SimuA*", "SimuB*", "SimuC*"], flag_match=False,
upscale = gl.EPostUpscale.UNKNOWN,
stats = [gl.EPostStat.MEAN, gl.EPostStat.VAR],
check_targets=[4], verbose=True, namconv=gl.NamingConvention("Post2"))
Simulation Post-Processing -------------------------- Multiplicity order for all variables - Variable 1 (SimuA*) = 3 - Variable 2 (SimuB*) = 2 - Variable 3 (SimuC*) = 2 Number of Iterations: 12 (using the 'product' criterion) Number of Statistics: 2 Number of Transform Variables: 2 Number of Output Variables: 4 == Cell #4/10 Mean Variance Var 1 47.605 209.793 Var 2 85.271 628.676
db.display(dbfmt)
Data Base Characteristics ========================= Data Base Contents ------------------ rank x-1 x-2 rank.1 SimuA-1 SimuA-2 SimuA-3 [ 0,] 1.000 0.502 0.135 0.000 1.489 -0.056 0.187 [ 1,] 2.000 0.693 0.185 1.000 0.519 -0.594 -0.438 [ 2,] 3.000 0.809 0.387 2.000 -2.893 0.909 0.171 [ 3,] 4.000 0.915 0.617 3.000 1.224 -1.397 0.647 [ 4,] 5.000 0.088 0.808 4.000 -0.171 -0.214 -0.783 [ 5,] 6.000 0.189 0.802 5.000 1.071 2.205 -0.047 [ 6,] 7.000 0.835 0.201 6.000 -0.722 -0.029 -0.781 [ 7,] 8.000 0.647 0.105 7.000 -0.462 -0.816 0.772 [ 8,] 9.000 0.907 0.070 8.000 1.315 -0.602 -1.556 [ 9,] 10.000 0.258 0.374 9.000 -0.496 -1.302 0.050 SimuB-1 SimuB-2 SimuC-1 SimuC-2 *.Var1.Mean *1.Variance *.Var2.Mean [ 0,] 196.812 114.333 -3.668 -3.635 0.540 0.502 155.573 [ 1,] 165.288 104.997 -3.142 -3.692 -0.171 0.264 135.142 [ 2,] 155.281 124.644 -3.934 -3.673 -0.605 2.957 139.962 [ 3,] 104.455 187.628 -3.087 -3.684 0.158 1.380 146.042 [ 4,] 167.794 100.972 -2.142 -2.837 -0.389 0.085 134.383 [ 5,] 118.355 102.044 -2.891 -3.889 1.076 0.922 110.200 [ 6,] 127.263 114.671 -3.572 -2.395 -0.511 0.127 120.967 [ 7,] 162.597 140.488 -3.040 -3.493 -0.168 0.505 151.543 [ 8,] 172.682 151.285 -3.228 -2.714 -0.281 1.555 161.983 [ 9,] 131.565 184.920 -2.987 -2.919 -0.583 0.336 158.243 *2.Variance *.Var3.Mean *3.Variance *.Var1.Mean *1.Variance *.Var2.Mean *2.Variance [ 0,] 1855.304 -3.652 0.000 50.820 206.201 90.745 618.109 [ 1,] 991.390 -3.417 0.082 43.851 110.193 79.078 330.340 [ 2,] 255.980 -3.804 0.019 45.185 28.773 82.100 85.519 [ 3,] 1886.664 -3.386 0.097 47.605 209.793 85.271 628.676 [ 4,] 1217.776 -2.489 0.132 43.835 135.333 78.425 405.863 [ 5,] 72.553 -3.390 0.272 35.962 8.194 64.333 24.247 [ 6,] 43.240 -2.983 0.378 39.158 4.861 70.861 14.452 [ 7,] 133.306 -3.266 0.056 49.369 14.874 88.499 44.465 [ 8,] 124.859 -2.971 0.072 52.911 14.054 94.471 41.738 [ 9,] 776.385 -2.953 0.001 51.569 86.303 92.390 258.776