Simulation post-processing¶

This test is meant to demonstrate the tool for post-processing simulation results.

In [1]:
import gstlearn as gl
import gstlearn.plot as gp
import gstlearn.document as gdoc
import numpy as np
import matplotlib.pyplot as plt
import gstlearn.plot as gp
import scipy.sparse as sc

gdoc.setNoScroll()

We construct several simulations independently. Here, the manner these simulations are constrcuted has no interest and we simply sample distributions (Uniform or Gaussian) with various parameters. We construct three simulations (called SimuA, SimuB and SimuC) with different occurence numbers (3, 2 and 2).

Then, for each cell of the output grid:

  • a search for the samples belonging to the cell is performed
  • a particular transformation (calculating their sum and their standard deviation) is applied to the outcomes of these samples
  • the transformed values of these samples are upscaled (using the mean) to one single multivariate per cell
  • Some statistics (mean and variance) are calculated on the results of the transformation per cell.

Creating a point Data Base

In [2]:
nech=10
db = gl.Db.createFillRandom(ndat=nech, ndim=2, nvar=0, seed = 3143233)
err = db.addColumns(np.arange(0,nech),"rank")

Creating a Grid Data Base covering the same area

In [3]:
nx = 5
dx = 1 / nx
x0 = dx / 2
dbgrid = gl.DbGrid.create(nx=[nx,nx], dx=[dx,dx], x0=[x0,x0])

Adding a first set of 3 simulation outcomes (sampled from Normal distribution)

In [4]:
nsimuA = 3
uid = db.addColumnsByConstant(nsimuA, 0., "SimuA")
for i in range(nsimuA):
    vec = gl.VectorHelper.simulateGaussian(nech)
    db.setColumnByUID(vec, uid + i)

Adding a second set of 2 simulations (sampled from Uniform distribution between 100 and 200)

In [5]:
nsimuB = 2
uid = db.addColumnsByConstant(nsimuB, 0., "SimuB")
for i in range(nsimuB):
    vec = gl.VectorHelper.simulateUniform(nech, 100., 200.)
    db.setColumnByUID(vec, uid + i)

Adding a third set of 2 simulations (sampled from Uniform distribution between -4 and -2)

In [6]:
nsimuC = 2
uid = db.addColumnsByConstant(nsimuC, 0., "SimuC")
for i in range(nsimuC):
    vec = gl.VectorHelper.simulateUniform(nech, -4., -2.)
    db.setColumnByUID(vec, uid + i)

In the following paragraph, we provide the complete dump of the contents of the Data Base to better understand the next steps.

In [7]:
dbfmt = gl.DbStringFormat.createFromFlags(flag_resume = False, flag_vars=False, flag_array=True)
gl.OptCst.defineByKey("NTROW",-1)
gl.OptCst.defineByKey("NTCOL",-1)
db.display(dbfmt)
Data Base Characteristics
=========================

Data Base Contents
------------------
                 rank       x-1       x-2    rank.1   SimuA-1   SimuA-2   SimuA-3
     [  0,]     0.000     0.502     0.135     0.000     1.489    -0.056     0.187
     [  1,]     1.000     0.693     0.185     1.000     0.519    -0.594    -0.438
     [  2,]     2.000     0.809     0.387     2.000    -2.893     0.909     0.171
     [  3,]     3.000     0.915     0.617     3.000     1.224    -1.397     0.647
     [  4,]     4.000     0.088     0.808     4.000    -0.171    -0.214    -0.783
     [  5,]     5.000     0.189     0.802     5.000     1.071     2.205    -0.047
     [  6,]     6.000     0.835     0.201     6.000    -0.722    -0.029    -0.781
     [  7,]     7.000     0.647     0.105     7.000    -0.462    -0.816     0.772
     [  8,]     8.000     0.907     0.070     8.000     1.315    -0.602    -1.556
     [  9,]     9.000     0.258     0.374     9.000    -0.496    -1.302     0.050
              SimuB-1   SimuB-2   SimuC-1   SimuC-2
     [  0,]   196.812   114.333    -3.668    -3.635
     [  1,]   165.288   104.997    -3.142    -3.692
     [  2,]   155.281   124.644    -3.934    -3.673
     [  3,]   104.455   187.628    -3.087    -3.684
     [  4,]   167.794   100.972    -2.142    -2.837
     [  5,]   118.355   102.044    -2.891    -3.889
     [  6,]   127.263   114.671    -3.572    -2.395
     [  7,]   162.597   140.488    -3.040    -3.493
     [  8,]   172.682   151.285    -3.228    -2.714
     [  9,]   131.565   184.920    -2.987    -2.919
 

Display of the samples and overlay the grid of cells

In [8]:
ax = gp.setDefaultGeographic([10,10])
ax = gp.grid(dbgrid, flagCell=True)
ax = gp.literal(db, name="rank", s=100)

We perform the simulations post-processing as described in the introduction.

In this first application, we bypass any transformation function.

A particular attention is brought to the cell (ranked 4 [1-based]) in order to check the work flow applied to the information.

In [9]:
err = gl.simuPost(db, dbgrid, ["SimuA*", "SimuB*", "SimuC*"], flag_match=False, 
                  upscale = gl.EPostUpscale.MEAN,
                  stats = [gl.EPostStat.MEAN, gl.EPostStat.VAR], 
                  check_targets=[4], verbose=True, namconv=gl.NamingConvention("Post1"))
 Simulation Post-Processing
 --------------------------
 Multiplicity order for all variables
 - Variable 1 (SimuA*) = 3
 - Variable 2 (SimuB*) = 2
 - Variable 3 (SimuC*) = 2
 Number of Iterations: 12  (using the 'product' criterion)
 Number of Statistics: 2
 Number of Output Variables: 6
 
== Cell #4/25 (regrouping 2 samples)
             Mean   Variance
Var 1     -0.170      0.160
Var 2    143.343    462.942
Var 3     -3.342      0.068
 
In [10]:
dbgrid
Out[10]:
Data Base Grid Characteristics
==============================

Data Base Summary
-----------------
File is organized as a regular grid
Space dimension              = 2
Number of Columns            = 9
Total number of samples      = 25

Grid characteristics:
---------------------
Origin :      0.100     0.100
Mesh   :      0.200     0.200
Number :          5         5

Variables
---------
Column = 0 - Name = rank - Locator = NA
Column = 1 - Name = x1 - Locator = x1
Column = 2 - Name = x2 - Locator = x2
Column = 3 - Name = Post1.Var1.Mean - Locator = NA
Column = 4 - Name = Post1.Var1.Variance - Locator = NA
Column = 5 - Name = Post1.Var2.Mean - Locator = NA
Column = 6 - Name = Post1.Var2.Variance - Locator = NA
Column = 7 - Name = Post1.Var3.Mean - Locator = NA
Column = 8 - Name = Post1.Var3.Variance - Locator = NA
In [11]:
ax = gp.setDefaultGeographic([10,10])
ax = gp.literal(dbgrid, name="Post1.Var1.Variance", marker="")
ax = gp.grid(dbgrid, flagCell=True)
ax = gp.literal(db, name="rank", s=100)

In this second trial, we use simuPostDemo where a stransformation has been embedded in the procedure: this transformation calculates the sumù and the standard deviation* of the simulation outcomes for eachsample of the target cell. For comparison sake, we still keep the focus on the same cell.

In [12]:
err = gl.simuPostDemo(db, dbgrid, ["SimuA*", "SimuB*", "SimuC*"], flag_match=False, 
                      upscale = gl.EPostUpscale.MEAN,
                      stats = [gl.EPostStat.MEAN, gl.EPostStat.VAR], 
                      check_targets=[4], verbose=True, namconv=gl.NamingConvention("Post2"))
 Simulation Post-Processing
 --------------------------
 Multiplicity order for all variables
 - Variable 1 (SimuA*) = 3
 - Variable 2 (SimuB*) = 2
 - Variable 3 (SimuC*) = 2
 Number of Iterations: 12  (using the 'product' criterion)
 Number of Statistics: 2
 Number of Transform Variables: 2
 Number of Output Variables: 4
 
== Cell #4/25 (regrouping 2 samples)
             Mean   Variance
Var 1     46.610     51.463
Var 2     83.789    154.267
 
In [13]:
ax = gp.setDefaultGeographic([10,10])
ax = gp.literal(dbgrid, name="Post2.Var1.Variance", marker="")
ax = gp.grid(dbgrid, flagCell=True)
ax = gp.literal(db, name="rank", s=100)

In the next paragraph, we perform the simulation post-processing stage in place. This means that we do not provide any output Db and the Upscaling phase does not take place (however note that the upscaling rule must not be left empty). The results are stored in the Input Db.

In [14]:
err = gl.simuPost(db, None, ["SimuA*", "SimuB*", "SimuC*"], flag_match=False, 
                  upscale = gl.EPostUpscale.UNKNOWN,
                  stats = [gl.EPostStat.MEAN, gl.EPostStat.VAR], 
                  check_targets=[4], verbose=True, namconv=gl.NamingConvention("Post1"))
 Simulation Post-Processing
 --------------------------
 Multiplicity order for all variables
 - Variable 1 (SimuA*) = 3
 - Variable 2 (SimuB*) = 2
 - Variable 3 (SimuC*) = 2
 Number of Iterations: 12  (using the 'product' criterion)
 Number of Statistics: 2
 Number of Output Variables: 6
 
== Cell #4/10
             Mean   Variance
Var 1      0.158      1.380
Var 2    146.042   1886.664
Var 3     -3.386      0.097
 
In [15]:
db.display(dbfmt)
Data Base Characteristics
=========================

Data Base Contents
------------------
                   rank         x-1         x-2      rank.1     SimuA-1     SimuA-2     SimuA-3
     [  0,]       0.000       0.502       0.135       0.000       1.489      -0.056       0.187
     [  1,]       1.000       0.693       0.185       1.000       0.519      -0.594      -0.438
     [  2,]       2.000       0.809       0.387       2.000      -2.893       0.909       0.171
     [  3,]       3.000       0.915       0.617       3.000       1.224      -1.397       0.647
     [  4,]       4.000       0.088       0.808       4.000      -0.171      -0.214      -0.783
     [  5,]       5.000       0.189       0.802       5.000       1.071       2.205      -0.047
     [  6,]       6.000       0.835       0.201       6.000      -0.722      -0.029      -0.781
     [  7,]       7.000       0.647       0.105       7.000      -0.462      -0.816       0.772
     [  8,]       8.000       0.907       0.070       8.000       1.315      -0.602      -1.556
     [  9,]       9.000       0.258       0.374       9.000      -0.496      -1.302       0.050
                SimuB-1     SimuB-2     SimuC-1     SimuC-2 *.Var1.Mean *1.Variance *.Var2.Mean
     [  0,]     196.812     114.333      -3.668      -3.635       0.540       0.502     155.573
     [  1,]     165.288     104.997      -3.142      -3.692      -0.171       0.264     135.142
     [  2,]     155.281     124.644      -3.934      -3.673      -0.605       2.957     139.962
     [  3,]     104.455     187.628      -3.087      -3.684       0.158       1.380     146.042
     [  4,]     167.794     100.972      -2.142      -2.837      -0.389       0.085     134.383
     [  5,]     118.355     102.044      -2.891      -3.889       1.076       0.922     110.200
     [  6,]     127.263     114.671      -3.572      -2.395      -0.511       0.127     120.967
     [  7,]     162.597     140.488      -3.040      -3.493      -0.168       0.505     151.543
     [  8,]     172.682     151.285      -3.228      -2.714      -0.281       1.555     161.983
     [  9,]     131.565     184.920      -2.987      -2.919      -0.583       0.336     158.243
            *2.Variance *.Var3.Mean *3.Variance
     [  0,]    1855.304      -3.652       0.000
     [  1,]     991.390      -3.417       0.082
     [  2,]     255.980      -3.804       0.019
     [  3,]    1886.664      -3.386       0.097
     [  4,]    1217.776      -2.489       0.132
     [  5,]      72.553      -3.390       0.272
     [  6,]      43.240      -2.983       0.378
     [  7,]     133.306      -3.266       0.056
     [  8,]     124.859      -2.971       0.072
     [  9,]     776.385      -2.953       0.001
 

Same operation with a (built-in) transformation.

In [16]:
err = gl.simuPostDemo(db, None, ["SimuA*", "SimuB*", "SimuC*"], flag_match=False, 
                      upscale = gl.EPostUpscale.UNKNOWN,
                      stats = [gl.EPostStat.MEAN, gl.EPostStat.VAR], 
                      check_targets=[4], verbose=True, namconv=gl.NamingConvention("Post2"))
 Simulation Post-Processing
 --------------------------
 Multiplicity order for all variables
 - Variable 1 (SimuA*) = 3
 - Variable 2 (SimuB*) = 2
 - Variable 3 (SimuC*) = 2
 Number of Iterations: 12  (using the 'product' criterion)
 Number of Statistics: 2
 Number of Transform Variables: 2
 Number of Output Variables: 4
 
== Cell #4/10
             Mean   Variance
Var 1     47.605    209.793
Var 2     85.271    628.676
 
In [17]:
db.display(dbfmt)
Data Base Characteristics
=========================

Data Base Contents
------------------
                   rank         x-1         x-2      rank.1     SimuA-1     SimuA-2     SimuA-3
     [  0,]       0.000       0.502       0.135       0.000       1.489      -0.056       0.187
     [  1,]       1.000       0.693       0.185       1.000       0.519      -0.594      -0.438
     [  2,]       2.000       0.809       0.387       2.000      -2.893       0.909       0.171
     [  3,]       3.000       0.915       0.617       3.000       1.224      -1.397       0.647
     [  4,]       4.000       0.088       0.808       4.000      -0.171      -0.214      -0.783
     [  5,]       5.000       0.189       0.802       5.000       1.071       2.205      -0.047
     [  6,]       6.000       0.835       0.201       6.000      -0.722      -0.029      -0.781
     [  7,]       7.000       0.647       0.105       7.000      -0.462      -0.816       0.772
     [  8,]       8.000       0.907       0.070       8.000       1.315      -0.602      -1.556
     [  9,]       9.000       0.258       0.374       9.000      -0.496      -1.302       0.050
                SimuB-1     SimuB-2     SimuC-1     SimuC-2 *.Var1.Mean *1.Variance *.Var2.Mean
     [  0,]     196.812     114.333      -3.668      -3.635       0.540       0.502     155.573
     [  1,]     165.288     104.997      -3.142      -3.692      -0.171       0.264     135.142
     [  2,]     155.281     124.644      -3.934      -3.673      -0.605       2.957     139.962
     [  3,]     104.455     187.628      -3.087      -3.684       0.158       1.380     146.042
     [  4,]     167.794     100.972      -2.142      -2.837      -0.389       0.085     134.383
     [  5,]     118.355     102.044      -2.891      -3.889       1.076       0.922     110.200
     [  6,]     127.263     114.671      -3.572      -2.395      -0.511       0.127     120.967
     [  7,]     162.597     140.488      -3.040      -3.493      -0.168       0.505     151.543
     [  8,]     172.682     151.285      -3.228      -2.714      -0.281       1.555     161.983
     [  9,]     131.565     184.920      -2.987      -2.919      -0.583       0.336     158.243
            *2.Variance *.Var3.Mean *3.Variance *.Var1.Mean *1.Variance *.Var2.Mean *2.Variance
     [  0,]    1855.304      -3.652       0.000      50.820     206.201      90.745     618.109
     [  1,]     991.390      -3.417       0.082      43.851     110.193      79.078     330.340
     [  2,]     255.980      -3.804       0.019      45.185      28.773      82.100      85.519
     [  3,]    1886.664      -3.386       0.097      47.605     209.793      85.271     628.676
     [  4,]    1217.776      -2.489       0.132      43.835     135.333      78.425     405.863
     [  5,]      72.553      -3.390       0.272      35.962       8.194      64.333      24.247
     [  6,]      43.240      -2.983       0.378      39.158       4.861      70.861      14.452
     [  7,]     133.306      -3.266       0.056      49.369      14.874      88.499      44.465
     [  8,]     124.859      -2.971       0.072      52.911      14.054      94.471      41.738
     [  9,]     776.385      -2.953       0.001      51.569      86.303      92.390     258.776