Data Base management (Db)¶

Import packages¶

In [1]:
import numpy as np
import matplotlib.pyplot as plt
import gstlearn as gl
import gstlearn.plot as gp
import gstlearn.document as gdoc

gdoc.setNoScroll()

Define the Global variables

In [2]:
gl.OptCst.define(gl.ECst.NTCOL, 6)
gl.law_set_random_seed(13414)

Defining a Data set¶

The data is defined by simulating samples at random within a given box. This study is performed in 2-D but this is not considered as a limitation.

In [3]:
nech = 500
mydb = gl.Db.createFromBox(nech, [0, 0], [100, 100])
mydb
Out[3]:
Data Base Characteristics
=========================

Data Base Summary
-----------------
File is organized as a set of isolated points
Space dimension              = 2
Number of Columns            = 3
Total number of samples      = 500

Variables
---------
Column = 0 - Name = rank - Locator = NA
Column = 1 - Name = x-1 - Locator = x1
Column = 2 - Name = x-2 - Locator = x2

Displaying the Data set

In [4]:
res = gp.symbol(mydb)
No description has been provided for this image

We now define a vector of 0-1 integer values at random again, according to a Bernoulli distribution with a probability of 0.2. This vector is added to the Data Base.

In [5]:
sel = gl.VectorHelper.simulateBernoulli(nech, 0.2)
gl.VectorHelper.dumpStats("Statistics on the Selection vector", sel)
iuid = mydb.addColumns(sel, "sel")
Statistics on the Selection vector
- Number of samples = 500 / 500
- Minimum  =       0.000
- Maximum  =       1.000
- Mean     =       0.186
- St. Dev. =       0.389
In [6]:
dbfmt = gl.DbStringFormat.createFromFlags(flag_stats=True, names=["sel"])
mydb.display(dbfmt)
Data Base Characteristics
=========================

Data Base Summary
-----------------
File is organized as a set of isolated points
Space dimension              = 2
Number of Columns            = 4
Total number of samples      = 500

Data Base Statistics
--------------------
4 - Name sel - Locator NA
 Nb of data          =         500
 Nb of active values =         500
 Minimum value       =       0.000
 Maximum value       =       1.000
 Mean value          =       0.186
 Standard Deviation  =       0.389
 Variance            =       0.151

Variables
---------
Column = 0 - Name = rank - Locator = NA
Column = 1 - Name = x-1 - Locator = x1
Column = 2 - Name = x-2 - Locator = x2
Column = 3 - Name = sel - Locator = NA
In [7]:
res = gp.symbol(mydb, nameColor="sel")
No description has been provided for this image

Extracting a new Data Base upon ranks¶

We demonstrate the possibility to extract a Data Base by specifying the selected ranks of an Input Data Base.

In [8]:
ranks = gl.VectorHelper.sampleRanks(mydb.getNSample(), proportion=0.2)
print("Number of selected samples =", len(ranks))
Number of selected samples = 100
In [9]:
mydbred1 = gl.Db.createReduce(mydb, ranks=ranks)
In [10]:
gp.symbol(mydbred1)
gp.decoration(title="Extraction by Ranks")
No description has been provided for this image

Extracting a new Data Base upon selection¶

We now create turn the variable 'sel' into a selection and createa new data set which is restricted to the only active samples

In [11]:
mydb.setLocator("sel", gl.ELoc.SEL)
mydbred2 = gl.Db.createReduce(mydb)
mydbred2
Out[11]:
Data Base Characteristics
=========================

Data Base Summary
-----------------
File is organized as a set of isolated points
Space dimension              = 2
Number of Columns            = 4
Total number of samples      = 93
Number of active samples     = 93

Variables
---------
Column = 0 - Name = rank - Locator = NA
Column = 1 - Name = x-1 - Locator = x1
Column = 2 - Name = x-2 - Locator = x2
Column = 3 - Name = sel - Locator = sel
In [12]:
ax = gp.plot(mydbred2)
gp.decoration(title="Extraction by Selection")
No description has been provided for this image

Defining a Line Data set¶

The data is defined in 2-D as a set of lines at random. The number of lines is provided. Each line contains a number of samples drawn at random.

In [13]:
mydb = gl.DbLine.createFillRandom(ndim=2, nbline=10, nperline=30, delta=[1, -1])

Understanding the contents of the DbLine file created randomly

In [14]:
mydb.display()
Data Base Line Characteristics
==============================
Number of Lines = 10
Number of samples = 274
Line length = 24 / 29 / 30 / 26 / 26 / 22 / 23 / 35 / 23 / 36

Variables
---------
Column = 0 - Name = rank - Locator = NA
Column = 1 - Name = x-1 - Locator = x1
Column = 2 - Name = x-2 - Locator = x2

Displaying the Line Data set

In [15]:
res = gp.line(mydb, flagSample=True, flagAnnotateHeader=True)
No description has been provided for this image

Defining an Oriented Graph Data Set¶

The data is defined in 2-D as a set of samples which are joined by arcs in order to form a Graph.

In [16]:
x1 = np.array(
    [
        0.0,
        1.0,
        2.0,
        3.0,
        4.0,
        5.0,
        6.0,
        2.0,
        3.0,
        4.0,
        5.0,
        6.0,
        7.0,
        3.0,
        4.0,
        2.0,
        3.0,
        4.0,
        0.0,
        5.0,
        7.0,
    ]
)
x2 = np.array(
    [
        0.0,
        0.0,
        0.0,
        0.0,
        0.0,
        0.0,
        0.0,
        1.0,
        1.0,
        1.0,
        1.0,
        1.0,
        1.0,
        2.0,
        2.0,
        3.0,
        3.0,
        3.0,
        4.0,
        5.0,
        4.0,
    ]
)
z1 = np.array(
    [
        1.2,
        2.5,
        3.6,
        1.4,
        0.3,
        0.2,
        8.2,
        0.3,
        3.2,
        1.2,
        0.4,
        0.1,
        0.3,
        3.2,
        4.5,
        1.2,
        5.2,
        1.2,
        1.1,
        2.2,
        3.3,
    ]
)
tab = np.concatenate((x1, x2, z1))
nech = len(x1)

arcs = gl.MatrixSparse(nech, nech)
arcs.setValue(0, 1, gl.law_uniform())
arcs.setValue(1, 2, gl.law_uniform())
arcs.setValue(2, 3, gl.law_uniform())
arcs.setValue(3, 4, gl.law_uniform())
arcs.setValue(4, 5, gl.law_uniform())
arcs.setValue(5, 6, gl.law_uniform())
arcs.setValue(2, 7, gl.law_uniform())
arcs.setValue(7, 8, gl.law_uniform())
arcs.setValue(8, 9, gl.law_uniform())
arcs.setValue(9, 10, gl.law_uniform())
arcs.setValue(10, 11, gl.law_uniform())
arcs.setValue(11, 12, gl.law_uniform())
arcs.setValue(8, 13, gl.law_uniform())
arcs.setValue(13, 14, gl.law_uniform())
arcs.setValue(14, 11, gl.law_uniform())
arcs.setValue(7, 15, gl.law_uniform())
arcs.setValue(15, 16, gl.law_uniform())
arcs.setValue(16, 17, gl.law_uniform())
dbgraphO = gl.DbGraphO.createFromMatrix(
    nech, gl.ELoadBy.COLUMN, tab, arcs, ["x1", "x2", "z1"], ["x1", "x2", "z1"]
)
In [17]:
gl.OptCst.define(gl.ECst.NTROW, -1)
gl.OptCst.define(gl.ECst.NTCOL, -1)
dbgraphO.display()
Data Base Oriented Graph Characteristics
========================================

Variables
---------
Column = 0 - Name = rank - Locator = NA
Column = 1 - Name = x1 - Locator = x1
Column = 2 - Name = x2 - Locator = x2
Column = 3 - Name = z1 - Locator = z1

Data Base Summary
-----------------
File is organized as a set of isolated points
Space dimension              = 2
Number of Columns            = 4
Total number of samples      = 21
- Number of rows    = 21
- Number of columns = 21
- Sparse Format
                 [,  0]     [,  1]     [,  2]     [,  3]     [,  4]     [,  5]     [,  6]
      [  0,]          .      0.066          .          .          .          .          .
      [  1,]          .          .      0.956          .          .          .          .
      [  2,]          .          .          .      0.423          .          .          .
      [  3,]          .          .          .          .      0.420          .          .
      [  4,]          .          .          .          .          .      0.064          .
      [  5,]          .          .          .          .          .          .      0.758
      [  6,]          .          .          .          .          .          .          .
      [  7,]          .          .          .          .          .          .          .
      [  8,]          .          .          .          .          .          .          .
      [  9,]          .          .          .          .          .          .          .
      [ 10,]          .          .          .          .          .          .          .
      [ 11,]          .          .          .          .          .          .          .
      [ 12,]          .          .          .          .          .          .          .
      [ 13,]          .          .          .          .          .          .          .
      [ 14,]          .          .          .          .          .          .          .
      [ 15,]          .          .          .          .          .          .          .
      [ 16,]          .          .          .          .          .          .          .
      [ 17,]          .          .          .          .          .          .          .
      [ 18,]          .          .          .          .          .          .          .
      [ 19,]          .          .          .          .          .          .          .
      [ 20,]          .          .          .          .          .          .          .
                 [,  7]     [,  8]     [,  9]     [, 10]     [, 11]     [, 12]     [, 13]
      [  0,]          .          .          .          .          .          .          .
      [  1,]          .          .          .          .          .          .          .
      [  2,]      0.618          .          .          .          .          .          .
      [  3,]          .          .          .          .          .          .          .
      [  4,]          .          .          .          .          .          .          .
      [  5,]          .          .          .          .          .          .          .
      [  6,]          .          .          .          .          .          .          .
      [  7,]          .      0.883          .          .          .          .          .
      [  8,]          .          .      0.703          .          .          .      0.520
      [  9,]          .          .          .      0.864          .          .          .
      [ 10,]          .          .          .          .      0.746          .          .
      [ 11,]          .          .          .          .          .      0.310          .
      [ 12,]          .          .          .          .          .          .          .
      [ 13,]          .          .          .          .          .          .          .
      [ 14,]          .          .          .          .      0.202          .          .
      [ 15,]          .          .          .          .          .          .          .
      [ 16,]          .          .          .          .          .          .          .
      [ 17,]          .          .          .          .          .          .          .
      [ 18,]          .          .          .          .          .          .          .
      [ 19,]          .          .          .          .          .          .          .
      [ 20,]          .          .          .          .          .          .          .
                 [, 14]     [, 15]     [, 16]     [, 17]     [, 18]     [, 19]     [, 20]
      [  0,]          .          .          .          .          .          .          .
      [  1,]          .          .          .          .          .          .          .
      [  2,]          .          .          .          .          .          .          .
      [  3,]          .          .          .          .          .          .          .
      [  4,]          .          .          .          .          .          .          .
      [  5,]          .          .          .          .          .          .          .
      [  6,]          .          .          .          .          .          .          .
      [  7,]          .      0.194          .          .          .          .          .
      [  8,]          .          .          .          .          .          .          .
      [  9,]          .          .          .          .          .          .          .
      [ 10,]          .          .          .          .          .          .          .
      [ 11,]          .          .          .          .          .          .          .
      [ 12,]          .          .          .          .          .          .          .
      [ 13,]      0.621          .          .          .          .          .          .
      [ 14,]          .          .          .          .          .          .          .
      [ 15,]          .          .      0.412          .          .          .          .
      [ 16,]          .          .          .      0.221          .          .          .
      [ 17,]          .          .          .          .          .          .          .
      [ 18,]          .          .          .          .          .          .          .
      [ 19,]          .          .          .          .          .          .          .
      [ 20,]          .          .          .          .          .          .          .

In [18]:
gp.plot(dbgraphO, flagSample=True, flagAnnotate=True)
gp.symbol(dbgraphO, nameSize="z1")
plt.show()
No description has been provided for this image

Defining an Meshing Data Set¶

We first concentrate on the Turbo Meshing based on a regular Grid

Case of Turbo Meshing¶

In [19]:
nx = [12, 15]
dx = [1.3, 1.1]
tab = np.ones(12 * 15)
dbmeshT = gl.DbMeshTurbo(
    nx,
    dx,
    gl.VectorDouble(),
    gl.VectorDouble(),
    gl.ELoadBy.SAMPLE,
    tab,
    ["var"],
    ["z1"],
)
dbmeshT
Out[19]:
Data Base for Turbo Meshing
===========================

Variables
---------
Column = 0 - Name = rank - Locator = NA
Column = 1 - Name = var - Locator = z1

Data Base Summary
-----------------
File is organized as a regular grid
Space dimension              = 2
Number of Columns            = 2
Total number of samples      = 180

Turbo Meshing
=============

Grid characteristics:
---------------------
Origin :       0.000      0.000
Mesh   :       1.300      1.100
Number :          12         15
Euclidean Geometry
Space Dimension           = 2
Number of Apices per Mesh = 3
Number of Meshes          = 308
Number of Apices          = 180

Bounding Box Extension
----------------------
Dim #1 - Min:0 - Max:14.3
Dim #2 - Min:0 - Max:15.4
In [20]:
res = gp.mesh(dbmeshT, flagApex=True)
No description has been provided for this image

The next chunk is meant to demonstrate that the 'DbMesh' can be used as a standard 'DbGrid', in particular for adding new fields to the Data Base.

In [21]:
dbmeshT.addColumnsByConstant(1, 5.0, "NewVar", gl.ELoc.V)
dbmeshT
Out[21]:
Data Base for Turbo Meshing
===========================

Variables
---------
Column = 0 - Name = rank - Locator = NA
Column = 1 - Name = var - Locator = z1
Column = 2 - Name = NewVar - Locator = v1

Data Base Summary
-----------------
File is organized as a regular grid
Space dimension              = 2
Number of Columns            = 3
Total number of samples      = 180

Turbo Meshing
=============

Grid characteristics:
---------------------
Origin :       0.000      0.000
Mesh   :       1.300      1.100
Number :          12         15
Euclidean Geometry
Space Dimension           = 2
Number of Apices per Mesh = 3
Number of Meshes          = 308
Number of Apices          = 180

Bounding Box Extension
----------------------
Dim #1 - Min:0 - Max:14.3
Dim #2 - Min:0 - Max:15.4

Case of Standard Meshing¶

In [22]:
x1 = np.array([0.0, 1.0, 1.0, 2.0, 3.0, 4.0, 5.0])
x2 = np.array([0.0, 0.0, 2.0, 2.0, 1.0, 2.0, 1.0])
apices = np.concatenate((x1, x2))

i1 = np.array([0, 1, 1, 3, 4, 1])
i2 = np.array([1, 2, 3, 4, 5, 4])
i3 = np.array([2, 3, 4, 5, 6, 6])
meshes = np.concatenate((i1, i2, i3))
In [23]:
dbmeshS = gl.DbMeshStandard.create(ndim=2, napexpermesh=3, apices=apices, meshes=meshes)
dbmeshS
Out[23]:
Data Base for Standard Meshing
==============================

Variables
---------
Column = 0 - Name = x-1 - Locator = x1
Column = 1 - Name = x-2 - Locator = x2

Data Base Summary
-----------------
File is organized as a set of isolated points
Space dimension              = 2
Number of Columns            = 2
Total number of samples      = 7

Standard Meshing
================
Euclidean Geometry
Space Dimension           = 2
Number of Apices per Mesh = 3
Number of Meshes          = 6
Number of Apices          = 7

Bounding Box Extension
----------------------
Dim #1 - Min:0 - Max:5
Dim #2 - Min:0 - Max:2
In [24]:
res = gp.mesh(dbmeshS, flagApex=True)
No description has been provided for this image