Data Base management (Db)¶
Import packages¶
import numpy as np
import pandas as pd
import sys
import os
import matplotlib.pyplot as plt
import gstlearn as gl
import gstlearn.plot as gp
import gstlearn.document as gdoc
gdoc.setNoScroll()
Global variables
gl.OptCst.define(gl.ECst.NTCOL,6)
gl.law_set_random_seed(13414)
Defining a Data set¶
The data is defined by simulating samples at random within a given box. This study is performed in 2-D but this is not considered as a limitation.
nech = 500
mydb = gl.Db.createFromBox(nech, [0,0], [100, 100])
mydb
Data Base Characteristics ========================= Data Base Summary ----------------- File is organized as a set of isolated points Space dimension = 2 Number of Columns = 3 Total number of samples = 500 Variables --------- Column = 0 - Name = rank - Locator = NA Column = 1 - Name = x-1 - Locator = x1 Column = 2 - Name = x-2 - Locator = x2
Displaying the Data set
ax = mydb.plot()
We now define a vector of 0-1 integer values at random again, according to a Bernoulli distribution with a probability of 0.2. This vector is added to the Data Base.
sel = gl.VectorHelper.simulateBernoulli(nech, 0.2)
gl.VectorHelper.displayStats("Statistics on the Selection vector",sel)
iuid = mydb.addColumns(sel,"sel")
Statistics on the Selection vector - Number of samples = 500 / 500 - Minimum = 0.000 - Maximum = 1.000 - Mean = 0.186 - St. Dev. = 0.389
dbfmt = gl.DbStringFormat.createFromFlags(flag_stats=True, names=["sel"])
mydb.display(dbfmt)
Data Base Characteristics ========================= Data Base Summary ----------------- File is organized as a set of isolated points Space dimension = 2 Number of Columns = 4 Total number of samples = 500 Data Base Statistics -------------------- 4 - Name sel - Locator NA Nb of data = 500 Nb of active values = 500 Minimum value = 0.000 Maximum value = 1.000 Mean value = 0.186 Standard Deviation = 0.389 Variance = 0.151 Variables --------- Column = 0 - Name = rank - Locator = NA Column = 1 - Name = x-1 - Locator = x1 Column = 2 - Name = x-2 - Locator = x2 Column = 3 - Name = sel - Locator = NA
ax = mydb.plot(nameColor="sel")
Extracting a new Data Base upon ranks¶
We demonstrate the possibility to extract a Data Base by specifying the selected ranks of an Input Data Base.
ranks = gl.VectorHelper.sampleRanks(mydb.getSampleNumber(), proportion=0.2)
print("Number of selected samples =", len(ranks))
Number of selected samples =
100
mydbred1 = gl.Db.createReduce(mydb, ranks=ranks)
ax = mydbred1.plot()
ax.decoration(title="Extraction by Ranks")
Extracting a new Data Base upon selection¶
We now create turn the variable 'sel' into a selection and createa new data set which is restricted to the only active samples
mydb.setLocator('sel', gl.ELoc.SEL)
mydbred2 = gl.Db.createReduce(mydb)
mydbred2
Data Base Characteristics ========================= Data Base Summary ----------------- File is organized as a set of isolated points Space dimension = 2 Number of Columns = 4 Total number of samples = 93 Number of active samples = 93 Variables --------- Column = 0 - Name = rank - Locator = NA Column = 1 - Name = x-1 - Locator = x1 Column = 2 - Name = x-2 - Locator = x2 Column = 3 - Name = sel - Locator = sel
ax = mydbred2.plot()
ax.decoration(title="Extraction by Selection")
Defining a Line Data set¶
The data is defined in 2-D as a set of lines at random. The number of lines is provided. Each line contains a number of samples drawn at random.
mydb = gl.DbLine.createFillRandom(ndim=2, nbline=10, nperline=30, delta=[1,-1])
Understanding the contents of the DbLine file created randomly
mydb.display()
Data Base Line Characteristics ============================== Number of Lines = 10 Line length = 24 / 29 / 30 / 26 / 26 / 22 / 23 / 35 / 23 / 36 Variables --------- Column = 0 - Name = rank - Locator = NA Column = 1 - Name = x-1 - Locator = x1 Column = 2 - Name = x-2 - Locator = x2
Displaying the Line Data set
ax = mydb.plot(flagSample=True, flagAnnotateHeader=True)
Defining an Oriented Graph Data Set¶
The data is defined in 2-D as a set of samples which are joined by arcs in order to form a Graph.
x1 = np.array([ 0., 1., 2., 3., 4., 5., 6., 2., 3., 4., 5., 6., 7., 3., 4., 2., 3., 4., 0., 5., 7.])
x2 = np.array([ 0., 0., 0., 0., 0., 0., 0., 1., 1., 1., 1., 1., 1., 2., 2., 3., 3., 3., 4., 5., 4.])
z1 = np.array([1.2,2.5,3.6,1.4,0.3,0.2,8.2,0.3,3.2,1.2,0.4,0.1,0.3,3.2,4.5,1.2,5.2,1.2,1.1,2.2,3.3])
tab = np.concatenate((x1, x2, z1))
nech = len(x1)
arcs = gl.MatrixSparse(nech,nech)
arcs.setValue( 0, 1, gl.law_uniform());
arcs.setValue( 1, 2, gl.law_uniform());
arcs.setValue( 2, 3, gl.law_uniform());
arcs.setValue( 3, 4, gl.law_uniform());
arcs.setValue( 4, 5, gl.law_uniform());
arcs.setValue( 5, 6, gl.law_uniform());
arcs.setValue( 2, 7, gl.law_uniform());
arcs.setValue( 7, 8, gl.law_uniform());
arcs.setValue( 8, 9, gl.law_uniform());
arcs.setValue( 9,10, gl.law_uniform());
arcs.setValue(10,11, gl.law_uniform());
arcs.setValue(11,12, gl.law_uniform());
arcs.setValue( 8,13, gl.law_uniform());
arcs.setValue(13,14, gl.law_uniform());
arcs.setValue(14,11, gl.law_uniform());
arcs.setValue( 7,15, gl.law_uniform());
arcs.setValue(15,16, gl.law_uniform());
arcs.setValue(16,17, gl.law_uniform());
dbgraphO = gl.DbGraphO.createFromMatrix(nech, gl.ELoadBy.COLUMN, tab, arcs,
["x1", "x2", "z1"], ["x1", "x2", "z1"])
gl.OptCst.define(gl.ECst.NTROW,-1)
gl.OptCst.define(gl.ECst.NTCOL,-1)
dbgraphO.display()
Data Base Oriented Graph Characteristics ======================================== Variables --------- Column = 0 - Name = rank - Locator = NA Column = 1 - Name = x1 - Locator = x1 Column = 2 - Name = x2 - Locator = x2 Column = 3 - Name = z1 - Locator = z1 Data Base Summary ----------------- File is organized as a set of isolated points Space dimension = 2 Number of Columns = 4 Total number of samples = 21 - Number of rows = 21 - Number of columns = 21 - Sparse Format [, 0] [, 1] [, 2] [, 3] [, 4] [, 5] [, 6] [ 0,] . 0.066 . . . . . [ 1,] . . 0.956 . . . . [ 2,] . . . 0.423 . . . [ 3,] . . . . 0.420 . . [ 4,] . . . . . 0.064 . [ 5,] . . . . . . 0.758 [ 6,] . . . . . . . [ 7,] . . . . . . . [ 8,] . . . . . . . [ 9,] . . . . . . . [ 10,] . . . . . . . [ 11,] . . . . . . . [ 12,] . . . . . . . [ 13,] . . . . . . . [ 14,] . . . . . . . [ 15,] . . . . . . . [ 16,] . . . . . . . [ 17,] . . . . . . . [ 18,] . . . . . . . [ 19,] . . . . . . . [ 20,] . . . . . . . [, 7] [, 8] [, 9] [, 10] [, 11] [, 12] [, 13] [ 0,] . . . . . . . [ 1,] . . . . . . . [ 2,] 0.618 . . . . . . [ 3,] . . . . . . . [ 4,] . . . . . . . [ 5,] . . . . . . . [ 6,] . . . . . . . [ 7,] . 0.883 . . . . . [ 8,] . . 0.703 . . . 0.520 [ 9,] . . . 0.864 . . . [ 10,] . . . . 0.746 . . [ 11,] . . . . . 0.310 . [ 12,] . . . . . . . [ 13,] . . . . . . . [ 14,] . . . . 0.202 . . [ 15,] . . . . . . . [ 16,] . . . . . . . [ 17,] . . . . . . . [ 18,] . . . . . . . [ 19,] . . . . . . . [ 20,] . . . . . . . [, 14] [, 15] [, 16] [, 17] [, 18] [, 19] [, 20] [ 0,] . . . . . . . [ 1,] . . . . . . . [ 2,] . . . . . . . [ 3,] . . . . . . . [ 4,] . . . . . . . [ 5,] . . . . . . . [ 6,] . . . . . . . [ 7,] . 0.194 . . . . . [ 8,] . . . . . . . [ 9,] . . . . . . . [ 10,] . . . . . . . [ 11,] . . . . . . . [ 12,] . . . . . . . [ 13,] 0.621 . . . . . . [ 14,] . . . . . . . [ 15,] . . 0.412 . . . . [ 16,] . . . 0.221 . . . [ 17,] . . . . . . . [ 18,] . . . . . . . [ 19,] . . . . . . . [ 20,] . . . . . . .
ax = dbgraphO.plot(flagSample=True, flagAnnotate=True)
ax = dbgraphO.point(nameSize="z1")
Defining an Meshing Data Set¶
We first concentrate on the Turbo Meshing based on a regular Grid
Case of Turbo Meshing¶
nx = [12,15]
dx = [1.3, 1.1]
tab = np.ones(12*15)
dbmeshT = gl.DbMeshTurbo(nx,dx,gl.VectorDouble(),gl.VectorDouble(),
gl.ELoadBy.SAMPLE,tab,["var"],["z1"])
dbmeshT
Data Base for Turbo Meshing =========================== Variables --------- Column = 0 - Name = rank - Locator = NA Column = 1 - Name = var - Locator = z1 Data Base Summary ----------------- File is organized as a regular grid Space dimension = 2 Number of Columns = 2 Total number of samples = 180 Turbo Meshing ============= Grid characteristics: --------------------- Origin : 0.000 0.000 Mesh : 1.300 1.100 Number : 12 15 Euclidean Geometry Space Dimension = 2 Number of Apices per Mesh = 3 Number of Meshes = 308 Number of Apices = 180 Bounding Box Extension ---------------------- Dim #1 - Min:0 - Max:14.3 Dim #2 - Min:0 - Max:15.4
ax = dbmeshT.plot()
ax = dbmeshT.point()
The next chunk is meant to demonstrate that the 'DbMesh' can be used as a standard 'DbGrid', in particular for adding new fields to the Data Base.
dbmeshT.addColumnsByConstant(1, 5., "NewVar", gl.ELoc.V)
dbmeshT
Data Base for Turbo Meshing =========================== Variables --------- Column = 0 - Name = rank - Locator = NA Column = 1 - Name = var - Locator = z1 Column = 2 - Name = NewVar - Locator = v1 Data Base Summary ----------------- File is organized as a regular grid Space dimension = 2 Number of Columns = 3 Total number of samples = 180 Turbo Meshing ============= Grid characteristics: --------------------- Origin : 0.000 0.000 Mesh : 1.300 1.100 Number : 12 15 Euclidean Geometry Space Dimension = 2 Number of Apices per Mesh = 3 Number of Meshes = 308 Number of Apices = 180 Bounding Box Extension ---------------------- Dim #1 - Min:0 - Max:14.3 Dim #2 - Min:0 - Max:15.4
Case of Standard Meshing¶
x1 = np.array([ 0., 1., 1., 2., 3., 4., 5.])
x2 = np.array([ 0., 0., 2., 2., 1., 2., 1.])
apices = np.concatenate((x1, x2))
i1 = np.array([0,1,1,3,4,1])
i2 = np.array([1,2,3,4,5,4])
i3 = np.array([2,3,4,5,6,6])
meshes = np.concatenate((i1,i2,i3))
dbmeshS = gl.DbMeshStandard.create(ndim=2,napexpermesh=3,apices=apices,meshes=meshes)
dbmeshS
Data Base for Standard Meshing ============================== Variables --------- Column = 0 - Name = x-1 - Locator = x1 Column = 1 - Name = x-2 - Locator = x2 Data Base Summary ----------------- File is organized as a set of isolated points Space dimension = 2 Number of Columns = 2 Total number of samples = 7 Standard Meshing ================ Euclidean Geometry Space Dimension = 2 Number of Apices per Mesh = 3 Number of Meshes = 6 Number of Apices = 7 Bounding Box Extension ---------------------- Dim #1 - Min:0 - Max:5 Dim #2 - Min:0 - Max:2
ax = dbmeshS.plot()
ax = dbmeshS.point()