Data Base management (Db)ΒΆ

Import packagesΒΆ

In [1]:
import numpy as np
import pandas as pd
import sys
import os
import matplotlib.pyplot as plt
import gstlearn as gl
import gstlearn.plot as gp
import gstlearn.document as gdoc

gdoc.setNoScroll()

Define the Global variables

In [2]:
gl.OptCst.define(gl.ECst.NTCOL,6)
gl.law_set_random_seed(13414)

Defining a Data setΒΆ

The data is defined by simulating samples at random within a given box. This study is performed in 2-D but this is not considered as a limitation.

In [3]:
nech = 500
mydb = gl.Db.createFromBox(nech, [0,0], [100, 100])
mydb
Out[3]:
Data Base Characteristics
=========================

Data Base Summary
-----------------
File is organized as a set of isolated points
Space dimension              = 2
Number of Columns            = 3
Total number of samples      = 500

Variables
---------
Column = 0 - Name = rank - Locator = NA
Column = 1 - Name = x-1 - Locator = x1
Column = 2 - Name = x-2 - Locator = x2

Displaying the Data set

In [4]:
res = gp.symbol(mydb)
No description has been provided for this image

We now define a vector of 0-1 integer values at random again, according to a Bernoulli distribution with a probability of 0.2. This vector is added to the Data Base.

In [5]:
sel = gl.VectorHelper.simulateBernoulli(nech, 0.2)
gl.VectorHelper.dumpStats("Statistics on the Selection vector",sel)
iuid = mydb.addColumns(sel,"sel")
Statistics on the Selection vector
- Number of samples = 500 / 500
- Minimum  =      0.000
- Maximum  =      1.000
- Mean     =      0.186
- St. Dev. =      0.389
In [6]:
dbfmt = gl.DbStringFormat.createFromFlags(flag_stats=True, names=["sel"])
mydb.display(dbfmt)
Data Base Characteristics
=========================

Data Base Summary
-----------------
File is organized as a set of isolated points
Space dimension              = 2
Number of Columns            = 4
Total number of samples      = 500

Data Base Statistics
--------------------
4 - Name sel - Locator NA
 Nb of data          =        500
 Nb of active values =        500
 Minimum value       =      0.000
 Maximum value       =      1.000
 Mean value          =      0.186
 Standard Deviation  =      0.389
 Variance            =      0.151

Variables
---------
Column = 0 - Name = rank - Locator = NA
Column = 1 - Name = x-1 - Locator = x1
Column = 2 - Name = x-2 - Locator = x2
Column = 3 - Name = sel - Locator = NA
In [7]:
res = gp.symbol(mydb,nameColor="sel")
No description has been provided for this image

Extracting a new Data Base upon ranksΒΆ

We demonstrate the possibility to extract a Data Base by specifying the selected ranks of an Input Data Base.

In [8]:
ranks = gl.VectorHelper.sampleRanks(mydb.getNSample(), proportion=0.2)
print("Number of selected samples =", len(ranks))
Number of selected samples = 100
In [9]:
mydbred1 = gl.Db.createReduce(mydb, ranks=ranks)
In [10]:
gp.symbol(mydbred1)
gp.decoration(title="Extraction by Ranks")
No description has been provided for this image

Extracting a new Data Base upon selectionΒΆ

We now create turn the variable 'sel' into a selection and createa new data set which is restricted to the only active samples

In [11]:
mydb.setLocator('sel', gl.ELoc.SEL)
mydbred2 = gl.Db.createReduce(mydb)
mydbred2
Out[11]:
Data Base Characteristics
=========================

Data Base Summary
-----------------
File is organized as a set of isolated points
Space dimension              = 2
Number of Columns            = 4
Total number of samples      = 93
Number of active samples     = 93

Variables
---------
Column = 0 - Name = rank - Locator = NA
Column = 1 - Name = x-1 - Locator = x1
Column = 2 - Name = x-2 - Locator = x2
Column = 3 - Name = sel - Locator = sel
In [12]:
ax = gp.plot(mydbred2)
gp.decoration(title="Extraction by Selection")
No description has been provided for this image

Defining a Line Data setΒΆ

The data is defined in 2-D as a set of lines at random. The number of lines is provided. Each line contains a number of samples drawn at random.

In [13]:
mydb = gl.DbLine.createFillRandom(ndim=2, nbline=10, nperline=30, delta=[1,-1])

Understanding the contents of the DbLine file created randomly

In [14]:
mydb.display()
Data Base Line Characteristics
==============================
Number of Lines = 10
Line length = 24 / 29 / 30 / 26 / 26 / 22 / 23 / 35 / 23 / 36

Variables
---------
Column = 0 - Name = rank - Locator = NA
Column = 1 - Name = x-1 - Locator = x1
Column = 2 - Name = x-2 - Locator = x2

Displaying the Line Data set

In [15]:
res = gp.line(mydb, flagSample=True, flagAnnotateHeader=True)
No description has been provided for this image

Defining an Oriented Graph Data SetΒΆ

The data is defined in 2-D as a set of samples which are joined by arcs in order to form a Graph.

In [16]:
x1  = np.array([ 0., 1., 2., 3., 4., 5., 6., 2., 3., 4., 5., 6., 7., 3., 4., 2., 3., 4., 0., 5., 7.])
x2  = np.array([ 0., 0., 0., 0., 0., 0., 0., 1., 1., 1., 1., 1., 1., 2., 2., 3., 3., 3., 4., 5., 4.])
z1  = np.array([1.2,2.5,3.6,1.4,0.3,0.2,8.2,0.3,3.2,1.2,0.4,0.1,0.3,3.2,4.5,1.2,5.2,1.2,1.1,2.2,3.3])
tab = np.concatenate((x1, x2, z1))
nech = len(x1)

arcs = gl.MatrixSparse(nech,nech)
arcs.setValue( 0, 1, gl.law_uniform());
arcs.setValue( 1, 2, gl.law_uniform());
arcs.setValue( 2, 3, gl.law_uniform());
arcs.setValue( 3, 4, gl.law_uniform());
arcs.setValue( 4, 5, gl.law_uniform());
arcs.setValue( 5, 6, gl.law_uniform());
arcs.setValue( 2, 7, gl.law_uniform());
arcs.setValue( 7, 8, gl.law_uniform());
arcs.setValue( 8, 9, gl.law_uniform());
arcs.setValue( 9,10, gl.law_uniform());
arcs.setValue(10,11, gl.law_uniform());
arcs.setValue(11,12, gl.law_uniform());
arcs.setValue( 8,13, gl.law_uniform());
arcs.setValue(13,14, gl.law_uniform());
arcs.setValue(14,11, gl.law_uniform());
arcs.setValue( 7,15, gl.law_uniform());
arcs.setValue(15,16, gl.law_uniform());
arcs.setValue(16,17, gl.law_uniform());

dbgraphO = gl.DbGraphO.createFromMatrix(nech, gl.ELoadBy.COLUMN, tab, arcs,
                                        ["x1", "x2", "z1"], ["x1", "x2", "z1"])
In [17]:
gl.OptCst.define(gl.ECst.NTROW,-1)
gl.OptCst.define(gl.ECst.NTCOL,-1)
dbgraphO.display()
Data Base Oriented Graph Characteristics
========================================

Variables
---------
Column = 0 - Name = rank - Locator = NA
Column = 1 - Name = x1 - Locator = x1
Column = 2 - Name = x2 - Locator = x2
Column = 3 - Name = z1 - Locator = z1

Data Base Summary
-----------------
File is organized as a set of isolated points
Space dimension              = 2
Number of Columns            = 4
Total number of samples      = 21
- Number of rows    = 21
- Number of columns = 21
- Sparse Format
               [,  0]    [,  1]    [,  2]    [,  3]    [,  4]    [,  5]    [,  6]
     [  0,]         .     0.066         .         .         .         .         .
     [  1,]         .         .     0.956         .         .         .         .
     [  2,]         .         .         .     0.423         .         .         .
     [  3,]         .         .         .         .     0.420         .         .
     [  4,]         .         .         .         .         .     0.064         .
     [  5,]         .         .         .         .         .         .     0.758
     [  6,]         .         .         .         .         .         .         .
     [  7,]         .         .         .         .         .         .         .
     [  8,]         .         .         .         .         .         .         .
     [  9,]         .         .         .         .         .         .         .
     [ 10,]         .         .         .         .         .         .         .
     [ 11,]         .         .         .         .         .         .         .
     [ 12,]         .         .         .         .         .         .         .
     [ 13,]         .         .         .         .         .         .         .
     [ 14,]         .         .         .         .         .         .         .
     [ 15,]         .         .         .         .         .         .         .
     [ 16,]         .         .         .         .         .         .         .
     [ 17,]         .         .         .         .         .         .         .
     [ 18,]         .         .         .         .         .         .         .
     [ 19,]         .         .         .         .         .         .         .
     [ 20,]         .         .         .         .         .         .         .
               [,  7]    [,  8]    [,  9]    [, 10]    [, 11]    [, 12]    [, 13]
     [  0,]         .         .         .         .         .         .         .
     [  1,]         .         .         .         .         .         .         .
     [  2,]     0.618         .         .         .         .         .         .
     [  3,]         .         .         .         .         .         .         .
     [  4,]         .         .         .         .         .         .         .
     [  5,]         .         .         .         .         .         .         .
     [  6,]         .         .         .         .         .         .         .
     [  7,]         .     0.883         .         .         .         .         .
     [  8,]         .         .     0.703         .         .         .     0.520
     [  9,]         .         .         .     0.864         .         .         .
     [ 10,]         .         .         .         .     0.746         .         .
     [ 11,]         .         .         .         .         .     0.310         .
     [ 12,]         .         .         .         .         .         .         .
     [ 13,]         .         .         .         .         .         .         .
     [ 14,]         .         .         .         .     0.202         .         .
     [ 15,]         .         .         .         .         .         .         .
     [ 16,]         .         .         .         .         .         .         .
     [ 17,]         .         .         .         .         .         .         .
     [ 18,]         .         .         .         .         .         .         .
     [ 19,]         .         .         .         .         .         .         .
     [ 20,]         .         .         .         .         .         .         .
               [, 14]    [, 15]    [, 16]    [, 17]    [, 18]    [, 19]    [, 20]
     [  0,]         .         .         .         .         .         .         .
     [  1,]         .         .         .         .         .         .         .
     [  2,]         .         .         .         .         .         .         .
     [  3,]         .         .         .         .         .         .         .
     [  4,]         .         .         .         .         .         .         .
     [  5,]         .         .         .         .         .         .         .
     [  6,]         .         .         .         .         .         .         .
     [  7,]         .     0.194         .         .         .         .         .
     [  8,]         .         .         .         .         .         .         .
     [  9,]         .         .         .         .         .         .         .
     [ 10,]         .         .         .         .         .         .         .
     [ 11,]         .         .         .         .         .         .         .
     [ 12,]         .         .         .         .         .         .         .
     [ 13,]     0.621         .         .         .         .         .         .
     [ 14,]         .         .         .         .         .         .         .
     [ 15,]         .         .     0.412         .         .         .         .
     [ 16,]         .         .         .     0.221         .         .         .
     [ 17,]         .         .         .         .         .         .         .
     [ 18,]         .         .         .         .         .         .         .
     [ 19,]         .         .         .         .         .         .         .
     [ 20,]         .         .         .         .         .         .         .

In [18]:
gp.plot(dbgraphO,flagSample=True, flagAnnotate=True)
gp.symbol(dbgraphO, nameSize="z1")
plt.show()
No description has been provided for this image

Defining an Meshing Data SetΒΆ

We first concentrate on the Turbo Meshing based on a regular Grid

Case of Turbo MeshingΒΆ

In [19]:
nx = [12,15]
dx = [1.3, 1.1]
tab = np.ones(12*15)
dbmeshT = gl.DbMeshTurbo(nx,dx,gl.VectorDouble(),gl.VectorDouble(),
                        gl.ELoadBy.SAMPLE,tab,["var"],["z1"])
dbmeshT
Out[19]:
Data Base for Turbo Meshing
===========================

Variables
---------
Column = 0 - Name = rank - Locator = NA
Column = 1 - Name = var - Locator = z1

Data Base Summary
-----------------
File is organized as a regular grid
Space dimension              = 2
Number of Columns            = 2
Total number of samples      = 180

Turbo Meshing
=============

Grid characteristics:
---------------------
Origin :      0.000     0.000
Mesh   :      1.300     1.100
Number :         12        15
Euclidean Geometry
Space Dimension           = 2
Number of Apices per Mesh = 3
Number of Meshes          = 308
Number of Apices          = 180

Bounding Box Extension
----------------------
Dim #1 - Min:0 - Max:14.3
Dim #2 - Min:0 - Max:15.4
In [20]:
res = gp.mesh(dbmeshT, flagApex=True)
No description has been provided for this image

The next chunk is meant to demonstrate that the 'DbMesh' can be used as a standard 'DbGrid', in particular for adding new fields to the Data Base.

In [21]:
dbmeshT.addColumnsByConstant(1, 5., "NewVar", gl.ELoc.V)
dbmeshT
Out[21]:
Data Base for Turbo Meshing
===========================

Variables
---------
Column = 0 - Name = rank - Locator = NA
Column = 1 - Name = var - Locator = z1
Column = 2 - Name = NewVar - Locator = v1

Data Base Summary
-----------------
File is organized as a regular grid
Space dimension              = 2
Number of Columns            = 3
Total number of samples      = 180

Turbo Meshing
=============

Grid characteristics:
---------------------
Origin :      0.000     0.000
Mesh   :      1.300     1.100
Number :         12        15
Euclidean Geometry
Space Dimension           = 2
Number of Apices per Mesh = 3
Number of Meshes          = 308
Number of Apices          = 180

Bounding Box Extension
----------------------
Dim #1 - Min:0 - Max:14.3
Dim #2 - Min:0 - Max:15.4

Case of Standard MeshingΒΆ

In [22]:
x1  = np.array([ 0., 1., 1., 2., 3., 4., 5.])
x2  = np.array([ 0., 0., 2., 2., 1., 2., 1.])
apices = np.concatenate((x1, x2))

i1 = np.array([0,1,1,3,4,1])
i2 = np.array([1,2,3,4,5,4])
i3 = np.array([2,3,4,5,6,6])
meshes = np.concatenate((i1,i2,i3))
In [23]:
dbmeshS = gl.DbMeshStandard.create(ndim=2,napexpermesh=3,apices=apices,meshes=meshes)
dbmeshS
Out[23]:
Data Base for Standard Meshing
==============================

Variables
---------
Column = 0 - Name = x-1 - Locator = x1
Column = 1 - Name = x-2 - Locator = x2

Data Base Summary
-----------------
File is organized as a set of isolated points
Space dimension              = 2
Number of Columns            = 2
Total number of samples      = 7

Standard Meshing
================
Euclidean Geometry
Space Dimension           = 2
Number of Apices per Mesh = 3
Number of Meshes          = 6
Number of Apices          = 7

Bounding Box Extension
----------------------
Dim #1 - Min:0 - Max:5
Dim #2 - Min:0 - Max:2
In [24]:
res = gp.mesh(dbmeshS, flagApex=True)
No description has been provided for this image