%%javascript
IPython.OutputArea.prototype._should_scroll = function(lines) {
return false;
}
import numpy as np
import pandas as pd
import sys
import os
import matplotlib.pyplot as plt
import gstlearn as gl
import gstlearn.plot as gp
Global variables
gl.OptCst.define(gl.ECst.NTCOL,6)
gl.law_set_random_seed(13414)
The data is defined by simulating samples at random within a given box. This study is performed in 2-D but this is not considered as a limitation.
nech = 500
mydb = gl.Db.createFromBox(nech, [0,0], [100, 100])
mydb
Data Base Characteristics ========================= Data Base Summary ----------------- File is organized as a set of isolated points Space dimension = 2 Number of Columns = 3 Total number of samples = 500 Variables --------- Column = 0 - Name = rank - Locator = NA Column = 1 - Name = x-1 - Locator = x1 Column = 2 - Name = x-2 - Locator = x2
Displaying the Data set
ax = mydb.plot()
We now define a vector of 0-1 integer values at random again, according to a Bernoulli distribution with a probability of 0.2. This vector is added to the Data Base.
sel = gl.VectorHelper.simulateBernoulli(nech, 0.2)
gl.VectorHelper.displayStats("Statistics on the Selection vector",sel)
iuid = mydb.addColumns(sel,"sel")
Statistics on the Selection vector - Number of samples = 500 / 500 - Minimum = 0.000 - Maximum = 1.000 - Mean = 0.186 - St. Dev. = 0.389
dbfmt = gl.DbStringFormat.createFromFlags(flag_stats=True, names=["sel"])
mydb.display(dbfmt)
Data Base Characteristics ========================= Data Base Summary ----------------- File is organized as a set of isolated points Space dimension = 2 Number of Columns = 4 Total number of samples = 500 Data Base Statistics -------------------- 4 - Name sel - Locator NA Nb of data = 500 Nb of active values = 500 Minimum value = 0.000 Maximum value = 1.000 Mean value = 0.186 Standard Deviation = 0.389 Variance = 0.151 Variables --------- Column = 0 - Name = rank - Locator = NA Column = 1 - Name = x-1 - Locator = x1 Column = 2 - Name = x-2 - Locator = x2 Column = 3 - Name = sel - Locator = NA
ax = mydb.plot(nameColor="sel")
We demonstrate the possibility to extract a Data Base by specifying the selected ranks of an Input Data Base.
ranks = gl.VectorHelper.sampleRanks(mydb.getSampleNumber(), proportion=0.2)
print("Number of selected samples =", len(ranks))
Number of selected samples = 100
mydbred1 = gl.Db.createReduce(mydb, ranks=ranks)
ax = mydbred1.plot()
ax.decoration(title="Extraction by Ranks")
We now create turn the variable 'sel' into a selection and createa new data set which is restricted to the only active samples
mydb.setLocator('sel', gl.ELoc.SEL)
mydbred2 = gl.Db.createReduce(mydb)
mydbred2
Data Base Characteristics ========================= Data Base Summary ----------------- File is organized as a set of isolated points Space dimension = 2 Number of Columns = 4 Total number of samples = 93 Number of active samples = 93 Variables --------- Column = 0 - Name = rank - Locator = NA Column = 1 - Name = x-1 - Locator = x1 Column = 2 - Name = x-2 - Locator = x2 Column = 3 - Name = sel - Locator = sel
ax = mydbred2.plot()
ax.decoration(title="Extraction by Selection")