IPython.OutputArea.prototype._should_scroll = function(lines) {
return false;
import numpy as np
import pandas as pd
import sys
import os
import matplotlib.pyplot as plt
import gstlearn as gl
import gstlearn.plot as gp
Global variables
The data is defined by simulating samples at random within a given box. This study is performed in 2-D but this is not considered as a limitation.
nech = 500
mydb = gl.Db.createFromBox(nech, [0,0], [100, 100])
Data Base Characteristics ========================= Data Base Summary ----------------- File is organized as a set of isolated points Space dimension = 2 Number of Columns = 3 Total number of samples = 500 Variables --------- Column = 0 - Name = rank - Locator = NA Column = 1 - Name = x-1 - Locator = x1 Column = 2 - Name = x-2 - Locator = x2
Displaying the Data set
ax = mydb.plot()
We now define a vector of 0-1 integer values at random again, according to a Bernoulli distribution with a probability of 0.2. This vector is added to the Data Base.
sel = gl.VectorHelper.simulateBernoulli(nech, 0.2)
gl.VectorHelper.displayStats("Statistics on the Selection vector",sel)
iuid = mydb.addColumns(sel,"sel")
Statistics on the Selection vector - Number of samples = 500 / 500 - Minimum = 0.000 - Maximum = 1.000 - Mean = 0.186 - St. Dev. = 0.389
dbfmt = gl.DbStringFormat.createFromFlags(flag_stats=True, names=["sel"])
Data Base Characteristics ========================= Data Base Summary ----------------- File is organized as a set of isolated points Space dimension = 2 Number of Columns = 4 Total number of samples = 500 Data Base Statistics -------------------- 4 - Name sel - Locator NA Nb of data = 500 Nb of active values = 500 Minimum value = 0.000 Maximum value = 1.000 Mean value = 0.186 Standard Deviation = 0.389 Variance = 0.151 Variables --------- Column = 0 - Name = rank - Locator = NA Column = 1 - Name = x-1 - Locator = x1 Column = 2 - Name = x-2 - Locator = x2 Column = 3 - Name = sel - Locator = NA
ax = mydb.plot(nameColor="sel")
We demonstrate the possibility to extract a Data Base by specifying the selected ranks of an Input Data Base.
ranks = gl.VectorHelper.sampleRanks(mydb.getSampleNumber(), proportion=0.2)
print("Number of selected samples =", len(ranks))
Number of selected samples = 100
mydbred1 = gl.Db.createReduce(mydb, ranks=ranks)
ax = mydbred1.plot()
ax.decoration(title="Extraction by Ranks")
We now create turn the variable 'sel' into a selection and createa new data set which is restricted to the only active samples
mydb.setLocator('sel', gl.ELoc.SEL)
mydbred2 = gl.Db.createReduce(mydb)
Data Base Characteristics ========================= Data Base Summary ----------------- File is organized as a set of isolated points Space dimension = 2 Number of Columns = 4 Total number of samples = 93 Number of active samples = 93 Variables --------- Column = 0 - Name = rank - Locator = NA Column = 1 - Name = x-1 - Locator = x1 Column = 2 - Name = x-2 - Locator = x2 Column = 3 - Name = sel - Locator = sel
ax = mydbred2.plot()
ax.decoration(title="Extraction by Selection")