%%javascript
IPython.OutputArea.prototype._should_scroll = function(lines) {
return false;
}
This case study is meant to demonstrate how to use gstlearn for migrating information from Db to DbGrid and vice-versa. Note that, in order to test the whole possibilities, we add a selection to both files.
import gstlearn as gl
import gstlearn.plot as gp
import matplotlib.pyplot as plt
import numpy as np
Global parameters.
ndim = 2
gl.defineDefaultSpace(gl.ESpaceType.RN,ndim)
Generate initial data set
data = gl.Db.createFillRandom(ndat=20, ndim=ndim, nvar=1)
data.addSelectionByRanks(np.arange(2,18))
data.display()
Data Base Characteristics ========================= Data Base Summary ----------------- File is organized as a set of isolated points Space dimension = 2 Number of Columns = 5 Maximum Number of UIDs = 5 Total number of samples = 20 Number of active samples = 16 Variables --------- Column = 0 - Name = rank - Locator = NA Column = 1 - Name = x-1 - Locator = x1 Column = 2 - Name = x-2 - Locator = x2 Column = 3 - Name = z - Locator = z1 Column = 4 - Name = NewSel - Locator = sel
gp.plot(data, name_size="z")
Create a grid, over the [0,1] x [0,1]
grid = gl.DbGrid.create([50,50],dx=[0.02,0.02])
grid.addSelectionFromDbByConvexHull(data,0.05)
grid.display()
Data Base Grid Characteristics ============================== Data Base Summary ----------------- File is organized as a regular grid Space dimension = 2 Number of Columns = 4 Maximum Number of UIDs = 4 Total number of samples = 2500 Number of active samples = 1738 Grid characteristics: --------------------- Origin : 0.000 0.000 Mesh : 0.020 0.020 Number : 50 50 Variables --------- Column = 0 - Name = rank - Locator = NA Column = 1 - Name = x1 - Locator = x1 Column = 2 - Name = x2 - Locator = x2 Column = 3 - Name = Hull - Locator = sel
Migrate the information from data to grid. The migration is limited to te cell to which each sample belongs.
err = gl.migrate(data, grid, "z", namconv=gl.NamingConvention("Migrate-NoFill",False))
ax = gp.grid(grid)
ax = gp.plot(data)
In this second attempt, we use the flag_fill option to fill the whole grid. However we compensate by specifying a maximum filling distance. Moreover we make this maximum distance anisotropic for ckecking.
err = gl.migrate(data, grid, "z", flag_fill=True, dmax=[0.1,0.2],
namconv=gl.NamingConvention("Migrate-Fill",False))
ax = gp.grid(grid)
ax = gp.plot(data)
Ball Tree is a special algorithm which makes search much more efficient. The principle is to build a Ball Tree based on a first data set. Then, using this tree, it is easy to consider a second data base and to search for the set of samples of the first data base which are the closest neighbors of each sample of the second data base.
To delonstrate this facility, we consider the case where we cannot benefit from any specific oragnization (grid for example) for any of the two data bases. Moreover (although this is a demnstration file, not a bench marking one), we consider a dense data set for better legibility
ndat = 10000
db = gl.Db.createFillRandom(ndat)
ax = gp.point(db)
ax.decoration(title="Location of samples from the First Data Base")
We build the corresponding BallTree structure
ball = gl.Ball(db, 10, 1)
We now consider one target location and ask for the set of corresponding neighboring samples (say 10 closest samples). The target site is selected as the center of the square.
neigh_size = 500
center = [0.5, 0.5]
knn1 = ball.queryOneAsVD(gl.VectorDouble(center), neigh_size)
The returned argument is a C objet (not mapped explicitly for Pythonn ... yet). However, some (statc) functions are available to retreive the contents of this objects. We use getIndices which returns the vector of ranks of samples constituting the neighborhood of the target site.
target = db.clone()
target.deleteColumn("Selection")
iuid = target.addSelectionByRanks(knn1.getIndices(), "Selection")
As a consequence of the choice of the Euclidean distance, the set of samples neighboring the target (represented as a black square) are located within a circle centered on the target
ax = gp.point(db, size=1)
ax = gp.point(target, color="yellow", flagCst=True, size=1)
ax = gp.sample(center, marker='s')
ax.decoration(title="Euclidean distance")
We can produce a similar figure changing the distance, from Euclidean to Manhattan.
ball = gl.Ball(db, 10, 0)
knn2 = ball.queryOneAsVD(gl.VectorDouble(center), neigh_size)
target = db.clone()
target.deleteColumn("Selection")
iuid = target.addSelectionByRanks(knn2.getIndices(), "Selection")
As a consequence of the choice of the Manhattan distance, the set of samples neighboring the target (represented as a black square) are located within a diamond centered on the target
ax = gp.point(db, size=1)
ax = gp.point(target, color="yellow", flagCst=True, size=1)
ax = gp.sample(center, marker='s')
ax.decoration(title="Manhattan distance")