Migration Facility¶
This case study is meant to demonstrate how to use gstlearn for migrating information from Db to DbGrid and vice-versa. Note that, in order to test the whole possibilities, we add a selection to both files.
import gstlearn as gl
import gstlearn.plot as gp
import gstlearn.document as gdoc
import matplotlib.pyplot as plt
import numpy as np
gdoc.setNoScroll()
General algorithm¶
Global parameters.
ndim = 2
gl.defineDefaultSpace(gl.ESpaceType.RN, ndim)
Generate initial data set
data = gl.Db.createFillRandom(ndat=20, ndim=ndim, nvar=1)
data.addSelectionByRanks(np.arange(2, 18))
data.display()
Data Base Characteristics ========================= Data Base Summary ----------------- File is organized as a set of isolated points Space dimension = 2 Number of Columns = 5 Total number of samples = 20 Number of active samples = 16 Variables --------- Column = 0 - Name = rank - Locator = NA Column = 1 - Name = x-1 - Locator = x1 Column = 2 - Name = x-2 - Locator = x2 Column = 3 - Name = z - Locator = z1 Column = 4 - Name = NewSel - Locator = sel
res = gp.plot(data, nameSize="z")
Create a grid, over the [0,1] x [0,1]
grid = gl.DbGrid.create([50, 50], dx=[0.02, 0.02])
grid.addSelectionFromDbByConvexHull(data, 0.05)
grid.display()
Data Base Grid Characteristics ============================== Data Base Summary ----------------- File is organized as a regular grid Space dimension = 2 Number of Columns = 4 Total number of samples = 2500 Number of active samples = 1738 Grid characteristics: --------------------- Origin : 0.000 0.000 Mesh : 0.020 0.020 Number : 50 50 Variables --------- Column = 0 - Name = rank - Locator = NA Column = 1 - Name = x1 - Locator = x1 Column = 2 - Name = x2 - Locator = x2 Column = 3 - Name = Hull - Locator = sel
Migrate the information from data to grid. The migration is limited to te cell to which each sample belongs.
err = gl.migrate(data, grid, "z", namconv=gl.NamingConvention("Migrate-NoFill", False))
gp.raster(grid)
gp.plot(data)
plt.show()
In this second attempt, we use the flag_fill option to fill the whole grid. However we compensate by specifying a maximum filling distance. Moreover we make this maximum distance anisotropic for ckecking.
err = gl.migrate(
data,
grid,
"z",
flag_fill=True,
dmax=[0.1, 0.2],
namconv=gl.NamingConvention("Migrate-Fill", False),
)
gp.raster(grid)
gp.plot(data)
plt.show()
Ball Tree¶
Ball Tree is a special algorithm which makes search much more efficient. The principle is to build a Ball Tree based on a first data set. Then, using this tree, it is easy to consider a second data base and to search for the set of samples of the first data base which are the closest neighbors of each sample of the second data base.
To delonstrate this facility, we consider the case where we cannot benefit from any specific oragnization (grid for example) for any of the two data bases. Moreover (although this is a demnstration file, not a bench marking one), we consider a dense data set for better legibility
ndat = 10000
db = gl.Db.createFillRandom(ndat)
gp.symbol(db)
gp.decoration(title="Location of samples from the First Data Base")
We build the corresponding BallTree structure
ball = gl.Ball(db, None, 10, True, 1)
We now consider one target location and ask for the set of corresponding neighboring samples (say 10 closest samples). The target site is selected as the center of the square.
neigh_size = 500
center = [0.5, 0.5]
knn1 = ball.queryOneAsVD(gl.VectorDouble(center), neigh_size)
The returned argument is a C objet (not mapped explicitly for Pythonn ... yet). However, some (statc) functions are available to retreive the contents of this objects. We use getIndices which returns the vector of ranks of samples constituting the neighborhood of the target site.
target = db.clone()
target.deleteColumn("Selection")
iuid = target.addSelectionByRanks(knn1.getIndices(), "Selection")
As a consequence of the choice of the Euclidean distance, the set of samples neighboring the target (represented as a black square) are located within a circle centered on the target
gp.symbol(db, s=1)
gp.symbol(target, c="yellow", flagCst=True, s=1)
gp.sample(center, marker="s")
gp.decoration(title="Euclidean distance")
We can produce a similar figure changing the distance, from Euclidean to Manhattan.
ball = gl.Ball(db, None, 10, True, 2)
knn2 = ball.queryOneAsVD(gl.VectorDouble(center), neigh_size)
target = db.clone()
target.deleteColumn("Selection")
iuid = target.addSelectionByRanks(knn2.getIndices(), "Selection")
As a consequence of the choice of the Manhattan distance, the set of samples neighboring the target (represented as a black square) are located within a diamond centered on the target
gp.symbol(db, s=1)
gp.symbol(target, c="yellow", flagCst=True, s=1)
gp.sample(center, marker="s")
gp.decoration(title="Manhattan distance")
Simple exercise¶
In this exercise (due to Mike Pereira), we double check the migration between a coarse and a fine regular 2-D grids (possibly considered as isolated points)
We first create the two grids: they overlap and have coinciding nodes:
- db1 stands for the fine grid
- db2 stands for the coarse grid
We convert them into Dbs containing the nodes considered as isolated points:
- db1b coincides with db1
- db2b coincides with db2
The variable which is systematically migrated is the sample rank( named 'rank'), which makes the test easy to understand.
nxs = 100
db1 = gl.DbGrid.create(nx=[nxs + 1, nxs + 1], dx=[1.0 / nxs, 1.0 / nxs])
Ns = 5
db2 = gl.DbGrid.create(nx=[Ns, Ns], dx=[0.2, 0.2], x0=[0.1, 0.1])
db2b = gl.Db()
db2b.addColumns(db2.getColumn("rank"), "rank")
db2b.addColumns(db2.getColumn("x1"), "x1", gl.ELoc.X, 0)
db2b.addColumns(db2.getColumn("x2"), "x2", gl.ELoc.X, 1)
db1b = gl.Db()
db1b.addColumns(db1.getColumn("rank"), "rank")
db1b.addColumns(db1.getColumn("x1"), "x1", gl.ELoc.X, 0)
db1b.addColumns(db1.getColumn("x2"), "x2", gl.ELoc.X, 1)
2
Plotting the different set of information (as points for better legibility)
gp.symbol(db1b, c="red", flagCst=True, s=1)
gp.symbol(db2b, c="black", flagCst=True, s=2)
gp.decoration(title="Two sets overlaid")
In all subsequent tests, we migrate the information from the coarse grid (either considered as a grid or as a set of isolated points) onto the fine grid (either considered as a grid or as a set of isolated points).
For the representationof the results:
- if the fine grid is considered as a grid, we use the raster representation
- when the fine grid is considered as a set of isolatedpoints, we use the color symbol representation
Migrate from Grid to Grid¶
err = gl.migrate(
db2,
db1,
"rank",
dist_type=2,
flag_fill=True,
namconv=gl.NamingConvention("GridToGrid"),
)
gp.raster(db1, name="GridToGrid*")
gp.symbol(db2, c="white", flagCst=True, s=5)
gp.decoration(title="Migrate Grid to Grid")
Migrate from Grid to Point¶
err = gl.migrate(
db2,
db1b,
"rank",
dist_type=2,
flag_fill=True,
namconv=gl.NamingConvention("GridToPoint"),
)
gp.symbol(db1b, nameColor="GridToPoint*")
gp.symbol(db2, c="white", flagCst=True, s=5)
gp.decoration(title="Migrate Grid to Point")
Migrate from Point to Grid¶
err = gl.migrate(
db2b,
db1,
"rank",
dist_type=2,
flag_fill=True,
namconv=gl.NamingConvention("PointToGrid"),
)
gp.raster(db1, name="PointToGrid*")
gp.symbol(db2, c="white", flagCst=True, s=5)
gp.decoration(title="Migrate Point to Grid")
Migrate from Point to Point¶
err = gl.migrate(
db2b,
db1b,
"rank",
dist_type=2,
flag_fill=True,
namconv=gl.NamingConvention("PointToPoint"),
)
gp.symbol(db1b, nameColor="PointToPoint*")
gp.symbol(db2, c="white", flagCst=True, s=5)
gp.decoration(title="Migrate Point to Point")