# Migration FacilityÂ¶

This case study is meant to demonstrate how to use *gstlearn* for migrating information from Db to DbGrid and vice-versa. Note that, in order to test the whole possibilities, we add a selection to both files.

```
import gstlearn as gl
import gstlearn.plot as gp
import gstlearn.document as gdoc
import matplotlib.pyplot as plt
import numpy as np
gdoc.setNoScroll()
```

Global parameters.

```
ndim = 2
gl.defineDefaultSpace(gl.ESpaceType.RN,ndim)
```

Generate initial data set

```
data = gl.Db.createFillRandom(ndat=20, ndim=ndim, nvar=1)
data.addSelectionByRanks(np.arange(2,18))
data.display()
```

Data Base Characteristics ========================= Data Base Summary ----------------- File is organized as a set of isolated points Space dimension = 2 Number of Columns = 5 Total number of samples = 20 Number of active samples = 16 Variables --------- Column = 0 - Name = rank - Locator = NA Column = 1 - Name = x-1 - Locator = x1 Column = 2 - Name = x-2 - Locator = x2 Column = 3 - Name = z - Locator = z1 Column = 4 - Name = NewSel - Locator = sel

```
gp.plot(data, nameSize="z")
```

Create a grid, over the [0,1] x [0,1]

```
grid = gl.DbGrid.create([50,50],dx=[0.02,0.02])
grid.addSelectionFromDbByConvexHull(data,0.05)
grid.display()
```

Data Base Grid Characteristics ============================== Data Base Summary ----------------- File is organized as a regular grid Space dimension = 2 Number of Columns = 4 Total number of samples = 2500 Number of active samples = 1738 Grid characteristics: --------------------- Origin : 0.000 0.000 Mesh : 0.020 0.020 Number : 50 50 Variables --------- Column = 0 - Name = rank - Locator = NA Column = 1 - Name = x1 - Locator = x1 Column = 2 - Name = x2 - Locator = x2 Column = 3 - Name = Hull - Locator = sel

Migrate the information from **data** to **grid**. The migration is limited to te cell to which each sample belongs.

```
err = gl.migrate(data, grid, "z", namconv=gl.NamingConvention("Migrate-NoFill",False))
ax = gp.grid(grid)
ax = gp.plot(data)
```

In this second attempt, we use the **flag_fill** option to fill the whole grid. However we compensate by specifying a maximum filling distance. Moreover we make this maximum distance anisotropic for ckecking.

```
err = gl.migrate(data, grid, "z", flag_fill=True, dmax=[0.1,0.2],
namconv=gl.NamingConvention("Migrate-Fill",False))
ax = gp.grid(grid)
ax = gp.plot(data)
```

## Ball TreeÂ¶

Ball Tree is a special algorithm which makes search much more efficient. The principle is to build a Ball Tree based on a first data set. Then, using this tree, it is easy to consider a second data base and to search for the set of samples of the first data base which are the closest neighbors of each sample of the second data base.

To delonstrate this facility, we consider the case where we cannot benefit from any specific oragnization (grid for example) for any of the two data bases. Moreover (although this is a demnstration file, not a bench marking one), we consider a dense data set for better legibility

```
ndat = 10000
db = gl.Db.createFillRandom(ndat)
ax = gp.point(db)
ax.decoration(title="Location of samples from the First Data Base")
```

We build the corresponding BallTree structure

```
ball = gl.Ball(db, None, 10, 1)
```

We now consider one target location and ask for the set of corresponding neighboring samples (say 10 closest samples). The target site is selected as the center of the square.

```
neigh_size = 500
center = [0.5, 0.5]
knn1 = ball.queryOneAsVD(gl.VectorDouble(center), neigh_size)
```

The returned argument is a C objet (not mapped explicitly for Pythonn ... yet). However, some (statc) functions are available to retreive the contents of this objects. We use *getIndices* which returns the vector of ranks of samples constituting the neighborhood of the target site.

```
target = db.clone()
target.deleteColumn("Selection")
iuid = target.addSelectionByRanks(knn1.getIndices(), "Selection")
```

As a consequence of the choice of the *Euclidean distance*, the set of samples neighboring the target (represented as a black square) are located within a circle centered on the target

```
ax = gp.point(db, size=1)
ax = gp.point(target, color="yellow", flagCst=True, size=1)
ax = gp.sample(center, marker='s')
ax.decoration(title="Euclidean distance")
```

We can produce a similar figure changing the distance, from Euclidean to Manhattan.

```
ball = gl.Ball(db, None, 10, 2)
knn2 = ball.queryOneAsVD(gl.VectorDouble(center), neigh_size)
target = db.clone()
target.deleteColumn("Selection")
iuid = target.addSelectionByRanks(knn2.getIndices(), "Selection")
```

As a consequence of the choice of the *Manhattan distance*, the set of samples neighboring the target (represented as a black square) are located within a diamond centered on the target

```
ax = gp.point(db, size=1)
ax = gp.point(target, color="yellow", flagCst=True, size=1)
ax = gp.sample(center, marker='s')
ax.decoration(title="Manhattan distance")
```