Random Variables¶
This file gives the elementary information for the Random Variable definition.
import gstlearn.document as gdoc
from IPython.display import Markdown
gdoc.setNoScroll()
Markdown(gdoc.loadDoc("Random_Variable.md"))
Random variables¶
A random variable $Z$ is a function over a sample space $\Omega$ to the space of outputs $H$.
$$ Z:\Omega\to H\\ \hspace{1.3cm}\omega \mapsto Z(\omega) $$
The random variables are described using probabilities.
A. Discrete variables¶
A.1. Probability mass function¶
To introduce the probability mass function, we will give some examples
A.1.1) Head or tail¶
We launch a coin. The result (still unknown) will be equal to:
$$Z(\omega) = \left\{\begin{array}{cc}0 &\textrm{ if head}\\ 1 &\textrm{ if tail}\\ \end{array}\right.$$
Here $H=\{0,1\}$.
Since we don't know $\omega$, we describe $Z$ using probabilities.
In case of binary variables, we just have to specify the probability that $Z=1$, which is denoted $P(Z=1)$.
More formally, $P$ is a measure over $\Omega$, it measures the size of the set $$Z^{-1}(\{1\})=\{\omega\in\Omega,Z(\omega)=1\}.$$
By definition:
$$P(Z=1) = P(Z^{-1}(\{1\})).$$
If $p=P(Z=1)$, then $P(Z=0) = 1-p$.
When $H$ only contains 2 values (0 or 1), the variable is binary and its distribution is named Bernouilli with parameter $p=P(Z=1)$.
A.1.2) Outputs of a dice¶
$H=\{1,2,3,4,5,6\}$
The variable is characterized by the definition of $p_i=P(Z=i)$ for $i=1,\dots,6$.
We have $p_1+p_2+p_3+p_4+p_5+p_6=1$.
A.1.3) General case of discrete variable¶
$Z$ is a random variable with values in a countable space $H$.
$Z$ is characterized by its probability mass function, $p(i)=p_i=P(Z=i)$ for all $i\in H$.
A.1.4) Uniform distribution over a finite output space $H = \{a_1,\dots,a_n\}$¶
For all $i=1,\dots,n$, $$p_i=\frac{1}{n}$$
A.1.5) Binomial distribution of parameters $(n,p)$¶
It models the sum of $n$ independent Bernouilli variables with parameters $p$.
$H = \{0,1,\dots,n\}$
$$p_i = \frac{n!}{i!(n-i)!}p^i(1-p)^{n-i}$$
A.2. Cumulative distribution¶
Instead of working with the probability mass function, we can use the cumulative distribution function $F$ defined by $$F(i) = P(Z\leq i)$$
We can deduce $F$ from the probability mass function $p$ $$F(i) = \sum_{j=1}^i P(Z=j) = \sum_{j=1}^i p(j)$$
And we can also deduce $p$ from $F$: $$p(i) = P(Z=i) = P(Z\leq i) - P(Z\leq i-1) = F(i)-F(i-1)$$
A.3. Property:¶
$$P(a<Z \leq b) = F(b)-F(a)$$
A.4. Expectation:¶
The expectation of a random variable with probability mass function $p$ is given by
$$E[Z]=\sum_{i\in H}i \times P(Z=i)=\sum_{i\in H}i \times p_i$$
The expectation can be seen as the theoretical mean.
Example:
If $Z$ is a Bernouilli variable with parameter $p$, $$E[Z]= 0 \times (1-p) + 1 \times p = p$$
Expectation of a function $$E(q(Z))=\sum_{i\in H} q(i) \times P(Z=i)$$
A.5. Variance¶
$$\textrm{Var(Z)} = E[(Z-E[Z])^2]$$
A.6. Random vectors¶
If $Z_1$ is a random variable on a countable space $H_1$ and $Z_2$ is another random variable on a countable space $H_2$, if we want to fully describe the pair $(Z_1,Z_2)$, we must define the probabilities of all events $\{Z_1=i, Z_2=j\}$ for all $i\in H_1$ and all $j\in H_2$. We will note $$p_{ij}=P(Z_1=i,Z_2=j).$$
Marginalisation
$$p_{i.} = P(Z_1=i) = \sum_{j\in H_2} p_{ij}$$
$$p_{.j}= P(Z_2=j) = \sum_{i\in H_1} p_{ij}$$
Example:
The probability $Z_1$ to be a rich man is a Bernouilli variable.
The probability $Z_2$ to be a Geostatistician is a Bernouilli variable.
$$\begin{array}{c|c|c||c} & 0 & 1 \\ \hline 0 & p_{00} & p_{01} & p_{0.}\\ \hline 1 & p_{10} & p_{11} & p_{1.}\\ \hline & p_{.0} & p_{.1} & 1\\ \end{array}$$
Conditional distribution
$$P(Z_1=i|Z_2=j) = \frac{P(Z_1=i,Z_2=j)}{P(Z_2=j)} = \frac{p_{ij}}{\sum_{i\in H_1} p_{ij}}$$
B. Continuous random variable¶
The output space $H$ is continuous e.g $\mathbb{R}$ or an interval $[a,b]$.
To characterize the distribution, one can use the cumulative distribution function (c.d.f) defined as $$F(z)=P(Z\leq z).$$
When $F$ is differentiable, $Z$ has a probability density function (p.d.f) $f$ defined as $$f(z)=F'(z).$$ where $$\int_H f(t)dt =1$$
Then, $$F(z) = \int_{-\infty}^z f(t)dt$$
All the variables considered in this course will have a density.
Examples¶
- Gaussian distribution:
The Gaussian distribution with mean $m$ and variance $\sigma^2$ has density
$$f(x)=\frac{1}{\sqrt{2\pi}\sigma}\displaystyle e^{-\frac{(x-m)^2}{2\sigma^2}}$$
(see curves above)
- Uniform variable over an interval $[a,b]$
$$f(x) = \left\{\begin{array}{ccc}\frac{1}{b-a} & \textrm{ if } & a<x\leq b\\ 0 & \textrm{ otherwise} & \end{array}\right.$$
$$F(x) = \left\{\begin{array}{ccc}0 & \textrm{ if } & x\geq a \\ \frac{x-a}{b-a} & \textrm{ if } & a<x\leq b\\ 1 & \textrm{ if } & x\geq b\end{array}\right.$$
Expectation¶
The expectation plays the role of the mean for the random variable.
It is an average of the values weighted by the density:
$$E[Z] = \int_H tf(t)dt$$
Expectation of a function:
$$E[q(Z)] = \int_H q(t)f(t)dt$$
Variance¶
$$\textrm{Var}[Z] = E[(Z-E[Z])^2]$$
Note that if a random variable $Z$ is positive ($P(Z\geq 0)=1$), then $$E[Z]\geq 0$$
So, the variance is always positive (as the expectation of a positive random variable).
More properties on expectation and variance can be found here.
Law of large numbers¶
The expectation of a random variable can be seen as the empirical average over an infinite number of realizations of this variable as stated by the (strong) law of large numbers:
Let $Z$ a random variable over $H=\mathbb{R}$ with $E[Z]=m$. If $Z_1,\dots,Z_n,\dots$ is an infinite sequence of independent copies of $Z$, then the sample average variables $$\bar{Z}_n = \frac{Z_1+\dots,Z_n}{n}$$ converges to $m$ when $n\to\infty$.
Let's consider the new (Bernouilli) variable $$1\!\!\!1_{a<Z \leq b}=\left\{\begin{array}{ccc}1 & \textrm{ if } & a<Z\leq b\\ 0 & \textrm{ otherwise} & \end{array}\right.$$
$$E[1\!\!\!1_{a<Z\leq b}] = P(a<Z\leq b)=\int_a^b f(t)dt$$
So, if we subdivide $H$ into small intervals, we expect that the histogram of a large sample of (independent) realizations of $Z$ is close to its density $f$.
Bivariate distribution¶
If we have two random variables $X$ and $Y$, we can describe them independently but we can also be interested by their link. We can do that by using a joint distribution. Here we will suppose that the random vector $(X,Y)$ has a density $f(x,y)$.
The density can be seen as the probability
$$P(x\leq X \leq x+dx \textrm{ and } y\leq Y\leq y+dy) =f(x,y)dxdy$$
We have seen that the density of a single variable plays the role of the histogram computed over an infinite number of realizations.
Let's observe a large number of realizations from the previous bivariate distribution.
Let's compute the 2d histogram and compare with the theoretical distribution:
Marginalisation¶
We can retrieve the marginal distribution of each variable from the bivariate density:
$$f_X(x)=\int_{H_2}f(x,y)dy$$
$$f_Y(y)=\int_{H_1}f(x,y)dx$$
Conditional distributions¶
We have two variables $X$ and $Y$ with joint density $f(x,y)$. Suppose we have observed $X=x$ and we would like to know the distribution of $Y$ knowing this information.
It can be computed by
$$f_{Y|X=x}(y)=\frac{f(x,y)}{f(x)}$$
It can be interpreted as
$$P(y\leq Y\leq y+dy| x\leq X \leq x+dx) = f_{Y|X=x}(y)dy$$
The conditional expectation $$E[Y|X=x]=\int_{H_2}yf_{Y|X=x}(y)dy$$
is the expectation of $Y$ with the conditional distribution.
It is the best possible prediction of $Y$ knowing $X$, i.e, it is the function of $X$ which minimizes $$\textrm{Var}(Y-q(X))$$ amongst all the possible functions.
To summarize bivariate distributions, one can use the covariance. See here.
Multivariate distributions¶
We can generalize to a set $X_1,\dots,X_p$ of variables by using multivariate densities $$f(x_1,\dots,x_p)$$
In geostatistics, we often use the multivariate gaussian distribution.