This file gives the elementary information for the Random Variable definition.
A random variable Z is a function over a sample space Ω to the space of outputs H.
Z:Ω→Hω↦Z(ω)
The random variables are described using probabilities.
To introduce the probability mass function, we will give some examples
We launch a coin. The result (still unknown) will be equal to:
Z(ω)={0 if head1 if tail
Here H={0,1}.
Since we don’t know ω, we describe Z using probabilities.
In case of binary variables, we just have to specify the probability that Z=1, which is denoted P(Z=1).
More formally, P is a measure over Ω, it measures the size of the set Z−1({1})={ω∈Ω,Z(ω)=1}.
By definition:
P(Z=1)=P(Z−1({1})).
If p=P(Z=1), then P(Z=0)=1−p.
When H only contains 2 values (0 or 1), the variable is binary and its distribution is named Bernouilli with parameter p=P(Z=1).
H={1,2,3,4,5,6}
The variable is characterized by the definition of pi=P(Z=i) for i=1,…,6.
We have p1+p2+p3+p4+p5+p6=1.
Z is a random variable with values in a countable space H.
Z is characterized by its probability mass function, p(i)=pi=P(Z=i) for all i∈H.
Description
For all i=1,…,n, pi=1n
Description
It models the sum of n independent Bernouilli variables with parameters p.
H={0,1,…,n}
pi=n!i!(n−i)!pi(1−p)n−i
Description
Instead of working with the probability mass function, we can use the cumulative distribution function F defined by F(i)=P(Z≤i)
We can deduce F from the probability mass function p F(i)=i∑j=1P(Z=j)=i∑j=1p(j)
Description
And we can also deduce p from F: p(i)=P(Z=i)=P(Z≤i)−P(Z≤i−1)=F(i)−F(i−1)
P(a<Z≤b)=F(b)−F(a)
The expectation of a random variable with probability mass function p is given by
E[Z]=∑i∈Hi×P(Z=i)=∑i∈Hi×pi
The expectation can be seen as the theoretical mean.
Example:
If Z is a Bernouilli variable with parameter p, E[Z]=0×(1−p)+1×p=p
Expectation of a function E(q(Z))=∑i∈Hq(i)×P(Z=i)
Var(Z)=E[(Z−E[Z])2]
If Z1 is a random variable on a countable space H1 and Z2 is another random variable on a countable space H2, if we want to fully describe the pair (Z1,Z2), we must define the probabilities of all events {Z1=i,Z2=j} for all i∈H1 and all j∈H2. We will note pij=P(Z1=i,Z2=j).
Marginalisation
pi.=P(Z1=i)=∑j∈H2pij
p.j=P(Z2=j)=∑i∈H1pij
Example:
The probability Z1 to be a rich man is a Bernouilli variable.
The probability Z2 to be a Geostatistician is a Bernouilli variable.
010p00p01p0.1p10p11p1.p.0p.11
Conditional distribution
P(Z1=i|Z2=j)=P(Z1=i,Z2=j)P(Z2=j)=pij∑i∈H1pij
The output space H is continuous e.g R or an interval [a,b].
To characterize the distribution, one can use the cumulative distribution function (c.d.f) defined as F(z)=P(Z≤z).
Description
When F is differentiable, Z has a probability density function (p.d.f) f defined as f(z)=F′(z). where ∫Hf(t)dt=1
Then, F(z)=∫z−∞f(t)dt
Description
All the variables considered in this course will have a density.
The Gaussian distribution with mean m and variance σ2 has density
f(x)=1√2πσe−(x−m)22σ2
(see curves above)
f(x)={1b−a if a<x≤b0 otherwise
Description
F(x)={0 if x≥ax−ab−a if a<x≤b1 if x≥b
Description
The expectation plays the role of the mean for the random variable.
It is an average of the values weighted by the density:
E[Z]=∫Htf(t)dt
Expectation of a function:
E[q(Z)]=∫Hq(t)f(t)dt
Var[Z]=E[(Z−E[Z])2]
Note that if a random variable Z is positive (P(Z≥0)=1), then E[Z]≥0
So, the variance is always positive (as the expectation of a positive random variable).
More properties on expectation and variance can be found here.
The expectation of a random variable can be seen as the empirical average over an infinite number of realizations of this variable as stated by the (strong) law of large numbers:
Let Z a random variable over H=R with E[Z]=m. If Z1,…,Zn,… is an infinite sequence of independent copies of Z, then the sample average variables ˉZn=Z1+…,Znn converges to m when n→∞.
lln
Let’s consider the new (Bernouilli) variable 11a<Z≤b={1 if a<Z≤b0 otherwise
E[11a<Z≤b]=P(a<Z≤b)=∫baf(t)dt
So, if we subdivide H into small intervals, we expect that the histogram of a large sample of (independent) realizations of Z is close to its density f.
llnhisto
If we have two random variables X and Y, we can describe them independently but we can also be interested by their link. We can do that by using a joint distribution. Here we will suppose that the random vector (X,Y) has a density f(x,y).
g1
The density can be seen as the probability
P(x≤X≤x+dx and y≤Y≤y+dy)=f(x,y)dxdy
We have seen that the density of a single variable plays the role of the histogram computed over an infinite number of realizations.
Let’s observe a large number of realizations from the previous bivariate distribution.
g2
Let’s compute the 2d histogram and compare with the theoretical distribution:
g2
We can retrieve the marginal distribution of each variable from the bivariate density:
fX(x)=∫H2f(x,y)dy
fY(y)=∫H1f(x,y)dx
m
We have two variables X and Y with joint density f(x,y). Suppose we have observed X=x and we would like to know the distribution of Y knowing this information.
It can be computed by
fY|X=x(y)=f(x,y)f(x)
It can be interpreted as
P(y≤Y≤y+dy|x≤X≤x+dx)=fY|X=x(y)dy
f
The conditional expectation E[Y|X=x]=∫H2yfY|X=x(y)dy
is the expectation of Y with the conditional distribution.
It is the best possible prediction of Y knowing X, i.e, it is the function of X which minimizes Var(Y−q(X)) amongst all the possible functions.
To summarize bivariate distributions, one can use the covariance. See here.
We can generalize to a set X1,…,Xp of variables by using multivariate densities f(x1,…,xp)
In geostatistics, we often use the multivariate gaussian distribution.