This file gives the elementary information on Expectation, Variance and Covariance for the Random Variables.
If \(X\) is a random variable with output space \(H\) and with density \(f(x)\).
Then, \[E[X]=\int_H x f(x) dx\]
If \(X\) and \(Y\) are two random variables with respective outputs spaces \(H\) and \(K\), with respective densities \(f\) and \(g\), with a joint density \(f(x,y)\) and with a finite expectation.
We have \[E[X+Y]=E[X]+E[Y]\]
Let consider the function \(q\) defined by \(q(x,y)=x+y\)
\(E[q(X,Y)] = \int_{H}\int_{K} q(x,y) f(x,y)dx dy\)
\(\phantom{E[Z]} = \int_{H}\int_{K} (x+y) f(x,y)dx dy\)
\(\phantom{E[Z]} = \int_{H}\int_{K} x f(x,y)dx dy + \int_{H}\int_{K} y f(x,y)dx dy\)
\(\phantom{E[Z]} = \int_{H} x\int_{K} f(x,y)dy dx + \int_{K} y\int_{H} f(x,y)dx dy\)
\(\phantom{E[Z]} = \int_{H} xf(x) dx + \int_{K} y(y)g(y)dy\)
\(\phantom{E[Z]} = E[X]+E[Y]\)
We also have
\[E[aX]=aE[X]\]
If \(X\) is a positive random variable i.e \[P(X\geq 0)=1\] then
\[E[X]\geq 0\]
Where \(X\) is a positive random variable, if \(E[X]=0\), then
\[P(X=0)=1\]
\[E[a] = a\]
\[\textrm{Cov}(X,Y) = E[(X-E[X])(Y-E[Y])]\]
\[\textrm{Cov}(X,Y)=\textrm{Cov}(Y,X)\]
Sometimes more convenient
\[\textrm{Cov}(X,Y) = E[XY] - E[X]E[Y]\]
In particular
\[\textrm{Var}(X) = \textrm{Cov}(X,X) = E[X^2] - E[X]^2\]
\[\textrm{Cov}(aX+bY,Z) = a\textrm{Cov}(X,Z)+b\textrm{Cov}(Y,Z)\]
\[\textrm{Cov}(X,X) = E[(X-E[X])(X-E[X])]=\textrm{Var}(X)\]
Note that the variance is always positive (as the expectation of a square of a random variable).
\[\textrm{Cov}(X,a) = 0\]
Consequence :
\[\textrm{Var}(a)=0\]
Reciprocally, if a random variable has a variance equal to 0, then the variable is constant.
\[\textrm{Var}\left(\sum_{i=1}^n\lambda_i Z_i\right) = \sum_{i=1}^n\sum_{j=1}^n \lambda_i\lambda_j\textrm{Cov}(Z_i,Z_j)\]
\[\textrm{Var}(aX) = a^2 \textrm{Var}(X)\]
\[\textrm{Var}(aX+bY)=a^2\textrm{Cov}(X,X)+2ab\textrm{Cov}(X,Y)+b^2\textrm{Cov}(Y,Y)\]
\[\textrm{Var}(aX-bY)=a^2\textrm{Cov}(X,X)-2ab\textrm{Cov}(X,Y)+b^2\textrm{Cov}(Y,Y)\]
\[\textrm{Var}(X+a) = \textrm{Var}(X)\]
When we have a set of random variables \(Z_1,\dots,Z_n\).
For each pair \((k,l)\), if we denote \[c_{kl} = \textrm{Cov}(Z_k,Z_l)\]
We can store the \(c_{kl}\)’s in a matrix \[\Sigma = \left[\begin{array}{ccc}c_{11} &\dots & c_{1n}\\ c_{21} & \dots & c_{2n}\\ \vdots & \ddots & \vdots\\ c_{n1} & \dots & c_{nn}\end{array}\right]\]
\(\Sigma\) is named the covariance matrix of the random vector \[Z=\left[\begin{array}{c}Z_1\\ \vdots\\ Z_n\end{array}\right]\]
Note that we can rewrite
\[\textrm{Var}\left(\sum_{i=1}^n\lambda_iZ_i\right) = \lambda^T \Sigma \lambda\]
where \[\lambda =\left[\begin{array}{c}\lambda_1\\ \vdots\\\lambda_n\end{array}\right]\]
and \(^T\) designates the transposition
\[\lambda^T =\left[\begin{array}{ccc}\lambda_1& \dots & \lambda_n\end{array}\right]\]
Since a variance is always positive, the variance of any linear combination as to be positive. Therefore, a covariance matrix is always (semi-)positive definite, i.e
For each \(\lambda\) \[\lambda^T \Sigma \lambda\geq 0\]
Let consider two random vectors \(X=(X_1,\dots,X_n)\) and \(Y=(Y_1,\dots,Y_p)\).
We can consider the cross-covariance matrix \(\textrm{Cov}(X,Y)\) where element corresponding to the row \(i\) and the column \(j\) is \(\textrm{Cov}(X_i,Y_j)\)
If \(A\) and \(B\) are some matrices (of constants)
\[\textrm{Cov}(AX,BY) = A\textrm{Cov}(X,Y)B^T\]
Suppose that we want to estimate a quantity modeled by a random variable \(Z_0\) as a linear combination of known quanties \(Z_1,\dots, Z_n\) stored in a vector \[Z=\left[\begin{array}{c}Z_1\\ \vdots\\ Z_n\end{array}\right]\]
We will denote \[Z_0^\star = \sum_{i=1}^n \lambda_i Z_i = \lambda^T Z\] this (random) estimator.
We know the covariance matrix of the full vector \((Z_0,Z_1,\dots,Z_n)\) that we write with blocks for convenience:
\[\left[\begin{array}{cc}\sigma_0^2 & c_0^T \\ c_0 & C\end{array}\right]\]
where
Compute the variance of the error \[Z_0^\star-Z_0\]
\(\textrm{Var}(Z_0^\star-Z_0) = \textrm{Cov}(Z_0^\star-Z_0,Z_0^\star-Z_0)\)
\(\phantom{\textrm{Var}(Z_0^\star-Z_0)} = \textrm{Var}(Z_0) -2 \textrm{Cov}(Z_0^\star,Z_0) + \textrm{Var}(Z_0)\)
\(\phantom{\textrm{Var}(Z_0^\star-Z_0)} = \textrm{Var}(\lambda^TZ) -2 \textrm{Cov}(\lambda^T Z,Z_0) + \sigma_0^2\)
\(\phantom{\textrm{Var}(Z_0^\star-Z_0)} = \lambda^T\textrm{Var}(Z)\lambda -2 \lambda^T\textrm{Cov}( Z,Z_0) + \sigma_0^2\)
\(\phantom{\textrm{Var}(Z_0^\star-Z_0)} = \lambda^TC\lambda -2 \lambda^Tc_0 + \sigma_0^2\)
The covariance is a measure of the link between two variables. However it depends on the scale of each variable. To have a similar measure which is invariant by rescaling, we can use the correlation coefficient:
\[\rho(X,Y)=\frac{\textrm{Cov}(X,Y)}{\sqrt{\textrm{Var}(X)\textrm{Var}(Y)}}\]
When the correlation coefficient is equal to \(1\) or \(-1\), we have
\[Y=aX+b\]
with
Note that \(\rho(X,Y)\) can be equal to \(0\) even if the variables are strongly linked.
The usual example is a variable \(X\) with a pair density (\(f(-x)=f(x)\)) and \(Y=X^2\):
\[\textrm{Cov}(X,Y)=\textrm{Cov}(X,X^2)=E[X^3]-E[X]E[X^2]=E[X^3]=\int_{\mathbb{R}} x^3f(x)dx =0\]