# Chapter 5 Properties of the expectation

We will be using the law of iterated expectations and the law of conditional variances to compute the expectation and variance of the sum of a random number of independent random variables and the expecation and variance of a mixture. Before that, we recall the formulas of the expectation and variance of a linear combination of random variables.

The second part of the session is devoted to the moments of a random variable.

## 5.1 Expectation and variance of a linear combination of random variables

We recall from Session 4 that given $$d$$ random variables $$X_1,X_2,\ldots,X_d$$ and real numbers $$a_1,a_2,\ldots,a_d$$, then

\begin{align*} {\mathbb E}\left[\sum_{i=1}^d a_iX_i\right]&=\sum_{i=1}^d a_i{\mathbb E}[X_i]\\ {\rm Var}\left[\sum_{i=1}^d a_iX_i\right]&=\sum_{i=1}^d\sum_{j=1}^d a_ia_j{\rm Cov}[X_i,X_j]\\ &=\sum_{i=1}^d a_i^2{\rm Var}[X_i]+2\sum_{i<j}a_i a_j{\rm Cov}[X_i,X_j] \end{align*}

## 5.2 Conditional expectation

For any $$X,Y$$ random variables in the same probability space, the conditional expectation of $$X$$ given that $$Y$$ assumes value $$y$$, written as $${\mathbb E}[X|Y=y]$$, is a number which is computed as

• $$X$$ discrete $${\mathbb E}[X|Y=y]=\sum_{x}xp_{X|Y}(x|y)$$;
• $$X$$ continuous $${\mathbb E}[X|Y=y]=\int_{-\infty}^{\infty}xf_{X|Y}(x|y)dx$$.

Nevertheless, $${\mathbb E}[X|Y]$$ is a random variable that depends on $$Y$$ (it is a function of random variable $$Y$$).

Law of iterated expectations

${\mathbb E}[X]={\mathbb E}\left[{\mathbb E}[X|Y]\right]$

• Discrete random variables ${\mathbb E}\left[{\mathbb E}[X|Y]\right]=\sum_y\sum_{x}xp_{X|Y}(x|y)p_Y(y)=\sum_x\sum_{y}xp_{X,Y}(x,y)=\sum_x xp_X(x)={\mathbb E}[X].$
• Continuous random variables ${\mathbb E}\left[{\mathbb E}[X|Y]\right]=\int_{-\infty}^\infty\int_{-\infty}^\infty xf_{X|Y}(x|y)f_Y(y)dxdy=\int_{-\infty}^\infty\int_{-\infty}^\infty xf_{X,Y}(x,y)dydx=\int_{-\infty}^\infty xf_X(x)dx={\mathbb E}[X].$

Mixture distribution

• If $$X\sim F(x)=\sum_{i\in I}p_iF_i(x)$$ and $$X_i\sim F_i(x)$$, that is, for some (discrete) r.v. $$Y$$, it holds $$X|Y=i\sim F_i(x)$$ and then $${\mathbb E}[X]=\sum_{i\in I} p_i{\mathbb E}[X|Y=i]=\sum_{i\in I}p_i{\mathbb E}[X_i]$$.

Example. If $$X_i\sim{\rm N}(\mu_i,\sigma_i)$$, then $${\mathbb E}[X]=\sum_{i\in I}p_i{\mathbb E}[X_i]=\sum_{i\in I}p_i\mu_i$$.

• If $$X\sim F(x)=\int_A \omega(a)F_a(x)da$$ and $$X_a\sim F_a(x)$$, that is, for some (continuous) r.v. $$Y$$, it holds $$X|Y=a\sim F_a(x)$$, then $${\mathbb E}[X]=\int_A \omega(a){\mathbb E}[X_a]da$$.

Example. If $$X\sim{\rm N}(Y,\sigma)$$ with $$Y\sim{\rm U}(0,1)$$, then $${\mathbb E}[X]=\int_0^1{\mathbb E}[X|Y=y]dy=\int_0^1ydy=1/2$$.

Sum of a random number of independent random variables

Consider $$X_1,X_2,\dots$$ independent random variables with the distribution of $$X$$ and $$N$$ a random natural number independent of $$X_1,X_2,\ldots$$, then

${\mathbb E}\left[\sum_{i=1}^N X_i\right]={\mathbb E}\left[{\mathbb E}\left[\sum_{i=1}^N X_i\biggr\rvert N\right]\right]={\mathbb E}\left[N{\mathbb E}[X]\right]={\mathbb E}[N]{\mathbb E}[X].$

Example. We play $$10$$ times a game. Each time we play the probability we win is $$0.5$$ and the associated monetary prize is $${\rm N}(6,1)$$ at each game we win, $$0$$ when we loose. The number of victories is $$N\sim{\rm B}(n=10,p=1/2)$$ and the prize at our $$i$$-th win is $$X_i\sim{\rm N}(6,1)$$. Our final earnings will be $$Y=\sum_{i=1}^N X_i$$ with mean ${\mathbb E}[Y]={\mathbb E}[N]{\mathbb E}[X]=5\times 6=30.$

## 5.3 Conditional variance

For any $$X,Y$$ random variables in the same probability space, the conditional variance of $$X$$ given that $$Y$$ assumes value $$y$$, written as $${\rm Var}[X|Y=y]$$, is a number. We can think of it as $$g(y)$$.

As a function of random variable $$Y$$, the expression $${\rm Var}[X|Y]$$, which could be written as $$g(Y)$$, is a random variable that depends on $$Y$$.

Law of conditional variances

${\rm Var}[X]={\mathbb E}[{\rm Var}[X|Y]]+{\rm Var}[{\mathbb E}[X|Y]]$

\begin{align*} {\rm Var}[X]&={\mathbb E}[(X-{\mathbb E}[X|Y]+{\mathbb E}[X|Y]-{\mathbb E}[X])^2]\\ &={\mathbb E}[(X-{\mathbb E}[X|Y])^2]+{\mathbb E}[({\mathbb E}[X|Y]-{\mathbb E}[X])^2]+0\\ &={\mathbb E}\left[{\mathbb E}[(X-{\mathbb E}[X|Y])^2]|Y\right]+{\rm Var}\left[{\mathbb E}[X|Y]\right]\\ &={\mathbb E}\left[{\rm Var}[X|Y]\right]+{\rm Var}\left[{\mathbb E}[X|Y]\right]. \end{align*}

Mixture distribution (discrete)

Assume $$X\sim F(x)=\sum_{i\in I}p_iF_i(x)$$ and $$X_i\sim F_i(x)$$. This means that for some (discrete) r.v. $$Y$$ it holds $$X|Y=i\sim F_i(x)$$ and then ${\rm Var}[X]=\sum_{i\in I} p_i{\rm Var}[X_i]+\sum_{i\in I}p_i({\mathbb E}[X_i]-{\mathbb E}[X])^2=\sum_{i\in I}p_i{\mathbb E}[X_i^2]-{\mathbb E}[X]^2.$

Example. If $$X_i\sim{\rm N}(\mu_i,\sigma_i)$$, then $${\rm Var}[X]=\sum_{i\in I}p_i(\mu_i^2+\sigma_i^2)-\mu^2$$.

Mixture distribution (continuous)

Assume $$X\sim F(x)=\int_A \omega(a)F_a(x)da$$ and $$X_a\sim F_a(x)$$. This means that for some (continuous) r.v. $$Y$$, it holds $$X|Y=a\sim F_a(x)$$, then ${\rm Var}[X]=\int_A \omega(a){\rm Var}[X_a]da+\int_A \omega(a)\left({\mathbb E}[X_a]-{\mathbb E}[X]\right)^2da=\int_A \omega(a){\mathbb E}[X_a^2]da-{\mathbb E}[X]^2.$

Example. If $$X\sim{\rm N}(Y,\sigma)$$ with $$Y\sim{\rm U}(0,1)$$, then $${\rm Var}[X]=\int_0^1(\sigma^2+y^2)dy-(1/2)^2=\sigma^2+1/12$$.

Sum of a random number of independent random variables

Consider $$X_1,X_2,\dots$$ independent random variables with the distribution of $$X$$ and $$N$$ a random natural number independent of $$X_1,X_2,\ldots$$, then

\begin{align*}{\rm Var}\left[\sum_{i=1}^N X_i\right]&={\mathbb E}\left[{\rm Var}\left[\sum_{i=1}^N X_i\biggr\rvert N\right]\right]+{\rm Var}\left[{\mathbb E}\left[\sum_{i=1}^N X_i\biggr\rvert N\right]\right]\\ &={\mathbb E}[N{\rm Var}[X]]+{\rm Var}[N{\mathbb E}[X]]\\ &={\mathbb E}[N]{\rm Var}[X]+{\mathbb E}[X]^2{\rm Var}[N]. \end{align*}

Example. We play $$10$$ times a game… Our final earnings will be $$Y=\sum_{i=1}^N X_i$$ with variance ${\rm Var}[Y]={\mathbb E}[N]{\rm Var}[X]+{\mathbb E}[X]^2{\rm Var}[N]=5\times 1+6^2\times 10\times 0.5\times 0.5=95.$

## 5.4 Moments of a random variable

If $$X$$ is a random variable, and $$k$$ a positive integer such that $${\mathbb E}|X|^k<\infty$$, then

• the $$k$$-th moment of $$X$$ is $$\mu_k={\mathbb E}[X^k]$$;
• the $$k$$-the centred moment of $$X$$ is $$m_k={\mathbb E}[(X-\mu_1)^k]$$.

First moment (mean) location

The first moment of an integrable random variable is its mean (location parameter)

$\mu_1=\mu={\mathbb E}[X],$

while the first centred moment is $$0$$,

$m_1={\mathbb E}[X-\mu]=0.$

Second moment (variance) scatter

The second moment of a random variable with $${\mathbb E}|X|^2<\infty$$ is

$\mu_2={\mathbb E}[X^2],$

while the second centred moment is its variance (scatter parameter),

$m_2={\mathbb E}[(X-\mu)^2]={\rm Var}[X]=\sigma^2.$

Third moment (skewness) symmetry

The third centred moment of a random variable can be used to obtain information about the asymmetry of its distribution $m_3={\mathbb E}[(X-\mu)^3].$

The coefficient of skewness is defined as

${\rm Skew}_X=\frac{m_3}{\sigma_X^3}=\frac{{\mathbb E}[(X-\mu)^3]}{{\mathbb E}[(X-\mu)^2]^{3/2}}.$

• The skewness of a symmetric distribution is $$0$$.
plot(dnorm,xlim=c(-3,3))

library(moments)
set.seed(1)
skewness(rnorm(1000))
## [1] -0.0191671
• The skewness of a right-skewed distribution (rigth tail is longer than the left tail) is positive.
dchi=function(x){dchi=dchisq(x,df=3)}
plot(dchi,xlim=c(0,10))

set.seed(1); skewness(rchisq(1000,df=3))
## [1] 1.496328
• The skewness of a left-skewed distribution (left tail is longer than the right tail) is negative.
dchineg=function(x){dchi=dchisq(-x,df=3)}
plot(dchineg,xlim=c(-10,0))

set.seed(1); skewness(-rchisq(1000,df=3))
## [1] -1.496328

Fourth moment (kurtosis) tails

The fourth centerd moment of a random variable can be used to obtain information about how heavy are the tails of its distribution $m_4={\mathbb E}[(X-\mu)^4].$ The kurtosis

${\rm Kurt}_X=\frac{m_4}{\sigma_X^4}=\frac{{\mathbb E}[(X-\mu)^4]}{{\mathbb E}[(X-\mu)^2]^2}$

set.seed(1)
kurtosis(rnorm(1000))
## [1] 2.998225

Excess kurtosis

The normal distribution is often taken as a golden standard and the kurtosis of a random variable is compared with that of a normal random variable by means of the Excess kurtosis

${\rm EKurt}_X=\frac{m_4}{\sigma_X^4}-3.$

• A mesokurtic distribution has tails which are as heavy as those of a normal distribution and its excess kurtosis is zero.
set.seed(1); x.binom=rbinom(10000,size=40,prob=0.5)
hist(x.binom,probability=T)

kurtosis(x.binom)-3
## [1] -0.1317299
• A leptokurtic distribution has heavier tails than a normal distribution and its excess kurtosis is positive.
set.seed(1); x.lap=sample(c(-1,1),1000,replace=T)*rexp(1000)
hist(x.lap,probability=T)
kurtosis(x.lap)-3
## [1] 2.371918

• Fourth moment: negative excess kurtosis (platykurtic)

A platykurtic distribution has thinner tails than a normal distribution and its excess kurtosis is negative.

set.seed(1); x.unif=runif(1000)
hist(x.unif,probability=T)
kurtosis(x.unif)-3
## [1] -1.184201

## 5.5 The moment generating function

The moment genearting function of random variable $$X$$ evaluated at $$s\in{\mathbb R}$$ is given by $M_X(t)={\mathbb E}[e^{tX}]\,.$

The moment generating function completely determines the distribution of the random variable $$X$$ (inversion property).

Moment generating function of some random variables

• If $$Y=aX+b$$, then $$M_Y(t)=e^{tb}M_X(at)$$
• If $$X$$ and $$Y$$ are independent $$M_{X+Y}(t)=M_X(t)M_Y(t)$$
• If $$X\sim pF_{X_1}+(1-p)F_{X_2}$$, then $$M_X(t)=pM_{X_1}(t)+(1-p)M_{X_2}(t)$$
• If $$X\sim{\rm B}(1,p)$$, then $$M_X(t)=1-p+pe^t$$
• If $$X\sim{\rm B}(n,p)$$, then $$M_X(t)=(1-p+pe^t)^n$$
• If $$X\sim{\mathcal P}(\lambda)$$, then $$M_X(t)=e^{\lambda(e^{t}-1)}$$
• If $$X\sim{\rm Exp}(\lambda)$$, then $$M_X(t)=\frac{\lambda}{\lambda-t}$$
• If $$X\sim{\rm N}(\mu,\sigma)$$, then $$M_X(t)=e^{\frac{\sigma^2 t^2}{2}+\mu t}$$

Moment generating function and moments

The $$k$$-th derivative of the moment generating function evaluated at $$0$$ equals the $$k$$-th moment of a random variable:

• $$M'_X(t)={\mathbb E}[Xe^{tX}]$$ and $$M'_X(0)={\mathbb E}[X]$$
• $$M''_X(t)={\mathbb E}[X^2e^{tX}]$$ and $$M''_X(0)={\mathbb E}[X^2]$$
• $$M^{(k)}_X(t)={\mathbb E}[X^ke^{tX}]$$ and $$M^{(k)}_X(0)={\mathbb E}[X^k]$$