Chapter 6 Limit Theorems

We will explore the limit behaviour of sequences of random variables. We are specifically interested in the convergece of estimators (e.g the sample mean) to a given number and their approximate distributions when built out of a large sample.

Before starting with convergence concepts, we will comment two useful probabilistic inequalities (Markov’s and Chebishev’s) that relate probabilities with means and variances!!

6.1 Markov and Chebishev inequalities

Markov’s inequality

If \(X\) is a nonnegative random variable, for any value \(a>0\),

Given \(a>0\), define r.v. \(Y=\left\{\begin{array}{cl}1 &\textrm{if }X\geq a\\0&\textrm{otherwise}\end{array}\right.\).

Since \(X\geq a\), it holds \(Y\leq X/a\), and then \(P(X\geq a)={\mathbb E}[Y]\leq{\mathbb E}[X]/a\).

Example (Factory)

The number of items produced in a factory during a week is a random variable with mean 50.

What can you say about the probability that this week’s production will be at least 75?

Denote by \(X\) the production in a week, \[P(X\geq 75)\leq\frac{{\mathbb E}[X]}{75}=\frac{50}{75}=\frac{2}{3}.\]

  • What is the probability if \(X\sim{\rm U}(50-x,50+x)\)? Compute in terms of \(x\).
  • What is the mean of \(X\) if \(P(X=0)=1/3\) and \(P(X=75)=2/3\)?

Example (Pau Gasol)

Pau Gasol averaged \(10.065\) points per game during the NBA regular season 2017-18. What can we say about the proportion of days at wich he scored at least \(20\) points?

Denote by \(Y\) the points Pau Gasol scores in a game,\[P(Y\geq 20)\leq\frac{{\mathbb E}[Y]}{20}=\frac{10.065}{20}=0.50325.\]

He actually scored \(20\) or more points in \(4\) games out of the \(77\) games he played during the regular season (\(4/77=0.052\)).

Is the bound very poor? Imagine Pau Gasol had scored \(20\) points in \(38\) games, \(15\) points in one game and \(0\) points in \(38\) games. He would have scored \(20\) or more points in \(49.35\%\) of the games he had played.

Chebishev’s inequality

If \(X\) is a random variable with finite mean \(\mu\) and variance \(\sigma^2\), for any value \(k>0\),

The key step is to apply Markov’s inequality to nonnegative random variable \((X-\mu)^2\) in order to obtain

Example (Factory)

The number of items produced in a factory during a week is a random variable with mean 50 and variance \(25\).

What can you say about the probability that this week’s production will be between 40 and 60?

Denote by \(X\) the production in a week, \[\begin{multline*}P(40\leq X\leq 60)=P(|X-50|\leq 10)=1-P(|X-50|>10)\\\geq 1-P(|X-50|\geq 10)\geq 1-\frac{\sigma^2_X}{10^2}=0.75.\end{multline*}\]
  • What is the probability if \(X\sim{\rm U}(50-5\sqrt{3},50+5\sqrt{3})\)?
  • What are the mean and variance of \(X\) if \(P(X=50)=3/4\) and \(P(X=40)=P(X=60)=1/8\)?

Example (Pau Gasol)

Pau Gasol averaged \(10.065\) points per game during the NBA regular season 2017-18 with variance \(31.6\).

What can we say about the proportion of games at which he scored between 3 and 18 points?

Denote by \(Y\) the points Pau Gasol scores in a game, \[\begin{multline*}P(3\leq X\leq 18)=P(|Y-10.065|\leq 8)=1-P(|Y-10.065|>8)\\\geq 1-P(|Y-10.065|\geq 8)\geq 1-\frac{\sigma^2_Y}{8^2}=0.50625.\end{multline*}\]

He actually scored between \(3\) and \(18\) points in \(63\) games out of the \(77\) games he played (\(63/77=0.818\)).

Alternative expressions of Chebishev’s inequality

\[\begin{align*} P(|X-\mu|\geq k)&\leq\frac{\sigma^2}{k^2}\\ P(|X-\mu|< k)&\geq 1-\frac{\sigma^2}{k^2} \end{align*}\]
\(k\) \(\sigma^2/k^2\) \(1-\sigma^2/k^2\)
\(\sigma\) \(1\) \(0\)
\(2\sigma\) \(1/4\) \(3/4\)
\(3\sigma\) \(1/9\) \(8/9\)
\(4\sigma\) \(1/16\) \(15/16\)
\(5\sigma\) \(1/25\) \(24/25\)

6.2 Weak LLN (convergence in probability)

A sequence of random variables \(\{X_n\}_n\) converges in probability to a constant \(a\in{\mathbb R}\) (\(X_n\xrightarrow{Pr} a\)) if for any \(\varepsilon>0\), \[\lim\limits_{n\rightarrow\infty} P(|X_n-a|\geq\varepsilon)=0\,.\]

Weak Law of Large Numbers

If \(\{X_n\}_n\) is a sequence of independent and identically distributed random variables with \({\mathbb E}[X_i]=\mu\), then for any \(\varepsilon>0\), \[\lim\limits_{n\rightarrow\infty} P(|\overline{X}_n-\mu|\geq\varepsilon)=0\,,\] where \(\overline{X}_n=\frac{1}{n}\sum_{i=1}^n X_i\).

WLLN for a r.v. with finite second moment

If \(\{X_n\}_n\) is a sequence of independent and identically distributed random variables with \({\mathbb E}[X_i]=\mu\) and \({\rm Var}[X_i]=\sigma^2\), then \[\begin{align*} {\mathbb E}[\overline{X}_n]&=\mu;\\ {\rm Var}[\overline{X}_n]&=\sigma^2/n. \end{align*}\] Apply now Chebishev’s inequality to \(\overline{X}_n\) in order to obtain:

Continuous mapping Theorem

If \(X_n\xrightarrow{Pr} a\), \(Y_n\xrightarrow{Pr} b\) and \(g:\mathbb{R}^2\mapsto{\mathbb R}\) is a continuous function, then \(g(X_n,Y_n)\xrightarrow{Pr} g(a,b)\).

Consider now \(X_n\xrightarrow{Pr} a\) and \(Y_n\xrightarrow{Pr} b\) - \(X_n+Y_n\xrightarrow{Pr} a+b\); - \(X_nY_n\xrightarrow{Pr} ab\); - \(X_n/Y_n\xrightarrow{Pr} a/b\) if \(b\neq 0\).

Convergence in probability to a random variable

\(\{X_n\}_n\) converges in probability to r.v. \(X\) (\(X_n\xrightarrow{Pr} X\)) if for any \(\varepsilon>0\), \[\lim\limits_{n\rightarrow\infty} P(|X_n-X|\geq\varepsilon)=0\,.\]

Consistency of estimators

Given a random sample \(X_1,\ldots,X_n\) drawn from some population \(X\sim F_\theta\) whose distribution depends on a parameter \(\theta\). A statistic \(\hat{\theta}\) that is used to estimate \(\theta\) (approximate it) is called estimator of \(\theta\).

An estimator is (weakly) consistent if \(\hat{\theta}\xrightarrow{Pr} \theta\).

  • The sample mean is a consistent estimator of the population mean, \(\overline{X}_n\xrightarrow{Pr}\mu\).
  • The sample variance is a consistent estimator of the population variance, \(S^2_n\xrightarrow{Pr}\sigma^2\).
  • The sample proportion is a consistent estimator of the population proportion, \(\hat{p}\xrightarrow{Pr}p\).

6.3 Central Limit Theorem (convergence in distribution)

A sequence of random variables \(\{X_n\}_n\) with cdfs \(F_n\) converges in distribution (or law) to r.v. \(X\) with cdf \(F\) (\(X_n\xrightarrow{d} X\)) if for every continuity point \(x\) of \(F\), \(\lim\limits_{n\rightarrow\infty} F_n(x)=F(x)\).

Central Limit Theorem (Lyapunov)

If \(\{X_n\}_n\) is a sequence of iid r.v.s with mean \(\mu\) and variance \(\sigma^2<\infty\), then \[\frac{\sum_{i=1}^n X_i-n\mu}{\sigma\sqrt{n}}\xrightarrow{d}Z\,,\] where \(Z\sim{\rm N}(0,1)\). Under some extra condition, the identical distribution assumption of the \(X_is\) can be dropped, and if \({\mathbb E}[X_i]=\mu_i\) and \({\rm Var}[X_i]=\sigma_i^2\), \[\frac{\sum_{i=1}^n X_i-\sum_{i=1}^n\mu_i}{\sqrt{\sum_{i=1}^n\sigma_i^2}}\xrightarrow{d}Z\,.\]

Sketch of the proof of the CLT

If \(\mu=0\), \(\sigma=1\), and \(M\) is the MGF of \(X_i\), then \(M_{\sum\limits_{i=1}^nX/\sqrt{n}}(t)=M(t/\sqrt{n})^n\).

Denote \(L(t)=\log M(t)\) and observe \(L(0)=0\), \(L'(0)=0\), and \(L''(0)=1\).

\[\begin{align*} \lim_{n\rightarrow\infty}\frac{L(t/\sqrt{n})}{n^{-1}}&=\lim_{n\rightarrow\infty}\frac{-L'(t/\sqrt{n})n^{-3/2}t}{-2n^{-2}}\\ &=\lim_{n\rightarrow\infty}\frac{L'(t/\sqrt{n})t}{2n^{-1/2}}\\ &=\lim_{n\rightarrow\infty}\frac{-L''(t/\sqrt{n})n^{-3/2}t^2}{-2n^{-3/2}}\\ &=\lim_{n\rightarrow\infty}\frac{L''(t/\sqrt{n})t^2}{2}=\frac{t^2}{2}.\\ \end{align*}\]

Applying L’Hospital’s rule twice. We conclude \(\lim\limits_{n\rightarrow\infty}M(t/\sqrt{n})^n=e^{t^2/2}\).

Example (Factory)

The number of items produced in a factory during a week is a random variable with mean 50 and variance \(25\).

The factory is open 49 weeks every year. What can you say about the probability that this year’s production will be between 2380 and 2520?

Denote by \(X_i\) the production in the \(i\)-th week and by \(X=\sum_{i=1}^{49}X_i\) the total production.

Chebishev’s inequality \[{\mathbb E}[X]=49\times 50=2450\,,\quad{\rm Var}[X]=49\times 25=1225=35^2\]

\[\begin{multline*} P(2380\leq X\leq 2520)=P(|X-2450|\leq 70)=1-P(|X-2450|>70)\\ \geq 1-P(|X-2450|\geq 70)\geq 1-\frac{\sigma^2_X}{70^2}=1-\frac{49\times 25}{4900}=0.75. \end{multline*}\]

Central Limit Theorem \[X\approx{\rm N}(2450,35)\]

\[P(2380\leq X\leq 2520)=P\left(\frac{2380-2450}{35}\leq Z\leq\frac{2520-2450}{35}\right)=P(-2\leq Z\leq 2)=0.9545.\]

Normal approximation to the Binomial and Poisson distributions

  • If \(X\sim{\rm B}(n,p)\), then \(X\approx{\rm N}(np,\sqrt{np(1-p)})\) (good approximation if \(n\geq 50\) and \(0.4<p<0.6\) or \(np>5\) and \(n(1-p)>5\)).
  • If \(X\sim{\mathcal P}(\lambda)\), then \(X\approx{\rm N}(\lambda,\sqrt{\lambda})\) (good approximation if \(\lambda\geq 10\)).
  • Continuity corrections for the previous discrete distribution models. If \(k\) is an integer
    • \(P(X=k)=P(k-0.5<X<k+0.5)\)
    • \(P(X\leq k)=P(X<k+0.5)\)
    • \(P(X< k)=P(X<k-0.5)\)
    • \(P(X\geq k)=P(X>k-0.5)\)
    • \(P(X> k)=P(X>k+0.5)\)

Example (Potholes)

Past experience suggests that there are, on average, 2 potholes per mile of highway after a certain amount of usage, and that the random variable ‘number of potholes’ can be modeled by means of a Poisson distribution.

A group of workers is hired to repair 100 potholes. How many miles must be inspected so that with probability 0.95 at least 100 potholes are found?

Denote \(X\equiv\)’number of potholes in \(k\) miles’, \(X\sim{\mathcal P}(\lambda=2k)\), so \(X\approx{\rm N}(\mu=2k,\sigma=\sqrt{2k})\). We have the equation \(P(X\geq 100)=0.95\). \[P(X\geq 100)=P(X>99.5)=P(Z>(99.5-2k)/\sqrt{2k})=0.95\] In conclusion \((99.5-2k)/\sqrt{2k}=-1.645\), so \(k=58.65875\) miles must be inspected.

1-ppois(99,lambda=2*58.65875)
## [1] 0.9529045

Asymptotic distribution of several estimators

Consider \(X_1,X_2,\ldots\) iid as r.v. \(X\) with mean \(\mu\) and variance \(\sigma^2<\infty\).

  • The sample mean is asymptotically normal (1/2) \[\frac{\overline{X}_n-\mu}{\sigma/\sqrt{n}}\xrightarrow{d} Z.\]

Consider a population with individuals that have some given characteristic with (population) proportion \(p\) and a random sample for which \(\hat{p}\) stands for the sample proportion of individuals with the characteristic.

  • The sample proportion is asymptotically normal,\[\frac{\hat{p}-p}{\sqrt{p(1-p)/n}}\xrightarrow{d} Z.\]

For \(Z\sim{\rm N}(0,1)\).

Slutsky’s Theorem

If \(X_n\xrightarrow{d} X\) and \(Y_n\xrightarrow{Pr} a\), then - \(X_n+Y_n\xrightarrow{d} X+a\); - \(X_nY_n\xrightarrow{d} aX\); - \(X_n/Y_n\xrightarrow{d} X/a\) if \(a\neq 0\).

  • The sample mean is asymptotically normal (2/2) \[\sqrt{n}(\overline{X}_n-\mu)/S_n\xrightarrow{d} Z.\]
  • The sample variance is asymptotically normal \[\sqrt{n}(S^2_n-\sigma^2)/\sqrt{m_4-\sigma^4}\xrightarrow{d} Z.\]

Sample mean exponential population unknown variance

set.seed(1)
n=100;lambda=2;x=vector(length=1000)
for(i in 1:1000){simul=rexp(n,rate=lambda)
  x[i]=(mean(simul)-1/lambda)/sqrt(var(simul)/n)}
hist(x,probability=T)
t=seq(-3,3,by=.1)
lines(t,dnorm(t))

6.4 Strong LNN (almost sure convergence)

A sequence of random variables \(\{X_n\}_n\) converges almost surely (or with probability 1) to a constant \(a\in{\mathbb R}\) if, \[P\left(\lim\limits_{n\rightarrow\infty}X_n=a\right)=1\,.\]

Strong Law of Large Numbers

If \(\{X_n\}_n\) is a sequence of independent and identically distributed random variables with \({\mathbb E}[X_i]=\mu\), then \[P\left(\lim\limits_{n\rightarrow\infty}\overline{X}_n=\mu\right)=1\,,\] where \(\overline{X}_n=\frac{1}{n}\sum_{i=1}^n X_i\),.

SLNN for a r.v. with finite fourth moment

If \(\{X_n\}_n\) is a sequence of iid r.v.s with \({\mathbb E}[X_i]=0\) and \({\mathbb E}[X_i^4]=\mu_4<\infty\),

\[\begin{align*} {\mathbb E}\left[\frac{\left(\sum_{i=1}^n X_i\right)^4}{n^4}\right]&=\left(n{\mathbb E}[X_i^4]+6{n\choose 2}{\mathbb E}[X_i^2]{\mathbb E}[X_j^2]\right)/n^4\\ &=\left(n\mu_4+3n(n-1)\mu_2^2\right)/n^4\\ &\leq\frac{3n-2}{n^3}\mu_4. \end{align*}\]

Now \[{\mathbb E}\sum_{n=1}^\infty\left[\frac{\left(\sum_{i=1}^n X_i\right)^4}{n^4}\right]=\sum_{n=1}^\infty {\mathbb E}\left[\frac{\left(\sum_{i=1}^n X_i\right)^4}{n^4}\right]<\infty.\] Then \(\sum\limits_{n=1}\limits^{\infty}\left(\sum\limits_{i=1}\limits^{n} X_i\right)^4/n^4<\infty\) a.s. and \(\lim\limits_{n\rightarrow\infty} \left(\sum\limits_{i=1}\limits^{n} X_i\right)^4/n^4=0\) a.s. We conclude \(\lim\limits_{n} \overline{X}_n=\lim\limits_{n} \sum\limits_{i=1}\limits^{n} X_i/n=0\) a.s.

Convergence in probabiliy vs almost sure convergence

The convergence in probabilty is implied by the almost sure convergence while the reverse does not hold, and thus the names of Weak LLN and Strong LLN.

The almost sure convergence of an estimator to the value of the parameter is referred to as strong consistency.

Example of sequence of random variables converging in probability, but not almost surely.

The sequence of independent r.v.s \(\{X_n\}_n\) with distributions \(P(X_n=1)=1/n\) and \(P(X_n=0)=1-1/n\) converges in probability to \(0\), but it does not converge to \(0\) almost surely.

Almost sure convergence

set.seed(10)
plot(cumsum(rbinom(rep(1,1000),size=1,prob=0.5))/(1:1000),
     type="l",ylab="H freq",xlab="n toss")
abline(h=c(0.53,0.47))

Convergence in probabiliy

for(i in 1:30){set.seed(i)
 points(cumsum(rbinom(rep(1,1000),size=1,prob=0.5))/(1:1000),
         type="l",col=i)}