# Chapter 4 Variance reduction techniques

## 4.1 Antithetic variables

Commonly simulation techniques are used to estimate a value $$\theta$$ that represents the mean of a rv $$\theta={\mathbb E}[X]$$.

A sequence of observations of $$X$$ denoted $$X_1,X_2,\ldots,X_n$$ is generated and $$\theta$$ is estimated as the average value $$n^{-1}\sum_{i=1}^n X_i$$.

If the estimator is unbiased, the Mean Square Error of the estimate is its variace ${\rm MSE}\left[\frac{1}{n}\sum_{i=1}^n X_i\right]={\mathbb E}\left[\left(\frac{1}{n}\sum_{i=1}^n X_i-\theta\right)^2\right]=\frac{1}{n^2}{\rm Var}\left[\sum_{i=1}^n X_i\right].$

If the $$X_i$$s are independent, the MSE is $${\rm Var}[X]/n$$.

Consider the situation at which we are simulating $$n/2$$ bivariate rvs $$(X_i,Y_i)$$ all of them with the same distribution, the distribution of all the marginals $$X$$ and $$Y$$ being the same, and $${\mathbb E}[X]=\theta$$, then $${\rm Cov}(X_i,Y_i)$$ is fixed for every $$i$$ and possibly different from $$0$$, while $${\rm Cov}(X_i,X_j)={\rm Cov}(Y_i,Y_j)=0$$ otherwise.

If $${\rm Cov}(X,Y)<0$$, then the variance of this estimator of $$\theta$$ is less than the variance of the estimator obtained out of $$n$$ independent rvs.

If $$X$$ is obtained as a transformation of $$k$$ random numbers in $$(0,1)$$ as $X=h(U_1,U_2,\ldots,U_k),$ where each $$U_i\sim{\rm U}(0,1)$$ and independent one from the others, then $$1-U_i\sim{\rm U}(0,1)$$ and independent one from the others.

In conclusion, $Y=h(1-U_1,1-U_2,\ldots,1-U_k),$ follows the distribution of $$X$$.

Further, under monotonicity assumptions on $$h$$, the rvs $$X$$ and $$Y$$ are negatively correlated.

Example

$\int_0^1 e^x\,{\rm d}x=e-1=1.718282$

set.seed(1)
MC <- 2000
sim.u <- runif(MC)
sim.exp <- exp(sim.u)
sim.ant <- (exp(sim.u[1:(MC/2)])+exp(1-sim.u[1:(MC/2)]))/2
sd(sim.exp)/sqrt(MC)
## [1] 0.01114951
sd(sim.ant)/sqrt(MC/2)
## [1] 0.001992196
plot(cumsum(exp(sim.u))/(1:MC),type="l",ylim=c(1.6,1.8))
bytwo <- seq(2,MC,by=2)
points(bytwo,cumsum(sim.ant)/(1:(MC/2)),type="l",col="blue")
abline(h=exp(1)-1,col="red")

• If the basic ingredient of our simulation algorithm is $$U\sim{\rm U}(0,1)$$, use the pair of negatively correlated and identically distributed rvs $$U,1-U$$.
• If the basic ingredient of our simulation algorithm is $$X\sim{\rm N}(\mu,\sigma)$$, use use the pair of negatively correlated and identically distributed rvs $$X,2\mu-X$$.
• If the basic ingredient of our simulation algorithm has cdf $$F$$, use the pair of negatively correlated and identically distributed rvs $$F^{-1}(U),F^{-1}(1-U)$$.

## 4.2 Control variates

Our goal is to estimate $$\theta={\mathbb E}[X]$$, where $$X$$ is a rv that we can simulate.

We can also simulate rv $$Y$$ which is (negatively) correlated with $$X$$ and whose mean is $${\mathbb E}[Y]=\mu_Y$$.

For any constant $$c$$, the mean of $$X+c(Y-\mu_y)$$ is $$\theta$$, and its variance is ${\rm Var}[X+c(Y-\mu_y)]={\rm Var}[X]+c^2{\rm Var}[Y]+2c{\rm Cov}(X,Y),$ which is minimized for $c^*=\frac{|{\rm Cov}(X,Y)|}{{\rm Var}(Y)}$ and the resulting variance is ${\rm Var}[X+c^*(Y-\mu_y)]={\rm Var}[X]-\frac{{\rm Cov}(X,Y)^2}{{\rm Var}(Y)}.$

Example

$\int_0^1 e^x\,{\rm d}x=e-1=1.718282$ We know $$\int_0^1 (1-x)\,{\rm d}x=1/2$$, and the rvs $$e^U$$ and $$1-U$$ are negatively correlated.

sim.cv <- exp(sim.u)+(1-sim.u-.5)
mean(sim.cv)
## [1] 1.716722
sd(sim.cv)/sqrt(MC)
## [1] 0.004738436
plot(cumsum(exp(sim.u))/(1:MC),type="l",ylim=c(1.6,1.8))
points(bytwo,cumsum(sim.ant)/(1:(MC/2)),type="l",col="blue")
points(cumsum(sim.cv)/(1:MC),type="l",col="green")
abline(h=exp(1)-1,col="red")

## 4.3 Stratified sampling

Our goal is to estimate $$\theta={\mathbb E}[X]$$.

• There is a discrete variable $$Y$$ with support $$\{y_1,y_2,\ldots,y_k\}$$ and known distribution $$P(Y=y_i)=p_i$$.
• For each $$i=1,\ldots,k$$ we can simulate $$X|Y=y_i$$.

The stratified sampling estimator of $$\theta$$ is obtained after taking $$np_i$$ observations of $$X|Y=y_i$$ for each $$i=1\ldots,k$$ (a total number of $$n$$ observations is taken) and $$\theta$$ is estimated as $$\sum_{i=1}^k\overline{X}_ip_i$$, where $$\overline{X}_i$$ is the sample mean of $$X$$ over the $$np_i$$ instances for which $$Y=y_i$$.

Observe that (by the iterated expectation formula) ${\mathbb E}\left[\sum_{i=1}^k\overline{X}_ip_i\right]=\sum_{i=1}^k{\mathbb E}[\overline{X}_i]p_i=\sum_{i=1}^k{\mathbb E}[X|Y=y_i]p_i={\mathbb E}[X]\,.$

Variance of the stratified sampling estimator

\begin{align*} {\rm Var}\left[\sum_{i=1}^k\overline{X}_ip_i\right]&=\sum_{i=1}^k p_i^2{\rm Var}[\overline{X}_i]\\ &=\sum_{i=1}^k p_i^2\frac{1}{np_i}{\rm Var}[X|Y=y_i]\\ &=\frac{1}{n}{\mathbb E}[{\rm Var}[X|Y]]. \end{align*}

Apply the conditional variance formula ${\rm Var}[X]={\mathbb E}[{\rm Var}[X|Y]]+{\rm Var}[{\mathbb E}[X|Y]]$ on $$\overline{X}$$ to check that ${\rm Var}[\overline{X}]-{\rm Var}\left[\sum_{i=1}^k\overline{X}_ip_i\right]=\frac{1}{n}{\rm Var}[{\mathbb E}[X|Y]]\geq 0$

Example

p <- c(1728,1346,1869,1822,1056)/7821
sim.claim<-function(n) {
claim<-runif(n,min=400,max=800)
sim.u<-runif(n)
bin1<-which(sim.u<p[1]) ; claim[bin1]<-runif(length(bin1),min=0,max=50)
bin2<-which(sim.u>p[1] & sim.u<sum(p[1:2])) ; claim[bin2]<-runif(length(bin2),min=50,max=100)
bin3<-which(sim.u>sum(p[1:2]) & sim.u<sum(p[1:3])) ; claim[bin3]<-runif(length(bin3),min=100,max=200)
bin4<-which(sim.u>sum(p[1:3]) & sim.u<sum(p[1:4])) ; claim[bin4]<-runif(length(bin4),min=200,max=400)
return(claim)
}
sclaim <- sim.claim(7821)
c(mean(sclaim),var(sclaim)/7821)
## [1] 203.902862   4.736561
sclaim <- list(runif(p[1]*7821,min=0,max=50),
runif(p[2]*7821,min=50,max=100),
runif(p[3]*7821,min=100,max=200),
runif(p[4]*7821,min=200,max=400),
runif(p[5]*7821,min=400,max=800))
(p[1]*mean(sclaim[[1]])+p[2]*mean(sclaim[[2]])+p[3]*mean(sclaim[[3]])
+p[4]*mean(sclaim[[4]])+p[5]*mean(sclaim[[5]]))
## [1] 205.3114
(p[1]*var(sclaim[[1]])+p[2]*var(sclaim[[2]])+p[3]*var(sclaim[[3]])
+p[4]*var(sclaim[[4]])+p[5]*var(sclaim[[5]]))/7821
## [1] 0.370737
var(sample(100000,x=c(25,75,150,300,600),prob=p,replace=TRUE))/7821
## [1] 4.320348

Example

$\int_0^1 e^x\,{\rm d}x=e-1=1.718282$

strata <- 10
sim.str <- exp(sim.u/strata+seq(0,.9,by=.1))
mstr <- vector(length=strata) ; vstr <- vector(length=strata)
for(i in 1:strata){
mstr[i]<-mean(sim.str[(1+(MC/strata)*(i-1)):((MC/strata)*i)])
vstr[i]<-var(sim.str[(1+(MC/strata)*(i-1)):((MC/strata)*i)])
}
mean(mstr)
## [1] 1.717646
mean(vstr)/MC
## [1] 0.0001220831

## 4.4 Importance sampling

Importance sampling is a variance reduction technique.

Assume we want to estimate $$p$$ small, then the relative standard error of $$\hat{p}_n$$ is $$\sigma(\hat{p}_n)/p=\sqrt{p(1-p)/n}p^{-1}\approx 1/\sqrt{np}$$. If $$p=10^{-6}$$, then the relarive standard error is $$10^3/\sqrt{n}$$ which is very large.

Consider rv $$X$$ with dmf $$f$$, in order to estimate $${\mathbb E}[h(X)]=\int h(x)f(x)\,{\rm d}(x)$$ use an instrumental density function $$g$$ such that:

• $$f(x)=0$$ whenever $$g(x)=0$$

If $$Y$$ stands for a rv with dmf $$g$$, then ${\mathbb E}[h(X)]=\int h(x)f(x)\,{\rm d}(x)=\int h(x)\frac{f(x)}{g(x)}g(x)\,{\rm d}(x)={\mathbb E}[h(Y)\omega(Y)]\,,$ where the weight function $$\omega=f/g$$.

By sampling from $$g$$ and averaging $$h(y_i)\omega(y_i)$$ we obtain an unbiased estimation of $${\mathbb E}[h(Y)]$$ whose variance is ${\rm Var}\left[\frac{1}{n}\sum_{i=1}^n h(Y_i)\omega(Y_i)\right]=\frac{1}{n}\left({\mathbb E}[h^2(X)\omega(X)]-{\mathbb E}[h(X)]^2\right)$

While when $$g=f$$, the variance is $$({\mathbb E}[h^2(X)]-{\mathbb E}[h(X)]^2)/n$$. In order to reduce the variance we need that $$\omega$$ is small – equiv. $$g$$ is large – for large values of $$h^2$$ (while it can be large – equiv. $$g$$ small – at the values where $$h^2$$ is small).

1-pnorm(5)
## [1] 2.866516e-07

Example

set.seed(1) ; MC <- 10000
s.norm <- rnorm(MC)
c(mean(s.norm>5),var(s.norm>5)/MC)
## [1] 0 0
ss.norm <- s.norm+5
is <- (ss.norm>5)*exp((25-10*ss.norm)/2)
c(mean(is),var(is)/MC)
## [1] 2.799445e-07 4.565146e-17

Tilting

Density function $f_t(x)=\frac{e^{tx}f(x)}{M(t)}$ is called tilted density of $$f$$, for $$-\infty<t<\infty$$.

Tilting is a natural way to obtain instrumental dmfs for importance sampling.

If $$X\sim{\rm N}(\mu,\sigma)$$, by tilting, we obtain $${\rm N}(\mu+t\sigma^2,\sigma)$$.