Logo

Resources for students in Stat 111 (Spring 2023). Managed by aL Xin.

View files on GitHub awqx/stat111-2023

Sections
Lectures
Exam materials
R Bootcamp
Syllabus

Section 02 - Asymptotics and MLE properties

09 Feb 2023


Content

Asymptotics

Thm (Slutsky’s Theorem). If $X_1, X_2, \dots$ and $Y_1, Y_2, \dots$ are sequences of r.v.s such that $X_n$ converges to $X$ in distribution and $Y_n$ converges to $c$ in probability, then

  1. $X_n + Y_n \overset{d}{\to} X + c$
  2. $X_n Y_n \overset{d}{\to} cX$
  3. $X_n / Y_n \overset{d}{\to} X / c$ for $c \neq 0$

Thm. (Continuous mapping theorem). If $X_1, X_2, \dots$ is a sequence of r.v.s and $g$ is a continuous function, then

  1. If $X_n \overset{d}{\to} X$, then $g(X_n) \overset{d}{\to} g(X)$
  2. If $X_n \overset{p}{\to} X$, then $g(X_n) \overset{d}{\to} g(X)$

Delta method. Relates to convergence in distribution. Let $g$ be differentiable such that $g’(\mu) \neq 0$.

\[\begin{aligned} \frac{\sqrt n (Y_n - \mu)}{\sigma} &\overset{d}{\to} \mathcal{N}(0, 1) \\ \frac{\sqrt n (g(Y_n) - g(\mu))}{\mid g'(\mu)\mid \sigma} &\overset{d}{\to} \mathcal{N}(0, 1) \end{aligned}\]

Check: Why do we use absolute value in the denominator?

Consistency

Def (Consistency). An estimator $\hat\theta$ is consistent for the estimand $\theta^\ast$ if $\hat\theta \overset{p}{\to} \theta^\ast$ as $n \to \infty$

Score function and Fisher information

Well-specified

Score function

\[s(\theta; \vec{Y}) = \frac{\partial \ell (\theta; \vec{Y})}{\partial \theta}\] \[\begin{aligned} E(s(\theta^*; \vec{Y})) &= 0 \\ \textrm{Var}(s(\theta^*; \vec{Y})) = -E(s'(\theta^*; \vec{Y})) \end{aligned}\]

Fisher information

Def (Fisher information). Fisher information is defined as

\[\mathcal{I}_{\vec{Y}}(\theta^*) = \textrm{Var}(s(\theta^*; \vec{Y}))\] \[\mathcal{I}(\theta) = -E(s'(\theta^*; \vec{Y})) = E(s(\theta^*; \vec{Y})^2)\] \[\mathcal{I}(\tau) = \frac{\mathcal{I}(\theta)}{g'(\theta)^2}\]

Kullback-Leibler divergence

Def. (Kullback-Leibler divergence). Kullback-Leibler divergence, also called K-L divergence or relative entropy, is defined as

\[D(F\mid \mid G) = E_f \left[ \log \frac{f(\vec{X})}{g(\vec{X})} \right] = \int_{-\infty}^\infty \log \frac{f(\vec{x})}{g(\vec{x})} f(\vec{x}) d(\vec{x})\]

Cramer-Rao lower bound

Thm (Cramer-Rao Lower Bound, CRLB). Under regularity, if $\hat\theta$ is unbiased for $\theta$, then

\[\textrm{Var}(\hat\theta) \geq \frac{1}{n \mathcal{I}_1(\theta^*)}\] \[\textrm{Var}(\hat\theta) \geq \frac{g'(\theta^*)^2}{n \mathcal{I}_n (\theta^*)}\]

Properties of MLE

\[\sqrt n (\hat\theta - \theta^*) \overset{d}{\to} \mathcal{N}\left(0, \frac{1}{\mathcal{I}(\theta^*)}\right)\]

Problems

1 Typo searching

Let $Y_j$ be the number of typos on page $j$ of a certain book, and suppose that the $Y_j$ are i.i.d. $Pois(\lambda)$, with $\lambda$ unknown. Let

\[\theta = P(Y_j \geq 1 \mid \lambda),\]

the probability of a page having at least one typo. The first $n$ pages of the book are proofread extremely carefully, so $Y_1,\dots,Y_n$ are observed.

1.1 MLE

Find the MLE $\hat{\lambda}$ of $\lambda$.

1.2 Approximate distribution

Show that $\hat{\lambda}$ is approximately Normal for $n$ large, and give the parameters.

1.3 MLE

Find the MLE $\hat{\theta}$ of $\theta$.

1.4 Distribution of $\hat\theta$

Find the distribution of $\hat\theta$ in three different ways:

1.4.1 Fisher information

Use Fisher information and the result about the asymptotic distribution of the MLE.

1.4.2 Delta method

1.4.3 Simulation

Use simulation for the case where $n=10$ and the true value is $\lambda^* = 1$, performing at least $10^4$ replications.

1.4.4 Compare

Compare the results of the three approaches above: how similar or different are they? If they differ substantially, which do you trust the most?

1.4.5 Discuss

Discuss the relative advantages and disadvantages of the three approaches above (e.g., in terms of accuracy, generality, and ease of computation).

2 Score and Fisher info of Uniform

Suppose we have $X_1,\cdots, X_n \overset{i.i.d.}{\to} \textrm{Unif}(0,\theta), \theta >0$.

2.1 Finding the score

Let $n=1$, write down score function $s(\theta; x)$. Find $E[s(\theta^\ast; X_1)]$, where $\theta^\ast$ is the true parameter.

2.2 Checking properties of score

With $n=1$, find $\mathbb{E}\left[(s(\theta^\ast; X_1))^2\right]$, $\textrm{Var}(s(\theta^\ast; X_1))$, and $-\mathbb{E}\left[\frac{\partial s(\theta^\ast; X_1)}{\partial \theta^\ast}\right]$. Are they equal? Can you explain this?

2.3 Fisher w/o regularity

In fact, when the regularity conditions does not hold, we re-define Fisher information as $\mathcal{I}{\mathbf{X}}(\theta^\ast)=E[(s(\theta^\ast; \mathbf{X}))^2]$. Find $\mathcal{I}{X_1}(\theta^\ast)$ in this setting.

2.3 More Fisher

From this part, consider a general $n>0$. Find $\mathcal{I}_{\mathbf{X}}(\theta^\ast)$.

2.4 MLE

Let $\hat{\theta}$ be the MLE for $\theta$. For a constant $\epsilon>0$, find $\Pr(\theta-\hat{\theta}>\epsilon)$ and use this to prove the consistency of $\hat{\theta}$.

2.5 Bias

Show that the MLE is biased (“story proof” is fine). Propose an unbiased estimator of $\theta$ in the form of $c\hat{\theta}$, where $c$ is constant. What is the variance of this estimator?

2.6 Variance of (2.5)

How does the variance of your estimator in 2.5 compare to the inverse of Fisher information?