Logo

Resources for students in Stat 111 (Spring 2023). Managed by aL Xin.

View files on GitHub awqx/stat111-2023

Sections
Lectures
Exam materials
R Bootcamp
Syllabus

Lecture 06 - Delta method, MLE properties

09 Feb 2023


Review from yesterday’s concepts

Examples

Ex. Let $Y_1, Y_2, \dots$ be i.i.d. with mean $\mu \neq 0$ and variance $\sigma^2$.

Create a statistic

\[T_h = \frac{\sqrt{n} (\bar Y_n - \mu)}{Y_n^3 + \mu^3}\]

We can solve the components as

\[\begin{aligned} \sqrt{n}(\bar Y_n - \mu) \overset{d}{\to} \sigma Z, Z \sim \mathcal{N}(0, 1) \textrm{ by CLT} \\ \bar{Y}_n \to \mu \textrm{ by LLN } \\ \bar Y_n^3 + \mu^3 \to 2 \mu^3 \textrm{ by CMT} \\ T_h \overset{d}{\to} \frac{\sigma Z}{2 \mu^3} \sim \mathcal{N}(0, \sigma^2 / (4\mu^6)) \end{aligned}\]

Delta method

Thm (Delta method). Let $g$ be a differentiable function. Suppose $\sqrt{n} (T_n - \theta) \overset{d}{\to} \mathcal{N}(0, \sigma^2)$, where $T_n$ is an r.v. and $\theta$ is a constant. Then,

\[\sqrt{n} (g(T_n) - g(\theta)) \overset{d}{\to} \mathcal{N} \left( 0, (g'(\theta))^2 \sigma^2 \right)\] \[T_n - \theta = \frac{1}{\sqrt n} \sqrt n (T_n - \theta)\]

Thm (Delta method restatement). If $T_n \dot\sim \mathcal{N}(\theta, \frac{\sigma^2}{n})$, then $g(T_n) \dot\sim \mathcal{N}\left(g(\theta), \frac{g’(\theta)^2 \sigma^2}{n}\right)$

Examples

Ex Let $T \sim \textrm{Pois}(\lambda)$. The Poisson distribution is $\approx \mathcal{N}(\lambda, \lambda)$ for large $\lambda$, where $\lambda \geq 0$ and preferable $\geq 100$.

We have $\sqrt{T} \sim \mathcal{N}(\sqrt{\lambda}, 1/4)$. When $g(x) = x^{1/2}$, we have $g’(x) = \frac{1}{2} x^{-1/2}$. This has the effect of stabilizing the variance. No matter how large $\lambda$ is, this particular transformation will always result in a variance of $1/4$.

Properties of MLE

Under regularity conditions, $\hat\theta$ exists and is consistent (converges to the estimand in probability, i.e., $\hat\theta \overset{p}{\to} \theta^\ast$.

Note on notation: $\theta^\ast$ is the true estimand. Sometimes, $\theta$ (no asterisk) means “any” $\theta$ we could consider (like the statement of the Delta method). Sometimes it’s hard to be consistent, so ask if you’re confused.

Thm (Asymptotic distribution of the MLE). The distribution of the MLE converges to

\[\sqrt{n}(\hat\theta_n - \theta^\ast) \overset{d}{\to} \mathcal{N}(0, ???)\]

Score and Fisher info

Def (Score). The score function is the derivative of the log-likelihood. If $L(\theta; y)$ is the likelihood and $\ell(\theta; y)$ is $\log L(\theta; y)$, we have that

\[s(\theta; y) = \frac{\partial}{\partial \theta} \ell(\theta; y),\]

(assuming that $\ell(\theta; y)$ is single-dimensional).

Def (Fisher information). The fisher information is

\[\mathcal{I}_Y(\theta) = \textrm{Var}(s(\theta; Y); \theta),\]

where $Y$ is random and the variance is computed under the assumption that the true data-generating estimand is $\theta$.

NO BAYES STUFF THIS WEEK.

Thm (Score expectation). The expected value of the score is 0.

Proof: We can use differentiation under the integral sign (DUThIS) (also called Leibniz’s rule).

\[\frac{d}{d\theta} \int_{-\infty}^{\infty} h(x, \theta) dx = \int_{-\infty}^{\infty} \frac{d}{d\theta} h(x, \theta) dx.\]

Applying this to expectation, we have

\[\begin{aligned} \mathbb{E}[s(\theta; Y)] &= \int_{-\infty}^{\infty} s(\theta; y) f(y; \theta) dy \\ &= \int_{-\infty}^{\infty} \frac{L'(\theta; y)}{L(\theta; y)} f(y; \theta) dy \\ &= \int_{-\infty}^{\infty} \frac{\partial}{\partial \theta} f(y; \theta) dy \\ &= \frac{\partial}{\partial \theta} \int_{-\infty}^{\infty} f(y; \theta) dy \\ &= \frac{\partial}{\partial \theta} 1 \\ &= 0. \end{aligned}\]

Thm (Score-info). $\mathbb{E}[s’(\theta; Y)] = -\mathcal{I}_Y(\theta)$