Resources for students in Stat 111 (Spring 2023). Managed by aL Xin.
View files on GitHub awqx/stat111-2023
SectionsThm (Invariance under reparameterization). Let $\psi = g(\theta)$, where $g$ is invertible (one-to-one). Then, $L(\psi; y) = L(\theta; y)$.
Proof: $L(\psi; y) = f(y; \psi) = f(y; \theta) = L(\theta; y)$. When evaluating individual values, we have that $L(\psi; y) = L(g(\theta); y), L(g^{-1}(\psi); y) = L(\theta; y)$.
Corollary: The MLE is also invariant. $\hat\psi = g(\hat\theta)$.
Ex (MLE of normal parameters). We have $\theta = (\mu, \sigma^2)$. We then have $(\hat\mu, \hat\sigma^2) = (\bar Y, \frac{1}{n} \sum_i (Y_i - \bar Y)^2 )$. We can use invariance to get $\hat\sigma = \sqrt{\hat\sigma^2}$.
Ex (Logit function). $Y \sim \textrm{Bin}(n, p), \theta = \textrm{logit}(p) = \log\frac{p}{1 - p}$ (log-odds).
We have $L(p) = p^y (1 - p)^{n - y}$. We have log-likelihood $\ell(p) = y \log p + (n - 1) \log(1 - p)$. The first derivative w.r.t. $p$ is $\ell’(p) = \frac{y}{p} - \frac{n - y}{1 - p}$. Setting this equal to zero and solving for $p$ gives us $\hat p = \frac{Y}{n}$ (make sure to put a nice hat on it) (capital $Y$ is estimator, lowercase $y$ is estimate).
By invariance, $\hat\theta = \textrm{logit}(\hat p)$.
Ex (Bad unbiased estimators). $Y \sim \textrm{Pois}(\lambda)$. Our estimand is $\theta = e^{-3 \lambda}$.
Consider the estimator $\hat\theta = (-2)^y$. Claim: this is unbiased.
\[\mathbb{E}[\hat\theta] = \left(\lambda \sum_{k = 0}^\infty \frac{e^{-\lambda} \lambda^k}{k!} \right) = e^{-\lambda} e^{-2\lambda} = \theta.\]This is the only unbiased estimator! But it sucks because we’ll regularly estimate invalid values (i.e., outside the range $[0, 1]$).
We also have that $\hat\lambda_{MLE} = Y$. By invariance, $\hat\theta_{MLE} = e^{-3 Y}$. This is biased but better.
Ex (MoM of Poisson). Let $Y_1, \dots, Y_n \overset{i.i.d.}{\sim} \textrm{Pois}(\theta)$. Find a method of moments estimator.
Using the first moment. We have $\theta = \mathbb{E}[Y_1]$. We then have $\hat\theta_{MoM} = \bar Y$. Note this is the same as the MLE. This estimator is unbiased.
Using the second moment. We can write the variance $\theta = \mathbb{E}(Y_1^2) - \mathbb{E}Y_1^2$. We then have $\hat\theta_{MoM2} = \frac{1}{n} \sum_{j = 1}^n Y_j^2 - \bar Y^2$. This estimator is biased.
Which is better in terms of MSE? We can solve this with theoretical calculations, simulation, or asymptotics. The first is “better” in terms of MSE.
Ex (MoM of paired data). Let $(X_j, Y_j)$ be i.i.d. pairs (independence across pairs, not within pairs) and let $j = 1, \dots, n$. Let us define an estimand
\[\beta = \frac{\textrm{Cov} (X, Y)}{ \textrm{Var}(X)} = \frac{\mathbb{E}(XY) - \mathbb{E}X \mathbb{E}Y}{\mathbb{E}(X^2) - \mathbb{E} X^2}.\](This is an important estimand in linear regression; more discussion is available on Ed. We will also discuss this later in the course).
We can then make a MoM estimator
\[\hat\beta_{MoM} = \frac{\frac{1}{n} \sum_{j = 1}^n X_j Y_j - \bar X \bar Y} {\frac{1}{n} \sum_{j = 1}^n X_j^2 - \bar X^2}.\]