Resources for students in Stat 111 (Spring 2023). Managed by aL Xin.
View files on GitHub awqx/stat111-2023
SectionsThm (Slutsky’s Theorem). If $X_1, X_2, \dots$ and $Y_1, Y_2, \dots$ are sequences of r.v.s such that $X_n$ converges to $X$ in distribution and $Y_n$ converges to $c$ in probability, then
Thm. (Continuous mapping theorem). If $X_1, X_2, \dots$ is a sequence of r.v.s and $g$ is a continuous function, then
Delta method. Relates to convergence in distribution. Let $g$ be differentiable such that $g’(\mu) \neq 0$.
\[\begin{aligned} \frac{\sqrt n (Y_n - \mu)}{\sigma} &\overset{d}{\to} \mathcal{N}(0, 1) \\ \frac{\sqrt n (g(Y_n) - g(\mu))}{\mid g'(\mu)\mid \sigma} &\overset{d}{\to} \mathcal{N}(0, 1) \end{aligned}\]Check: Why do we use absolute value in the denominator?
Def (Consistency). An estimator $\hat\theta$ is consistent for the estimand $\theta^\ast$ if $\hat\theta \overset{p}{\to} \theta^\ast$ as $n \to \infty$
Def (Fisher information). Fisher information is defined as
\[\mathcal{I}_{\vec{Y}}(\theta^*) = \textrm{Var}(s(\theta^*; \vec{Y}))\]Def. (Kullback-Leibler divergence). Kullback-Leibler divergence, also called K-L divergence or relative entropy, is defined as
\[D(F\mid \mid G) = E_f \left[ \log \frac{f(\vec{X})}{g(\vec{X})} \right] = \int_{-\infty}^\infty \log \frac{f(\vec{x})}{g(\vec{x})} f(\vec{x}) d(\vec{x})\]Thm (Cramer-Rao Lower Bound, CRLB). Under regularity, if $\hat\theta$ is unbiased for $\theta$, then
\[\textrm{Var}(\hat\theta) \geq \frac{1}{n \mathcal{I}_1(\theta^*)}\]Let $Y_j$ be the number of typos on page $j$ of a certain book, and suppose that the $Y_j$ are i.i.d. $Pois(\lambda)$, with $\lambda$ unknown. Let
\[\theta = P(Y_j \geq 1 \mid \lambda),\]the probability of a page having at least one typo. The first $n$ pages of the book are proofread extremely carefully, so $Y_1,\dots,Y_n$ are observed.
Find the MLE $\hat{\lambda}$ of $\lambda$.
Show that $\hat{\lambda}$ is approximately Normal for $n$ large, and give the parameters.
Find the MLE $\hat{\theta}$ of $\theta$.
Find the distribution of $\hat\theta$ in three different ways:
Use Fisher information and the result about the asymptotic distribution of the MLE.
Use simulation for the case where $n=10$ and the true value is $\lambda^* = 1$, performing at least $10^4$ replications.
Compare the results of the three approaches above: how similar or different are they? If they differ substantially, which do you trust the most?
Discuss the relative advantages and disadvantages of the three approaches above (e.g., in terms of accuracy, generality, and ease of computation).
Suppose we have $X_1,\cdots, X_n \overset{i.i.d.}{\to} \textrm{Unif}(0,\theta), \theta >0$.
Let $n=1$, write down score function $s(\theta; x)$. Find $E[s(\theta^\ast; X_1)]$, where $\theta^\ast$ is the true parameter.
With $n=1$, find $\mathbb{E}\left[(s(\theta^\ast; X_1))^2\right]$, $\textrm{Var}(s(\theta^\ast; X_1))$, and $-\mathbb{E}\left[\frac{\partial s(\theta^\ast; X_1)}{\partial \theta^\ast}\right]$. Are they equal? Can you explain this?
In fact, when the regularity conditions does not hold, we re-define Fisher information as $\mathcal{I}{\mathbf{X}}(\theta^\ast)=E[(s(\theta^\ast; \mathbf{X}))^2]$. Find $\mathcal{I}{X_1}(\theta^\ast)$ in this setting.
From this part, consider a general $n>0$. Find $\mathcal{I}_{\mathbf{X}}(\theta^\ast)$.
Let $\hat{\theta}$ be the MLE for $\theta$. For a constant $\epsilon>0$, find $\Pr(\theta-\hat{\theta}>\epsilon)$ and use this to prove the consistency of $\hat{\theta}$.
Show that the MLE is biased (“story proof” is fine). Propose an unbiased estimator of $\theta$ in the form of $c\hat{\theta}$, where $c$ is constant. What is the variance of this estimator?
How does the variance of your estimator in 2.5 compare to the inverse of Fisher information?