Logo

Resources for students in Stat 111 (Spring 2023). Managed by aL Xin.

View files on GitHub awqx/stat111-2023

Sections
Lectures
Exam materials
R Bootcamp
Syllabus

Section 01 - Basics, MLE, MoM

02 Feb 2023


See solutions here

Table of contents

Topics covered in section

Stat 110 material

Stat 111 basics

Key vocabulary

Benchmarking: Bias, variance, standard error

Bayesian and frequentist views

Frequentist

Bayesian

Likelihood

Maximum likelihood estimation

Ex: Calculate the MLE for $p$ for $Y_1, \dots, Y_n \overset{i.i.d}{\sim} \textrm{Geom}(p)$. To find $\hat{p}_{MLE}$, first determine the likelihood. We have

\[L(p; \vec{y}) = \prod_{j = 1}^n (1 - p)^{y_i} p = (1 - p)^{\sum_n y_i} p^n.\]

Take the log of the above to find

\[\ell (p) = \sum_{j = 1}^n y_i \log (1 - p) + n \log (p).\]

Take the first derivative

\[\frac{\partial}{\partial p} \ell(p) = -\frac{\sum_{j = 1}^n y_i}{1 - p} + \frac{n}{p}\]

and the second derivative

\[\frac{\partial^2}{\partial p^2} \ell(p) = -\frac{\sum_{j = 1}^n y_i}{(1 - p)^2} - \frac{n}{p^2}.\]

Set the first derivative equal to 0 and solve for the critical point $p^*$ to find

\[p^* = \frac{n}{n + \sum_{j = 1}^n y_i}\]

Method of moments estimation

Ex: Assume our data is distributed i.i.d. Geometric. The mean of a geometric distribution is $\frac{1 - p}{p}$. The sample mean is $\hat{Y}$. We can set the two equal to find

\[\frac{1 - p}{p} = \bar{Y}.\]

Add 1 to both sides to derive

\[\frac{1}{\hat p} = \bar{Y} + 1 \to \hat p = \frac{1}{\bar{Y} + 1} = \frac{n}{n + \sum Y_i}.\]

Practice problems

Problem 1 is sourced from Patrick Dickinson, COL 2022. Problems 2 and 3 are past HW questions.

Problem 1 (True/False questions)

  1. There exists an estimator $\hat\theta$ for which $\textrm{MSE}(\hat\theta) = (\textrm{Bias}(\hat\theta))^2$
  2. If $\hat\theta$ is unbiased, then all other estimators are biased.
  3. The sample mean is unbiased under all models where a mean exists.
  4. The squared error loss of an estimator for its estimand is a random variable.
  5. When data comes from a discrete distribution, the likelihood function is also discrete.

Problem 2 (Medical testing)

This problem will be a review of conditioning

We are trying to estimate what proportion of the Harvard body (staff, grad students, and undergraduates) have some unspecified pandemic illness. The administration conducts frequent tests to determine the prevalence and incidence of infection.

(a) Perfect test

Assume that the test is perfect and that whether a person tests positive is distributed i.i.d. Bernoulli with proportion $p$.

Determine the MLE and produce an estimate of $p$ given the observations. Assume the data is in the vector $\vec{x}$ of length $n$.

*Note: Why is i.i.d. Bernoulli a poor choice of model here?**

(b) Imperfect test

The sensitivity of a test is the probability that someone who has the condition tests positive.

The specificity of a test is the probability that someone who does not have the condition tests negative.

Let sensitivity be $a$ and specificity by $b$. Correct the likelihood function to accommodate for error in the test.

Problem 3 (DNA sequence)

A DNA sequence can be composed of four different possible base pairs. For example, consider the following sequence:

CTACCTTCAATTGCTGGAACG

(a) Multinomial model

For simplicity, assume that DNA base pairs are selected from a Multinomial distribution (ignore other properties of DNA base pairs).

Let the probabilities corresponding to the base pair selection be represented as $\theta = (p_a, p_c, p_g, p_t)$.

Write the log-likelihood of $\theta$

(b) Markov chain

We can use a Markov chain model to accommodate for possible violations of independence in the genome. Let us model the sequence as a lag-1 Markov model.

In a lag-1 Markov model, we have $\textrm{Pr}(X_s \mid X_{s - 1}, \dots, X_1) = \textrm{Pr}(X_s \mid X_{s - 1})$.

This model can be represented with a $4 \times 4$ transition matrix:

\[T = \begin{pmatrix} \tau_{aa} & \tau_{ac} & \tau_{ag} & \tau_{at} \\ \tau_{ca} & \tau_{cc} & \tau_{cg} & \tau_{ct} \\ \tau_{ga} & \tau_{gc} & \tau_{gg} & \tau_{gt} \\ \tau_{ta} & \tau_{tc} & \tau_{tg} & \tau_{tt} \end{pmatrix}\]

How many parameters does this matrix have?

Assume that the marginal distribution $\textrm{Pr}(X_1 = x), x \in {t, c, g, a}$ is known. Write the log-likelihood of this model.

Problem 3 (MLE and MoM)

Emily and her mom buy a chihuahua from the pet store. The storekeeper, a statistician, mentions that Chihuahua weights can be reasonably modeled as i.i.d. Normal with a standard deviation of one pound. Unfortunately, the storekeeper forgot the average weight of a chihuahua.

(a) MLE

Based on a single observation, $y$, of her new dog’s weight, Emily estimates the mean of the data-generating distribution via maximum likelihood. What estimator does she use? How would Emily estimate $\theta = \mu^2$, where $\mu$ is the mean?

(b) MoM

Emily’s mom prefers the method of moments. Using the first and second moments respectively, she derives estimators for the same two estimands that Emily estimated. What are Mom’s estimators?

(c) MSE

For any estimators that disagree, determine whether Emily or Mom gave the estimator with lower MSE? Can you determine this without actually computing the MSE?

Problem 4 (Gaussian mixture model)

Suppose $Y_1, \dots, Y_n$ are drawn i.i.d. from a Gaussian mixture model, with $n > 1$. In particular, the data come from a standard Normal distribution with probability $\frac{1}{2}$ and otherwise come from a Normal distribution with unknown mean and variance, $\mu \in \mathbb R$ and $\sigma^2 > 0$ respectively.

(a) Likelihood calculations

What is the likelihood? What is the log-likelihood?

(b) Arbitrary likelihood

Show that the likelihood can be made arbitrarily large. Hint: Set $\mu = Y_1$.

(c) Non-arbitrary likelihood

Why would we not achieve arbitrarily large likelihood using the same technique if the model were $Y_i \overset{\text{i.i.d.}}{\sim} \mathcal N(\mu, \sigma^2)$?