Lecture 14 - Post-midterm, regression
09 Mar 2023
Midterm details
- For midterm regrade requests
- Get your physical midterm
- Write your request on the cover
- Turn in to Joe
- Do NOT modify your work
- Violation of Honor Code
- The EV of this is very negative
- Turn in before March 23rd (Thursday)
- Five-number summary of midterm
- Mean 71, SD 19, quantiles are (58, 75, 85)
- Go to Joe’s OH to get midterm if you couldn’t pick it up in class
- Solutions posted on Canvas
- Overall course cutoffs
-
85: at least an A-, barring extreme circumstances
- 65-85: B level
- 50-64: C level
- <50: D level
Midterm review
Bird chirps
- Note that most important regularity condition is violated
- Support depends on the parameter
- Need indicator or some other method for working with violations of the support
- Don’t put the indicator in the exponent!
- Results in $0^0$, for example
- Indicator should be scalar
- If there is at least one violation of the support, then the parameter is not valid
- Density category error
- $Y$ is continuous, so use PDF not probability
- $P(Y = y)$ is not valid!
- Illustrative case: $X \sim \textrm{Expo}(1), Y = 2X$
- $P(Y = y) = P(2X = y) = P(X = y/2) = e^{-y/2}$
- The last thing integrates to $2$, which is not a probability and results in $0 = 0 = 0 = 2$
- The above can use a CDF
- $P(Y \leq y) = P(X \leq y/2) = 1 - e^{-y/2}$, which is correctly normalized
- Statistic or estimator must be function of the data
- “Bigly = big and ugly”, used to describe a mess
- Do not put parameters into statistics
- Confidence interval must also be statistics
Gamma CI
- Sanity checks for confidence intervals
- Confidence interval should get narrower as sample size grows
- Confidence interval should depend on data
- Limit category error
- If you take a limit as $n \to \infty$, you should not have $n$ in your expression anymore
- RIP partial credit
- This is for your own good
- Know the difference between the asymptotic distribution and the approximate distribution for a large $n$
- $\sqrt n (\bar Y - \mu) \overset{d}{\to} \mathcal{N}(0, \sigma^2)$ as $n \to \infty$
- Note that there is no $n$ in the RHS
- $\bar Y \dot\sim \mathcal{N}(\mu, \sigma^2 / n)$ given some large $n$
Regression
- Basic setting: $Y = \beta_0 + \beta_1 X + \epsilon$
- Let $\epsilon \perp X$, where both are r.v.s
- We often want to consider the conditional distribution $Y \mid X$
- Or $\mathbb{E}[Y \mid X]$
- Quantiles are also well defined
- Logistic regression: special case where $Y = {0, 1}$ (i.e., data is binary)
- Poisson regression: $Y$ is count data (integers)
Gaussian linear regression
- For now, assume $Y$ is Normal (Gaussian, in some courses/contexts)
\[\begin{aligned}
Y | X &\sim \mathcal{N}(\theta_0 + \theta_1 X, \sigma^2) \\
\epsilon &= Y - \theta_0 - \theta_1 X \\
\mathbb{E}[Y|X] &= \theta_0 + \theta_1 X
\end{aligned}\]
- $\epsilon$ is called error (unobservable)
- Not observable w/o info about $\theta_0, \theta_1$
- For plotting, we use the residual (observable)
- Residual: $\hat\epsilon = Y - \hat\theta_0 - \hat\theta_1 X$
- $\hat \theta_0 + \hat \theta_1 X$ is the fitted/predicted value
- Still need to show independence of $X$ and $\epsilon$
-
$\epsilon |
X \sim \mathcal{N} (0, \sigma^2)$ |
- Conditional distribution does not depend on $X$
- $X$ does not have to be Normal for the assumptions of this model
- For rest of problem, assume $Y = \theta X + \epsilon$
- Estimators
- Assuming no intercept and mean of 0 for both $X, Y$
\[\hat\theta = \frac{
\sum_{j = i}^n X_j Y_j
}{
\sum_{j = 1}^n X_j^2
}\]
\[\textrm{Cov}(Y, X) = \textrm{Cov}(\theta X, X) + \textrm{Cov}(\epsilon, X) \to \theta = \frac{\textrm{Cov}(X, Y)}{\textrm{Var}(X)}\]
- So by standard definitions of covariance and variance,
\[\theta = \frac{
\sum_{j = i}^n (X_j - \bar X)(Y_j - \bar Y)
}{
\sum_{j = 1}^n (X_j - \bar X)^2
}\]
Logistic regression
- Bernoulli is unique because all binary data is Bernoulli
- Unlike continuous distributions, which can take many distributions (which we then try to coerce to Normal)
\[\mu(x) = \mathbb{E}[Y | X = x] = P(Y = 1 | X = x)\]
- The form $\theta_0 + \theta_1 X$ is not generally applicable because the RHS is not necessarily bounded by $0, 1$
- So what functions can we use to constrain this?
- Relevant choice is a CDF, e.g., $F(\theta_0 + \theta_1 X)$ where $F$ is some valid CDF
- Most common choice is logistic distribution (logit regression)
- We can also use $\Phi$ (probit regression)
- Results usually similar between logit and probit
- Probit not in closed form
- Logit function is natural parameter of NEF of binomial
\[\textrm{logit}(p) = \log \left( \frac{p}{1 - p} \right), \textrm{logit}^{-1}(p) = \frac{e^x}{1 + e^x}\]
- The inverse logit is also the sigmoid function
- This is also the logistic CDF
\[P(Y = 1 | X = x) = \frac{e^{\theta x}}{1 + e^{\theta x}}\]
- Likelihood function given as
\[L(\theta) =
\prod_{j = 1}^n
\left(
\frac{e^{\theta x_j}}{1 + e^{\theta x_j}}
\right)^{y_j}
\left(
\frac{1}{1 + e^{\theta x_j}}
\right)^{1 - y_j}\]
- No closed-form solution for the MLE