# Homework 3. Variational Inference
## 1. Evidence Lower Bound
$\newcommand{\bX}{\mathbf{X}}\newcommand{\by}{\mathbf{y}}\newcommand{\bI}{\mathbf{I}}$
Recall from Lab 8, our example of variational inference for a Bayesian linear regression model. Namely,
$$\begin{align*}
\by | \bX, \beta &\sim N(\bX\beta, \bI_n \sigma^2) \\
\beta &\sim N(0, \bI_p \sigma^2_b).
\end{align*}$$
We assumed a mean-field model that $Q$ factorizes as $$Q(\beta) = \prod_{j=1}^P Q_j(\beta_j).$$
### 1.1
Consulting the results in Lab 8 on parameter definitions for each $Q_j$, please derive the *evidence lower bound* or ELBO for this model.
### 1.2
Consult lab 8 for the implementation of a CAVI algorithm for the model above, but rather than evaluate the mean squared error (MSE), evaluate the ELBO. The ELBO should *increase* with each iteration, otherwise there is likely a bug.
## 2. Bayesian Linear Regression Pt II
Here we assume a slightly different linear model, which is given by, $$\begin{align*}
\by | \bX, \beta &\sim N(\bX\beta, \bI_n \sigma^2) \\
\beta_j &\sim \text{Laplace}(0, b).
\end{align*}$$
We assumed a mean-field model that $Q$ factorizes as $$Q(\beta) = \prod_{j=1}^P Q_j(\beta_j).$$ Rather than identify optimal $Q_j$ through CAVI, we will first assume $Q_j := \text{Laplace}(\mu_j, b_j)$. Next, to identify updates for each $\mu_j, b_j$, we take the derivative of the ELBO with respect to each; however the gradient of the ELBO requires knowing $\mu_j, b_j$, which causes challenges.
### 2.1
Re-write the ELBO as a deterministic transformation of $\beta_j$ using location-scale rules (i.e. reparameterization trick)
### 2.2
Implement the above by performing stochastic VI to optimize the ELBO by sampling.