# Continuous Random Variables
A continuous random variable $X$ is a random variable which comes from an *uncountably infinite* sample space. It is defined to have a CDF that can be written as:
$$F(x) = Pr(X \le x) = \int\limits_{-\infty}^x f(a)da,\ \ \forall x \in \mathbb{R}$$
For some *nonnegative* function $f(x)$, which is called the **probability density function** (PDF). (It must be non-negative to ensure that the CDF is always increasing, and cannot go below 0.)
In this case, there cannot be a PMF, as individual elements of $X$ are not assigned a specific probability (in fact, $Pr(X=k) = 0$, because the CDF is continuous). *Ranges*, on the other hand, are assigned a probability, meaning the CDF can easily be defined. Instead of a PMF, the probability density function $f(x)$ is used.
## Properties
### Continuous Distribution Function
the CDF of a continuous random variable is by definition continuous. In addition, the conditions for non-negativity, being increasing, and having a range from $[0,1]$ hold true the same as for discrete random variables.
The PDF can be derived from the CDF through taking the derivative.
$$f(x) = \frac{dF(x)}{dx}$$
The probability of a range is not dependent on which ends of the range are included:
$$
\begin{split}
Pr(X \in (x_1, x_2))& = Pr(X \in [x_1, x_2))\\
& = Pr(X \in (x_1, x_2])\\
& = Pr(X \in [x_1, x_2])\\
& = F(x_1) - F(x_2) \\
& = \int\limits_{-\infty}^{x_2} f(x)dx - \int\limits_{-\infty}^{x_1} f(x) dx\\
& = \int\limits_{x_1}^{x_2} f(x)dx
\end{split}
$$
### Probability Density Function
The PDF is by definition the density of the probability values evaluated at $x$.
## Continuous random variables vs. discrete
The PMF of a discrete random variable is always guaranteed to be in the range $[0,1]$, but a PDF can be any non-negative value.
The CDF of discrete random variables is not always continuous, whereas the CDF of a CRV is.
When applying a function to a DRV, you can keep the same PMF (always summing up the probability of events which are mapped to the same value), but with a CRV, you need to account for the stretching of the range, and therefore the changing of the density function.
## Transformation of CRVs
When converting a CRV $X$ with PDF $f_X(\cdot)$ to a $Y = g(X)$, if the inverse of $g^{-1}(y)$ is finite or countably infinite, then $Y$ is also a CRV and we can define its PDF as:
$$
f_Y(y) = \sum\limits_{x_i: g(x_i) = y} \frac{f_X(x_i)}{|g'(x_i)|}
$$
This is similar to the transformation of a DRV, where we sum up all of the probabilities which map to a specific $y$. However, we also need to account for any stretching of the density range by dividing by the derivative of the transformation function at those points.
In practice, if the function maps $x$ to $y$ in a one-to-one manner, then there is no summation that needs to take place. This means we can simplify the derivation of $f_Y$ to:
$$f_Y(y) = \frac{f_X(g^{-1}(y))}{|g'(g^{-1}(y))|}$$
~~Example: Transformations
Defining a CRV $X$ and $Y = aX + b$, then $f_Y$ is:
$$x = g^{-1}(y) = \frac{y-b}{a}$$
$$g'(x) = a$$
$$f_Y(y) = \frac{1}{|a|} f_X\left(\frac{y-b}{a}\right)$$
Defining $Y = X^2$, we need to add both sides of the parabola (for the summation):
$$g^{-1}(y) = \pm\sqrt{y}$$
$$g'(x) = 2x$$
$$f_Y(y) = \begin{cases}
0, & y \lt 0 \\
\frac{1}{2\sqrt{y}}\left(f_X(\sqrt{y})+f_X(-\sqrt{y})\right) & y\gt 0
\end{cases}$$
~~
## Families of CRVs
### Uniform
Uniform random variables have the same density over the range which they are defined, $[a,b]$. This means all values are equally likely over their range. Their density function is of the form:
$$
f(x) = \begin{cases}
\frac{1}{b-a} & a \lt x \lt b \\
0 & \textrm{otherwise}
\end{cases}
$$
and CDF:
$$
F(x) = \begin{cases}
0 & x \lt a \\
\frac{x-a}{b-a} & a \le x \lt b \\
0 & b \le x
\end{cases}
$$
Uniform random variables are denoted $\textrm{Unif}(a, b)$, and a *standard uniform variable* is often denoted $U = \textrm{Unif}(0, 1)$.
#### Universality Theorem
For any random variable, we can define a transformation to a uniform variable $G(X)$ where $X = G^{-1}(U)$. As long as this function satisfies the requirements for a CDF, we can convert this uniform random variable to the non-uniform one. The new CDF $F_X$ can be found as:
$$
\begin{split}
F_X(x) & = \Pr(X \le x) \\
& = \Pr(G^{-1}(U) \le x) \\
& = \Pr(U \le G(x)) \\
& = G(x)
\end{split}
$$
In the last step, we use the properties of $U$, where it has a PDF of 1, meaning that $\Pr(U \le G(x))=G(x)$. Essentially, this means that we can find a CDF for a continuous variable $X$ just by finding a transformation from the standard uniform variable to $X$.
This property is useful in the calculation of a random variable in computers, such as in machine learning.
### Exponential
An exponential random variable is a RV with a PDF defined as an exponential function:
$$
f(x) = \begin{cases}
\lambda e^{-\lambda x} & x \gt 0 \\
0 & x \lt 0
\end{cases}
$$
and CDF:
$$
F(x) = \begin{cases}
1- e^{-\lambda x} & x \ge 0 \\
0 & x \lt 0
\end{cases}
$$
This distribution also has the memoryless property:
$$P(X \gt s + t\ |\ X \gt t) = P(X \gt s)$$
These are denoted $\textrm{Exp}(\lambda)$
Using the universality theorem, we can also define an exponential RV as:
$$X = -\frac{1}{\lambda}\log(1-U)$$
### Gaussian (normal distribution)
A gaussian random variable has a PDF:
$$f(x) = \frac{1}{\sqrt{2\pi\sigma^2}}e^{-\frac{(x-\mu)^2}{2\sigma^2}}$$
This creates a normal distribution centered at $\mu$ with a standard deviation $\sigma$. It is parameterized by these two variables as $\textrm{Gauss}(\mu,\sigma^2)$. A *standard Gaussian* random variable is $Z = \textrm{Gauss}(0,1)$.
The CDF is much more complex, and results in:
$$F(x) = \Phi(x) = \frac{1}
{\sqrt{2\pi}}\int\limits_{-\infty}^{x} e^{-\frac{a^2}{2}}da$$
Transforming a Gaussian random variable is fairly simple, as if $X = \textrm{Gauss}(\mu,\sigma^2)$ and $Y = aX+b$, then $Y$ is:
$$Y = \textrm{Gauss}(a\mu+b,a^2\sigma^2)$$
$$f_Y(y) = \frac{1}{|a|}f_X\left( \frac{y-b}{a} \right) = \frac{1}{\sqrt{2\pi a^2\sigma^2}}e^{-\frac{(x-(a\mu+b))^2}{2a^2\sigma^2}}$$
This makes sense, as the $\mu$ is just an expectation and $\sigma^2$ is just a variance. This also means that any Gaussian RV can be written using the standard gaussian variable as:
$$Y=\sigma Z + \mu $$
The CDF of a standard Gaussian $Z$ is:
$$\Phi(x) = \frac{a}{\sqrt{2\pi}}\int\limits_{-\infty}^{x}e^{-\frac{a^2}{2}}$$
This is extremely difficult to calculate analytically, but it has been recorded for $x\ge 0$ as a *z-table*. Importantly, the distribution here is symmetric around 0, so finding $x\lt 0$ is quite simple. These measurements can then easily be adjusted to get the CDF values for any Gaussian distribution.
For example:
$$
\begin{align*}
\Pr(Y \in (a,b)) &= \Pr(a\lt Y\lt b) \\
&= \Pr(a \lt \sigma Z + \mu \lt b) \\
&= \Pr\left( \frac{a-\mu}{\sigma} \lt Z \lt \frac{a-\mu}{\sigma} \right) \\
&= \Phi\left(\frac{a-\mu}{\sigma}\right) - \Phi\left(\frac{a-\mu}{\sigma}\right)
\end {align*}
$$
~~Example: Grading on a Curve
Given a set of scores on a test are distributed as $Y = \sigma Z + \mu$, grades are assigned as:
- A: $Y \in (\mu + \sigma, \infty)$
- B: $Y \in (\mu, \mu + \sigma)$
- C: $Y \in (\mu - \sigma, \mu)$
- D: $Y \in (\mu - 2\sigma, \mu - \sigma)$
- F: $Y \in (-\infty, \mu - 2\sigma)$
~~
## Properties
> Law of Large Numbers
> When taking a large number ($\to \infty$) of random measurements, the average of the measurements will approach the expectation.
> Central Limit Theorem
> When taking a large number of random measurements, the average of the measurements will behave like a Gaussian variable $\textrm{Gauss}(E[X],(Var[X])^2)$.
These two properties allow us to model measurements in statistics as Gaussian variables with normal distributions, and use that to calculate things like confidence intervals.
Using this property, we can model binomial variables as a series of measurements with either 1 or 0 as their values. This yields another way of approximating a binomial variable, which is useful when $np(1-p)$ is very large.
$$
Z \approx \frac{X - np}{\sqrt{np(1-p)}}
$$
Gaussian approximation can be done with the standard $Z$, and range:
$$\Pr(X = x) \approx \Pr(x - p \lt X \lt x + p) = \Pr\left(\frac{x - p - np}{\sqrt{np(1-p)}} \lt Z \lt \frac{x+p - np}{\sqrt{np(1-p)}}\right)$$
## Expectation
Because of their continuity, the expectation of a random variable can be calculated as:
$$
E[X] = \int\limits_{-\infty}^{\infty} xf(x)dx
$$
This is essentially the same definition as with a discrete RV, but changed into a continuous form.
Transformation of a variable will change its expectation as ($Y = g(X)$)
$$
E[Y] = E[g(X)] = \int\limits_{-\infty}^{\infty} g(x)f_X(x)dx
$$
However, the linear property still holds:
$$E[aX + b] = aE[X] + b$$
## Variance
For continuous random variables, variance is defined as:
$$\textrm{Var}[X] = E[(X - \mu)^2] = \int\limits_{-\infty}^{\infty} (x-\mu)^2f(x)dx$$
Similarly to discrete variables:
$$\textrm{Var}[X] = E[X^2] - (E[X])^2$$
$$\textrm{Var}[aX + b] = a^2\textrm{Var}[X]$$
## Expectation and Variance of Specific Random Variables
### Uniform
For a uniform CRV, $\textrm{Unif}(a,b)$, the expectation and variance are defined by:
$$E[X] = \frac{a+b}{2}$$
$$E[X^2] = \frac{a^2 + b^2 + ab}{3}$$
$$Var[X] = \frac{(b-a)^2}{12}$$
Note that for the standard uniform variable, the expectation is $\frac{1}{2}$.
### Exponential
For an exponential CRV, $\textrm{Exp}(\lambda)$, the expectation and variance are defined by:
$$E[X] = \frac{1}{\lambda}$$
$$Var[X] = \frac{1}{\lambda^2}$$
### Gaussian
For a standard Gaussian CRV, $\textrm{Gauss}(\lambda)$, the expectation is:
$$E[X] = \frac{1}{\sqrt{2\pi}} \int_{-\infty}^{\infty} xe^{-\frac{x^2}{2}}dx = 0$$
Because it is symmetric around the origin. The second order moment is:
$$E[X^2] = \frac{1}{\sqrt{2\pi}} \int_{-\infty}^{\infty} x^2e^{-\frac{x^2}{2}}dx$$
This requires an integral by parts with $x$ and $x e^{x^2}$, simplifying it to an integral of the PDF, eventually yielding:
$$E[X^2] = 1$$
Using this, the variance can be computed as:
$$Var[X] = E[X^2] - (E[X])^2 = 1$$
For a non-standard Gaussian RV, we can use the properties of expectation and variance to derive them from the standard:
$$E[Y] = E[\sigma X + \mu] = \sigma E[X] + \mu = \mu$$
$$Var[Y] = Var[\sigma X + \mu] = \sigma^2 Var[X] = \sigma^2$$
One interesting property of the Gaussian variable is that it's distribution is *fully* described by the expectation and variance.
## Cauchy RVs
An interesting random variable is the Cauchy random variable, which has a PDF defined as:
$$
f_X(x) = \frac{\lambda}{\pi(\lambda^2 + x^2)}
$$
This RV is interesting because it has no expectation or higher-order moments. In addition, the central limit theorem does not hold for a Cauchy distribution, as it assumes the RV has an expectation.