In Lecture 13, we talked about why the normal distribution was useful and how we used it to calculate probabilities of events of our interest. This was the last lecture in the probability section. We moved on to statistics section after this lecture.
Normal (Gaussian) Distributions
We have come across some basics about the normal distributions. The PDF is:
$$f_X(x)=\cfrac{1}{\sqrt{2\pi}\sigma}\,e^{-\frac{(x-\mu)^2}{2\sigma^2}}$$
Expectation & Variance
Like always, whenever we get a PDF or PMF, we always need to check whether it describes a valid probabilistic model. On top of that, it is always good to know the expectation and variance of the distribution. We can show that:
$$ \begin{aligned} \int_{-\infty}^{\infty}\cfrac{1}{\sqrt{2\pi}\sigma}\,e^{-\frac{(x-\mu)^2}{2\sigma^2}} \mathrm{d}x &= 1 \\[10pt] \mathbb{E}[X] &= \mu\\[10pt] \mathbb{V}\textmd{ar}(X) & = \sigma^2 \end{aligned} $$
The proof is a bit complicated and unnecessary to show during the lecture. Therefore, I put them into the first Extra Reading Material. You can have a look if you are interested.
Normal Probability Calculation
In modern days there are many calculators and computer programs to help us calculate normal probabilities. In the second Extra Reading Material, we have a detailed explanation on how to use the functions in spreadsheet software, such as Microsoft Excel
and LibreOffice Calc
, to calculate the normal densities and probabilities. You need to know how to use the function NORM.DIST
. It will be very helpful for finishing your homework and exams. In the old days, people just used the standard normal table to get the probability from the standard normal distribution $\mathcal{N}(0,1)$. We will demonstrate how to use the table, but we are not going to spend too much on it.
The First Standard Normal Table
You can easily find many different versions of the standard normal table on the internet. Make sure you read the description and be careful about what the numbers actually represent. Some tables show the lower-tail probabilities that are basically the cumulative probabilities:
$$F_X(x) = \mathbb{P}(X \leqslant x) = \int_{-\infty}^{x} \cfrac{1}{\sqrt{2\pi}}\,e^{-\frac{t^2}{2}} \mathrm{d}t$$
Other tables show the upper-tail probabilities:
$$1-F_X(x) = \mathbb{P}(X > x) = \int_{x}^{\infty} \cfrac{1}{\sqrt{2\pi}}\,e^{-\frac{t^2}{2}} \mathrm{d}t$$
Have you ever wondered who actually created the first standard normal table? I put some interesting material in the References section for you to have a look if you are curious. People tend to think the first table related to the standard normal distribution came from Christian Kramp who was astronomer. Kramp did not really constructed the standard normal table that many people are have been using. Instead, the table contains probabilities of $\int_{x}^{\infty}e^{-t^2}\mathrm{d}t$. There is a very nice paper about roughly how Kramp constructed the table using the Taylor expansion, and you can find the paper in the References.
For curiosity, let’s have a look at how accurate Kramp’s table is by comparing to the results from the NORM.DIST(x, 0, 1, 1)
function in a spreadsheet program (I’m using Microsoft Excel
). Before we do the checking, some extra work needs to be done. Remember that the NORM.DIST(x, 0, 1, 1)
returns the following probability:
$$\int_{-\infty}^{x}\cfrac{1}{\sqrt{2\pi}}\,e^{-\frac{y^2}{2}}\mathrm{d}y$$
where $y$ is the dummy variable. However, Kramp’s table shows the following probability:
$$\int_{x}^{\infty}e^{-t^2}\mathrm{d}t$$
where $t$ is the dummy variable. Therefore, we need to do some manipulation. Let $t = \frac{y}{\sqrt{2}}$, then $\mathrm{d}t = \frac{1}{\sqrt{2}}\mathrm{d}y$. When $t=x$, we have $y=\sqrt{2}x$; when $t \rightarrow \infty$, we have $y \rightarrow \infty$. Now, we could do the following manipulation:
$$ \begin{aligned} \int_{x}^{\infty}e^{-t^2}\mathrm{d}t &= \int_{\sqrt{2}x}^{\infty}e^{-\frac{y^2}{2}} \cfrac{1}{\sqrt{2}}\,\mathrm{d}y = \cfrac{1}{\sqrt{2}} \int_{\sqrt{2}x}^{\infty}e^{-\frac{y^2}{2}}\,\mathrm{d}y\\[15pt] &= \cfrac{\sqrt{2\pi}}{\sqrt{2}} \cdot \cfrac{1}{\sqrt{2\pi}} \int_{\sqrt{2}x}^{\infty}e^{-\frac{y^2}{2}}\,\mathrm{d}y = \sqrt{\pi}\int_{\sqrt{2}x}^{\infty}\cfrac{1}{\sqrt{2\pi}}\,e^{-\frac{y^2}{2}}\,\mathrm{d}y \\[15pt] &= \sqrt{\pi} \left( 1 - {\color{blue} \int_{-\infty}^{\sqrt{2}x}\cfrac{1}{\sqrt{2\pi}}\,e^{-\frac{y^2}{2}}\,\mathrm{d}y } \right) \end{aligned} $$
Note that the blue part is the value from NORM.DIST(y, 0, 1, 1)
. That’s what we should compare to. Using the top 10 rows of the middle page from the lecture slide:
$t$ | $y$ | $[1 - \textmd{NORM.DIST}(y, 0, 1, 1)] \times \sqrt{\pi}$ | Kramp’s value |
---|---|---|---|
$0.76$ | $1.074802307$ | $0.250326533$ | $0.25032654$ |
$0.77$ | $1.088944443$ | $0.244756720$ | $0.24475673$ |
$0.78$ | $1.103086579$ | $0.239272023$ | $0.23927203$ |
$0.79$ | $1.117228714$ | $0.233872223$ | $0.23387223$ |
$0.80$ | $1.131370850$ | $0.228557067$ | $0.22855708$ |
$0.81$ | $1.145512986$ | $0.223326276$ | $0.22332629$ |
$0.82$ | $1.159655121$ | $0.218179539$ | $0.21817955$ |
$0.83$ | $1.173797257$ | $0.213116520$ | $0.21311653$ |
$0.84$ | $1.187939392$ | $0.208136851$ | $0.20813686$ |
$0.85$ | $1.202081528$ | $0.203240140$ | $0.20324015$ |
You can see, they are pretty accurate, which is amazing.