# ProbabilityCommon Distributions

There are several specific probability distributions which arise commonly in real-world applications. Learning the properties of these distributions and how those properties lead to their frequent appearance can help us build random models and reason about properties about random systems that involve these distributions. In this section, we will explore several such distributions.

## Bernoulli distribution

Suppose we conduct an experiment with exactly two outcomes, which we will encode as 0 and 1. For example, consider the following scenarios

- You flip a coin and it comes up heads (1) or tails (0)
- Someone's position on a political issue is either positive (1) or negative (0)}
- Someone can either be healthy (1) or sick (0)
- In an online survey, a user answers either true (1) or false (0)

The distribution of the result of such an experiment is governed by a single parameter , which is the probability of the outcome encoded as 1. The probability of the other outcome is , since one of the two outcomes must occur. It is customary to think of the outcomes 1 and 0 as success and failure, respectively, in which case may be referred to as the success probability. A sequence of independent Bernoulli random variables with the same success probability is referred to as a sequence of **Bernoulli trials**.

We write to mean that is Bernoulli distributed with success probability . The expected value of a random variable is

and its variance is

**Exercise**

Consider a sum of 10 independent Bernoulli random variables with success probability .

- Find the mean and variance of .
- Find the value of which maximizes . Hint: write down an expression for and then use Julia to find its maximum value.

*Solution.*

- For let then Therefore,

and

- Observe that for we have if and only if there are successes. Now, there are ways in which we can have success and each of them occurs with probability , by independence. Therefore

We can use Julia's built-in `binomial`

function and an array comprehension as follows

maximum([binomial(10, k)*0.36^k*(1 - 0.36)^(10 - k) for k in 0:10])

to find that the value of that maximizes is and the maximum is approximately

## The binomial distribution

**Example**

What is the probability of rolling exactly 18 sixes in 100 independent rolls of a fair die?

*Solution.* There are many ways to roll 18 sixes. We could roll 18 sixes followed by 82 non-sixes, and that happens with probability

by independence. Similarly, the probability of rolling 2 non-sixes, then 9 sixes, then 14 non-sixes, then 9 more sixes, and finally 66 non-sixes also has probability given by . In fact, for every choice of 18 positions in which the sixes may fall, there is a an outcome with exactly 18 sixes whose probability is . Since there are of these outcomes, the probability that one of them occurs is

Generally, independent trials with success probability will lead to total successes with probability

This distribution is called the **binomial distribution** and is denoted .

**Exercise****Stirling's approximation** allows us to more easily manipulate factorial expressions algebraically. It says that

Suppose that is even and that . Use Stirling's approximation to show that times the probability mass assigned to 0 by the distribution converges to a finite, positive constant as . Find the value of this constant.

*Solution.* Let be the probability mass at 0. Substituting Stirling's approximation for the factorial expressions in tells us that

as . Simplifying the big mess of an expression on the left hand side tells us that as . Therefore, as .

## Geometric distribution

The **geometric distribution** with parameter is the distribution of the index of the first success in a sequence of independent Bernoulli trials.

The probability that the first success occurs on trial is equal to the probability that the first trials fail and the th trial succeeds. The probability of this event is . Therefore, the probability mass function of the geometric distribution is

**Exercise**

Use Monte Carlo to find the mean and variance of the geometric distribution with parameter .

Hint: you can sample from the geometric distribution using the definition: count the number of times you have to run `rand(Uniform(0, 1))`

until you get a result less than .

*Solution.* Here's an example solution:

using Statistics, Distributions function sample_geometric(p) k = 1 while true if rand(Uniform(0, 1)) < p return k else k += 1 end end end samples = [sample_geometric(1/3) for i=1:1_000_000] m = mean(samples) σ² = mean(x^2 for x in samples) - m^2 (m, σ²)

The pair returned by this block is very close to , leading us to conjecture that the mean and variance are 3 and 6, respectively.

Note: the superscript of 2 is part of the variable name. You can get this symbol at the Julia prompt using `\^2«tab»`

We can use Taylor series to work out exact expressions for the mean and variance. The mean is equal to

and we recognize all the terms except the first as times the derivative of

By the formula for the sum of a geometric series, this expression is equal to

and so the mean of the geometric distribution is

The variance can be worked in a similar but more tedious way, and the result is

These expressions do indeed evaluate to 3 and 6, respectively, when is substituted.

**Exercise**

Suppose that is geometric with success probability , and consider the random variable . What is the expected value of ?

*Solution.* The random variable is equal to with probability , for all positive integers . Therefore, the expected value is

So has infinite mean.

**Exercise**

Explain why `ceil(log(rand())/log(1-p))`

returns a random variable whose distribution is geometric with success probability .

*Solution.* Let define the ceiling function on The question is asking to show that if is uniformly distributed in , then

is geometrically distributed with success probability .

This is true because of the inverse cdf trick of Exercise . To show that this is indeed the case, it suffices to show that if is the cdf of a geometrically distributed random variable with success probability then the generalized inverse of is

for all Now, let be the cdf of a geometric random variable with success probability and denote the floor function on Then

where the last line follows from evaluating the geometric sum. The jumps in clearly occur at positive integer values. Therefore, if we let we find that the generalized inverse of is given by

for all But if then because Therefore, for all

Now if is uniformly distributed in then is also uniformly distributed in so is indeed geometrically distributed with success probability

**Exercise**

Every time you visit your favorite restaurant, you choose a meal uniformly at random from the 10 available meals. How many visits will it take on average before you've tried all 10 meals?

Hint: try letting be the number of visits from the time you try the th unique meal to the time when you try the st unique meal.

*Solution.* For let be the number of visits it takes to try the th unique meal after trying the th unique meal. Then the number of visits it takes to try all the meals is Now, for any non-negative integer $X_k = n$ if all the previous visits yielded the meals that have already been tried. Because the meals are chosen independently and uniformly at random, we find that

for all and any non-negative integer For notational simplicity, let Then

for all Now, as we recall from elementary calculus, the term-by-term differentiation theorem gives

Therefore,

for all and thus

We find that, on average, about visits are needed to try all the different meals.

## Poisson Distribution

The *Poisson distribution* arises as the number of 1's observed in a large number of low-probability Bernoulli random variables. This situation models a surprising variety of real-world scenarios:

- The number of calls received at a call center in a given hour. Each potential caller has a low probability of calling during that particular hour, and there are many potential callers who are acting independently.
- The number of meteorites which strike earth in a given year. There are many meteorites which might hit earth, and each one does so with low probability.
- The number of mutations on a strand of DNA. Each mutation occurs with low probability, but there are many potential sites for a mutation.
- The number of claims filed with an insurance company in a given month. There are many customers, and they file claims independently and with low probability each month.

**Exercise**

- Find the expected value of , where is a sum of 1000 independent Bernoulli random variables with success probability .
- Find the probability mass function of . Hint: find an expression representing the probability mass at each from 0 to 1000, and then use Julia to evaluate it. You will need to define
`n = big(1000)`

and`p = big(3)/1000`

because arbitrary precision arithmetic is required to avoid overflow issues. - Compare your results to the probability mass function defined on .

*Solution.* (i) The expected value of each Bernoulli random variable is , so by linearity of expectation the expected value of is .

(ii) Consider all possible length-1000 strings of 0's or 1's. Of these, there are with ones and zeros, and each of those strings has a probability of of being the result of independent sequence of random variables (where ). Therefore, the probability of the event is . We can obtain a vector of these probabilities as follows:

n = big(1000) p = big(3)/1000 massfunction = [binomial(n,k)*p^k*(1-p)^(n-k) for k=0:1000]

(iii) We can run `[3^big(k)/factorial(big(k))*exp(-3) for k=0:1000]`

to get the first 1001 values of the given probability mass function. We see that the values are quite similar. The first ten pairs of values are

```
(0.0495631, 0.0497871)
(0.149137, 0.149361)
(0.224154, 0.224042)
(0.224379, 0.224042)
(0.168284, 0.168031)
(0.100869, 0.100819)
(0.0503334, 0.0504094)
(0.0215065, 0.021604)
(0.00803259, 0.00810151)
(0.0026641, 0.0027005)
```

Inspired by this exercise, we make the following definition:

**Definition** (Poisson distribution)

The Poisson distribution with mean is the distribution whose probability mass function is

The expression in the definition of the Poisson distribution arises as a limit of the expression

In other words, we use a success probability of so that the expected number of successes remains constant as .

The connection between the Poisson and Bernoulli random variables may be used to obtain the mean and variance of the Poisson distribution. The average number of successes in Bernoulli() trials is , by linearity of expectation. Therefore, we expect that the mean of a Poisson random variable with parameter is equal to . Similarly, the variance of the number of successes in Bernoulli( ) trials is equal to . Taking , we predict that the variance of a Poisson random variable with parameter is also equal to . Both of these predictions are accurate:

**Theorem**

The mean and variance of a Poisson random variable with parameter are and , respectively.

**Exercise**

Suppose that the number of typos on a page is a Poisson random variable with mean .

- Provide an explanation for why the Poisson distribution might be a good approximation for the distribution of typos on a page.
- Find the probability that a particular page is typo-free.

*Solution.* (i) A typo opportunities on a page convert to actual typos with a small but roughly constant probability, there are quite a few of them, and different typos are (roughly) independent of one another. Thus the number of typos is a sum of independent Bernoulli random variables. (ii) The probability that a Poisson random variable with parameter is equal to 0 is

## Exponential distribution

The exponential distribution also emerges as a limit involving Bernoulli random variables: imagine placing a light bulbs activated by independent random variables at every multiple of on the positive real number line. Consider the position of the **leftmost lit bulb**. The probability that it occurs to the right of a point is equal to the probability that all of the bulbs to the left remain unlit:

This probability converges to as .

**Definition** (Exponential distribution)

Let . The exponential distribution with parameter is the probability measure on which assigns mass to the interval , for all .

Equivalently, the exponential distribution with parameter is the probability measure whose density is

**Exercise**

Find the mean of the exponential distribution with parameter .

*Solution.* We calculate

**Exercise**

Suppose that

*Solution.* Observing that

as required.

## Cauchy distribution

The Cauchy distribution spreads probability mass *way* out on the real number line.

**Definition** (Cauchy distribution)

The **Cauchy distribution** is the probability measure on

The amount of probability mass assigned by the Cauchy distribution to the interval

This mass goes to 0 so slowly that the Cauchy distribution doesn't even have a well-defined mean, let alone a variance. We say that the Cauchy distribution is **heavy-tailed**, and we will use it as an example when we want to study the effects of heavy tails on results like the law of large numbers or the central limit theorem.

**Exercise**

Show that the mean of the Cauchy distribution is not well-defined.

*Solution.* Let

Therefore

**Exercise**

Choose

*Solution.* Let

Then for all

Since

Now, by the fundamental theorem of calculus, we know that if

for all