Probability Approximation

Probability Level 5

"A jar contains a large number ( $n$ ) of balls. $m$ of these balls are red. We draw a relatively small number of balls: $k << m$ . What is the probability of drawing only red balls?"

The exact answer to this question is $\mathbb P = \binom m k / \binom n k.$ To save calculation work, and using $k << m$ , we may use the simple approximation $\mathbb P_1 \approx \left(\frac \mu \nu\right)^k$ with $\mu = m - \tfrac12(k-1)$ and $\nu = n - \tfrac12(k-1)$ . This approximation is not bad, but it can be improved by the following "correction": $\mathbb P_2 \approx \mathbb P_1 \cdot \left(1 - \frac 1 c \left(\frac 1{\mu^2} - \frac 1{\nu^2}\right)k(k^2-1)\right).$ How great is the integer $c$ ?

The answer is 24.

1 solution

Arjen Vreugdenhil
Jul 8, 2019

Assuming that $k$ is odd, let $k = 2\delta + 1$ and start with $\mathbb P = \frac{(\mu + \delta)(\mu + \delta -1)\cdots(\mu+1)\mu(\mu-1)\cdots(\mu-\delta)}{(\nu + \delta)(\nu + \delta -1)\cdots(\nu+1)\nu(\nu-1)\cdots(\nu-\delta)}.$ Combining positives and negatives, this becomes $= \frac{(\mu^2 - \delta^2)(\mu^2 - (\delta-1)^2)\cdots(\mu^2 - 1^2) \mu}{(\nu^2 - \delta^2)(\nu^2 - (\delta-1)^2)\cdots(\nu^2 - 1^2) \nu}.$ $= \frac{\mu^k - (\delta^2 + (\delta-1)^2 + \cdots + 1^2)\mu^{k-2} + O(\mu^{k-4})}{\nu^k - (\delta^2 + (\delta-1)^2 + \cdots + 1^2)\nu^{k-2} + O(\nu^{k-4})}$ $= \left(\frac{\mu}{\nu}\right)^k\left(\frac{1 - C \mu^{-2} + O(\mu^{-4})}{1 - C \nu^{-2} + O(\nu^{-4})}\right),$ with $C := \delta^2 + (\delta-1)^2 + \cdots + 1^2 = \frac{\delta(\delta+1)(2\delta+1)}6 = \frac{\tfrac12(k-1)\cdot\tfrac12(k+1)\cdot k}6 = \frac{k(k^2-1)}{24}.$

Now for the approximation, we drop the $O(\cdot)$ terms and use $1/(1-x) \approx 1+x$ for sufficiently small $x$ ; thus $\mathbb P \approx \left(\frac{\mu}{\nu}\right)^k\left(1 - C \left(\frac 1{\mu^2} - \frac 1{\nu^2}\right) \right).$ This is almost the desired expression for $\mathbb P_2$ ; we see that $C = \frac 1 c\cdot k(k^2-1) = \frac{k(k^2-1)}{24},$ showing that $c = \boxed{24}$ .

An alternative solution is simply by trying the formula for a typical case, say $n = 102$ , $m = 52$ , and $k = 5$ . On one hand, $\mathbb P = \frac{52\cdot 51\cdot 50\cdot 49\cdot 48}{102\cdot 101\cdot 100\cdot 99\cdot 98} = \frac{52\cdot 48}{2\cdot 101\cdot 2\cdot 99 \cdot 2} =\frac{2^3\cdot 13}{3\cdot 11\cdot 101} = \frac{104}{3333};$ on the other hand, with $\nu = 100$ , $\mu = 50$ , $k = 5$ , we have $\mathbb P \approx \left(\frac{50}{100}\right)^5\cdot\left(1 - \frac 1 c\left(\frac{1}{50^2} - \frac{1}{100^2}\right)\cdot 5\cdot (5^2-1)\right),$ $\frac{104}{3333} \approx \frac{1}{32}\cdot\left(1 - \frac 1c \cdot \frac{3}{10\:000}\cdot 120\right) = \frac{250 - 9/c}{8000};$ the solution is $c = \frac{3333\cdot 9}{3333\cdot 250 - 104\cdot 8000} = \frac{29\:997}{1250} \approx \frac{30\:000}{10\:000/8} = \boxed{24}.$

Extension: What is the probability that, when drawing $k+\ell << m$ balls, precisely $k$ of them are red and $\ell$ aren't?

First approximation: $\mathbb P_1 = \frac{\kappa^k \lambda^\ell}{\nu^{k+\ell}}\ \binom k \ell,$ with $\kappa = m - \tfrac12(k-1)$ , $\lambda = (n-m) - \tfrac12(\ell-1)$ , $\nu = n - \tfrac12(k+\ell-1)$ ;

second approximation: $\mathbb P_2 = \mathbb P_1\cdot \left(1 - \frac{1}{24}\left( \frac{k(k^2-1)}{\kappa^2} + \frac{\ell(\ell^2-1)}{\lambda^2} - \frac{(k+\ell)([k+\ell]^2-1)}{\nu^2}\right)\right).$

See how well this approximation works with $n = 100$ , $m = 60$ , $k = 5$ , $\ell =15$ : $\mathbb P = \frac{\binom{60}{5} \binom{40}{15}}{\binom{100}{20}} = 4.09884\cdot 10^{-4};$ $\mathbb P_1 = \frac{58^5\cdot 33^{15}}{(90\tfrac12)^{20}}\binom{20}5 = 4.49077\cdot 10^{-4};$ $\mathbb P_2 = \mathbb P_1\cdot (1-0.08945) = 4.08909\cdot 10^{-4}.$ The first approximation is nearly 10% off; the second, less than $\tfrac14$ %.

Arjen Vreugdenhil - 1 year, 11 months ago

Thing is, with numbers this small, you can just work out the probabilities as you did. I would argue that what matters is how well the approximation works when the numbers are too big to do this.

Joe Mansley - 1 year, 11 months ago

Try with $n = 3000$ , $m = 1000$ , $\ell =100$ , $k = 200$ : you get $\mathbb P_1 = 1.13223\cdot 10^{-35}$ , $\mathbb P_2 = 0.811186\cdot 10^{-35}$ . The exact value is $\mathbb P = 0.851482\cdot 10^{-35}$ , an error of approx. 5%.

Arjen Vreugdenhil - 1 year, 11 months ago

Probability Approximation

The answer is 24.

1 solution

0 pending reports