The Rich Get Richer

Probability Level 5

A bag initially contains one red ball and one blue ball. You select one of these balls at random and then place two balls of that same color back in the bag. For instance, if you drew a red ball, then you would put two red balls back into the bag so that the bag would contain two red balls and one blue bag. You repeat this until the bag contains 2016 balls. For the color that has the most balls present in the bag, what is the expected number of balls of this color? That is, after the bag contains 2016 balls, you open it and choose the color that has more balls (or choose either color in case of a tie); what is the expected number of balls of this color?

If the answer can be expressed as $\dfrac{p}{q}$ , where $p$ and $q$ are coprime positive integers, enter $p+q$ as your answer.

The answer is 3048191.

3 solutions

Ivan Koswara
Jan 14, 2016

Relevant wiki: Expected Value - Problem Solving

The answer is $\frac{3046176}{2015}$ , giving $p+q = 3048191$ .

We will prove the following surprising claim:

Claim : If the bag has $n$ balls, then the probability that it has $r$ red balls (and thus $n-r$ blue balls) is exactly $\frac{1}{n-1}$ , for all $n \ge 2, 1 \le r \le n-1$ .

We will prove this claim by induction on $n$ . For $n = 2$ , this is trivial (the bag starts with 1 red ball and 1 blue ball, so the probability that it has 1 red ball is $1 = \frac{1}{2-1}$ ).

Now, suppose we have proven the claim for $n = n_0$ , and we'd like to prove the claim for $n = n_0 + 1$ .

Fix $r$ with $1 \le r \le n_0$ . We want to compute the probability that when we have $n_0 + 1$ balls, we have $r$ red balls. Consider the last ball added into the bag. There are two cases:

Case 1 : This ball was red, so the previous bag has $r-1$ red balls out of $n_0$ balls. If $r \ge 2$ , the probability of this happening is $\frac{1}{n_0 - 1} \cdot \frac{r-1}{n_0}$ : the probability that we do have $r-1$ red balls when we have $n_0$ balls (inductive hypothesis), multiplied by the actual probability of taking a red ball in this case. If $r = 1$ , the probability is zero since we cannot have no red balls in the bag (the original red ball must always remain in), but this can also be represented as $\frac{1}{n_0-1} \cdot \frac{r-1}{n_0}$ because the second term is zero. So the probability that a red ball is taken to make our bag to have $r$ red balls out of $n_0 + 1$ balls is $\frac{1}{n_0 - 1} \cdot \frac{r - 1}{n_0}$ .

Case 2 : This ball was blue, so the previous bag has $r$ red balls (and thus $n_0 - r$ blue balls) out of $n_0$ balls. This case is completely analogous with the above; the probability of this case happening is $\frac{1}{n_0 - 1} \cdot \frac{n_0 - r}{n_0}$ .

In total, the probability that we have $r$ red balls out of $n_0 + 1$ balls is simply the sum of the probabilities contributed by the two cases above; that is, $\frac{1}{n_0 - 1} \cdot \frac{r - 1}{n_0} + \frac{1}{n_0 - 1} \cdot \frac{n_0 - r}{n_0} = \frac{1}{n_0}$ . This completes the proof.

Now that we have proven this, we can compute the expected value easily:

$\displaystyle\begin{aligned} E(\text{number of majority balls}) &= \sum_{r=1}^{2015} P(\text{\$r\$ red balls}) \cdot \text{number of majority balls} \\ &= \sum_{r=1}^{2015} \frac{1}{2015} \cdot \max \{r, 2016-r\} \\ &= \frac{1}{2015} \cdot \left( \sum_{r=1}^{2015} \max \{r, 2016-r\} \right) \\ &= \frac{1}{2015} \cdot \left( \sum_{r=1}^{1007} (2016-r) + \sum_{r=1008}^{2015} r \right) \\ &= \frac{1}{2015} \cdot \left( \sum_{r=1}^{1007} (2016-r) + \sum_{r=1}^{1008} (2016-r) \right) \\ &= \frac{1}{2015} \cdot ( 2016 \cdot 2015 - (1+2+\ldots+1007+1008+1007+\ldots+1) ) \\ &= \frac{1}{2015} \cdot (2016 \cdot 2015 - 1008^2) \\ &= \boxed{ \dfrac{3046176}{2015} } \end{aligned}$

The second equality uses our result, together with the fact that the number of majority balls is simply the larger of $r$ (number of red balls) and $2016-r$ (number of blue balls). The fourth equality uses the result that when $r < 1008$ , the bigger one is $2016-r$ , and when $r \ge 1008$ , the bigger one is $r$ . (Okay, they are equal when $r = 1008$ , but we can choose either one.) The fifth equality performs the replacement $r \to 2016-r$ (this also reverses the order of summation). The sixth equality takes everything apart, setting the 2016's to the side. The seventh equality uses the (easily proven) fact $1+2+\ldots+n+(n-1)+\ldots+1 = n^2$ .

Moderator note:

Knowing that this problem is Polya's Urn allows us to start out with the surprising claim.

A more simple consideration via tree diagram resulted in the expression $\frac{3}{4}n+1$ where n denotes the count of even samples. In our case n equals 2014 leading to an expected number $1511.5 = \frac{3023}{2}$ which means the solution is 3025.

Rem.: Indeed the difference between our expected numbers is approximately 0.25 which is less than one ball and therefore meaningless!

Andreas Wendler - 5 years, 4 months ago

If you were running a lottery paying out £1,000,000 to the lucky winner, but had only sold £1,200,000 of tickets, you would not regard an expected number of 1.25 winners as meaningless compared to an expected number of 1 winner. In the second case, you would probably make a profit. In the first case, you would probably go bankrupt. Certainly so if you did this every week!

Just because you are describing something that comes in integers, that does not mean that fractional parts of expectations aren't still very important! 2012 stats have the average number of children per family in the UK as 1.7; any demographic analysis (for example, projection of the need for primary school places in 2017) would be seriously adrift if we rounded 1.7 up to 2, and even worse if we rounded down to 1.

Mark Hennings - 5 years, 4 months ago

Indeed you have always to consider the RELATIVE deviation that is in my case immensely less than in your interesting constructive far-fetched funny examples!

Andreas Wendler - 5 years, 4 months ago

@Andreas Wendler – You have said that a deviation of approximately 0.25 in the expectation of an integer-valued random variable is meaningless. I have given you two examples where such a deviation is anything but meaningless. Funny? Thank you. Far-fetched? I am not so sure.

You are claiming that the relative size of the error is relevant. I disagree. What we are talking about here is the fact that fractional factors in expectations of integer-valued random variables (which have a precise mathematical meaning) are important, as a matter of precision. If we choose to round those answers later, when interpreting them for Joe Public, that is another matter. Your calculation leading to $\tfrac{3023}{2}$ is approximate, but inaccurate.

Mark Hennings - 5 years, 4 months ago

@Mark Hennings – When then for our problem surely sufficient because I think you are not able to deliver an empirical proof of your result! Have not heard anything of a tree diagram? Then you would realize that there is an equal distribution of the probabilities to get 1, 2, ..., 2015 balls of one color at the end fixing my result!

Andreas Wendler - 5 years, 4 months ago

@Andreas Wendler – Please read both my and Ivan's proof. We have both used the fact that the distribution of balls is uniform, and both obtained the correct answer. I suspect that you are calculating the expectation of the wring quantity. If you post yiur calculations, I can point out where you went wrong.

Mark Hennings - 5 years, 4 months ago

Kerry Soderdahl
Jan 30, 2016

It's easy to notice that the probability of any particular sequence of draws resulting in $r$ red balls and $b$ blue balls in the bag is $\displaystyle\frac{(r-1)!(b-1)!}{(n-1)!}$ (where $n$ is the total number of balls) regardless of the order of the draws.

The number of paths leading to $r$ red balls and $b$ blue balls, which is the same as the total number of possible sequences of $r-1$ red ball draws and $b-1$ blue ball draws, is simply $\displaystyle\frac{(n-2)!}{(r-1)!(b-1)!}$ .

It follows that the total probability of every sequence of draws leading to $r$ red balls and $b$ blue balls is the product of these:

$\frac{(r-1)!(b-1)!}{(n-1)!}\times\frac{(n-2)!}{(r-1)!(b-1)!}=\frac{(r-1)!(b-1)!(n-2)!}{(r-1)!(b-1)!(n-1)!}=\frac{1}{n-1}$

Knowing that the probability of each of the the 2015 outcomes resulting in 2016 total balls is uniformly distributed, calculating the expected value of the majority color is now very easy.

To account for the cases where the red ball is the majority: $\displaystyle\sum_{k=1009}^{2016}\frac{k}{2015}$

Similarly, to account for the blue ball majority cases: $\displaystyle\sum_{k=1009}^{2016}\frac{k}{2015}$

In the event of a tie: $\displaystyle\frac{1008}{2015}$

Summing these:

$\frac{1008}{2015}+2\sum_{k=1009}^{2016}\frac{k}{2015}=\frac{1008+2\sum_{k=1009}^{2016}{k}}{2015}=\frac{3046176}{2015}$

$3046176+2015=\boxed{3048191}$

Mark Hennings
Jan 16, 2016

This is another case of Polya's urn; the probability that there are $b+1$ blue balls after $2n$ ball choices is $\tfrac{1}{2n+1}$ for $0 \le b \le 2n$ . If there are $b+1$ blue balls after $2n$ choices, there are $\mathrm{max}(b+1,2n-b+1)$ balls of whichever is the greater colour. Thus the expected number of the number of balls of the larger number after $2n$ ball choices is $\begin{array}{rcl} M_n & = & \displaystyle \tfrac{1}{2n+1} \sum_{b=0}^{2n} \mathrm{max}(b+1,2n-b+1)\\ & = &\displaystyle \tfrac{1}{2n+1}\left(\sum_{b=0}^{n-1} (2n- b+1) + (n+1) + \sum_{b=n+1}^{2n} (b+1)\right) \\ & = & \displaystyle \tfrac{1}{2n+1}\left(n+1 + 2\sum_{b=n+1}^{2n} (b+1)\right) \; = \; \tfrac{1}{2n+1}\left(n+1 + 2n(n+1) + 2\sum_{b=1}^n b\right) \\ & = & \tfrac{1}{2n+1} \left( n+1 + 3n(n+1)\right) \; = \; \tfrac{(n+1)(3n+1)}{2n+1} \end{array}$ With $n = 1007$ we have $M_n \,=\, \tfrac{3046176}{2015}$ , and so the answer is $\boxed{3048191}$ .

Ah, yes, it's called Polya's urn. I didn't know its name.

Ivan Koswara - 5 years, 5 months ago

The Rich Get Richer

The answer is 3048191.

3 solutions

Moderator note:

0 pending reports