Random Number Generator

Probability Level 3

You have recently bought the Random Number Generator 4000 from your local hardware store. This remarkable machine outputs numbers randomly and uniformly in the range $[0, 1]$ . You decide to create a list of these numbers and find the total sum.

What is the expected number of times the machine should operate to generate a number before the total sum is at least 1?

Answer to 3 decimal places.

The answer is 2.718.

5 solutions

Mark Hennings
Jun 14, 2017

Let $N$ be the number of times the number generator operates until the running total is $1$ or more. Thus $N$ is the smallest integer $n$ such that $X_1 + X_2 + \cdots + X_n \ge 1$ where each $X_j$ is uniformly distributed over $(0,1)$ , independently of all the others.

If we let $F_n(t) \; = \; \mathbf{P}[X_1 + X_2 + \cdots + X_n \le t]$ , then $F_1(t) = t$ for $0 \le t \le 1$ and $F_{n+1}(t) \; = \; \int_0^t F_n(t-u)\,du \hspace{2cm} n \ge 1\,,\,t > 0$ and it is a simple induction to show that $F_n(t) \; = \; \frac{t^n}{n!} \hspace{2cm} n \ge 1 \;,\; 0 < t < 1$ Thus we deduce that $\mathbb{P}[N \ge k] \; = \; \mathbb{P}[X_1 + X_2 + \cdots + X_{k-1} < 1] \; = \; F_{k-1}(1) = \frac{1}{(k-1)!}$ for all $k \ge 1$ , and hence $\mathbb{E}[N] \; = \; \sum_{k \ge 1} \mathbb{P}[N \ge k] \; = \; \sum_{k=1}^\infty \frac{1}{(k-1)!} \; = \; \boxed{e}$

The expected value of N is (sum) {k} k * P[N=k]. Why should this equal (sum) {k} P[ N >= k]? That happens to be true in this case but usually it's not going to be true.

Richard Desper - 3 years, 11 months ago

No, it is always true. This is a standard result for positive integer-valued discrete random variables. Here is the proof. $\begin{aligned} \mathbb{E}[X] & = \sum_{k \ge 1}k\mathbb{P}[X=k] \; = \; \sum_{k \ge 1}k\left(\mathbb{P}[X \ge k] - \mathbb{P}[X \ge k+1]\right) \\ & = \sum_{k \ge 1} k\mathbb{P}[X \ge k] - \sum_{k \ge 2}(k-1)\mathbb{P}[X \ge k] \\ & = \mathbb{P}[X \ge 1] + \sum_{k \ge 2}\big\{k - (k-1)\big\}\mathbb{P}[X \ge k] \\ & = \sum_{k \ge 1}\mathbb{P}[X \ge k] \end{aligned}$

Mark Hennings - 3 years, 11 months ago

That's a common alternative way to calculate the expected value, especially if the CDF had a nice form.

Calvin Lin Staff - 3 years, 11 months ago

That's a handy little identity. Thanks for the proof!

Richard Desper - 3 years, 11 months ago

As an aside, the continuous analogue of this identity is $E[X] = \int_0^{\infty} x f(x) \, dx = \int_{0}^\infty [ 1 - F(x) ] \, dx \leftrightarrow \sum ( 1 - P[X < k ] ) = \sum P [ X \geq k ]$ .

Edit: You can prove this using Fubini, as pointed out by Mark.

Calvin Lin Staff - 3 years, 11 months ago

@Calvin Lin – Well, I would rather prove it by Fubini/Tonelli, since I need not assume that $f$ is continuous. Thus $\begin{aligned} \mathbb{E}[X] & = \int_0^\infty xf(x)\,dx \; = \; \int_0^\infty \left(\int_0^x f(x)\,dy\right)\,dx \\ & = \int_0^\infty \left(\int_y^\infty f(x)\,dx\right)\,dy \; = \; \int_0^\infty (1 - F(x))\,dx \end{aligned}$ A further advantage of this method is that we only need to assume that $\mathbb{E}[X]$ exists. This tells us that $xf(x) \in L^1(0,\infty)$ . Fubini/Tonelli then tells us that $1-F \in L^1(0,\infty)$ , and gives us the identity...

Mark Hennings - 3 years, 11 months ago

@Mark Hennings – IIRC (and I might be wrong here) U-substitution and Fubini are related because we're essentially integrating along a different axis. IE The Fubini change of variables shows us how to go from integrating dx to dy. The u-substitution switch proof is that "area under the graph to y-axis + area under the graph to x-axis = difference of the 2 rectangles".

Calvin Lin Staff - 3 years, 11 months ago

@Calvin Lin – Well I would need to see your proof, but $u$ -substitution, as I understand it (it is not a piece of UK terminology), is making a continuously differentiable change of variable, such as putting $u = x^2$ in $\int xe^{x^2}\,dx$ . The two rectangle trick enables us to relate the integral of the inverse $f^{-1}$ of a function $f$ to the integral of $f$ . Unless I am making a mistake, we are either assuming that $F$ is continuously differentiable, so that $f$ is continuous, or else that $F$ is invertible. Neither of these are necessarily true for all RVs.

Mark Hennings - 3 years, 11 months ago

@Mark Hennings – Ah, my bad. I meant integration by parts , which does require the assumption of continuity.

I agree that Fubini is a much better approach, and have edited my comment.

Calvin Lin Staff - 3 years, 11 months ago

I don't understand the question. Why isn't the answer simply 2?

E[X 1 + X 2] = 1

And you have a symmetrical distributuion around the mean

Sean McCloskey - 3 years, 11 months ago

Can you elaborate on what you mean?

Just because the expected value of $X_1 + X_2$ is 1 doesn't mean that we expect it will take 2 variables on average to sum above 1.

E.g. The expected value of $X_ 1$ is $\frac12$ , and we expect that it will take more than 1 variable to sum above $\frac{1}{2}$ .

Calvin Lin Staff - 3 years, 11 months ago

If I graph X1 on the x-axis and X2 on the y-axis, and graph the sum X1+X2 on the z-axis, exactly half the area of the unit square in the x-y axis will have z-value > 1 and half < 1. I'm obviously misunderstanding the question. Is my misunderstanding that the question asks when will the value be 'expected' to equal or exceed 1, and in my example the probability is exactly 1/2 rather than >1/2?

Sean McCloskey - 3 years, 11 months ago

@Sean McCloskey – Great, glad to see that you figured this out.

To clarify it for others who might be in the same situation, if we let $P(n)$ be the probability that $n$ variables were needed to first exceed 1, then what you've shown is that $P(1) = 0 , P(2) = \frac{1}{2}$ .

This doesn't yet give us enough information to calculate the expected value for the number of variables, which is given by $\sum n P(n)$ .

Calvin Lin Staff - 3 years, 11 months ago

Never mind. I understand what it is asking.

Sean McCloskey - 3 years, 11 months ago

Shourya Pandey
Jun 18, 2017

For each $y$ in $[0,1]$ , let $E(y)$ be the expected number of times the machine should generate a number before the total sum is $y$ .

If any number in $[y,1]$ is generated, we are done. Otherwise $x$ in $[0,y]$ is generated with equal probability distribution, and it follows that

$\displaystyle E(y) = (1-y) + \int_{x=0}^{y} {(1+E(y-x)) dx}$

Integrating the $1$ and making the substitution $y-x = u$ , we get

$\displaystyle E(y) = 1 + \int_{u=0}^{y} {E(u)du}$ .

Take $\displaystyle 1+ \int_{u=0}^{y} {E(u)du} = f(y)$ , so that $E(y) = f'(y)$ .

Thus $f'(y) = f(y)$ , which gives $f(y) = Ce^{y}$ . But $f(0) = \displaystyle 1+ \int_{u=0}^{0} {E(u)du} = 1$ , clearly. So $C=1$ . Therefore

$E(y) = f'(y) = e^y$ , and $E(1) = e = \boxed{2.718}$

If any number in $[y,1]$ is generated, we are done. Otherwise $x$ in $[0,y]$ is generated with equal probability distribution...

Oh woah, I didn't think of partitioning it this way. Such a simple and elegant idea! Thanks for sharing!

Pi Han Goh - 3 years, 11 months ago

Jeffrey Lu
Jun 18, 2017

We may rephrase the problem: Suppose that the $n$ th number generated is $a_n$ . What is the expected number of times the machine will generate numbers such that $a_n > a_{n - 1}$ for $n > 1$ ?

This problem is slightly easier to deal with. Note that $E(X) = 1Pr(1) + 2Pr(2) + 3Pr(3) + ...$ where $X$ is the event and $Pr(k)$ denotes the probability of $k$ number being drawn.

Suppose that the machine has already generated $k - 1$ numbers and we are to generate a new one. We will find the probability of success. Since there exactly $k!$ orderings of $a_1, a_2, ... a_k$ , and only one satisfies the condition that $a_n < a_{n + 1}$ for all $1 \leq n \leq k - 1$ . Hence $Pr(k) = \frac{1}{k!}$ , or equivalently, $kPr(k) = \frac{1}{(k - 1)!}$ . From here, it is easy to see that the desired value is the famous constant $e$ .

I think some explanation as to why your reformulation of the problem is answering the same question would be useful. It is not obvious.

Mark Hennings - 3 years, 11 months ago

It's obviously 2.178 just use logic

Luke Bennett - 3 years, 11 months ago

Richard Desper
Jun 20, 2017

Pre-solution nitpick: "randomly drawn" does not necessarily mean "randomly drawn from a uniform distribution" though our thinking seems to default to said distribution when no distribution is specified. I'm assuming a uniform distribution on $[0,1]$ for this problem.

Let $N$ be the number of random numbers drawn before we see a sum of at least 1. Then the event $[N > n]$ is equivalent to choosing n random numbers $x_1, x_2, ... x_n$ such that their sum is less than 1. This region of $n$ -space defines the $n$ -dimensional simplex, whose volume is known to be $\frac{1}{n!}$ . Let us call this number $V_n$ .

Then the event $[N = n]$ happens when the sum of the first $(n-1)$ numbers is less than 1, but the sum of the first $n$ numbers is not less than 1. This happens with probability $p_n = V_{n-1} - V_{n}$ = $\frac{1}{(n-1)!} - \frac{1}{n!}$ $= \frac{(n-1)}{n!}.$ To find the expected value of $N$ , (after noticing that p_1 = 0) we sum over all $n \geq 2$ the quantity $n*p_n$ .

I.e., $E[N] = \sum_{n=2}^{\infty} \frac{n*(n-1)}{n!} = \sum_{n=2}^{\infty} \frac{1}{(n-2)!} = \ldots$

(after re-indexing)

$E[N] = \sum_{k=0}^{\infty} \frac{1}{k!} = e$

Good point. I've added the word "uniformly" into the question.

This region of $n$ -space defines the $n$ -dimensional simplex, whose volume is known to be $\frac{1}{n!}$ . Let us call this number $V_n$ .

Huh? What is this? Where can I learn this?

Pi Han Goh - 3 years, 11 months ago

Alex Li
Jun 22, 2017

I used code to simulate the situation several times.

public class RandomNumber {
    public static void main(String[] args) {
        //number of times to test
        int trials = 50000000;
        //total times that we have used the RNG4000
        long totalCount = 0;
        //go through each trial
        for (int i = 0; i < trials; i++) {
            //variable representing the sum of generated random numbers
            double curNum = 0;
            //generate a random number until curNum is greater than 1
            while (curNum < 1) {
                //Add the random number between 0 and 1 to curNum
                curNum += Math.random();
                //increment totalCount
                totalCount++;
            }
        }
        //the mean is total/num trials
        double result = (double)totalCount / trials;
        //print out the result
        System.out.println(result);
    }
}

Yup this is a Monte-carlo simulation . But is the final output accurate to 3 decimal places? Why or why not? Did you consider using some variance reduction technique to minimize the error?

Pi Han Goh - 3 years, 11 months ago

My 'variance reduction technique' was to try it a few times and see if the 3rd decimal place changed. The first couple of times it did, so I added to the number of trials and it eventually stabilized.

Alex Li - 3 years, 11 months ago

Thinking about it a bit more, you could find the error by repeating the experiment several times and finding the standard deviation of the set of answers. With 4 trials I had a standard deviation of under 10^-4. Since e=2.71828, I would have to be about 2.8 standard deviations below the mean to be wrong, which would happen about 2.5% of the time. I haven't done stats in a while but I think that should be accurate. I don't know of any techniques to minimize error other than adding more trials though.

Alex Li - 3 years, 11 months ago

Try antithetic variates...

Pi Han Goh - 3 years, 11 months ago

Random Number Generator

The answer is 2.718.

5 solutions

0 pending reports