Random Number Generator

You have recently bought the Random Number Generator 4000 from your local hardware store. This remarkable machine outputs numbers randomly and uniformly in the range [ 0 , 1 ] [0, 1] . You decide to create a list of these numbers and find the total sum.

What is the expected number of times the machine should operate to generate a number before the total sum is at least 1?

Answer to 3 decimal places.


The answer is 2.718.

This section requires Javascript.
You are seeing this because something didn't load right. We suggest you, (a) try refreshing the page, (b) enabling javascript if it is disabled on your browser and, finally, (c) loading the non-javascript version of this page . We're sorry about the hassle.

5 solutions

Mark Hennings
Jun 14, 2017

Let N N be the number of times the number generator operates until the running total is 1 1 or more. Thus N N is the smallest integer n n such that X 1 + X 2 + + X n 1 X_1 + X_2 + \cdots + X_n \ge 1 where each X j X_j is uniformly distributed over ( 0 , 1 ) (0,1) , independently of all the others.

If we let F n ( t ) = P [ X 1 + X 2 + + X n t ] F_n(t) \; = \; \mathbf{P}[X_1 + X_2 + \cdots + X_n \le t] , then F 1 ( t ) = t F_1(t) = t for 0 t 1 0 \le t \le 1 and F n + 1 ( t ) = 0 t F n ( t u ) d u n 1 , t > 0 F_{n+1}(t) \; = \; \int_0^t F_n(t-u)\,du \hspace{2cm} n \ge 1\,,\,t > 0 and it is a simple induction to show that F n ( t ) = t n n ! n 1 , 0 < t < 1 F_n(t) \; = \; \frac{t^n}{n!} \hspace{2cm} n \ge 1 \;,\; 0 < t < 1 Thus we deduce that P [ N k ] = P [ X 1 + X 2 + + X k 1 < 1 ] = F k 1 ( 1 ) = 1 ( k 1 ) ! \mathbb{P}[N \ge k] \; = \; \mathbb{P}[X_1 + X_2 + \cdots + X_{k-1} < 1] \; = \; F_{k-1}(1) = \frac{1}{(k-1)!} for all k 1 k \ge 1 , and hence E [ N ] = k 1 P [ N k ] = k = 1 1 ( k 1 ) ! = e \mathbb{E}[N] \; = \; \sum_{k \ge 1} \mathbb{P}[N \ge k] \; = \; \sum_{k=1}^\infty \frac{1}{(k-1)!} \; = \; \boxed{e}

The expected value of N is (sum) {k} k * P[N=k]. Why should this equal (sum) {k} P[ N >= k]? That happens to be true in this case but usually it's not going to be true.

Richard Desper - 3 years, 11 months ago

Log in to reply

No, it is always true. This is a standard result for positive integer-valued discrete random variables. Here is the proof. E [ X ] = k 1 k P [ X = k ] = k 1 k ( P [ X k ] P [ X k + 1 ] ) = k 1 k P [ X k ] k 2 ( k 1 ) P [ X k ] = P [ X 1 ] + k 2 { k ( k 1 ) } P [ X k ] = k 1 P [ X k ] \begin{aligned} \mathbb{E}[X] & = \sum_{k \ge 1}k\mathbb{P}[X=k] \; = \; \sum_{k \ge 1}k\left(\mathbb{P}[X \ge k] - \mathbb{P}[X \ge k+1]\right) \\ & = \sum_{k \ge 1} k\mathbb{P}[X \ge k] - \sum_{k \ge 2}(k-1)\mathbb{P}[X \ge k] \\ & = \mathbb{P}[X \ge 1] + \sum_{k \ge 2}\big\{k - (k-1)\big\}\mathbb{P}[X \ge k] \\ & = \sum_{k \ge 1}\mathbb{P}[X \ge k] \end{aligned}

Mark Hennings - 3 years, 11 months ago

Log in to reply

That's a common alternative way to calculate the expected value, especially if the CDF had a nice form.

Calvin Lin Staff - 3 years, 11 months ago

That's a handy little identity. Thanks for the proof!

Richard Desper - 3 years, 11 months ago

As an aside, the continuous analogue of this identity is E [ X ] = 0 x f ( x ) d x = 0 [ 1 F ( x ) ] d x ( 1 P [ X < k ] ) = P [ X k ] E[X] = \int_0^{\infty} x f(x) \, dx = \int_{0}^\infty [ 1 - F(x) ] \, dx \leftrightarrow \sum ( 1 - P[X < k ] ) = \sum P [ X \geq k ] .

Edit: You can prove this using Fubini, as pointed out by Mark.

Calvin Lin Staff - 3 years, 11 months ago

Log in to reply

@Calvin Lin Well, I would rather prove it by Fubini/Tonelli, since I need not assume that f f is continuous. Thus E [ X ] = 0 x f ( x ) d x = 0 ( 0 x f ( x ) d y ) d x = 0 ( y f ( x ) d x ) d y = 0 ( 1 F ( x ) ) d x \begin{aligned} \mathbb{E}[X] & = \int_0^\infty xf(x)\,dx \; = \; \int_0^\infty \left(\int_0^x f(x)\,dy\right)\,dx \\ & = \int_0^\infty \left(\int_y^\infty f(x)\,dx\right)\,dy \; = \; \int_0^\infty (1 - F(x))\,dx \end{aligned} A further advantage of this method is that we only need to assume that E [ X ] \mathbb{E}[X] exists. This tells us that x f ( x ) L 1 ( 0 , ) xf(x) \in L^1(0,\infty) . Fubini/Tonelli then tells us that 1 F L 1 ( 0 , ) 1-F \in L^1(0,\infty) , and gives us the identity...

Mark Hennings - 3 years, 11 months ago

Log in to reply

@Mark Hennings IIRC (and I might be wrong here) U-substitution and Fubini are related because we're essentially integrating along a different axis. IE The Fubini change of variables shows us how to go from integrating dx to dy. The u-substitution switch proof is that "area under the graph to y-axis + area under the graph to x-axis = difference of the 2 rectangles".

Calvin Lin Staff - 3 years, 11 months ago

Log in to reply

@Calvin Lin Well I would need to see your proof, but u u -substitution, as I understand it (it is not a piece of UK terminology), is making a continuously differentiable change of variable, such as putting u = x 2 u = x^2 in x e x 2 d x \int xe^{x^2}\,dx . The two rectangle trick enables us to relate the integral of the inverse f 1 f^{-1} of a function f f to the integral of f f . Unless I am making a mistake, we are either assuming that F F is continuously differentiable, so that f f is continuous, or else that F F is invertible. Neither of these are necessarily true for all RVs.

Mark Hennings - 3 years, 11 months ago

Log in to reply

@Mark Hennings Ah, my bad. I meant integration by parts , which does require the assumption of continuity.

I agree that Fubini is a much better approach, and have edited my comment.

Calvin Lin Staff - 3 years, 11 months ago

I don't understand the question. Why isn't the answer simply 2?

E[X 1 + X 2] = 1

And you have a symmetrical distributuion around the mean

Sean McCloskey - 3 years, 11 months ago

Log in to reply

Can you elaborate on what you mean?

Just because the expected value of X 1 + X 2 X_1 + X_2 is 1 doesn't mean that we expect it will take 2 variables on average to sum above 1.

E.g. The expected value of X 1 X_ 1 is 1 2 \frac12 , and we expect that it will take more than 1 variable to sum above 1 2 \frac{1}{2} .

Calvin Lin Staff - 3 years, 11 months ago

Log in to reply

If I graph X1 on the x-axis and X2 on the y-axis, and graph the sum X1+X2 on the z-axis, exactly half the area of the unit square in the x-y axis will have z-value > 1 and half < 1. I'm obviously misunderstanding the question. Is my misunderstanding that the question asks when will the value be 'expected' to equal or exceed 1, and in my example the probability is exactly 1/2 rather than >1/2?

Sean McCloskey - 3 years, 11 months ago

Log in to reply

@Sean McCloskey Great, glad to see that you figured this out.

To clarify it for others who might be in the same situation, if we let P ( n ) P(n) be the probability that n n variables were needed to first exceed 1, then what you've shown is that P ( 1 ) = 0 , P ( 2 ) = 1 2 P(1) = 0 , P(2) = \frac{1}{2} .

This doesn't yet give us enough information to calculate the expected value for the number of variables, which is given by n P ( n ) \sum n P(n) .

Calvin Lin Staff - 3 years, 11 months ago

Never mind. I understand what it is asking.

Sean McCloskey - 3 years, 11 months ago
Shourya Pandey
Jun 18, 2017

For each y y in [ 0 , 1 ] [0,1] , let E ( y ) E(y) be the expected number of times the machine should generate a number before the total sum is y y .

If any number in [ y , 1 ] [y,1] is generated, we are done. Otherwise x x in [ 0 , y ] [0,y] is generated with equal probability distribution, and it follows that

E ( y ) = ( 1 y ) + x = 0 y ( 1 + E ( y x ) ) d x \displaystyle E(y) = (1-y) + \int_{x=0}^{y} {(1+E(y-x)) dx}

Integrating the 1 1 and making the substitution y x = u y-x = u , we get

E ( y ) = 1 + u = 0 y E ( u ) d u \displaystyle E(y) = 1 + \int_{u=0}^{y} {E(u)du} .

Take 1 + u = 0 y E ( u ) d u = f ( y ) \displaystyle 1+ \int_{u=0}^{y} {E(u)du} = f(y) , so that E ( y ) = f ( y ) E(y) = f'(y) .

Thus f ( y ) = f ( y ) f'(y) = f(y) , which gives f ( y ) = C e y f(y) = Ce^{y} . But f ( 0 ) = 1 + u = 0 0 E ( u ) d u = 1 f(0) = \displaystyle 1+ \int_{u=0}^{0} {E(u)du} = 1 , clearly. So C = 1 C=1 . Therefore

E ( y ) = f ( y ) = e y E(y) = f'(y) = e^y , and E ( 1 ) = e = 2.718 E(1) = e = \boxed{2.718}

If any number in [ y , 1 ] [y,1] is generated, we are done. Otherwise x x in [ 0 , y ] [0,y] is generated with equal probability distribution...

Oh woah, I didn't think of partitioning it this way. Such a simple and elegant idea! Thanks for sharing!

Pi Han Goh - 3 years, 11 months ago
Jeffrey Lu
Jun 18, 2017

We may rephrase the problem: Suppose that the n n th number generated is a n a_n . What is the expected number of times the machine will generate numbers such that a n > a n 1 a_n > a_{n - 1} for n > 1 n > 1 ?

This problem is slightly easier to deal with. Note that E ( X ) = 1 P r ( 1 ) + 2 P r ( 2 ) + 3 P r ( 3 ) + . . . E(X) = 1Pr(1) + 2Pr(2) + 3Pr(3) + ... where X X is the event and P r ( k ) Pr(k) denotes the probability of k k number being drawn.

Suppose that the machine has already generated k 1 k - 1 numbers and we are to generate a new one. We will find the probability of success. Since there exactly k ! k! orderings of a 1 , a 2 , . . . a k a_1, a_2, ... a_k , and only one satisfies the condition that a n < a n + 1 a_n < a_{n + 1} for all 1 n k 1 1 \leq n \leq k - 1 . Hence P r ( k ) = 1 k ! Pr(k) = \frac{1}{k!} , or equivalently, k P r ( k ) = 1 ( k 1 ) ! kPr(k) = \frac{1}{(k - 1)!} . From here, it is easy to see that the desired value is the famous constant e e .

I think some explanation as to why your reformulation of the problem is answering the same question would be useful. It is not obvious.

Mark Hennings - 3 years, 11 months ago

It's obviously 2.178 just use logic

Luke Bennett - 3 years, 11 months ago
Richard Desper
Jun 20, 2017

Pre-solution nitpick: "randomly drawn" does not necessarily mean "randomly drawn from a uniform distribution" though our thinking seems to default to said distribution when no distribution is specified. I'm assuming a uniform distribution on [ 0 , 1 ] [0,1] for this problem.

Let N N be the number of random numbers drawn before we see a sum of at least 1. Then the event [ N > n ] [N > n] is equivalent to choosing n random numbers x 1 , x 2 , . . . x n x_1, x_2, ... x_n such that their sum is less than 1. This region of n n -space defines the n n -dimensional simplex, whose volume is known to be 1 n ! \frac{1}{n!} . Let us call this number V n V_n .

Then the event [ N = n ] [N = n] happens when the sum of the first ( n 1 ) (n-1) numbers is less than 1, but the sum of the first n n numbers is not less than 1. This happens with probability p n = V n 1 V n p_n = V_{n-1} - V_{n} = 1 ( n 1 ) ! 1 n ! \frac{1}{(n-1)!} - \frac{1}{n!} = ( n 1 ) n ! . = \frac{(n-1)}{n!}. To find the expected value of N N , (after noticing that p_1 = 0) we sum over all n 2 n \geq 2 the quantity n p n n*p_n .

I.e., E [ N ] = n = 2 n ( n 1 ) n ! = n = 2 1 ( n 2 ) ! = E[N] = \sum_{n=2}^{\infty} \frac{n*(n-1)}{n!} = \sum_{n=2}^{\infty} \frac{1}{(n-2)!} = \ldots

(after re-indexing)

E [ N ] = k = 0 1 k ! = e E[N] = \sum_{k=0}^{\infty} \frac{1}{k!} = e

Good point. I've added the word "uniformly" into the question.

This region of n n -space defines the n n -dimensional simplex, whose volume is known to be 1 n ! \frac{1}{n!} . Let us call this number V n V_n .

Huh? What is this? Where can I learn this?

Pi Han Goh - 3 years, 11 months ago
Alex Li
Jun 22, 2017

I used code to simulate the situation several times.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
public class RandomNumber {
    public static void main(String[] args) {
        //number of times to test
        int trials = 50000000;
        //total times that we have used the RNG4000
        long totalCount = 0;
        //go through each trial
        for (int i = 0; i < trials; i++) {
            //variable representing the sum of generated random numbers
            double curNum = 0;
            //generate a random number until curNum is greater than 1
            while (curNum < 1) {
                //Add the random number between 0 and 1 to curNum
                curNum += Math.random();
                //increment totalCount
                totalCount++;
            }
        }
        //the mean is total/num trials
        double result = (double)totalCount / trials;
        //print out the result
        System.out.println(result);
    }
}

Yup this is a Monte-carlo simulation . But is the final output accurate to 3 decimal places? Why or why not? Did you consider using some variance reduction technique to minimize the error?

Pi Han Goh - 3 years, 11 months ago

Log in to reply

My 'variance reduction technique' was to try it a few times and see if the 3rd decimal place changed. The first couple of times it did, so I added to the number of trials and it eventually stabilized.

Alex Li - 3 years, 11 months ago

Thinking about it a bit more, you could find the error by repeating the experiment several times and finding the standard deviation of the set of answers. With 4 trials I had a standard deviation of under 10^-4. Since e=2.71828, I would have to be about 2.8 standard deviations below the mean to be wrong, which would happen about 2.5% of the time. I haven't done stats in a while but I think that should be accurate. I don't know of any techniques to minimize error other than adding more trials though.

Alex Li - 3 years, 11 months ago

Log in to reply

Try antithetic variates...

Pi Han Goh - 3 years, 11 months ago

0 pending reports

×

Problem Loading...

Note Loading...

Set Loading...