You have recently bought the Random Number Generator 4000 from your local hardware store. This remarkable machine outputs numbers randomly and uniformly in the range [ 0 , 1 ] . You decide to create a list of these numbers and find the total sum.
What is the expected number of times the machine should operate to generate a number before the total sum is at least 1?
Answer to 3 decimal places.
This section requires Javascript.
You are seeing this because something didn't load right. We suggest you, (a) try
refreshing the page, (b) enabling javascript if it is disabled on your browser and,
finally, (c)
loading the
non-javascript version of this page
. We're sorry about the hassle.
The expected value of N is (sum) {k} k * P[N=k]. Why should this equal (sum) {k} P[ N >= k]? That happens to be true in this case but usually it's not going to be true.
Log in to reply
No, it is always true. This is a standard result for positive integer-valued discrete random variables. Here is the proof. E [ X ] = k ≥ 1 ∑ k P [ X = k ] = k ≥ 1 ∑ k ( P [ X ≥ k ] − P [ X ≥ k + 1 ] ) = k ≥ 1 ∑ k P [ X ≥ k ] − k ≥ 2 ∑ ( k − 1 ) P [ X ≥ k ] = P [ X ≥ 1 ] + k ≥ 2 ∑ { k − ( k − 1 ) } P [ X ≥ k ] = k ≥ 1 ∑ P [ X ≥ k ]
Log in to reply
That's a common alternative way to calculate the expected value, especially if the CDF had a nice form.
That's a handy little identity. Thanks for the proof!
As an aside, the continuous analogue of this identity is E [ X ] = ∫ 0 ∞ x f ( x ) d x = ∫ 0 ∞ [ 1 − F ( x ) ] d x ↔ ∑ ( 1 − P [ X < k ] ) = ∑ P [ X ≥ k ] .
Edit: You can prove this using Fubini, as pointed out by Mark.
Log in to reply
@Calvin Lin – Well, I would rather prove it by Fubini/Tonelli, since I need not assume that f is continuous. Thus E [ X ] = ∫ 0 ∞ x f ( x ) d x = ∫ 0 ∞ ( ∫ 0 x f ( x ) d y ) d x = ∫ 0 ∞ ( ∫ y ∞ f ( x ) d x ) d y = ∫ 0 ∞ ( 1 − F ( x ) ) d x A further advantage of this method is that we only need to assume that E [ X ] exists. This tells us that x f ( x ) ∈ L 1 ( 0 , ∞ ) . Fubini/Tonelli then tells us that 1 − F ∈ L 1 ( 0 , ∞ ) , and gives us the identity...
Log in to reply
@Mark Hennings – IIRC (and I might be wrong here) U-substitution and Fubini are related because we're essentially integrating along a different axis. IE The Fubini change of variables shows us how to go from integrating dx to dy. The u-substitution switch proof is that "area under the graph to y-axis + area under the graph to x-axis = difference of the 2 rectangles".
Log in to reply
@Calvin Lin – Well I would need to see your proof, but u -substitution, as I understand it (it is not a piece of UK terminology), is making a continuously differentiable change of variable, such as putting u = x 2 in ∫ x e x 2 d x . The two rectangle trick enables us to relate the integral of the inverse f − 1 of a function f to the integral of f . Unless I am making a mistake, we are either assuming that F is continuously differentiable, so that f is continuous, or else that F is invertible. Neither of these are necessarily true for all RVs.
Log in to reply
@Mark Hennings – Ah, my bad. I meant integration by parts , which does require the assumption of continuity.
I agree that Fubini is a much better approach, and have edited my comment.
I don't understand the question. Why isn't the answer simply 2?
E[X 1 + X 2] = 1
And you have a symmetrical distributuion around the mean
Log in to reply
Can you elaborate on what you mean?
Just because the expected value of X 1 + X 2 is 1 doesn't mean that we expect it will take 2 variables on average to sum above 1.
E.g. The expected value of X 1 is 2 1 , and we expect that it will take more than 1 variable to sum above 2 1 .
Log in to reply
If I graph X1 on the x-axis and X2 on the y-axis, and graph the sum X1+X2 on the z-axis, exactly half the area of the unit square in the x-y axis will have z-value > 1 and half < 1. I'm obviously misunderstanding the question. Is my misunderstanding that the question asks when will the value be 'expected' to equal or exceed 1, and in my example the probability is exactly 1/2 rather than >1/2?
Log in to reply
@Sean McCloskey – Great, glad to see that you figured this out.
To clarify it for others who might be in the same situation, if we let P ( n ) be the probability that n variables were needed to first exceed 1, then what you've shown is that P ( 1 ) = 0 , P ( 2 ) = 2 1 .
This doesn't yet give us enough information to calculate the expected value for the number of variables, which is given by ∑ n P ( n ) .
Never mind. I understand what it is asking.
For each y in [ 0 , 1 ] , let E ( y ) be the expected number of times the machine should generate a number before the total sum is y .
If any number in [ y , 1 ] is generated, we are done. Otherwise x in [ 0 , y ] is generated with equal probability distribution, and it follows that
E ( y ) = ( 1 − y ) + ∫ x = 0 y ( 1 + E ( y − x ) ) d x
Integrating the 1 and making the substitution y − x = u , we get
E ( y ) = 1 + ∫ u = 0 y E ( u ) d u .
Take 1 + ∫ u = 0 y E ( u ) d u = f ( y ) , so that E ( y ) = f ′ ( y ) .
Thus f ′ ( y ) = f ( y ) , which gives f ( y ) = C e y . But f ( 0 ) = 1 + ∫ u = 0 0 E ( u ) d u = 1 , clearly. So C = 1 . Therefore
E ( y ) = f ′ ( y ) = e y , and E ( 1 ) = e = 2 . 7 1 8
If any number in [ y , 1 ] is generated, we are done. Otherwise x in [ 0 , y ] is generated with equal probability distribution...
Oh woah, I didn't think of partitioning it this way. Such a simple and elegant idea! Thanks for sharing!
We may rephrase the problem: Suppose that the n th number generated is a n . What is the expected number of times the machine will generate numbers such that a n > a n − 1 for n > 1 ?
This problem is slightly easier to deal with. Note that E ( X ) = 1 P r ( 1 ) + 2 P r ( 2 ) + 3 P r ( 3 ) + . . . where X is the event and P r ( k ) denotes the probability of k number being drawn.
Suppose that the machine has already generated k − 1 numbers and we are to generate a new one. We will find the probability of success. Since there exactly k ! orderings of a 1 , a 2 , . . . a k , and only one satisfies the condition that a n < a n + 1 for all 1 ≤ n ≤ k − 1 . Hence P r ( k ) = k ! 1 , or equivalently, k P r ( k ) = ( k − 1 ) ! 1 . From here, it is easy to see that the desired value is the famous constant e .
I think some explanation as to why your reformulation of the problem is answering the same question would be useful. It is not obvious.
It's obviously 2.178 just use logic
Pre-solution nitpick: "randomly drawn" does not necessarily mean "randomly drawn from a uniform distribution" though our thinking seems to default to said distribution when no distribution is specified. I'm assuming a uniform distribution on [ 0 , 1 ] for this problem.
Let N be the number of random numbers drawn before we see a sum of at least 1. Then the event [ N > n ] is equivalent to choosing n random numbers x 1 , x 2 , . . . x n such that their sum is less than 1. This region of n -space defines the n -dimensional simplex, whose volume is known to be n ! 1 . Let us call this number V n .
Then the event [ N = n ] happens when the sum of the first ( n − 1 ) numbers is less than 1, but the sum of the first n numbers is not less than 1. This happens with probability p n = V n − 1 − V n = ( n − 1 ) ! 1 − n ! 1 = n ! ( n − 1 ) . To find the expected value of N , (after noticing that p_1 = 0) we sum over all n ≥ 2 the quantity n ∗ p n .
I.e., E [ N ] = ∑ n = 2 ∞ n ! n ∗ ( n − 1 ) = ∑ n = 2 ∞ ( n − 2 ) ! 1 = …
(after re-indexing)
E [ N ] = ∑ k = 0 ∞ k ! 1 = e
Good point. I've added the word "uniformly" into the question.
This region of n -space defines the n -dimensional simplex, whose volume is known to be n ! 1 . Let us call this number V n .
Huh? What is this? Where can I learn this?
I used code to simulate the situation several times.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 |
|
Yup this is a Monte-carlo simulation . But is the final output accurate to 3 decimal places? Why or why not? Did you consider using some variance reduction technique to minimize the error?
Log in to reply
My 'variance reduction technique' was to try it a few times and see if the 3rd decimal place changed. The first couple of times it did, so I added to the number of trials and it eventually stabilized.
Thinking about it a bit more, you could find the error by repeating the experiment several times and finding the standard deviation of the set of answers. With 4 trials I had a standard deviation of under 10^-4. Since e=2.71828, I would have to be about 2.8 standard deviations below the mean to be wrong, which would happen about 2.5% of the time. I haven't done stats in a while but I think that should be accurate. I don't know of any techniques to minimize error other than adding more trials though.
Problem Loading...
Note Loading...
Set Loading...
Let N be the number of times the number generator operates until the running total is 1 or more. Thus N is the smallest integer n such that X 1 + X 2 + ⋯ + X n ≥ 1 where each X j is uniformly distributed over ( 0 , 1 ) , independently of all the others.
If we let F n ( t ) = P [ X 1 + X 2 + ⋯ + X n ≤ t ] , then F 1 ( t ) = t for 0 ≤ t ≤ 1 and F n + 1 ( t ) = ∫ 0 t F n ( t − u ) d u n ≥ 1 , t > 0 and it is a simple induction to show that F n ( t ) = n ! t n n ≥ 1 , 0 < t < 1 Thus we deduce that P [ N ≥ k ] = P [ X 1 + X 2 + ⋯ + X k − 1 < 1 ] = F k − 1 ( 1 ) = ( k − 1 ) ! 1 for all k ≥ 1 , and hence E [ N ] = k ≥ 1 ∑ P [ N ≥ k ] = k = 1 ∑ ∞ ( k − 1 ) ! 1 = e