Unusual RNG

Computer Science Level 4

You have an infinite stream of digits from 0-9, each digit produced uniformly at random. You want to produce a random integer in the following, unusual manner:

First, take the first digit of the stream. Then, for each of the next digits, if it's larger than the last digit you took, ignore it. Otherwise, take it as well. This process ends when you take a 0 (since everything else will be 0). Your number is the sum of all the numbers taken.

For example, consider the following stream: 573493205714390...

First, you take the 5.
The next digit is a 7, so you ignore it.
The next digit is a 3, so you take it. Now the last digit you took is 3.
The next digit is a 4. Despite this being less than 5, you ignore it, since the last digit you took is 3 and 4 is larger than 3.
The next digit is a 9, which is also skipped.
The next digit is a 3. This is taken since you only skip strictly larger numbers.
The next digit is a 2, which is also taken.
The next digit is a 0, which ends the process since you're not going to take any other digit than 0. The resulting number produced by your process is $5+3+3+2 = 13$ .

This problem has two parts, both relating to this generator; you need to solve both of them.

It can be proven that not only the process will end, the expected number you'll get is also finite and a rational number. The expected number you get can be represented as $\frac{a}{b}$ with $a,b$ positive integers that are relatively prime.
There exists a smallest integer $N$ such that 99.99% of the time, the obtained number is strictly smaller than $N$ $($ and the obtained number is smaller than $N-1$ less than 99.99% of the time $).$

Find $10^6 a + 10^3 b + N$ .

The answer is 9001054.

3 solutions

Mark Hennings
Aug 12, 2017

I used exactly the same method as Ivan to develop $E[d] = d$ , and hence $a=9,b=1$ .

Any sequence that obtains the sum $n\ge1$ is a partition of the integer $n$ , and there is one valid sequence that yields the number $n$ for each partition of $n$ that has all digits less than $10$ . The probability of obtaining a particular sequence $a_1 \ge a_2 \ge \cdots \ge a_m > 0$ is $\frac{1}{10(a_1+1)(a_2+1)\cdots(a_m+1)}$ since we roll $a_1$ with probability $\tfrac{1}{10}$ , each $a_j$ with probability $\tfrac{1}{a_{j-1}+1}$ , and the final $0$ to terminate the sequence with probability $\tfrac{1}{a_m+1}$ . Thus, if $M$ is the total scored, then $\mathbb{P}[M=n] \; = \; \sum_{(a_1,a_2,...,a_m) \in \mathcal{P}_{10}(n)} \frac{1}{10(a_1+1)(a_2+1)\cdots(a_m+1)}$ where $\mathcal{P}_{10}(n)$ is the set of partitions of $n$ for which all digits are less than $10$ . Note that $\mathbb{P}[M=0] =\tfrac{1}{10}$ . Thus the cumulative distribution function of $M$ is $F(n) \; = \; \mathbb{P}[M \le n] \; = \; \tfrac1{10} + \sum_{m=1}^n \mathbb{P}[M=m] \hspace{2cm} n \ge 1$ Since Mathematica has an IntegerPartitions command, it is particularly easy to programme to calculate that $F(52)=0.99987$ and that $F(53) = 0.99991$ , so that $N=54$ , and the answer is $\boxed{9001054}$ .

That's weird, are you sure you didn't mistype $F(52) = 0.99987$ instead?

Your approach works, but I don't think it's efficient. If I analyzed it correctly, your algorithm would take roughly $O(n^d)$ time (I think there are around $O(n^{d-1})$ partitions, and $O(n)$ time to evaluate the fraction for each). My algorithm takes roughly $O(nd^2)$ time. The good thing is your runtime has a very small coefficient and the answer is also small, so it works out; but if I had asked something like $10^{-100}$ chance of going over, your algorithm might not finish quickly.

Ivan Koswara - 3 years, 10 months ago

Typo corrected.

Perhaps not the most efficient to calculate, but given the nature of your answer template, the value of $N$ was not likely to be large, and the coding to solve the problem only took me a couple of minutes.

Mark Hennings - 3 years, 10 months ago

You can make your method even faster by writing your recurrence relation as $(d+1)P[n,d] \; = \; dP[n,d-1] + P[n-d,d]$ That cuts us down to $O(3nd)$ operations. More intriguingly, this is a nonhomogeneous recurrence relation $(d+1)P[n,d] - P[n-d,d] \; =\; dP[n,d-1]$ and hence it is clear that $P[n,d]$ can be written as a linear combination of $\{\zeta_{j,k}\,:\, 1 \le k \le j \le d \}$ , where $\zeta_{j,k} \;(1 \le k \le j)$ are the $j$ complex $j$ th roots of $\tfrac{1}{j+1}$ . This raises the tantalising, but notationally complex, possibility of there being a closed form for $P[n,d]$

Mark Hennings - 3 years, 10 months ago

Can somebody explain why the solution for N is not N=53 given above probabilities, as F(53)=0.99991>99.99%, thus setting N=54 means that 0 to 53 will come slightly more than 99.99% of the time?

Btw. I got F(52)=0.99988719 and F(53)=0.99990995, thus shifted by 1 compared to Ivan's solution, but similar to Mark's solution, and I've used a dynamic programming approach similar to Ivan's solution.

Darko Simonovic - 3 years, 9 months ago

Because if $N = 53$ , you're saying that the sum being strictly less than 53 has 99.99% probability. As you see, that's not right (99.988%). You need $N = 54$ , so that the sum being strictly less than 54 (that is, 0 to 53) has 99.99% probability.

Ivan Koswara - 3 years, 9 months ago

Ivan Koswara
Aug 10, 2017

The biggest observation that will tremendously simplify everything is the following: if your last taken digit is $d$ , then you can effectively treat the stream as a stream of digits from 0 to $d$ , because you're going to skip all digits greater than $d$ anyway. (More over, each digit still appears with equal probability.) This means we can generalize the problem: instead of a stream of digits from 0 to 9, let's say we have a stream of digits from 0 to $d$ , and once we have a formula (or algorithm) to compute the answers to both problems for general $d$ , we'll just plug in $d = 9$ .

Despite being a computer science problem, the expected number can in fact be computed without resorting to computer. Let $E[d]$ be the expected value of the obtained number from a stream of 0 to $d$ . Clearly $E[0] = 0$ ; our stream is simply endless zeroes. What about the other $d$ ?

Suppose the first digit of the stream is $k$ . Then the expected value we'll get in this case is simply $k + E[k]$ : take the first digit, and by the important observation above, the rest of the stream can be simplified to a stream of 0 to $k$ , which has expected value $E[k]$ . Since our number is the sum of the two, by linearity of expectation we get the claim.

Since $k$ can be any of 0 to $d$ uniformly, we simply add them together:

$\displaystyle E[d] = \sum_{k=0}^d \frac{1}{d+1} \cdot (k + E[k])$

We can try to compute $E[1], E[2], E[3], \ldots$ , since each $E[d]$ only depends on the values of $E[k]$ for $0 \le k \le d$ ; the values of $E[k]$ for $0 \le k \le d-1$ are known, so the resulting equation is just a linear equation in $E[d]$ , which can be solved easily. After trying a few values, you'll notice that $E[d] = d$ ; now we can use strong induction to prove that this holds for all $d$ . Thus $E[9] = 9$ , and so $a = 9, b = 1$ .

The second part of the problem is more difficult and definitely requires computer. The trick is to flip the problem: instead of asking "given a probability $p$ , what is the smallest $N$ such that the probability that the generated integer is smaller than $N$ ?", we instead ask "given a number $N$ , what is the probability $p_N$ that the generated integer is smaller than $N$ ?" Clearly $p_N$ increases as $N$ increases, so we simply need to find some $N$ where $p_N \ge 0.9999$ to solve the problem.

For convenience, we'll instead define $P[n, d]$ to be the probability that the number is greater than or equal to $n$ , from a stream of 0 to $d$ . Then $N$ will be the number satisfying $P[N, 9] \le 0.0001 < P[N-1, 9]$ . We still need to figure out how to compute $P[n,d]$ , though. It's easy to make some base cases: if $n \le 0$ , then $P[n,d] = 1$ (the resulting number must be at least 0); and if $n > 0$ and $d = 0$ , then $P[n,d] = 0$ (when the stream is all zeroes, you can't do anything).

The trick is to compute $P[n,d]$ from simpler values of $n,d$ , similar to above. Let's say the first number in the stream is $k$ . Then the probability that the desired number is at least $n$ is easy to compute: we already have $k$ , so we need $n-k$ more, and the stream is cut to simply 0 to $k$ . This is exactly $P[n-k, k]$ . Since $k$ can be any of 0 to $d$ , and they all are mutually independent, we obtain

$\displaystyle P[n,d] = \sum_{k=0}^d \frac{1}{d+1} \cdot P[n-k, k]$

This is easy to compute. The only problem is that we don't know how high $n$ can be (in order to find the right $P[n,9]$ ), so we might potentially do this for a very long time. You can either just try it anyway (and be delighted that the answer is pretty small), or we can put a very rough upper bound at it:

Let $n = 9m$ . In order to have our number to be at least $n = 9m$ , we need to see at least $m$ nonzero digits (since each of them is at most 9). Even if we skip numbers, those must also be nonzero (since we never skip zeroes). In other words, the first $m$ digits of the stream must be nonzero. The probability that this is the case is simply $(9/10)^m$ ; each digit has probability $9/10$ to be nonzero, and there are $m$ independent events that must all succeed at the same time. Computing $m$ such that $(9/10)^m < 0.0001$ , we obtain $m \ge 88$ . Thus it's enough to test up to $n = 9 \cdot 88 = 792$ ; we must have $P[792, 9] < 0.0001$ . Hey, this is pretty small! We can indeed simulate it after all.

After we try it out, we find that $P[53, 9] \approx 0.0001128073, P[54, 9] \approx 0.0000900426$ . Thus $N = 54$ . The final answer is $\boxed{9001054}$ .

Arjen Vreugdenhil
Aug 21, 2017

For the first part, let $E(k)$ be the expected sum if we only take digits $i \leq k$ (the "ceiling").

Lemma : $E(k) = k$

Proof : By induction. Clearly, $E(0) = 0$ . Now assume that $E(i) = i$ for $i < k$ . Let $k$ be the "ceiling". Any digits greater than $k$ we ignore; the first digit we "take" is uniformly distributed over $\{0,\dots,k\}$ ; each digit in this set has a probability $1/(k+1)$ . The number we find will be $i +$ whatever we can expect with this new digit $i$ as the "ceiling". Thus $E(k) = \sum_{i = 0}^k \frac 1{k+1} \left(i + E(i)\right) \\ = \frac 1{k+1} \sum_{i=0}^k (2i) \\ = \frac 1{k+1} 2\cdot \tfrac12k(k+1) \\ = k\ \ \ \ \ \ \ \ \ \text{Q.E.D.}$

Thus the first answer is $E(9) = 9$ , or $a = 9$ and $b = 1$ .

For the second part, we must study the probabilities $P(k,n)$ of reaching a total of $n$ with "ceiling" $k$ . A recursion relation is easily established: $\begin{cases} P(0,0) = 1 \\ P(k,n) = \frac 1 {k+1} \sum_{i = 0}^k P(i,n-i) & k \geq 0, n \geq 0, n-i \geq 0 \end{cases}$

There is no simple expression for $P(k,n)$ , so we use a computer to generate this. To find the solution, we determine the cumulative distribution and find the smallest $N$ for which $\sum_{n = 0}^{N-1} P(9,n) \geq 0.9999.$

final int TOP = 1000;       
double p[][] = new double[10][TOP];

for (int k = 0; k <= 9; k++) {
    for (int n = 0; n < TOP; n++) 
        if (k == 0 && n == 0) p[k][n] = 1;
        else {
            p[k][n] = 0;
            for (int d = 0; d <= k && n-d >= 0; d++)
                p[k][n] += p[d][n-d]/(k+1.);
        }
}

double cumul = 0;
int n = 0;
while (n < TOP && cumul < 0.9999) {
    cumul += p[9][n]; n++;
}
System.out.println(n);

The output of this program is $N = 54$ .

Thus we post the solution $a\cdot 10^6 + b\cdot 10^3 + N = \boxed{9\,001\,054}$ .

Unusual RNG

The answer is 9001054.

3 solutions

0 pending reports