Self-describing Sequence

The sequence s = [ 1 , 2 , 2 , 1 , 1 , 2 , 1 , 2 , 2 , 1 , 2 , 2 , 1 , 1 , 2 , 1 , 1 , 2 , 2 , 1 , 2 , ] s = [ 1, 2, 2, 1, 1, 2, 1, 2, 2, 1, 2, 2, 1, 1, 2, 1, 1, 2, 2, 1, 2, \dots ] may be described as

1 one, 2 twos, 2 ones, 1 two, 1 one, 2 twos, 1 one, etc.

Remarkably, the counts that occur in this description are precisely the elements of s s itself! Thus, s s is a self-describing sequence.

1 1 2 2 2 1 1 2 2 1 1 1 2 2 2 1 1 2 2 2 1 1 2 2 1 1 1 2 \underbrace{1}_{1}\ \underbrace{2\ 2}_{2}\ \underbrace{1\ 1}_{2}\ \underbrace{2}_{1}\ \underbrace{1}_{1}\ \underbrace{2\ 2}_{2}\ \underbrace{1}_{1}\ \underbrace{2\ 2}_{2}\ \underbrace{1\ 1}_{2}\ \underbrace{2}_{1}\ \underbrace{1\ 1}_{2}\ \cdots

How many twos occur in the first 1,000,000 elements of this sequence?

Bonus : In the original version of this problem, no more than 2000 bytes of memory was to be used during the calculation.


Source : A computer programming contest between Dutch students and professionals a long time ago (1996 or 1997 if I remember correctly.) Our team was the only team to get this problem right, and that within the first five minutes of the contest. Can you beat our record? :)


The answer is 500014.

This section requires Javascript.
You are seeing this because something didn't load right. We suggest you, (a) try refreshing the page, (b) enabling javascript if it is disabled on your browser and, finally, (c) loading the non-javascript version of this page . We're sorry about the hassle.

5 solutions

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
final int TOP = 35;

int val[] = new int[TOP];
boolean dbl[] = new boolean[TOP];

for (int i = 0; i < TOP; i++) { val[i] = 1; dbl[i] = false; }
dbl[TOP-1] = true;

int sum = 0;
for (int n = 0; n < 1000000; n++) {
    if (val[0] == 2) sum++;

    int lvl = 0;
    while (!dbl[lvl]) {
        val[lvl] = 3 - val[lvl]; lvl++;
    }

    for (int i = 0; i < lvl; i++)
        dbl[i] = val[i+1] == 2;

    dbl[lvl] = false;            
}

out.println(sum);

Output: 500014 \boxed{500014} .

Explanation: We generate dynamically the same sequence at different levels:

  • s 0 s_0 : The bottom-level sequence, in which we count the number of twos.

  • s 1 s_1 : The sequence that describes s 0 s_0 ; specifically, if an element in s 1 s_1 equals 2, an element in s 0 s_0 will be repeated.

  • s 2 s_2 : The sequence that describes s 1 s_1 ;

  • and so on.

It turns out that we need to keep track of only two tidbits of information at each level:

  • v a l i val_i : The digit at the current position in sequence s i s_i ;

  • d b l i dbl_i : Whether this digit in sequence s i s_i will be repeated or not.

How do we generate the next digit at any given level i i ?

  • If the flag d b l i dbl_i is set, we simply repeat the digit. The value of v a l i val_i needs no change. Only we must turn off the flag d b l i dbl_i , because no digit may be repeated more than once. There is no need to generate the value at any higher level.

  • If the flag d b l i dbl_i is not set, we must change the digit: v a l i 3 v a l i val_i\mapsto 3-val_i . To determine whether this digit must be repeated or not, we must generate the next digit at a higher level. If this equals 2, we set the d b l dbl flag; if not, we clear it. Note that generating the next digit at a higher level may in turn require generating a digit at the level above that, and so on.

The logic of my solution may be clearer if it is written recursively:

1
2
3
4
5
6
7
8
9
to generate the next digit at level (lvl):
   if dbl[lvl]:
      // do not change val[lvl]
      dbl[lvl] := false
   else
      val[lvl] = 3 - val[lvl]
      generate the next digit at level (lvl+1)
      if val[lvl+1] == 2: dbl[lvl] = true
      else: dbl[lvl] = false

At the very beginning, this will lead to endless recursion. Therefore I set the dbl[] flag at the top level (where we will never have to generate the next digit). This stops the recursion and gets the list really going.

Arjen Vreugdenhil - 5 years, 8 months ago
Snehal Shekatkar
Jun 12, 2016

A short snippet in Python3; took 0.5 seconds.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
x = [1, 2, 2]

i = 1
while len(x) < 1000000:
    i += 1
    t = x[-1]
    for j in range(x[i]):
        x.append(3-t)

print(x.count(2))

Michael Mendrin
Oct 5, 2015

The most interesting thing about this is how such a short program can generate a sequence that never repeats, i.e. quasi-random. I like that, the implications this suggests.

Technically speaking, this is not a finite state automata because it involves a "tag" that grows without limit.

The program I wrote is just a bit longer than Arjen's, and, no, I did not figure one out in 5 minutes. I used 2 pointers on the string, and depending on what one says, the other is moved, after adding a few things to the string. While this is happening, count the 2's being added, until the string is 1,000,000 terms long. "Very confusing but straightforward anyway--once it is put together".

Let me get the logic in a moment.

Let p1 and p2 (p1<p2) be positions on the string, which returns S(p1) and S(p2).

If S(p1)=1
....then if S(p2)=1
................then S(p2+1)=2
................otherwise S(p2+1)=1
....p2=p2+1



If S(p1)=2
....then if S(p2)=1
.................then S(ps+1)=1
.........................S(ps+2)=2
.................otherwise S(ps+1)=2
..................................S(ps+2)=1
....p2=p2+2

p1=p1+1

No effort was made to restrict "memory space to 2000 bytes".

What's interesting about the Thue-Morse sequence is that its recurrence plot is strikingly similar to the one for this sequence. It takes a sharp eye to distinguish between the two.

Rather interesting indeed. The Thue-Morse sequence comes to mind too. Do you have the details that led you to your solution?

Andrew Ellinor - 5 years, 8 months ago

Log in to reply

The program I wrote is just a bit longer than Arjen's, and, no, I did not figure one out in 5 minutes. I used 2 pointers on the string, and depending on what one says, the other is moved, after adding a few things to the string. While this is happening, count the 2's being added, until the string is 1,000,000 terms long. "Very confusing but straightforward anyway--once it is put together".

Let me get the logic in a moment.

Let p1 and p2 (p1<p2) be positions on the string, which returns S(p1) and S(p2).

If S(p1)=1
....then if S(p2)=1
................then S(p2+1)=2
................otherwise S(p2+1)=1
....p2=p2+1

If S(p1)=2
....then if S(p2)=1
.................then S(ps+1)=1
.........................S(ps+2)=2
.................otherwise S(ps+1)=2
..................................S(ps+2)=1
....p2=p2+2

p1=p1+1

No effort was made to restrict "memory space to 2000 bytes".

What's interesting about the Thue-Morse sequence is that its recurrence plot is strikingly similar to the one for this sequence. It takes a sharp eye to distinguish between the two.

Michael Mendrin - 5 years, 8 months ago

How do you show that this sequence never repeats?

Christopher Boo - 5 years ago
Bill Bell
Nov 18, 2015

Almost no time spent optimising. But then how often would I use this?

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
from numpy import zeros, int8

def sequence():
    x=zeros((10**7+2),int8)
    x[1:6]=[1,2,2,1,1]
    step=1
    yield 1
    step=2
    yield 2
    yield 2
    step=3
    yield 1
    yield 1
    i=5
    while True:
        step+=1
        if step%2==1:
            for j in range(x[step]):
                yield 1
                i+=1
                x[i]=1
        else:
            for j in range(x[step]):
                yield 2
                i+=1
                x[i]=2

count=0
number=0
for k in sequence():
    count+=k==2
    number+=1
    if number==10**6:
        break

print count

Hun-Min Park
Oct 6, 2015

[Python 2.7.10]

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
get = {(0, '1'):'1', (1, '1'):'2', (0, '2'):'11', (1, '2'):'22'} # {(j(mod 2), a[j]):terms}

seq = '12'
index_max = 1000000

while True:
    seq = ''.join([get[(j%2, seq[j])] for j, c in enumerate(seq)])

    if len(seq)>=index_max:
        subseq = seq[:index_max]
        print sum(map(int, subseq))-index_max

        break

... calculates the sequence recursively, starting from {1 2}, until the length of the sequence exceeds 1000000.

Not very efficient code. (spents about 3~4 seconds)

0 pending reports

×

Problem Loading...

Note Loading...

Set Loading...