Fix a finite alphabet,
, with
, such as
or
or
or the ascii-characters, etc. Let
be a
finite sequence
. Consider now a fictional universe. In this universe biological organisms guard access to their cells via gateways. These gateways operate like a finite automata and require input of the right key,
. To enter the key, a steady stream of molecules encoding letters in
are fed in one at a time. The automata is built such that as long as the input-sequence is a strict initial segment of
, the gateway remains in a passive
expecting
mode. Should one letter be wrong, the input string of molecules is destroyed (
rejected
) including the bad letter, and the input sequence starts from scratch with the next incoming molecules. If, on the other hand, the sequence,
, is achieved, the gateway transitions to an
accepted
state and access to the cell is granted.
Enter now the Brutomolecule , a life-form trying to access organisms in distant solar systems. It is an entity, which sends signals into the far reaches of the universe and tries to breach any organisms it encounters by inducing and transmitting a stead stream of randomised molecules . Given any fixed organism, let or simply denote the breaching time : the ‘time’ it takes to gain entrance through a gateway with key . Working with the above definitions, here are some simple examples with the alphabet and the key :
Here is a more involved example demonstrating the mercilessness of rejection. Let :
Note, that it does not hold, that , since the highlighted subsequence is never inputted into the gateway. This in turn is the case, as the sequence , which overlaps with the first occurrence of , is rejected upon the 9th letter and not the 8th, meaning that the subsequent sequence inputted does not come from but rather .
This example demonstrates, that in general the time it takes for the first occurrence of as a subsequence of .
Finally, to describe the Brutomolecule , we need to describe how exactly its randomised stream of molecules is built up. The sequence constitute iid (independent identically distributed) -valued random variables with for all , where is a fixed distribution, which the Brutomolecule determines. And again, denotes the corresponding infinite sequence .
Task. Determine a general expression for for and enter its value under the setup with distribution , and for .
Problem continues in Part II .
This problem is sort of motivated by brilliant.org ’s exercise but takes things in a completely different direction.
This section requires Javascript.
You are seeing this because something didn't load right. We suggest you, (a) try
refreshing the page, (b) enabling javascript if it is disabled on your browser and,
finally, (c)
loading the
non-javascript version of this page
. We're sorry about the hassle.
For any finite sequence t ∈ Σ < ω let p ( t ) : = P [ X ⊃ t ] the probability that the random sequence X starts with t . Given the distribution of the X n , it is easy to see that p ( t ) = ∏ i < ∣ t ∣ p ( t ( i ) ) .
Now fix s ∈ { H , T } < ω . Given any fixed value X = x ∈ Σ ω there are ∣ s ∣ + 1 mutually exclusive possibilities:
Under case n < ∣ s ∣ , the input is rejected at position n + 1 and the sequence B ← n + 1 x awaits being inputted, where B is the backwards-shift operator, so T s ( x ) = n + 1 + T s ( B ← n + 1 x ) . Under case n = ∣ s ∣ , the first ∣ s ∣ molecules are accepted, so T s ( x ) = ∣ s ∣ .
It follows that
[ m c ] r c l E [ T s ] = = ∑ n < ∣ s ∣ P [ Case $n$ ] ⋅ E [ T s ∣ Case $n$ ] + P [ Case $|s|$ ] ⋅ E [ T s ∣ Case $|s|$ ] ∑ n < ∣ s ∣ ( p ( s ∣ n ) − p ( s ∣ ( n + 1 ) ) ) ⋅ ( n + 1 + E [ T s ( B ← n + 1 ( X ) ) ∣ Case $n$ ] ) + p ( s ) ⋅ ∣ s ∣
Since the X n are iid, it follows that the process X is memoryless and, in terms of its distribution, shift-invariant. Thus E [ T s ( B ← n + 1 ( X ) ) ∣ Case $n$ ] = E [ T s ( B ← n + 1 ( X ) ) ] = E [ T s ( X ) ] = E [ T s ] . This yields
[ m c ] r c l E [ T s ] = = = = p ( s ) ⋅ ∣ s ∣ + ∑ n < ∣ s ∣ ( p ( s ∣ n ) − p ( s ∣ ( n + 1 ) ) ) ⋅ ( n + 1 + E [ T s ] ) p ( s ) ∣ s ∣ + ∑ n < ∣ s ∣ n p ( s ∣ n ) − ( n + 1 ) p ( s ∣ ( n + 1 ) ) + ∑ n < ∣ s ∣ p ( s ∣ n ) + ∑ n < ∣ s ∣ ( p ( s ∣ n ) − p ( s ∣ ( n + 1 ) ) ) E [ T s ] p ( s ) ∣ s ∣ + 0 p ( s ∣ 0 ) − ∣ s ∣ p ( s ∣ ∣ s ∣ ) + ∑ n < ∣ s ∣ p ( s ∣ n ) + ( p ( s ∣ 0 ) − p ( s ∣ ∣ s ∣ ) ) E [ T s ] ∑ n < ∣ s ∣ p ( s ∣ n ) + ( 1 − p ( s ) ) E [ T s ]
Rearranging yields the general expression:
With the concrete inputs, one has under the setup Σ = { H , T } with distribution p ( H ) = 3 1 , p ( T ) = 3 2 and for s = T T H T H H T :