Brutomolecule: hacking biological defences (Part I)

Breach-times: randomly guessing finite sequences. \underline{\text{Breach-times: randomly guessing finite sequences.}}

Fix a finite alphabet, Σ \Sigma , with Σ > 1 |\Sigma|>1 , such as Σ = { G , C , A , T } \Sigma=\{G,C,A,T\} or { H , T } \{H,T\} or { a , b , c , , z } \{a,b,c,\ldots,z\} or the ascii-characters, etc. Let s { H , T } < ω s\in\{H,T\}^{<\omega} be a finite sequence . Consider now a fictional universe. In this universe biological organisms guard access to their cells via gateways. These gateways operate like a finite automata and require input of the right key, s s . To enter the key, a steady stream of molecules encoding letters in Σ \Sigma are fed in one at a time. The automata is built such that as long as the input-sequence is a strict initial segment of s s , the gateway remains in a passive expecting mode. Should one letter be wrong, the input string of molecules is destroyed ( rejected ) including the bad letter, and the input sequence starts from scratch with the next incoming molecules. If, on the other hand, the sequence, s s , is achieved, the gateway transitions to an accepted state and access to the cell is granted.

Enter now the Brutomolecule , a life-form trying to access organisms in distant solar systems. It is an entity, which sends signals into the far reaches of the universe and tries to breach any organisms it encounters by inducing and transmitting a stead stream of randomised molecules X = X 0 X 1 X 2 Σ ω X=X_{0}X_{1}X_{2}\ldots\in\Sigma^{\omega} . Given any fixed organism, let T s ( X ) T_{s}(X) or simply T s T_{s} denote the breaching time : the ‘time’ it takes to gain entrance through a gateway with key s Σ < ω s\in\Sigma^{<\omega} . Working with the above definitions, here are some simple examples with the alphabet Σ = { G , C , A , T } \Sigma=\{G,C,A,T\} and the key s = T T G s=TTG :

  • if X = T T G H T T T T G X=\overline{TTG}HTTTTG\ldots , then T s = 3 T_{s}=3 ;
  • if X = T T T T T G G T T T T G X=TTT\overline{TTG}GTTTTG\ldots , then T s = 6 T_{s}=6 .
  • if X = G G G G G T T G T T X=GGGGG\overline{TTG}TT\ldots , then T s = 8 T_{s}=8 ;

Here is a more involved example demonstrating the mercilessness of rejection. Let s = T A T T A T T T s=TATTATTT :

  • if X = G G G G T A T T T A T T A T T T G G G G T A T T A T T T G A X=GGGGTATTTATTATTTGGGG\overline{TATTATTT}GA\ldots , then T s = 29 T_{s}=29 .

Note, that it does not hold, that T s = 17 T_{s}=17 , since the highlighted subsequence G G G G T A T T T A T T A T T T G A GGGGTATT\overline{TATTATTT}GA\ldots is never inputted into the gateway. This in turn is the case, as the sequence G G G G T A T T T A T T A T T T G A GGGG\overline{TATTT}ATTATTTGA\ldots , which overlaps with the first occurrence of s s , is rejected upon the 9th letter and not the 8th, meaning that the subsequent sequence inputted does not come from T A T T A T T T G A TATTATTTGA\ldots but rather A T T A T T T G A ATTATTTGA\ldots .

This example demonstrates, that in general T s ≢ T_{s}\not\equiv the time it takes for the first occurrence of s s as a subsequence of X X .

Finally, to describe the Brutomolecule , we need to describe how exactly its randomised stream of molecules is built up. The sequence X 0 , X 1 , X_{0}, X_{1},\ldots constitute iid (independent identically distributed) Σ \Sigma -valued random variables with P [ X n = α ] = p ( α ) \mathbb{P}[X_{n}=\alpha]=p(\alpha) for all α Σ \alpha\in\Sigma , where p : Σ [ 0 , 1 ] p:\Sigma\longrightarrow[0,1] is a fixed distribution, which the Brutomolecule determines. And again, X X denotes the corresponding infinite sequence X : = X 0 X 1 X 2 Σ ω X:=X_{0}X_{1}X_{2}\ldots\in\Sigma^{\omega} .

Task. Determine a general expression for E [ T s ] \mathbb{E}[T_{s}] for s Σ < ω s\in\Sigma^{<\omega} and enter its value under the setup Σ = { H , T } \Sigma=\{H,T\} with distribution p ( H ) = 1 3 p(H)=\frac{1}{3} , p ( T ) = 2 3 p(T)=\frac{2}{3} and for s = T T H T H H T s=TTHTHHT .


Problem continues in Part II .

This problem is sort of motivated by brilliant.org ’s exercise but takes things in a completely different direction.


The answer is 328.3125.

This section requires Javascript.
You are seeing this because something didn't load right. We suggest you, (a) try refreshing the page, (b) enabling javascript if it is disabled on your browser and, finally, (c) loading the non-javascript version of this page . We're sorry about the hassle.

1 solution

R Mathe
Jun 3, 2018

For any finite sequence t Σ < ω t\in\Sigma^{<\omega} let p ( t ) : = P [ X t ] p(t):=\mathbb{P}[X\supset t] the probability that the random sequence X X starts with t t . Given the distribution of the X n X_{n} , it is easy to see that p ( t ) = i < t p ( t ( i ) ) p(t)=\prod_{i<|t|}p(t(i)) .

Now fix s { H , T } < ω s\in\{H,T\}^{<\omega} . Given any fixed value X = x Σ ω X=x\in\Sigma^{\omega} there are s + 1 |s|+1 mutually exclusive possibilities:

  • Case n n , with 0 n < s 0\leq n<|s| : x s n x\supseteq s\vert n and x s ( n + 1 ) x\nsupseteq s\vert(n+1) . This case occurs with probability P [ X s n ] P [ X s ( n + 1 ) ] = p ( s n ) p ( s ( n + 1 ) ) \mathbb{P}[X\supseteq s\vert n]-\mathbb{P}[X\supseteq s\vert (n+1)]=p(s\vert n)-p(s\vert (n+1)) .
  • Case s |s| : x s x\supseteq s . This case occurs with probability P [ X s ] = p ( s ) \mathbb{P}[X\supseteq s]=p(s) .

Under case n < s n<|s| , the input is rejected at position n + 1 n+1 and the sequence B n + 1 x B_{\leftarrow}^{n+1}x awaits being inputted, where B B is the backwards-shift operator, so T s ( x ) = n + 1 + T s ( B n + 1 x ) T_{s}(x)=n+1+T_{s}(B_{\leftarrow}^{n+1}x) . Under case n = s n=|s| , the first s |s| molecules are accepted, so T s ( x ) = s T_{s}(x)=|s| .

It follows that

[ m c ] r c l E [ T s ] = n < s P [ Case $n$ ] E [ T s Case $n$ ] + P [ Case $|s|$ ] E [ T s Case $|s|$ ] = n < s ( p ( s n ) p ( s ( n + 1 ) ) ) ( n + 1 + E [ T s ( B n + 1 ( X ) ) Case $n$ ] ) + p ( s ) s \begin{array}{c}[mc]{rcl} \mathbb{E}[T_{s}] &= &\sum_{n<|s|}\mathbb{P}[\text{Case \$n\$}]\cdot\mathbb{E}[T_{s}\mid\text{Case \$n\$}] +\mathbb{P}[\text{Case \$|s|\$}]\cdot\mathbb{E}[T_{s}\mid\text{Case \$|s|\$}]\\ &= &\sum_{n<|s|}(p(s\vert n)-p(s\vert (n+1)))\cdot(n+1+\mathbb{E}[T_{s}(B_{\leftarrow}^{n+1}(X))\mid\text{Case \$n\$}]) +p(s)\cdot|s|\\ \end{array}

Since the X n X_{n} are iid, it follows that the process X X is memoryless and, in terms of its distribution, shift-invariant. Thus E [ T s ( B n + 1 ( X ) ) Case $n$ ] = E [ T s ( B n + 1 ( X ) ) ] = E [ T s ( X ) ] = E [ T s ] \mathbb{E}[T_{s}(B_{\leftarrow}^{n+1}(X))\mid\text{Case \$n\$}]=\mathbb{E}[T_{s}(B_{\leftarrow}^{n+1}(X))]=\mathbb{E}[T_{s}(X)]=\mathbb{E}[T_{s}] . This yields

[ m c ] r c l E [ T s ] = p ( s ) s + n < s ( p ( s n ) p ( s ( n + 1 ) ) ) ( n + 1 + E [ T s ] ) = p ( s ) s + n < s n p ( s n ) ( n + 1 ) p ( s ( n + 1 ) ) + n < s p ( s n ) + n < s ( p ( s n ) p ( s ( n + 1 ) ) ) E [ T s ] = p ( s ) s + 0 p ( s 0 ) s p ( s s ) + n < s p ( s n ) + ( p ( s 0 ) p ( s s ) ) E [ T s ] = n < s p ( s n ) + ( 1 p ( s ) ) E [ T s ] \begin{array}{c}[mc]{rcl} \mathbb{E}[T_{s}] &= &p(s)\cdot|s|+\sum_{n<|s|}(p(s\vert n)-p(s\vert (n+1)))\cdot(n+1+\mathbb{E}[T_{s}])\\ &= &p(s)|s|+\sum_{n<|s|}np(s\vert n)-(n+1)p(s\vert (n+1)) +\sum_{n<|s|}p(s\vert n) +\sum_{n<|s|}(p(s\vert n)-p(s\vert (n+1)))\mathbb{E}[T_{s}]\\ &= &p(s)|s|+0p(s\vert 0)-|s|p(s\vert |s|) +\sum_{n<|s|}p(s\vert n) +(p(s\vert 0)-p(s\vert |s|))\mathbb{E}[T_{s}]\\ &= &\sum_{n<|s|}p(s\vert n)+(1-p(s))\mathbb{E}[T_{s}]\\ \end{array}

Rearranging yields the general expression:

E [ T s ] = n < s p ( s n ) p ( s ) \mathbb{E}[T_{s}] = \dfrac{\sum_{n<|s|}p(s\vert n)}{p(s)}

With the concrete inputs, one has under the setup Σ = { H , T } \Sigma=\{H,T\} with distribution p ( H ) = 1 3 p(H)=\frac{1}{3} , p ( T ) = 2 3 p(T)=\frac{2}{3} and for s = T T H T H H T s=TTHTHHT :

1
2
3
4
5
6
pH=1/3;
pT=2/3;
ps = [pT,pT,pH,pT,pH,pH,pT]; % s=TTHTHHT
c = cumprod([1,ps]);
ExT = sum(c(1:end-1))/c(end);
disp(ExT); % —> expected time is 328.3125

0 pending reports

×

Problem Loading...

Note Loading...

Set Loading...