Brutomolecule: hacking biological defences (Part II)

Probability Level 5

$\underline{\text{Asymptotic growth of Breach-time under larger keys.}}$

Recall from Part I that $p$ denotes the distribution of letters in $\Sigma$ , according to which the Brutomolecule produces its sequence of letters. Given a key $s\in\Sigma^{<\omega}$ , we defined $T_{s}$ (which we shall here denote $T_{s;~p}$ ) to be the successful breaching time for the Brutomolecule.

We extend the scenario of Part I as follows. Suppose all life-forms in a solar system deploy keys stemming from a universal signature known only to them. This signature is a very long (idealised: infinite) sequence $x\in\Sigma^{\omega}$ . In the course of evolution in a solar system the biological security of cells gets harder. The keys are all of the form $s=x\vert n$ , where $n$ steadily grows (say from life-form to life form). If the Brutomolecule persists with its blind attack, its corresponding average breaching times, $\mathbb{E}[T_{x|n;~p}]$ will steadily grow. In this problem we’re going to investing this growth.

Task 1. Suppose $p$ is non-trivial , that is, $0<p(\alpha)<1$ for all $\alpha\in\Sigma$ . Now, it can be shown that $\mathbb{E}[T_{x|n;~p}]$ does not grow linearly in $n$ . Show that in fact the sequence $\mathbb{E}[T_{x|n;~p}]$ exhibits a form of exponential growth.

Task 2. Under what condition (necessary and sufficient) does $h(x;p):=\lim_{n\rightarrow\infty}\frac{\log(\mathbb{E}[T_{x|n;~p}])}{n}$ converge under every choice of $p$ ? Determine this expression explicitly!

Task 3. Consider the alphabet $\Sigma=\{0,1\}$ . For which of the following sequences is it not the case that $h(x;~p)$ exists for every $p$ ?

$x=001001001001001\ldots$ ;
$x=001\,010\,100\,001\,010\,010\,100\ldots$ , where blocks of the form $001,010,100$ are randomly strewn together;
$x=0^{n_{0}}1^{n_{1}}0^{n_{2}}1^{n_{3}}\ldots$ , where $n_{i}\in\{1,2,\ldots\}$ satisfying $\dfrac{\sum_{j<i}n_{j}}{n_{i}}\longrightarrow 0$ ;
a generic randomly chosen $x$ in $\Sigma^{\omega}$ , where the letters in $x$ are successively chosen independently via a distribution $\mu:\Sigma\longrightarrow[0,1]$

Please note. This is mathematics, so $\log$ is to base $e$ and not base $10$ . The problem continues in Part III .

none:

h(x)

exists in all cases. 3 4 1, 3. 2, 4. 2 1

1 solution

R Mathe
Jun 3, 2018

The expression for $\mathbb{E}[T_{s;~p}]$ , $s\in\Sigma^{<\omega}$ as per Part I was as follows:

$\mathbb{E}[T_{s;~p}] = \dfrac{\sum_{n<|s|}p(s\vert n)}{p(s)}$

where $p(t)=\prod_{i<|t|}p(t(i))$ for all $i<|t|$ and $t\in\Sigma^{<\omega}$ . From this one has

$\dfrac{1}{p(s)}\leq\mathbb{E}[T_{s;~p}]\leq\dfrac{|s|}{p(s)}$

Taking logarithms yields

$\left|\dfrac{\log(\mathbb{E}[T_{s;~p}])}{|s|}-\frac{1}{|s|}\sum_{i<|s|}-\log(p(s(i)))\right| \leq \frac{\log(|s|)}{|s|}$

Setting $p_{i}:=p(x(i))$ for all $i\in\omega$ for the signature $x\in\Sigma^{\omega}$ yields the simpler expression

$\left|\dfrac{\log(\mathbb{E}[T_{x\vert n;~p}])}{n}-\frac{1}{n}\sum_{i<n}-\log(p_{i})\right| \leq \frac{\log(n)}{n}$

Since $\frac{\log(n)}{n}\longrightarrow 0$ for $n\longrightarrow\infty$ , we have that the asymptotic behaviour of $\dfrac{\log(\mathbb{E}[T_{x\vert n}])}{n}$ is equivalent to the asymptotic behaviour of the sequence of averages $-\frac{1}{n}\sum_{i<n}\log(p_{i})$ . Since each $p_{i}>0$ and there are only finitely many values, these averages are bounded.

Hence we have completed Task 1 : the development of $\mathbb{E}[T_{x\vert n;~p}]$ is asymptotically bounded between $e^{L_{\min}n}$ and $e^{L_{\max}n}$ , where $L_{\min}=\min\{-\log(p(\alpha)):\alpha\in\Sigma\}>0$ , since for non-trivial $p$ all $p(\alpha)<1$ and $L_{\max}=\max\{-\log(p(\alpha)):\alpha\in\Sigma\}<\infty$ , since for non-trivial $p$ all $p(\alpha)>0$ .

Towards Task 2 , by the above it is necessary and sufficient to investigate when $-\frac{1}{n}\sum_{i<n}\log(p_{i})$ converges. Reshaping this expression yields

$-\frac{1}{n}\sum_{i<n}\log(p_{i}) = -\sum_{\alpha\in\Sigma}\frac{|\{i<n:x(i)=\alpha\}|}{n}\log(p(\alpha)) = -\sum_{\alpha\in\Sigma}\frac{|A_{\alpha}\cap[0,n)|}{n}\log(p(\alpha))$

where $A_{\alpha}:=\{i\in\omega\mid x(i)=\alpha\}$ . It is thus clear, that a sufficient condition for convergence, is:

$(\star)\ldots$ $\frac{|A_{\alpha}\cap[0,n)|}{n}$ converges for all $\alpha\in\Sigma$ , that is, the sets $A_{\alpha}=\{i\in\omega\mid x(i)=\alpha\}$ are sets of asymptotic density , or in other words, each letter has asymptotic frequency in the signature $x$ .

Under this condition one has convergence to asymptotic densities $\frac{|A_{\alpha}\cap[0,n)|}{n}\longrightarrow d(A_{\alpha})$ . The explicit expression for $h(x;~p)$ under this condition is thus given by

$h(x;~p)=-\lim_{n\rightarrow\infty}\sum_{\alpha\in\Sigma}\frac{|A_{\alpha}\cap[0,n)|}{n}\log(p(\alpha)) =-\sum_{\alpha\in\Sigma}\lim_{n\rightarrow\infty}\frac{|A_{\alpha}\cap[0,n)|}{n}\cdot\log(p(\alpha)) =-\sum_{\alpha\in\Sigma}d(A_{\alpha})\cdot log(p(\alpha))$

Necessary condition. Is condition $(\star)$ necessary for $h(x;~p)$ to exists for all distributions $p$ ? In fact, we will show a stronger statement: we will show that the existence of $h(x;~p)$ for all non-trivial distributions $p$ implies that the $A_{\alpha}$ are density sets. Fix any $\beta\in\Sigma$ . Let $p$ be a (non-trivial) distribution. Then since for all $n>0$

$-\frac{|A_{\beta}\cap[0,n)|}{n}\log(p(\beta))+-\sum_{\alpha\in\Sigma\setminus\{\beta\}}0\cdot\log(p(\alpha)) \leq -\sum_{\alpha\in\Sigma}\frac{|A_{\alpha}\cap[0,n)|}{n}\log(p(\alpha)) \leq -\frac{|A_{\beta}\cap[0,n)|}{n}\log(p(\beta))+-\sum_{\alpha\in\Sigma\setminus\{\beta\}}1\cdot\log(p(\alpha))$

and $-\sum_{\alpha\in\Sigma}\frac{|A_{\alpha}\cap[0,n)|}{n}\log(p(\alpha))\longrightarrow h(x;~p)$ , taking the limes inferior of the above upper bound and the limes superior of the lower bound yields

$-d^{+}(A_{\beta})\log(p(\beta))+0\leq h(x;p)\leq -d^{-}(A_{\beta})\log(p(\beta))+-\sum_{\alpha\in\Sigma\setminus\{\beta\}}\log(p(\alpha))$

where $d^{+}(C):=\limsup_{n}\frac{|C\cap[0,n)|}{n}\in[0,1]$ and $d^{-}(C):=\liminf_{n}\frac{|C\cap[0,n)|}{n}\in[0,1]$ for any set $C\subseteq\omega$ . Since $L(\cdot)>0$ pointwise for non-trivial $p$ , it follows that

$d^{+}(A_{\beta})-d^{-}(A_{\beta})\leq \inf_{p~\text{non-trivial}}\dfrac{-\sum_{\alpha\in\Sigma\setminus\{\beta\}}\log(p(\alpha))}{-\log(p(\beta))}\\$

To show that $A_{\beta}$ is a set of density, ie that $d(A_{\beta})=\lim_{n}\frac{|A_{\beta}\cap[0,n)|}{n}$ exists, it suffices to show that $d^{+}(A_{\beta})-d^{-}(A_{\beta})\leq 0$ . To do this, by the above inequality, it suffices to show that $\frac{\sum_{\alpha\in\Sigma\setminus\{\beta\}}\log(p(\alpha))}{\log(p(\beta))}$ can be made arbitrarily small for non-trivial distributions $p$ . This is doable as follows:

Let $N:=|\Sigma|-1\geq 1$ . For any $q\in(0,\frac{1}{N}]$ , setting $p(\beta)=q$ and $p(\alpha)=\frac{1-q}{N}$ for $\alpha\in\Sigma\setminus\{\beta\}$ yields a non-trivial distribution with $\frac{\sum_{\alpha\in\Sigma\setminus\{\beta\}}\log(p(\alpha))}{\log(p(\beta))}=\frac{N\log(\frac{1-q}{N})}{\log(q)}$ .
Letting $q\longrightarrow 0^{+}$ yields $\frac{N\log(\frac{1-q}{N})}{\log(q)}\longrightarrow 0$ .

Thus it follows that $\inf_{p~\text{non-trivial}}\frac{\sum_{\alpha\in\Sigma\setminus\{\beta\}}\log(p(\alpha))}{\log(p(\beta))}=0$ . This completes the proof of the implication: if $h(x;~p)$ exists for all (non-trivial) distributions $p$ , it is necessary, that all $A_{\alpha}$ be sets of density. This completes Task 2.

Towards Task 3 , one may rely on Task 2. Since we only have two letters in our alphabet, it suffices to check if $A_{1}=\{i\in\omega\mid x(i)=1\}$ is a set of density. For 1 and 2 this is clear: they are of density $1/3$ . For 4 , it holds (via the law of large number or simply ergodic theory), that almost every randomly chosen sequence under in a product probability space $\prod_{n\in\mathbb{N}}(\Sigma,\mu)$ is normal: each digits occurs in a set of density. For 3 , by contrast it is easy to find two subsequences, for which $\frac{|A_{1}\cap[0,n)|}{n}\longrightarrow 0$ and $\frac{|A_{1}\cap[0,n)|}{n}\longrightarrow 1$ respectively. ${}^{\color{#D61F06}{\dagger}}$ Hence 3 is the only case under which $h(x;~p)$ does not converge.

${}^{\color{#D61F06}{\dagger}}$ Subsequences converging to $0$ resp. $1$ : Let

$\begin{array}{c}[mc]{rclrcl} N_{0}(i) &:= &\sum_{k\leq 2i}n_{k} \quad &J_{0}&:=&\{N_{0}(i)\mid i\in\mathbb{N}\},~\text{and}\\ N_{1}(i) &:= &\sum_{k\leq 2i+1}n_{k} \quad &J_{1}&:=&\{N_{1}(i)\mid i\in\mathbb{N}\}.\\ \end{array}$

Now consider the subsequences $(\frac{|A_{1}\cap[0,n)|}{n})_{n\in J_{0}}$ and $(\frac{|A\cap[0,n)|}{n})_{n\in J_{1}}$ . By the condition in the problem, it holds that

$\dfrac{n_{i}}{\sum_{k\leq i}n_{k}} =\dfrac{1}{1+\frac{1}{n_{i}}\sum_{k<i}n_{k}} \longrightarrow \frac{1}{1+0}=1\\$

Appealing to this, one has under the above setup

$\begin{array}{c}[mc]{rcccccl} \frac{|A_{1}\cap[0,N_{0}(i))|}{N_{0}(i)} &= &\frac{\sum_{k<i}n_{2k+1}}{\sum_{k\leq 2i}n_{k}} &\leq &1-\frac{n_{2i}}{\sum_{k\leq 2i}n_{k}} &\longrightarrow &1-1=0\\ \frac{|A_{1}\cap[0,N_{1}(i))|}{N_{1}(i)} &= &\frac{\sum_{k\leq i}n_{2k+1}}{\sum_{k\leq 2i+1}n_{k}} &\geq &\frac{n_{2i+1}}{\sum_{k\leq 2i+1}n_{k}} &\longrightarrow &1\\ \end{array}$

It follows that $\frac{|A_{1}\cap[0,N_{0}(i))|}{N_{0}(i)}\longrightarrow 0$ and $\frac{|A_{1}\cap[0,N_{1}(i))|}{N_{1}(i)}\longrightarrow 1$ .

Brutomolecule: hacking biological defences (Part II)

1 solution

0 pending reports