Longest Repeating Substring

We define the score of a string x x as the length of longest substring of x x which appears more than one times in x x . (The substrings may overlap)

Let S S be the sum of the scores of all the strings in this file . Find S S

Click here to open the file.


The answer is 11251.

This section requires Javascript.
You are seeing this because something didn't load right. We suggest you, (a) try refreshing the page, (b) enabling javascript if it is disabled on your browser and, finally, (c) loading the non-javascript version of this page . We're sorry about the hassle.

3 solutions

Thaddeus Abiy
Jul 7, 2015

The solution below runs O ( n 2 ) O(n^2) on the average case. I wonder if it is possible to prove the existence of an algorithm that runs at atleast O ( n l o g ( n ) ) O(n log(n)) .

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
def score( string ):
    visited = set()
    Max = 0
    for i in range(len( string ) + 1):
        for j in range(i + Max , len( string ) + 1):
            subs = string[ i : j ]            
            if subs in visited:
                Max = max( j - i , Max )
            visited.add(subs)
    return Max

Pranjal Jain
Jul 6, 2015

I'm just defining the functions here (Python 3.4)

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
from collections import defaultdict
def getsubs(loc, s):
    substr = s[loc:]
    i = -1
    while(substr):
        yield substr
        substr = s[loc:i]
        i -= 1
def longestRepetitiveSubstring(r, minocc=2): # minocc is minimum occurence
    occ = defaultdict(int)
    # tally all occurrences of all substrings
    for i in range(len(r)):
        for sub in getsubs(i,r):
            occ[sub] += 1
    # filter out all substrings with fewer than minocc occurrences
    occ_minocc = [k for k,v in occ.items() if v >= minocc]
    if occ_minocc:
        maxkey =  max(occ_minocc, key=len)
        return len(maxkey)
    else:
        return 0

Roel Baars
Jul 6, 2015

Mathematica can be very useful here to work relatively quickly with lists. This is highly unoptimized ;-)

1
2
3
4
5
6
7
8
9
data = Import["bp.txt", "List"];
score[txt_] :=
 (For[n = StringLength[txt]/2, n >= 1, n--,
   allsubsets = StringPartition[txt, n, 1];
   count = Tally[allsubsets];
   morethanone = Select[count, #[[2]] > 1 &];
   If[Length[morethanone] > 0, Return[n]]
   ]; Return[0];)
Sum[score[data[[i]]], {i, 1, Length[data]}]

0 pending reports

×

Problem Loading...

Note Loading...

Set Loading...