2

I would like to construct a length $N$ string over a $k$-letter alphabet, $S$, such that any substring of $P$ sequential characters in $S$ is unique for as small a value of $P$ as possible. To clarify, "unique" means that the fixed-length substring or its transpose occurs at most once in $S$.

For fixed $N$ and $k$, what is the smallest possible value of $P$, and how do I construct $S$ in such a manner as to minimize $P$?

H.H.
  • 33
  • I can't parse this question. I don't know what it means for a substring to be unique - I suppose you mean you want to minimize the value of $P$ such that no substring of length $P$ appears twice? I don't know what "between any substring" means. Usually, "between" takes two objects - between such-and-such and somehting else. Also, you're trying to minimize one thing while at the same time maximizing something else, which seems like asking the impossible. Maybe an example or three would clarify things. – Gerry Myerson Sep 06 '11 at 03:38
  • @Gerry, "Also, you're trying to minimize one thing while at the same time maximizing something else, which seems like asking the impossible." - fair criticism, I've removed the optimization component of the problem. As for "between any substring" I mean any pair of fixed-length arbitrarily selected substrings in 'S'. Hopefully I clarified in the above problem what I meant by "unique". – H.H. Sep 06 '11 at 16:22
  • This might be relevant: http://en.wikipedia.org/wiki/De_Bruijn_sequence – leonbloy Sep 06 '11 at 17:28
  • What is the "transpose" of a string? – Gerry Myerson Sep 07 '11 at 01:59
  • Perhaps http://math.stackexchange.com/questions/15510/what-is-the-shortest-string-that-contains-all-permutations-of-an-alphabet is of interest. – Gerry Myerson Sep 07 '11 at 02:01

1 Answers1

1

If P is odd, there are $\frac{k^P}{2}+\frac{k^\left({\frac{P+1}{2}}\right)}{2}$ distinct substrings (avoiding transpositions). If P is even, there are $\frac{k^P}{2}+\frac{k^\left({\frac{P}{2}}\right)}{2}$. $N$ must be at most this plus $P-1$. Constructing such a string shouldn't be hard, but I don't have an algorithm. Maybe this naive algorithm will work-append a character that doesn't produce a duplicate substring and within those, the one used least so far.

Ross Millikan
  • 374,822
  • thanks for your answer! My main interest here, and the difficulty I'm having, is in finding an algorithm that is guaranteed to construct optimal 'S'... – H.H. Sep 07 '11 at 05:46