5

For what size alphabet does it take longer to construct a suffix tree - for a really small alphabet size (because it has to go deep into the tree) or for a large alphabet size? Or is it dependent on the algorithm you use? If it is dependent, how does the alphabet size affect Ukkonen's algorithm?

Raphael
  • 72,336
  • 29
  • 179
  • 389
John Smith
  • 51
  • 1

1 Answers1

6

A larger alphabet is usually a drawback. However there are algorithms that can deal with this as long as the alphabet size is $n^{O(1)}$.

Ukkonen's algorithm runs only in $O(n)$ if the alphabet size is a constant but it is $O(n \log n)$ without this assumption. However, there are alternatives. You can compute the suffix-array of a text in linear time with the DC-3 Algorithm. This is a super-cool fancy algorithm that can be implemented in 50 lines of readable C++ code - one of my all-time favorites. If you can compare two characters in constant time and the alphabet size is $n^{O(1)}$, then the DC3 algorithm runs in $O(n)$ time.

Notice that you can get the suffix tree out of the suffix array in $O(n)$ time, when you have the LCP-array. Basically, you compute the Cartesian tree of the LCP-array and use the suffix-array to label the nodes. The LCP-array can be also computed with the DC3-algorithm.

A.Schulz
  • 12,167
  • 1
  • 40
  • 63
  • +1 for answering the question and also mentioning suffix arrays, and the super-cool fancy algorithm! – Paresh Nov 23 '12 at 09:35