How to check homeomorphic embedding relation programmatically?

Question

This is a follow up to this question and Deedlit's answer.

I'm looking for a precise definition of the "hem?" (tree A homeomorphically embeddable in tree B?) relation, preferably in terms of a runnable program or function in some programming language, that accepts two trees (see below) as arguments and returns either true or false.

I've found many definitions on the web but no actual code so far. Deedlit gives the following definition:

Given two trees, S and T, we use the following comparison algorithm. First check, inductively, if S is less than or equal to any of the immediate subtrees of T; if so, then S < T. Similarly, check if T is less than or equal to any of the immediate subtrees of S; if so, then T < S. If neither of those checks apply, then compare the number of children of the root of S to the number of children of the root of T; the tree with the larger number is greater. Finally, if the roots of S and T have the same number of children, compare the immediate subtrees of S and T one by one, starting from the smallest pair, then going to the second smallest pair, etc. The first time you find two different immediate subtrees, the greater of the two will belong to the greater original tree.

Not sure if I understand that correctly. An actual implementation would be nice.

For comparison, one might define the well known "substring?" relation which is a well-quasi-order on strings as follows (using clojure but I don't really care about choice of programming language, they are mostly equivalent)

(defn substring? [x y]
 (cond 
    (empty? x) true
    (empty? y) false
    :else (recur
           (if (= (first x) (first y))
             (rest x) x)
           (rest y))))

This definition works if x and y are strings or sequences of characters.

Let's say we limit ourselves to trees with 2 labels a and b, represented by parens () and brackets [] just like in Deedlit's answer. First, we would probably parse the strings into trees of nested vectors. Using instaparse, the following grammar seems to work:

(def two-label-tree-grammar
  "S = (a | b)
   a = <'('> (a | b)* <')'>
   b = <'['> (a | b)* <']'>")

(def two-label-tree-parser
  (insta/parser two-label-tree-grammar))

Then (two-label-tree-parser "([()([][])])") evaluates to [:S [:a [:b [:a] [:a [:b] [:b]]]]] (the generic root node :S can be ignored).

How would one programmatically check the "hem?" relation on such nested vectors? There's a wikia page that mentions a "reducibility" relation which might be easier to handle. There seems to be a claim that (hem? A B) and (reducible? B A) are equivalent, but I could not find a proof of the equivalence, or a programmatic definition of the reducible? relation.

The problem is that a good answer to your question would be a solid master's thesis for any school in the US. I'll just comment that the wikia page is wrong for multiple reasons, one of which is that a given tree has multiple different representations as a rooted tree, so two trees can be isomorphic yet have their encoded strings not be the same. — Matt Samuel, Jan 07 '15 at 02:25

the gods from engineering · Accepted Answer · 2015-02-26T22:40:30.983

Note: the first part of this answer assumed unlabeled trees, because the word label doesn't appear in the question until the example, and even then it wasn't clear if it is essentail. However, after looking at the the TREE(3) problem, that assumes labelled trees.

The subgraph homeomorphism problem for unlabeled trees

Although I can't say I've read all the directly and indirectly referenced material (which was the proverbial TL;DR for me), you seem to be asking about subgraph (or more precisely subtree) homeomorphism algorithms.

Subgraph homeomorphism is NP-complete, and you can obviously implement a brute force method to check it. For trees however, there are polynomial-time algorithms e.g. "O(n2.5) time algorithms for the subgraph homeomorphism problem on trees" by Moon Jung Chung; the paper has algorithms for both the rooted and unrooted tree flavors.

Such algorithms are not simple/short enough to paste here in any self-contained way, even with making calls to high-level procedures like solving a bipartite matching there's about a page of pseudocode for the simper, rooted case; and without the accompanying proof (that goes on for another page or so), it's not obvious why it even works.

The subgraph homeomorphism problem for labeled trees

In CS, the problem of finding a label-preserving homeomorphic subtree is called finding the maximum agreement subtree (abbreviated MAST); for further details see http://arxiv.org/pdf/cs/0101010.pdf (Note that the term maximum agreement subtree sometimes refers only to the problem as applied to the so-called evolutionary trees, in which only the leaves can have distinct labels; the paper I've linked to however treats the problem in full generality, with no restriction on the labeling of any nodes.)

There are some bioinformatics applications for labelled subtree homeomorphism, e.g. to philogenetic trees so you may have some luck finding working code in some program for those, but it's not an area I know much about (and generally asking for code on M.SE is probably not what the site is for.)

How to check homeomorphic embedding relation programmatically?

1 Answers1

The subgraph homeomorphism problem for unlabeled trees

The subgraph homeomorphism problem for labeled trees