Could anyone recommend a good similarity measure for objects which have multiple classes, where each class is part of a hierarchy?
For example, let's say the classes look like:
1 Produce
1.1 Eggs
1.1.1 Duck eggs
1.1.2 Chicken eggs
1.2 Milk
1.2.1 Cow milk
1.2.2 Goat milk
2 Baked goods
2.1 Cakes
2.1.1 Cheesecake
2.1.2 Chocolate
An object might be tagged with items from the above at any level, e.g.:
Omelette: eggs, milk (1.1, 1.2)
Duck egg omelette: duck eggs, milk (1.1.1, 1.2)
Goat milk chocolate cheesecake: goat milk, cheesecake, chocolate (1.2.2, 2.1.1, 2.1.2)
Beef: produce (1)
If the classes weren't part of a hierarchy, I'd probably I'd look at cosine similarity (or equivalent) between classes assigned to an object, but I'd like to use the fact that different classes with the same parents also have some similarity value (e.g. in the example above, beef has some small similarity to omelette, since they both have items from the class '1 produce').
If it helps, the hierarchy has ~200k classes, with a maximum depth of 5.