I have a dataset with a big number of features (it is coming from a document-oriented database). 80% of the features are 80% empty and filled in only in specific conditions. Let me give an example with an animals dataset :
- Number_of_paws is filled for every animal where living_place = ground,
- Coat_color is filled for every animal where aspect = coat,
- Depth_in_water is filled only for fishs, ...
How can I determine, for a new unknown feature to what subset of data it is related? So imagine a feature %something_unknown that is empty 98% of the time and I want to discover that this feature is only filled in when Animal_color = Red and Animal_type = fish.
I would say that it is related to subset analysis. How should one proceed to solve this problem ?
number_of_paws
has co-feature theliving_place=ground
and so on. Then once you have a map of features to co-features you can do the inverse process – Nikos M. Feb 20 '21 at 17:33