I want to detect attributes of objects in an image - like what is color of a patch on shirt of person, how many patches are there, type of objects, exact dimensions of the objects etc
I've heard of Blip2 but I'm not sure if this will do what I need above. Can someone suggest if Blip2 is right model or some other model is better for such metadata detection?