From this, by counting pixels I think I was able to "calculate" all the horizontal dimensions.
As you have a projectively deformed image, I'd suggest you don't use mere pixel counts, but instead use cross ratios to measure distances in a projectively accurate way. See e.g. this post of mine for an example.
But for the vertical dimensions, I'm not sure how I'd do it since they're different distances from the observer.
This is indeed tricky, as I argues in a similar situation. At least if you start out with “the camera may cause any projective transformation”, then having measurements in a single plane, or even a family of parallel planes, tells you nothing about the directions outside that family of parallel planes.
In a real world scenario, you may be able to add additional assumptions, like e.g. assuming that the center of the picture coincides with the axis of the camera, and that the sensor in the camera was perpendicular to that axis. Neither of these is obviously true for professional photos, though. Personally I'd say you need at least one non-horizontal measurement.