Short answer: visibility determination occurs both before and after the projection to screen space. Shading is generally done afterward.
Longer answer:
Visibility determination in a typical rendering engine takes place at several different levels of granularity: object, triangle, and pixel. These occur in different parts of the pipeline.
First, on the CPU, the engine will typically use a scene graph to cull entire objects that it determines are not visible from the current camera position. This involves frustum culling (getting rid of offscreen objects) and occlusion culling (getting rid of objects entirely hidden behind other opaque objects). Both of these act to reduce the number of draw calls sent to the GPU. Each mesh will be either drawn in its entirety if any part of it is visible, or not drawn at all. (By "drawn" here I mean the engine queues up a draw call for it and sends it to the GPU.)
When the GPU processes a mesh, it starts running the vertex shader on each vertex in it. This transforms the vertex into screen space. Note that screen space is still 3D. The XY coordinates of a vertex in screen space determine the pixel coordinates on screen, but the Z depth is also kept so that we can do per-pixel visibility later. If we threw away the Z at this point we wouldn't be able to tell which triangles should appear in front of which other triangles.
Once vertices start coming out of the vertex shader, the GPU assembles them into triangles and any triangles that fall entirely offscreen are thrown out. Triangles are also clipped when necessary. So this is the triangle-level visibility. After this the remaining triangles are passed to the rasterizer, which determines which pixels are covered by a triangle and runs the pixel shader on each of those.
The pixel shader typically does most of the work of calculating lighting and shading. After the pixel shader is done*, the result is stored to the framebuffer. However, it is only stored if the Z depth of the pixel (interpolated from the Z depths of the triangle's vertices) is closer to the camera than the Z depth already stored in the depth buffer at that pixel. So this is per-pixel visibility and is the finest-grained level of visibility determination.
*Footnote: actually, depth testing can often be done before the pixel shader runs, which saves a lot of time by not shading invisible pixels.