I'm fairly certain your problem lies in not handling so-called degenerate Bezier patches correctly, in particular (as Joojaa noted), the computation of the surface normal. I say "so-called degenerate" because, geometrically, the surfaces are often perfectly well behaved. It's just that some assumptions people frequently make regarding the parametric equations may not hold.
Books such as Gerald Farin's, snappily titled, "Curves and Surfaces for CAGD. A Practical Guide" will give more details, but I'll try to summarise two simple cases. Now assuming your Bezier is defined as $\bar{B}(u,v)$ the usual two causes of problems are:
Zero derivatives:
To compute the normal at $(a,b)$ one normally (pardon the pun) computes the two derivatives,
$\frac{\partial }{\partial v}\bar{B}(a,b)$ and $\frac{\partial }{\partial u}\bar{B}(a,b)$
or scaled versions thereof, to obtain two tangents and then take the cross product.
(Implementation note: Since we can use a scaled version of the tangent in the calculation, we really don't have to calculate the actual derivative. For example, especially at the corners, differences of control points can yield a scaled tangent. However for brevity in this discussion we will assume the actual derivatives).
A common occurrence, at least at the corners, is that the 1st partial derivatives at the location can be zero, which leads to an incorrect normal, i.e. a zero vector.
In the case of the tops and bottoms of the teapot, one whole edge of a number of the (bicubic) Bezier patches has been collapsed to a point, i.e. all 4 control points are the same, and thus, say
$\bar{B}(0,v)==\bar{B}(1,v)$ and $\frac{\partial }{\partial u}\bar{B}(0,0)=\bar{0}$.
The surface, however, is completely well behaved so you can simply choose another derivative that starts at that collapsed point. In this case, say, choosing
$\frac{\partial }{\partial v}\bar{B}(1,0)$
for the second tangent.
Having said this, you still have to check that your first derivatives are not zero for another reason (e.g 2 or 3 coincident control points), in which case, you can fall back (thanks to L'Hopital's rule) on the second (or if that's zero, even the third!) derivative(s) to obtain valid tangents.
Parallel tangents:
Another similar problem can arise if your two tangents, are parallel. - Farin has a good example in his book.
In this case, I think you may need to look at using something like $\frac{\partial^2 }{\partial u \partial v}\bar{B}$ or, possibly, just fall back to using a small UV offset to approximate a vector.