I think that fundamentally, the correct definition of tangent should say something about how good the function can be approximated by a linear function in a neighborhood around the point.
More precisely, we may say that the line defined by $y = Ax + B$ is a tangent to the function $f$ at a point $x_0$ if the following two conditions hold:
(1) $Ax_0 + B = f(x_0)$
(2) $f(x_0 + h) - f(x_0) - Ah = o(h)$
Condition (1) says that the tangent line should cross the graph of $f$ at the point $(x_0, f(x_0))$.
Condition (2) says that the error between $f$ and the linear approximation given by $Ax + B$ becomes arbitrarily small relative to the distance $h$ from $x_0$ (in both positive and negative directions). The $o(h)$ term is Little-O notation.
Note that $A$ is uniquely defined by condition (2) (if any such $A$ exists), and $B$ is then uniquely defined by condition (1). So there can only ever be one tangent line to $f$ at any given point. It is also possible that no tangent line exists, such as for functions like $f(x) = |x|$ at $x_0 = 0$, or other non-differentiable functions.
We see that for the parabola, the line $y = 0$ satisfies both conditions at $x_0 = 0$, but the line $x = 0$ does not satisfy condition (2). Intuitively, at any point other than $x_0$, the line $x = 0$ will be a horrible approximation for the parabola $y = x^2$, since the line does not even pass through any points with nonzero abscissa. Actually, it does not really fit the condition (1) either because the way I've formulated things, the line $x = 0$ is not of the form $y = Ax + B$ (you would need $A = \infty$). This can be generalized a bit more by considering lines of the form $Ax + By = C$ instead, which would allow representing vertical lines as well.
Note that condition (2) can be reformulated as saying that $$\lim\limits_{h\to0}\frac{f(x_0 + h) - f(x_0)}{h} = A$$ which is the usual definition of differentiability for $f$ at $x_0$ (with derivative $f'(x_0) = A$).
The way that I formulated (2) above allows an easier (and, in my opinion, more intuitive) generalization of tangent planes for multivariable functions. In that more general setting, $f$ is a function in $\mathbb{R}^m\to\mathbb{R}^n$, $\mathbf{x}_0$ and $\mathbf{h}$ are now vectors in $\mathbb{R}^m$, and $A$ is a linear transformation in $\mathbb{R}^m\to\mathbb{R}^n$ ($n \times m$ matrix). The error is compared relative to $|\mathbf{h}|$.