I have a different take. I think there is quite a satisfactory definition of variable that works at least in basic algebra, calculus, and analysis (and probably other settings too). It is the same as the definition of "random variable" -- you just have to forget the "random" part. Let me explain.
First, mathematically, a variable is a function. What kind of functions deserve to be called "variables"? Often, we talk about "real" or "complex" variables. This refers to the range of the function. Variables are often valued in "numbers" (for "numeric" variables) often meaning something like a field (thus "real variables", and "complex variables"), but sometimes in set $\{0, 1\}$ for binary variables, or even in more complex mathematical object, like vector space for "vector variables" etc. etc. The domain of the function is slightly more subtle issue -- actually several related issues. They are related to 1) the ability of functions to restrict and 1) ability of functions to "pull back" (compose). The first gives us ability -- which, as Matt E points out in his answer, is critical -- to specialize the variables -- meaning restrict the functions to subdomains. The second gives us ability to transfer variables from one space (domain) to another one, which maps into it. Thus when we talk about a variable, we are at all times talking about a function, but we often change the domain on which we consider this function in the middle of the argument, without being explicit about it. (I think this is what makes the notion of variable useful, but also confusing.)
Stepping aside from formal-ish mathematics, here is how we use "variables" in mathematical modeling of systems in other subjects (physics, chemistry, economics, what have you). We are presented with a "system" (a robot, a gas, an economy, etc.) which has a set of states (modeled as some abstract set $S$). These states have numerical characteristics (joint angle of joint number 2 in radians, pressure at location P in Pascals, average price of gas in Paris in euros per liter, etc.), which are -- you guessed it -- variables. Thus a variable $v$ is a function from space of possible states $S$ to "a number". Of course there are "vector variables" (like "velocity of the robot's center of mass"), and "categorical" or "binary" ones ("what party does the president of the US belong to") etc. here as well. [This is, as I promised, the same as a "random variable" except we don't require either the set of states or the set of values to have any extra structure (no sigma algebras here) and correspondingly don't require $v$ to be "measurable".]
Often some particular collection of variables $v_1, v_2, \ldots$ is sufficient for our purposes -- it describes the state of our system well enough, so that for the purposes of the analysis that we intend to do we don't need to know anything more than values of those variables (this is akin to a "sufficient statistic" or a "faithful representation"). Then we may think of the system in question as being completely captured by these variables, and consider instead of the system its "model" aka the image $V(S)$ of the joint map $V=(v_1, v_2, \ldots)$. There are two things that happen now:
1) Any variable "on the system" $v:S\to X$ that is of interest is assumed to be a "pull back" of $w:V(S)\to X$ via $V$, that is, be of the form $v(p)=w(V(p))$ for some $w$ (this is what it means for the system to be completely captured by $V$).
and
2) Some of the variables on $V(S)$ (and hence on $S$) can be obtained by restricting to $V(S)$ a "global" variable. Here a global variable is variable defined on the "ambient space "$R$ -- the product of ranges of $v_i$s (if each $v_i$ is valued in, say, $\mathbb{R}$, and there are finitely many of them, then this $R$ is $\mathbb{R}^n$). Certainly each $v_i$ is of this form -- it is a restriction of the projection that takes $(x_1, x_2, \ldots)$ to $x_i$. These projections are now called "coordinate functions" or just "coordinates".
When we talk about variables, we sometimes refer to functions with domain $S$, sometimes with domain $V(S)$ and sometimes to the global variables with domain $R$. (Confusion between these plagues a number of discussions of calculus of variation, for example; I believe physicists sometimes refer to calculations with "global" variables as being "off shell" and ones with the restricted variables as being "on shell", but do not trust me on this point too much).
Let's consider some ramifications of this view in algebra/calculus/analytic geometry.
1) Equations: We have variables $x$, $y$ and $z$ that are global, but also restrict to variables on the system in question. When we write something like $y=x^2$ (or $x^2=1$) we are specifying a subdomain of $R$, the set of points $p$ (in $R$ or in $V(S)$ as the case may be) where $y(p)=x(p)^2$ (or $x(p)^2=1$, respectively) that is the (largest) subdomain on which $y$ and $x^2$ restrict to the same function ($x^2$ and $1$ restrict to the same function). "Solving" $x^2=1$ amounts to specifying this subdomain in a different format, usually by giving simple criterion for membership, like listing all elements ($x=\{1, -1\}$, where we implicitly use the coordinate functions, and really mean $\{p| x(p)=1 \text{ or } x(p)=-1\}$).
Aside: After a discussion of differentials (which are also functions, but again often simultaneously on tangent spaces of $R$ and on restrictions to tangent spaces of submanifolds (curves)), one can view ordinary differential equations like $dy=ydx$ in the similar way - as asking for submanifolds (curves) on tangent spaces to which the restrictions of the differentials are equal as functions.
2) Dependent and independent variables: Some $V(S)$ have the property that they are (at least locally) in bijection with (some open subset of) the images under sub-collection $(v_{i_1}, v_{i_2},\ldots)$ of (usually coordinate) variables. Then we can consider these $v_{i_j}$s as "independent variables" and all other variables as dependent variables, meaning functions (locally) obtained by pulling back functions of $v_{i_j}$s. Of course many $V(S)$ have multiple such descriptions: a circle $x^2+y^2=25$ near $(3,4)$ is locally in bijection with it's projection to the $x$ coordinate AND with it's projection to the $y$ coordinate -- we can view either $x$ OR $y$ as "independent variable" on the circle near $(3,4)$.
3) Relatedly, "a change of variables" or "change of coordinates" is then a change of which variables $v_i$ are used to represent $S$, often by composing $V$ with some "coordinate change map" from $R$ to some other range.
Etc.