The problem comes down to formulas with free variables. Suppose we have a formula $\phi(x,y) \equiv x = y$, and we have a structure $M$. What would it mean for $M$ to satisfy $\phi$? Well, it doesn't really make sense, because a variable "$x$" is not given an meaning by $M$ - $x$ doesn't denote anything, and neither does $y$. So $x = y$ has no truth value at all.
One solution is to strengthen the notion of a "model" so that it includes both a structure $M$ and a function that assigns an element of $M$ to each variable. This is essentially the method taken by Mendelson (and also by Enderton's classic textbook on logic). Now we have a a rigorous definition of what it means for a structure $M$ and variable assignment $s$ to satisfy $\phi(x,y)$; we might write $(M,s) \models x = y$.
The other option is to keep the notion of a model the same, but change the definition of $\models$. Now, $M \models x = y$ is defined to mean that, for every variable assignment $s$, $(M,s) \models x = y$ in the sense of the previous paragraph. But of course this is not likely to hold; in this sense we with have $M \not \models x = y$ as long as $M$ has at least two elements in its domain.
These approaches give the same truth values to all sentences - they only differ for formulas that have free variables. In many areas of mathematical logic, we are mostly interested in truth values of sentences. The main area where the second convention is useful is in universal algebra, where they often look at "equational theories". In that context, they tend to use formulas with free variables as axioms, rather than sentences without free variables. And they are interested in models where every variable assignment satisfies the formulas, not just some random variable assignment. So they often use the second convention.
I do recommend Enderton's book for a clear presentation of the first convention, the one where $A(x)$ does not necessarily imply $(\forall x)A(x)$.