When we think about theories like ZFC or PA, we often view them foundationally: in particular, we often suppose that they are true. Truth is very strong. Although it's difficult to say exactly what it means for ZFC to be "true" (on the face of it we have to commit to the actual existence of a universe of sets!), some consequences of being true are easy to figure out: true things are consistent, and - since their consistency is true - don't prove that they are inconsistent.
However, this makes things like PA + $\neg$Con(PA) seem mysterious. So how are we to understand these?
The key is to remember that - assuming we work in some appropriate meta-theory - a theory is to be thought of as its class of models. A theory is consistent iff it has a model. So when we say PA + $\neg$Con(PA) is consistent, what we mean is that there are ordered semirings (= models of PA without induction) with some very strong properties.
One of these strong properties is the induction scheme, which can be rephrased model-theoretically as saying that these ordered semirings have no definable proper cuts.
It's very useful down the road to get a good feel for nonstandard models of PA as structures in their own right as oppposed to "incorrect" interpretations of the theory; Kaye's book is a very good source here.
The other is that they satisfy $\neg$Con(PA). This one seems mysterious since we think of $\neg$Con(PA) as asserting a fact on the meta-level. However, remember that the whole point of Goedel's incompleteness theorem in this context is that we can write down a sentence in the language of arithmetic which we externally prove is true iff PA is inconsistent. Post-Goedel, the MRDP theorem showed that we may take this sentence to be of the form "$\mathcal{E}$ has a solution" where $\mathcal{E}$ is a specific Diophantine equation. So $\neg$Con(PA) just means that a certain algebraic behavior occurs.
So models of PA+$\neg$Con(PA) are just ordered semirings with some interesting properties - they have no proper definable cuts, and they have solutions to some Diophantine equations which don't have solutions in $\mathbb{N}$. This demystifies them a lot!
So now let's return to the meaning of the arithmetic sentence we call "$\neg$Con(PA)." In the metatheory, we have some object we call "$\mathbb{N}$" and we prove:
If $T$ is a recursively axiomatizable theory, then $T$ is consistent iff $\mathbb{N}\models$ "$\mathcal{E}_T$ has no solutions."
(Here $\mathcal{E}_T$ is the analogue of $\mathcal{E}$ for $T$; remember that by the MRDP theorem, we're expressing "$\neg$Con(T)" as "$\mathcal{E}_T$ has no solutions" for simplicity.) Note that this claim is specific to $\mathbb{N}$: other ordered semirings, even nice ones!, need not work in place of $\mathbb{N}$. In particular, there will be lots of ordered semirings which our metatheory proves satisfy PA, but for which the claim analogous to the one above fails.
It's worth thinking of an analogous situation in non-foundationally-flavored mathematics. Take a topological space $T$, and let $\pi_1(T)$ and $H_1(T)$ be the fundamental group and the first homology group (with coefficients in $\mathbb{Z}$, say) respectively. Don't pay attention too much to what these are, the point is just that they're both groups coding the behavior of $T$ which are closely related in many ways. I'm thinking of $\pi_1(T)$ as the analogue of $\mathbb{N}$ and $H_1(T)$ as the analogue of a nonstandard model satisfying $\neg$Con(PA), respectively.
Now, the statement "$\pi_1(T)$ is abelian" (here, my analogue of $\neg$Con(PA)) tells us a lot about $T$ (take my word for us). But the statement "$H_1(T)$ is abelian" does not tell us the same things (actually it tells us nothing: $H_1(T)$ is always abelian :P).
We have a group $G$, and some other group $H$ similar to $G$ in lots of ways, and a property $p$; and if $G$ has $p$, we learn something, but if $H$ has $p$ we don't learn that thing. This is exactly what's going on here. It's not the property by itself that carries any meaning, it's the statement that the property holds of a specific object that carries meaning useful to us. We often conflate these two, since there's a clear notion of "truth" for arithmetic sentences, but thinking about it in these terms should demystify theories like PA+$\neg$Con(PA) a bit.