Let's say we are talking about runtimes of algorithms, not functions. Then we can use the definition of $O (f (n))$ the way it is given here. But where would we encounter something like $O (f (n)) - O (g (n))$?
Here's a practical situation: I have a complex algorithm. One step in that algorithm has a runtime of $O (f (n))$. I could modify that step in a way that it now takes $O (g (n))$. How much faster will my algorithm be?
We can say naively that the improvement is $O (f (n)) - O (g (n))$. If $f (n), g (n) = O (n^2)$ then we can say equally naively that the improvement is $O (n^2) - O (n^2)$. But what is the actual improvement?
The fact is that with the information given, we have no idea. f (n) could always be 100 times larger than g (n), or always 100 times smaller, or sometimes larger and sometimes smaller; that's absolutely allowed by the definition of O (). All we can say is that the runtime of the algorithm will not improve by more than $O (n^2)$, and will not get worse by more than $O (n^2)$. We can't say whether it will improve or get worse, and it might be different depending on n.
But if we talk about functions, and not algorithms, then we must use a definition of O (f (n)) that takes into account negative values and somehow works in a meaningful way for negative values.