The soundness definition that you are familiar with is the normal soundness definition of proof systems. The soundness definition in this paper is a definition of soundness for "proofs of knowledge", i.e., the goal of the prover is not only to convince the verifier that the statement is true but also that it knows a witness. This can be a much stronger requirement.
Now, there are two main aspects in which this differs to the notion you're familiar with. The first aspect is related to this question: How can we formalize that the prover "knows" a witness? The short answer in informal words is: if there's a witness in the prover, then there must be some way to get it out of the prover. This extraction process is the job of the knowledge extractor $M$ in the definition you found. The knowledge extractor $M$ is given black-box access to $P$ in order to extract the witness. I could expand on this concept but I believe there are a lot of very good resources with different levels of rigor that elaborate specifically on this point [1,2,3,4] because this is often a source of confusion for newcomers.
The second difference is that this definition relates the accepting probability of $P$ with the probability that $M$ extracts. In other words, whenever $P$ convinces the verifier, then $M$ should be able to extract a witness (with a negligible error). An alternative (but flawed) definition which is closer to the definition you're aware of would require that if $P$ convinces with non-negligible probability, then $M$ can extract successfully.* Variants of this definition have indeed been used early but they're too weak for what we typically need, as Bellare and Goldreich [4] explain in an entire paper that I very much recommend for a deeper understanding. In Appendix A, they explain why the definition you found is better than the flawed definition but still not optimal.
A final remark is that the paper you've found is already pretty old. These were the first days of zero-knowledge when the community was still in the process of finding good definitions for zero-knowledge proof systems, which can be nicely seen from the paper by Bellare and Goldreich [4]. The definitions in newer works may be clearer.
*Note that the condition "if the statement is in the language" would obviously be too weak: just because the statement is in the language does not mean that some $P$ (which could be just a dummy machine outputting only zeros) knows a witness.