Initial answer
The entanglement fidelity isn't intended to measure a change in entanglement. Rather, it measures how coherently a channel preserves a given state, meaning how well it preserves the state of a joint (possibly entangled) system when the channel acts on just part of the joint system. This is important in various settings such as compression, where we want to cause as little disturbance as possible by compressing and then decompressing — including the situation in which there are correlations (including entanglement) between whatever gets compressed and other systems.
If I can reinterpret the final question slightly, the quantity $F_e$ is presented in textbooks because it's both useful and fundamental, and I don't think it's correct to characterize $M_e$ in this way. Moreover, to my eye, it's misleading to describe the modified entanglement fidelity as "better" or something that improves on the entanglement fidelity in any measureable way — they're different things with different interpretations. Notice also that we can always define
$$
M_e(\rho,\Phi) = \max_U F_e(\rho, U\circ \Phi),\tag{1}
$$
in which $U\circ \Phi$ means the channel we get by first applying the channel $\Phi$ and then applying the channel corresponding to $U$ (i.e., conjugating by $U$). So, we can see the modified entanglement fidelity as being something derived from the entanglement fidelity, and to talk or think about it we really don't need to open the hood on the entanglement fidelity at all.
I feel that a part of the issue may be that "entanglement fidelity" is really not such a great name for this quantity. In my book on quantum information I called it the "channel fidelity" because I didn't like the original name. For example, if $\rho$ is pure there isn't any entanglement at all, but the quantity is still well-defined and meaningful. I'm aware that some people find this sort of name-changing to be annoying — but these names aren't sacred, and why shouldn't we do this when it makes sense?
Further details on entanglement fidelity
The entanglement fidelity works as follows for a given channel $\Phi$ on a system $Q$ as well as a state $\rho$ of $Q$. It's the smallest fidelity that we can possibly have between an input state $\sigma$ of a joint system $(Q,R)$ and the corresponding output state $(\Phi\otimes\operatorname{Id}_{R})(\sigma)$ of $(Q,R)$, under the assumption that the reduced state of $\sigma$ on $Q$ is $\rho$. In symbols,
$$
\operatorname{F}_e(\rho,\Phi) = \inf \{ \operatorname{F}(\sigma,(\Phi\otimes\operatorname{Id}_{R})(\sigma))\,:\,\operatorname{Tr}_R(\sigma) = \rho\}.\tag{2}
$$
The infimum should be understood to include the choice of the system $R$ as well as $\sigma$ — and by the way this should be the squared-fidelity if we want it to match with Schumacher's formula.
So, the entanglement fidelity represents the worst-case: the lowest that the fidelity can possibly be between the input and output when $\Phi$ is applied to a system whose reduced state is $\rho$. So it's not just about how well $\Phi$ preserves $\rho$, but how well $\Phi$ coherently preserves $\rho$. As it turns out, the worst case always happens when $\sigma$ is a purification of $\rho$ and every purification works equally well — so it's not actually all that complicated in mathematical terms and we end up with the simple formula at the top of the question.
Measuring a change in entanglement
If you're interested in changes in entanglement, then to my eye it makes sense to consider a measure of entanglement rather than the fidelity. Let's stick with the same names as above, so we have two systems $Q$ and $R$ and a channel $\Phi$ on $Q$. We can write $\operatorname{E}(Q:R)$ to denote an arbitrary measure of entanglement between $Q$ and $R$ — pick your favorite or let it be arbitrary — and we can write $\operatorname{E}(Q:R)_{\sigma}$ if we want to be specific about the state of $(Q,R)$ being $\sigma$. My reading of the question suggests that we're interested in the difference
$$
\operatorname{E}(Q:R)_{\sigma} - \operatorname{E}(Q:R)_{(\Phi\otimes\operatorname{Id}_{R})(\sigma)},\tag{3}
$$
which represents how much entanglement has been lost by applying $\Phi$ to $Q$ for the state $\sigma$. One could consider maximizing this quantity over all $R$ and all $\sigma$, so that it represents a worst case, and we could also constrain $\sigma$ so that $\operatorname{Tr}_R(\sigma) = \rho$ for a given choice of $\rho$ if that reflects an assumption that's relevant to whatever setting we're thinking about.
There are different measures of entanglement and some are incomarable, meaning that one may be both larger and smaller than another for different states, so my guess is that you're looking at a zoo rather than one measure to rule truthfully over all others. I don't know off the top of my head if this sort of thing has been studied, but I would guess that something along these lines must have been considered before. It seems pretty interesting and it might be fun to dive in and start working things out — but if I were doing this I would start with a sincere effort to learn what's already been done.