I don't have the historical perspective on Taylor's series so I can't say "who concocted this argument". But I can provide an intuitive justification for the technique used in the proof of Taylor's Theorem.
First of all note that the Taylor's theorem main goal is to express the function $f(x)$ as a power series in powers of $(x - a)$ where $a$ is a suitable point where $f$ behaves in a very good manner. By good manner we mean here that the the function $f$ has derivatives upto a certain order in the neighborhood of the point $a$. Thus if we assume $$f(x) = a_{0} + a_{1}(x - a) + a_{2}(x - a)^{2} + \cdots$$ and assume that the series above can be differentiated repeatedly then we get the coefficients as $$a_{n} = \frac{f^{(n)}(a)}{n!}$$ Thus if $f$ is differentiable $n$ times in the neighborhood of $a$ we can consider the Taylor polynomial $$T_{n}(x, a) = f(a) + f'(a)(x - a) + \cdots + \frac{f^{(n - 1)}(a)}{(n - 1)!}(x - a)^{n - 1}$$ and our hope is that this Taylor polynomial is a good approximation of $f(x)$ in the neighborhood of $a$. And the error in approximation is $$R_{n}(x, a) = f(x) - T_{n}(x, a)$$ Finding an explicit expression of $R_{n}(x, a)$ in terms of $f, n, a, x$ is the real crux of Taylor's Theorem.
And yes this part is tricky. Suppose we used the Taylor polynomial $T_{n + 1}(x)$ instead of $T_{n}(x)$. The difference between $T_{n}$ and $T_{n + 1}$ is the extra term of type $a_{n}(x - a)^{n}$. Since we actually don't use $T_{n + 1}$ but rather $T_{n}$ so the expectation is that the Remainder should behave almost like the extra term $a_{n}(x - a)^{n}$ and hence it makes sense to analyze the express $$\frac{R_{n}(x, a)}{(x - a)^{n}}$$ The tricky part now is to fix the variable $x$ in the above expression and consider it as a function $a$. And then we change the notation slightly to $$\frac{R_{n}(x, u)}{(x - u)^{n}}$$ where $u$ lies between $a$ and $x$ and compare this error with maximum error $$\frac{R_{n}(x, a)}{(x - a)^{n}}$$ and thus our final function is $$g(u) = \frac{R_{n}(x, u)}{(x - u)^{n}} - \frac{R_{n}(x, a)}{(x - a)^{n}}$$ and we want to get rid of $(x - u)^{n}$ in denominator (to simplify calculation) so we instead use the function $$F(u) = (x - u)^{n}g(u) = R_{n}(x, u) - \left(\frac{x - u}{x - a}\right)^{n}R_{n}(x, a)$$ Doing this achieves the important goal $F(a) = F(x) = 0$ so that we can apply Rolle's Theorem to get $F'(c) = 0$ for some $c$ between $a, x$. The expression for $F'(u)$ contains the expression $R_{n}(x, a)$ and the equation $F'(c) = 0$ allows us to expression $R_{n}(x, a)$ in terms of $x, n, a$ (although this is somewhat indeterminate because the exact value of $c$ is not known).
Above technique is based on differential calculus. There is another simpler way to look at it if we use integration. The expression $R_{n}(x, a)$ seen as a function of $x$ is such that all its derivatives upto order $(n - 1)$ vanish at $x = a$ and $R_{n}^{(n)}(x, a) = f^{(n)}(x)$. Thus $R_{n}(x, a)$ is the $n^{\text{th}}$ anti-derivative of $f^{(n)}(x)$ and moreover its first $(n - 1)$ derivatives vanish at $x = a$. Under these circumstances it is easy to prove via integration by parts (and induction on $n$) that $$R_{n}(x, a) = \frac{1}{(n - 1)!}\int_{a}^{x}(x - t)^{n - 1}f^{(n)}(t)\,dt$$ Putting $t = x + (a - x)u$ we get $$R_{n}(x, a) = \frac{(x - a)^{n}}{(n - 1)!}\int_{0}^{1}u^{n - 1}f^{(n)}(x + (a - x)u)\,du$$ The integral can be approximated using mean value theorems for integrals to get $$R_{n}(x, a) = \frac{(x - a)^{n}}{(n - 1)!}f^{(n)}(c)\int_{0}^{1}u^{n - 1}\,du$$ i.e. $$R_{n}(x, a) = \frac{(x - a)^{n}}{n!}f^{(n)}(c)$$ where $c$ is some point between $a$ and $x$.
Both the approaches can be modified in a simple manner to give rise to Cauchy's form of remainder. Also note that the approach based on differential calculus assumes that $f^{(n)}(x)$ exists in some neighborhood of $a$ whereas the approach based on integration requires that $f^{(n)}(x)$ is continuous on some neighborhood of $a$. There is another form of remainder for Taylor series which goes by the name of Peano's form of remainder and this requires a totally different approach for its proof.