(added 3/9/2021) There is no difference between the approaches via Taylor expansion or Butcher trees, the latter is just a systematic way to do the former. The Taylor coefficients will always be polynomial expressions in the method parameters/coefficients. With the manipulation of B-series one gets these expressions and the resulting order relations in a fully algebraic way.
At the end of the 19th century the idea was developed to use several close-by evaluations of the ODE function to get a more correct approximation of the next point. The idea of Kutta over the work of Runge and Heun*1 was to disregard any structure and use all of the previously computed slopes in the computation of the next one, using a general linear combination with undetermined coefficients, until a sufficient number for the integration step was obtained. In formulas, $\newcommand{\D}{\mathit{\Delta}}$
\begin{align}
\D_1y&=f(x,y)\D x\\
\D_2y&=f(x+k_2\D x,\,y+q_{21}\D_1y)\D x\\
\D_3y&=f(x+k_3\D x,\,y+q_{31}\D_1y+q_{32}\D_2y)\D x\\
&\vdots\\
\D_sy&=f(x+k_s\D x,\,y+q_{s1}\D_1y+...+q_{s,s-1}\D_{s-1}y)\D x\\
y_{\rm next}&=y+a_1\D_1y+...+a_s\D_sy
\end{align}
Then the order conditions were derived via Taylor expansion and solved for $s=2,3,4,5,6$ and examples given.
*1 What's the motivation for Runge-Kutta methods?
(original) Before the first step observe that the resulting system of order equations is under-determined, so you have some freedom in constructing a method.
The first step in the approach by Kutta and also usually followed today is to single out the order equations that do not contain the matrix coefficients, that is, that only contain the coefficients relevant to an integration of $y'(x)=f(x)$, a simple quadrature. Then it is well-known how to use interpolation to get quadrature rules for any selection of sample points. So select one of the named rules or make up your own. Kutta selected the Simpson rule and the 3/8 rule to construct examples, demanding further symmetry to further reduce degrees of freedom. Thus the sample points are selected as $(k_1,k_2,k_3,k_4)=(0,1/2,1/2,1)$ or $=(0,1/3,2/3,1)$ with coefficients $(a_1,a_2,a_3,a_4)=(1/6,1/3,1/3,1/6)$ or $=(1/8, 3/8,3/8,1/8)$. Kutta also explored other than the symmetric generalization of the Simpson rule.*2
*2 Why use Classic fourth-order Runge-Kutta over the 3/8-rule?
This then drastically reduces the variability in the remaining order conditions.
The classical RK4 method has the additional design decision that the matrix only is non-zero on the sub-diagonal, thus the non-zero entries are $q_{21}=k_2$, $q_{32}=k_3$, $q_{43}=k_4$. The quadrature conditions are
\begin{align}
a_1+a_2+a_3+a_4&=1\\
a_2k_2+a_3k_3+a_4k_4&=\frac12\\
a_2k_2^2+a_3k_3^2+a_4k_4^2&=\frac13\\
a_2k_2^3+a_3k_3^3+a_4k_4^3&=\frac14\\
\end{align}
and the order conditions for linear ODE reduce to
\begin{align}
a_3k_3k_2+a_4k_4k_3&=\frac16\\
a_4k_4k_3k_2&=\frac1{24}
\end{align}
and for nonlinear ODE
\begin{align}
a_3k_3k_2^2+a_4k_4k_3^2&=\frac1{12}\\
a_3k_3^2k_2+a_4k_4^2k_3&=\frac18\\
\end{align}
Leaving out the first equation which determines $a_1$, the other 7 equations connect the 6 quantities $k_2,k_3,k_4$ and $a_2k_2,a_3k_3,a_4k_4$. The last occur only linearly, so we have a linear system of 7 equations for 3 quantities. That the rank of the extended system matrix is less than 4 then gives 3 determinant equations for the 3 variables $k_2,k_3,k_4$, which completely determines them to one of finitely many solutions.