I was struggling with the motivation for connection and found it more comfortable to think from parallel transport. In Wiki, parallel transport is a map between tangent spaces denoted by $\Gamma(\gamma)_s^t:T_{\gamma(s)}M \rightarrow T_{\gamma(t)}M$, where $\gamma$ is the flow of the vector field $X$. This map satisfies the necessary smoothness properties and also (*) $\Gamma(\gamma)_u^t \circ \Gamma(\gamma)_s^u = \Gamma(\gamma)_s^t$. This is the intuition in preserving parallelism along the curve $\gamma$. Finally, the connection is defined by \begin{align*} \nabla_X Y = \lim_{s \rightarrow 0} \frac{\Gamma(\gamma)^0_s Y-Y}{s}. \end{align*} All of this makes sense to me.
However, I tried to prove $\nabla_{fX+gZ} = f\nabla_X +g\nabla_Z$ and I don't know if I can do it without assuming that $\Gamma(\gamma)^0_s$ must be linear. My feeling is that the linearity of the map can be proven using condition (*), but I can't do it. Any help and hints are greatly appreciated.