I want to understand how the "physicists' definition" of a tensor and the "mathematicians' definition" are equivalent. I'm going to stick to the finite dimensional case, and note that I am not concerned with the difference between tensor fields and tensors.
Let $V$ be a finite dimensional vector space, $V^{*}$ its dual space, and $F$ the underlying field. A type $(n,m)$ tensor $T$ is a multilinear map $$T: V \times ... \times V \times V^{*} \times ... \times V^{*} \to F $$ where there are $n$ copies of $V$ and $m$ copies of $V^{*}$.
At this point after pondering this definition for a while you can see that we can represent any tensor as a linear function on a larger vector space which takes as basis vectors all combinations of the basis vector of the $n$ copies of $V$ and $m$ copies of $V^{*}.$ In other words, we should be able to represent a tensor as a (multidimensional) array of shape $d^{n+m}.$ However, the values of this array will depend on the choice of basis chosen for $V$ (and $V^{*}$ - but let's assume for now that we will always choose the ``natural'' basis for $V^{*}$ given our choice of basis for $V$.)
The fact that tensors can be represented as multidimensional arrays, but the values of the array could depend on the basis might lead you to consider a ``physicists' definition of a type $(n, m)$ tensor'':
Let $V$ be a finite dimensional vector space with dimension $d$ and $F$ the underlying field. A type $(n,m)$ tensor $T$ associated to the vector space $V$ is a multidimensional array of shape $d^{n+m}$ which obeys a certain transformation law.
At this point, we don't know what the transformation law is. Instead we want to pretend like we are discovering it. So we start by considering the simplest cases first and then working our way up in complexity. From here on forward we'll take $p=2, F=\mathbb{R}$ for simplicity, and we'll represent $v,w \in V$ as column vectors and $\phi, \psi \in V^{*}$ as row vectors.
Example 1:
A type $(1,0)$ tensor $$T: V \to \mathbb{R}$$
We know that any linear map on $v$ to a scalar be represented as a row vector. So
$$T(v) = \begin{bmatrix} L_1 & L_2 \end{bmatrix} \begin{bmatrix}
v_1\\
v_2\\
\end{bmatrix}$$
for some $L_1, L_2 \in \mathbb{R}.$ Now what would the transformation law be? Well we can easily derive it:
$$T(v) = Lv = L(R^{-1}R)v = (LR^{T})(Rv) = \hat{L} \hat{v}$$
where there was some change of basis: $$v \mapsto \hat{v}, \hat{v} = Rv, RR^{T} = I.$$
Conclusion: The transformation law is given by: $$v \mapsto Rv \implies L \mapsto LR^T.$$ (This case also tells us how the covectors transform in general.)
Example 2:
A type $(0,1)$ tensor $$T: V^{*} \to \mathbb{R}$$
This case is similar. Any linear map on $\phi$ to a scalar can be represented by a column vector. So
$$T(\phi) = \begin{bmatrix} \phi_1 & \phi_2 \end{bmatrix} \begin{bmatrix}
L_1\\
L_2\\
\end{bmatrix}$$
for some $L_1, L_2 \in \mathbb{R}.$
$$T(\phi) = \phi L = \phi R^T R L = (\phi R^{T})(RL) = \hat{\phi} \hat{L} $$
Transformation law: $$v \mapsto Rv, \phi \mapsto \phi R^{T} \implies L \mapsto RL.$$
Now this is where I start to get confused. I want to consider all tensors which are represented by $2d$ arrays. This includes the type $(2,0), (1,1), (0,2)$ tensors. I want to represent each of these by using the normal rules for column/row vector and matrix multiplication.
Ostensibly it looks like we can only represent a type $(1,1)$ tensor: $$T(\phi, v) = \begin{bmatrix} \phi_1 & \phi_2 \end{bmatrix} \begin{bmatrix} L_{11} & L_{12} \\ L_{21} & L_{22} \end{bmatrix} \begin{bmatrix} v_1 \\ v_2 \end{bmatrix}$$ But we can actually easily represent a multilinear map from two copies of $V$ with the following: $$T(w, v) = \begin{bmatrix} w_1 \\ w_2 \end{bmatrix}^{T} \begin{bmatrix} L_{11} & L_{12} \\ L_{21} & L_{22} \end{bmatrix} \begin{bmatrix} v_1 \\ v_2 \end{bmatrix}.$$ This is a ``quadratic form.'' And if we want to represent a multilinear map from two copies of $V^{*}$ then we can just write: $$T(\phi, \psi) = \begin{bmatrix} \phi_1 & \phi_2 \end{bmatrix} \begin{bmatrix} L_{11} & L_{12} \\ L_{21} & L_{22} \end{bmatrix} \begin{bmatrix} \psi_1 & \psi_2 \end{bmatrix}^{T} $$ We also could also represent a type $(1,1)$ tensor like: $$T(w, \psi) =\begin{bmatrix} w_1 \\ w_2 \end{bmatrix}^{T} \begin{bmatrix} L_{11} & L_{12} \\ L_{21} & L_{22} \end{bmatrix} \begin{bmatrix} \psi_1 & \psi_2 \end{bmatrix}^{T}.$$
Now we derive our transformation laws. We insist that $v \in V$ transforms like $\hat{v} = Rv$ and $\phi \in V^{*}$ transforms like $\hat{\phi} = \phi R^T.$
$$\phi L v = \phi R^{T} R L R^{T} R v = (\phi R^{T}) (R L R^{T}) (R v) = \hat{\phi} \hat{L} \hat{v}$$ $$w^T L v = w^T R^{T} R L R^{T} R v = (Rw)^{T} (R L R^{T}) (R v) = \hat{w}^T \hat{L} \hat{v}$$ $$\phi L \psi^{T} = \phi R^{T} R L R^{T} R \psi^{T} = (\phi^{T} R) (R L R^{T}) (\psi R^{T})^{T} = \hat{\phi} \hat{L} \hat{\psi}^{T}$$ $$w^T L \psi^{T} = w^T R^{T} R L R^{T} R \psi^{T} = (Rw)^{T} (R L R^{T}) (\psi R^{T})^{T} = \hat{w}^T \hat{L} \hat{\psi}^{T}$$
So I am getting the same transformation law for the type $(2,0), (1,1), $ and $(0,2)$ tensors! This is clearly not right, so what is going on here?