Parts of your question have been answered here.
$\mathbb R^n$ is the vector space of all $n$-tuples of real numbers. Usually one writes $x \in \mathbb R^n$ as $x = (x_1,\ldots,x_n)$ in row form, but you can use also the column form
$$x = \left[ \begin{array}{rrr}
x_1 \\
. \phantom{.} \\
. \phantom{.} \\
. \phantom{.} \\
x_n \\
\end{array}\right]$$
This is just a notational issue. If you like, you can regard column vectors as $n \times 1$-matrices. Tu also prefers to use for coordinates upper indices instead of lower (for example $x^i$ instead of $x_i$).
Tu introduces the tangent space $T_p(\mathbb R^n)$ at $p$ in $\mathbb R^n$ as the vector space of all arrows emanating from $p$. He illustrates this for $n = 3$:

This generalizes of course to arbitrary $n$. Thus, formally we should regard such an arrow as a pair $a = (p,v)$ with $v \in \mathbb R^n$, that is we have $T_p(\mathbb R^n) = \{p\} \times \mathbb R^n$. The vector space structure on $\{p\} \times \mathbb R^n$ is of course given by $(p,v) + (p,w) = (p,v+w)$ and $r(p,v)= (p,rv)$. This achieves that the canonical bijection
$$\beta : \{p\} \times \mathbb R^n \to \mathbb R^n, \phi((p,v)) = v $$
becomes a linear isomorphism of vector spaces. Note that an arrow emanating at $p$ can geometrically be viewed as the directed line segment from $p$ to $p + v$, in other words as the pair of points $(p,p+v)$.
Tu uses the convention to write the starting point $p$ of an arrow $a = (p,v)$ in row form and the "vectorial part" $v$ of $a$ in column form. In Tu's own words, he does so to distinguish between points and vectors. Moreover, by an abuse of notation he drops $p$ from $a = (p,v)$, i.e. identifies $a$ with the column vector $v$. Thus in the sense of Tu there is difference between the column vector with components $v^i$ (which is an element of $T_p(\mathbb R^n))$ and the row vector $v = (v^1,\ldots,v^n) \in \mathbb R^n$. Personally I do not like this way of notational distinction of $T_p(\mathbb R^n)$ from $\mathbb R^n$, it does not elucidate anything and does not have any advantage over properly writing $a = (p,v)$ for the elements of $T_p(\mathbb R^n)$. In fact, Tu introduces an oversubtle and unnecessary distinction between row and columns vectors - but perhaps it is a matter of taste.
Anyway, this should answer your questions 1. and 2. In 2. you also ask "So will this vector space include all the possible vectors that have $p$ as starting point, even the normal vectors? Or only the vectors that pass tangentially through $p$?" Here you are misled by Fig. 2.2. A tangent vector $v$ to a surface at $p$. In fact, $T_p(\mathbb R^3)$ is not the set of tangent vectors to some surface $S \subset \mathbb R^3$ at the point $p \in S$, it is the set of all column vectors in the above sense. The set of tangent vectors to $S$ at $p$ is a two-dimensional linear subspace of $T_p(\mathbb R^3)$, as you shall see later when studying the book. You can write $T_p(S)$ for it.
Concerning your question 3.: Yes, it is an abuse of notation. Tu uses the same symbols for basis elements of $\mathbb R^n$ and of $T_p(\mathbb R^n)$. If $e_1, \ldots,e_n$ denote the standard basis vectors of $\mathbb R^n$, he should properly write $(p, e_i)$ for the standard basis vectors of $T_p(\mathbb R^n)$. But due to Tu's notational conventions we may omit $p$ and regard $e_i$ as a column vector. I think the danger of confusion is very little.
Of course $e_i$ does not live in $\mathcal D_p(\mathbb R^n)$ and thus is different from the derivation $D_{e_i} = \dfrac{\partial}{\partial x_i} \mid_p$. But for each $v \in \mathbb R^n$ we have the usual directional derivative $D_v = D_v \mid_p$ at $p$. This is the map which associates to any smooth $f : U \to \mathbb R$ defined on an open neigborhood $U$ of $p$
$$D_v \mid_p(f) = \lim_{t \to 0}\frac{f(p +tv)-f(p)}{t} .$$
Tu formalizes this and gets a map
$$D_v = D_v \mid_p : C^\infty_p \to \mathbb R$$
defined on the set $C^\infty_p$ of germs of smooth functions at $p$. This set is an $\mathbb R$-algebra (i.e. a vector space with a compatible multiplication of its elements). Tu shows that each $D_v$ is a derivation and thus gets his linear map
$$\phi : T_p(\mathbb R^n) \to \mathcal D_p(\mathbb R^n), \phi(v) = D_v.$$
In this formula $v \in T_p(\mathbb R^n)$ is a column vector in the sense of Tu's convention. It would be more precise to understand it as $\phi((p,v)) = D_v \mid_p$.
The formula $D_v = \sum v^i \dfrac{\partial}{\partial x_i} \mid_p$ follows from the fact that $D_{e_i} = \dfrac{\partial}{\partial x^i} \mid_p$: We have $v = \sum v^i e_i$ and therefore $D_v = \phi(v) = \phi(\sum v^i e_i) = \sum v^i \phi(e_i) = \sum v^i \dfrac{\partial}{\partial x^i} \mid_p$.
It is by no means trivial that $\phi$ is an isomorphism of vector spaces. This is proved in Theorem 2.2. It answers your question 4.: As Tu writes
This theorem shows that one may identify the tangent vectors at $p$ with the derivations at $p$. Under the vector space isomorphism $T_p(\mathbb R^) \simeq \mathcal D_p(\mathbb R^n)$, the standard basis $e_1,\ldots,e_n$ for $T_p(\mathbb R^n)$ corresponds to the set $\dfrac{\partial}{\partial x^1} \mid_p, \ldots, \dfrac{\partial}{\partial x^n} \mid_p$ of partial derivatives.