Two possible ways of defining vector/vector valued distributions in $\mathbb{R}^3$ are:
$$ X := [\mathcal{D}(\mathbb{R}^3; (\mathbb{R}^3)^{*})]^{*} = \{ T: \mathcal{D}(\mathbb{R}^3; (\mathbb{R}^3)^{*}) \simeq \mathcal{D}(\mathbb{R}^3; \mathbb{R}^3) \rightarrow \mathbb{R} \}, $$ and
$$ Y := \mathcal{D}^{*}(\mathbb{R}^3;\mathbb{R}^3) = \{ T : \mathcal{D}(\mathbb{R}^3) \rightarrow \mathbb{R}^3 \} \simeq \mathcal{D}'(\mathbb{R}^3;\mathbb{R})^3 . $$
Note that $\mathbf{u} \in L^1_{loc}(\mathbb{R}^3)^3$ induces the distribution $\langle T_{\mathbf{u}}, \phi \rangle = \int_{\mathbb{R}^3} \mathbf{u} \cdot \phi$, where in the first case the test function is vector valued while in the second case it is scalar valued.
As explained here, these two space are "equal" (i.e., linearly homeomorphic) in this framework. In general, that is if the ambient space is some infinite dimensional Banach space $U$, the $Y$ definition is to be preferred.
Now if $T$ is a vector valued distribution, one usually defines its curl via duality:
$$ \langle \operatorname{curl} T, \phi \rangle := \langle T, \operatorname{curl} \phi \rangle, \qquad \phi \in \mathcal{D}(\mathbb{R}^3; \mathbb{R}^3); $$
since $\operatorname{curl} \phi \in \mathcal{D}(\mathbb{R}^3; \mathbb{R}^3)$ and $T$ acts on it, a posteriori it follows that $T$ should be in $X$ and $\operatorname{curl} T \in X$ too (assume continuity holds)! If instead $T = (T_1, T_2, T_3) \in Y$, I would write
$$ \langle \underbrace{\operatorname{curl} T}_{\in Y}, \phi \rangle := (\langle \partial_3 T_2 - \partial_2 T_3, \phi \rangle, \langle \dots, \phi \rangle, \langle \dots, \phi \rangle) \qquad \phi \in \mathcal{D}(\mathbb{R}^3; \mathbb{R}). $$
For what concerns the gradient of a distribution $F \in \mathcal{D}'(\mathbb{R}^3; \mathbb{R})$ we have the same issue: we can either define it by duality with the divergence, resulting in $\nabla F \in X$, or as the triplet of distributions $(\partial_{x_1} F, \partial_{x_2} F, \partial_{x_3} F) \in Y$.
All considered, is there a more standard or desirable choice among these definitions? For some reason, my taste would prefer the $X$ definition (with corresponding operators), perhaps because of the "perfect dualities", but this doesn't sound like a robust motivation.
EDIT:
There is a good chance that part of the answer somehow lies in differential geometry, which I am not quite good at. What properties we would like these diffential operators to have? Almost certainly, the validity of a Poincaré lemma, namely that if $T \in \mathcal{D}'$ and $\operatorname{curl} T = 0$, then $T = \nabla F$ for some distribution $F$ (and similar for div and grad). I know that there is such an abstract result for currents, but I am not able to put all pieces together with the correct definitions and identifications (for instance, in the standard $\mathbb{R}^3$ setting, even if in principle grad, curl and div are differential operators acting on $0,1,2$-forms resp., we usually identify the divergence with the scalar function it is represented by etc...)