Indeed, Fourier inversion can be viewed as a "spectral synthesis" result, where functions (perhaps "generalized") are expressible as superpositions (allowing possibly both sums and integrals...) of eigenfunctions for some natural self-adjoint operator, for example the Laplacian.
(This admits a representation-theoretic analogue...)
The necessity of allowing integrals, and of allowing eigenfunctions outside the initial Hilbert space (e.g., $L^2(\mathbb R)$) arises whenever the operator has more than discrete spectrum (=eigenvalues=point-spectrum). Gelfand et al call eigenfunctions in a larger space "generalized eigenfunctions" (a usage slightly in conflict with Jordan-decomposition uses).
In general, I think there is no useful explicit description of appropriate "generalized eigenfunctions", but in many interesting cases there is: Fourier inversion on Euclidean spaces is the most immediate, but, also, spaces of automorphic forms on reductive groups admit such fairly-explicit decompositions, even including the continuous spectrum (Selberg, Langlands, Harish-Chandra, Godement, et al).
Meanwhile, one should take some care, because Fourier inversion does not directly express a Schwartz function as a superposition of Schwartz functions... so such slogans would be dangerously misleading. Indeed, in all the "explicit" examples I know, the arguments required to demonstrate (and understand) the finer gradations of "spectral synthesis" involve much more than a basic Hilbert-space argument, or even a basic rigged-Hilbert-space (=scales-of-Hilbert-spaces) argument. For example, on the real line, looking also at the Schrodinger/Hamiltonian operator $-\Delta+x^2$ (a.k.a., "with confining potential", a.k.a., "quantum harmonic oscillator") does give a discrete decomposition (related to Hermite polynomials/functions), from which one can prove things about the Schwartz space... and eventually return to Fourier inversion as a corollary of discrete-spectrum set-up. I think N. Wiener took that route to some extent.
An attempt to make sense of "orthogonality" of distinct exponentials on the real line, for example, can be made more purposeful by asking about existence or non-existence of intertwinings between the representations generated by these functions (giving some heft to the barer fact that they have distinct eigenvalues for the Laplacian). That is, the classical integrals involved in spectral decompositions, when convergent, very often are viewable as giving intertwining operators... which often can be proven to be unique even without assuming convergence of the natural integral. Oppositely, re-interpretation of integrals as (unique up to scalars) intertwining operators can give a sense to not-convergent integrals in many useful cases. E.g., this can be thought of as a proof that $\int_{\mathbb R} e^{i\xi x}\;dx=0$ unless $\xi=0$.
The Fourier image of an $L^2$ function lies always in $L^2$, by the Plancherel theorem, contrary to what you state.
Yes, the space of tempered distributions forms a vector space. But you cannot talk (at least not without introducing further structure) about orthonormal bases of this space, as no scalar product is given. Note that $<f,g>$ or $f(g)$ do not make sense in general if $f,g$ are tempered distributions, only if $f$ is a tempered distribution and $g$ is a Schwartz function.