What would be wrong with defining the Dirac delta function as $$ \int_{-\infty}^{\infty} \delta^{(n)}(f(x)) g(x) \,dx := \lim_{h\rightarrow 0}\int_{-\infty}^{\infty} \delta_h^{(n)}(f(x)) g(x) \,dx $$ for a suitable nascent delta function $\delta_h(x)$? For example, the rectangular pulse, hat function, and normal distribution nascent delta functions are all suitable in the case $n=0.$ For $n\leq1,$ the hat function and normal distribution $\delta_h$'s are suitable. For $n\geq 2,$ the normal distribution $\delta_h$ is suitable.
If the Dirac delta function is defined this way, it satisfies all the familiar properties of the Dirac delta function (the sifting property, composition rule, derivatives rule). So why is distribution theory necessary? Is the definition given above sufficient for most users of the Dirac delta function?