Preamble: I thought I would write out what's going in Lee's answer in detail. The paper he cites gives a derivation that is shorter, but somewhat more mysterious to me than the approach below.
I'll give that derivation that at the very bottom -- so if you just want a quick proof go there.
For simplicity of notation*, we will work in an Abelian group $G$, and later on we'll plug in $\mathbb{Z}_n^m$. I'll do this in the case of a finite group so that we can avoid issues of integrability.
(*Some may disagree, but for me a proliferation of Greek letters is worth the clarity about objects.)
High level intuition: The formula we are deriving follows from the following series of observations:
First, that composition can be represented by a composition operator $R_T$, mapping $f \to f \circ T$, and that this is a linear operator. The conjugation of this operator by the Fourier transform $F$ gives an operator $\hat{R_T}$ that tells us how to calculate $\hat{ f \circ T}$ from $\hat{f}$: we define $\hat{R_T}$ to have $F ({ f \circ T}) = \hat{R_T}(\hat{f})$. Since $\hat{R_T}$ is linear, we can check its behavior on a simple basis in order to write down a matrix describing it, and we obtain the formula from Lee's answer by choosing the basis of indicator functions. We get the formula for the matrix entries by an application of Parseval's theorem.
Formal Derivation:
Notation: Let $\hat{G}$ denote the group of characters of $G$, $\hat{G} = \{ \text{group homomorphisms } \chi : G \to \mathbb{C}^* \}$. Additionally, let $F : L^2(G) \to L^2(\hat{G})$ denote the Fourier transform: $F(f)(\chi) = \sum_{g \in G} f(g) \overline{\chi}(g)$ so $f = \frac{1}{|G|} \sum_{\chi \in \hat{G}} F(f)(\chi) \chi$. By basic Fourier analysis, $(F^{-1}(h))(g) = \frac{1}{|G|} \sum_{ \chi \in \hat{G} } h( \chi) \chi(g)$. We'll also write $F(f) = \hat{f}$.
Suppose we have a function $T : G \to G$ -- we assume no structure about $T$, it is just some transformation. Let $R_T : L^2(G) \to L^2(G)$ be given by $R_T(f) = f \circ T$.
The crucial observation is that $R_T$ is a linear function. We let $\hat{R_T} = F \circ R_T \circ F^{-1} : L^2(\hat{G}) \to L^2(\hat{G})$.
$\hat{R_T}$ has the property that $F ( f \circ g) = \hat{R_T} ( \hat{f})$ : the right hand side is $(F \circ R_T \circ F^{-1} \circ F)(f) = F \circ R_T) f = F( f \circ T)$. In particular, it gives the formula for a for the Fourier transform of a composition.
Since $\hat{R_T}$ is a composition of linear functions, it is linear. Thus, to understand $\hat{R_T}$ it suffices to examine its behavior on a favorite basis.
Our favorite basis will be $B = \{ \delta_{\chi} : \chi \in \hat{G} \}$, where $\delta_{\chi}$ is $1$ on $\chi$ and zero elsewhere.
One can compute from the formula that: $F^{-1} (\delta_{\chi}) = \frac{\chi}{|G|}$. Thus, $\hat{R_T} ( \delta_{\chi} ) = F \circ R_T \circ F^{-1} ( \delta_{\chi} ) = F ( \frac{ \chi}{|G|} \circ T )$.
We note Parseval's formula that that $(h, F(f))_{L^2(\hat{G})} = |G| (F^{-1}(h), f)_{L^2(G)}$ , where $(s,t)_{L^2(G)} = \sum_{g \in G} s(g) \overline{t} (g)$ and $(p,q)_{L^2(\hat{G})} = \sum_{\chi \in \hat{G}} p(\chi) \overline{q}(\chi))$. (I believe this is how Parseval's looks in the conventions I chose -- I expanded and verified.)
Additionally, we note that if $v = \sum_{\chi} a_{\chi} \delta_{\chi}$, then $\overline{a_{\chi}} = (\delta_{\chi}, v)$.
Finally, if we want the $\delta_{\Psi}$ coefficient of $\hat{R_T} ( \delta_{\chi} )$, we calculate with Parseval:
$( \delta_{\Psi} , \hat{R_T} ( \delta_{\chi} ) )_{L^2(\hat{G})} =
( \delta_{\Psi} , F ( \frac{ \chi}{|G|} \circ T ) )_{L^2(\hat{G})} = |G| ( F^{-1} (\delta_{\Psi}) , \frac{ \chi}{|G|} \circ T )_{L^2(G)} = |G| ( \frac{\Psi}{|G|}, \frac{ \chi}{|G|} \circ T)_{L^2(G)} = \frac{1}{|G|} ) ( \Psi, \chi \circ T) = \frac{1}{|G|} \sum_{g \in G} \Psi(g) \overline{\chi}(T(g)).$
This gave us the complex conjugate of the $ \delta_{\Psi}$th coefficient, so if we express the linear transformation $\hat{R_T}$ in the basis $B$ , the $(\Psi, \chi)$th entry is $\frac{1}{|G|} \sum_{g \in G} \overline{\Psi(g)} {\chi}(T(g))$. We denote this by $P( \Psi, \chi)$.
In particular, $\hat{R_T}( \hat{f})(\Psi) = \hat{R_T}( \sum_{ \chi \in \hat{G} } \hat{f}(\chi) \delta_{\chi})(\Psi) = \sum_{\chi \in \hat{G} } \hat{f}(\chi) \hat{R_T}( \delta_{\chi} )(\Psi)= \sum_{\chi \in \hat{G} } \hat{f}(\chi) P( \Psi, \chi)$.
Summarizing, we have proved:
If we define $P( \Psi, \chi) := \frac{1}{|G|} \sum_{g \in G} \overline{\Psi}(g) {\chi}(T(g))$, then
$F( f \circ T)(\Psi) = \sum_{\chi \in \hat{G} } \hat{f}(\chi) P( \Psi, \chi).$
Specialization to the one dimensional case:
Now we specialize to the case $G = \mathbb{Z}_n$ to obtain the formula. In this case, we identify $\hat{G}$ with $\{\chi_k : k \in \{0,1,2,\ldots, n-1\} \}$, where $\chi_k(x) = \exp(2 \pi i kx)$.
To coordinate with Lee's answer, we will denote the $(k,l)$th entry of $\hat{R}_T$ by $P(k,l)$. Then, $P(k,l) = \frac{1}{n} \sum_{x \in \mathbb{Z}_n} \overline{\chi_k(x)} {\chi_l}(T(x)) = \frac{1}{n} \sum_{x \in \mathbb{Z}_n} \exp ( 2 \pi i (lT(x) - kx)$.
Moreover, we can write $F ( f \circ T) (k) = \hat{R_T}( \hat{f})(k) = \sum_{l \in \{0, \ldots, n-1\}} \hat{f} (l) P(k,l)$.
This is now recognizable as the formula from Lee's answer.
Sanity check 1):
We use the special case $G = \mathbb{Z}_n$ and $T(x) = ax \mod n$ to sanity check the formula. For simplicity let's assume that $a \in \mathbb{Z}_n^*$.
$P(k,l) = \frac{1}{n} \sum_{x \in \mathbb{Z}_n} \exp ( 2 \pi i (lax - kx) = \delta_{la - k}$.
So, $F ( f \circ T) (k) = \sum_{l \in \{0, \ldots, n-1\}} \hat{f} (l) \delta_{la- k} = \hat{f}(ka^{-1})$.
We compute $F ( f \circ T)$ directly: $F( f \circ T)(k) = \sum_{g \in G} f(ag) \overline{ \chi_k(g)} = \sum_{h \in G} f(h) \overline{ \chi_k(a^{-1}g) } = \sum_{h \in G} f(h) \overline{ \chi_{a^{-1} k}(g)} = F(f)(a^{-1} k).$
So that seems good.
Sanity check 2):
$G = \mathbb{Z}_n$ and $T(x) = x + a \mod n$.
Then, $P(k,l) = \frac{1}{n} \sum_{x \in \mathbb{Z}_n} \exp ( 2 \pi i (lT(x) - kx) = \frac{1}{n} \sum_{x \in \mathbb{Z}_n} \exp ( 2 \pi i (l(x + a) - kx) = \delta_{k - l} \exp( 2 \pi i la).$
So, $F ( f \circ T) (k) =\sum_{l \in \{0, \ldots, n-1\}} \hat{f} (l) P(k,l) = \sum_{l \in \{0, \ldots, n-1\}} \hat{f} (l) \delta_{k - l} \exp( 2 \pi i la) = \hat{f}(k) \exp( 2 \pi i ka).$
This is a familiar formula, but to be careful let's calculate it directly:
$F( f \circ T)(k) = \sum_{g \in G} f(g + a) \overline{ \chi_k(g)} = \sum_h f(h) \overline{ \chi_k(h - a)} = ( \sum_h f(h) \overline{ \chi_k(h)} ) \overline{ \chi_k( - a)} = F(f)(k) exp( 2 \pi i k a).$
Seems good. :-)
Specialization to the multi dimensional case:
Let's do the case $G = \mathbb{Z}_n^m$.
We identify $\hat{G} = \{\chi_{k,l} : k \in \{0,1,2,\ldots, n-1\}^m \}$, where $\chi_{k}(x) = \exp(2 \pi i k \cdot x)$.
Then:
$P(k,l) = \frac{1}{n^m} \sum_{x \in \mathbb{Z}_n} \overline{\chi_k(x)} {\chi_l}(T(x)) = \frac{1}{n^m} \sum_{x \in \mathbb{Z}_n} \exp ( 2 \pi i ( l \cdot T(x) - k \cdot x)$
$F ( f \circ T) (k) = \sum_{l \in \{0, \ldots, n-1\}^m} \hat{f}(l) P(k,l)$
Comments on complexity:
Computing the matrix $P(k,l)$ is expensive -- naively it takes time $|G|^3$, although perhaps there's some FFT like dynamic programming trick that improves the speed. I think that in situations where you have an efficient implementation of $R_T$, it would be better to compute the operator $\hat{R_T}$ as the composition $F R_T F^{-1}$, where $F$ is implemented using the FFT. You could also compute the matrix $\hat{R_T} = F R_T F^{-1}$ using fast matrix multiplication, if you could write down the matrix of $R_T$ efficiently, I think this would be $O(|G|^{\omega})$, where $\omega$ is matrix multiplication exponent.
Alternative Derivation:
This is along the lines of the derivation in Bergner et al. 2006 "A Spectral Analysis of Function Composition and Its Implications for Sampling in Direct Volume Visualization."
Notation (same as above): We work in an Abelian group $G$. Let $\hat{G}$ denote the group of characters of $G$, $\hat{G} = \{ \text{group homomorphisms } \chi : G \to \mathbb{C}^* \}$. Additionally, let $F : L^2(G) \to L^2(\hat{G})$ denote the Fourier transform: $F(f)(\chi) = \sum_{g \in G} f(g) \overline{\chi}(g)$ so $f = \frac{1}{|G|} \sum_{\chi \in \hat{G}} F(f)(\chi) \chi$. By basic Fourier analysis, $(F^{-1}(h))(g) = \frac{1}{|G|} \sum_{ \chi \in \hat{G} } h( \chi) \chi(g)$. We'll also write $F(f) = \hat{f}$.
The trick is to write $(f \circ T)(x) = (F^{-1} (F (f)))( T(x)) = \frac{1}{|G|} \sum_{\chi \in \hat{G}} \hat{f}(\chi) \chi(T(x))$.
We then use this formula in the calculation of $F( f \circ T)$:
$F( f \circ T) ( \psi) = \sum_{g \in G} ( f \circ T) (g) \overline{\psi}(g) = \sum_{g \in G} \frac{1}{|G|} \sum_{\chi \in \hat{G}} \hat{f}(\chi) \chi( T(g)) \overline{\psi}(g) = \sum_{\chi \in \hat{G} } \hat{f}(\chi) \frac{1}{|G|} \sum_{g \in G} \chi(T(g)) \overline{\psi}(g).$
Letting $P( \psi, \chi) = \frac{1}{|G|} \sum_{g \in G} \chi(T(g)) \overline{\psi}(g)$, the above becomes $F( f \circ T) ( \psi) = \sum_{\chi \in \hat{G} } \hat{f}(\chi) P( \psi, \chi)$.
This proof has the advantage of being shorter and also working for $\mathbb{R}$, since we no longer have to chase the behavior on basis vectors. This also gives us a third sanity check.
On the other hand, it strikes me as very tricky, unlike the more conceptual "follow your nose" type of calculation above.