Interchange Summation Indexes - Möbius Inversion Formula

Question

I'm reading a book that prove the first part of Möbius Inversion Formula exactly like this:

($\Rightarrow$) Suppose that

$$f(n)=\sum_{d|n}g(d).$$

Then,

$$\begin{align}\sum_{d|n}\mu(d)f\left(\frac{n}{d}\right)&=\sum_{dd'=n}\mu(d)f\left(d'\right)\\ &=\sum_{dd'=n}\mu(d)\sum_{m|d'}g(m)\\ &=\sum_{dmh=n}\mu(d)g(m)\tag{1.1}\\ &=\sum_{mh'=n}g(m)\sum_{d|h'}\mu(d).\tag{1.2}\end{align}$$

But $\sum_{d|h'}\mu(d)=0$ for $h'>1$. Hence $$\sum_{d|n}\mu(d)f\left(\frac{n}{d}\right)=g(n).$$

I understood everything except why we can go from 1.1 to 1.2. I found similar questions on the internet, but I couldn't understand the answers either.

I'm realizing that this is not the first time that I have not understood how to interchange summation indexes. It might be interesting for me to do a more careful study of summations, so in addition to the proof explanation, I'm looking for books and videos that better explain how summation works and, in particular, how index interchange works.

Feel free to point out mistakes in my writing as well (I'm not used to writing in English).

Thanks in advance.

score 1 · Answer 1 · answered Sep 13 '21 at 15:17

I just want to begin by remarking that your English appears immaculate! I would not have guessed that you're not used to writing in English if you had not said it...

The general idea with sum index-interchanges is just some careful thought about precisely what terms a sum iterates over and why are the same in the new index-order.

If a term is summed in (1.1), then it is $\mu(d)g(m)$ for some triple $(d,m,h)\in\mathbb{N}^3$ such that $dmh = n$. If we were to group $dmh = m(dh) = mh'$ then notice $d|h'$. Therefore, we get a triple $(m,h',d)\in\mathbb{N}^3$ such that $mh' = n$ and $d|h'$ and thus this term appears in the sum (1.2).

Similarly, suppose a term is summed in (1.2) then it's $g(m)\mu(d)$ for some triple $(m,h',d)\in\mathbb{N}^3$ such that $mh' = n$ and $d|h'$. But then from this triple we can form the new triple $(d,m,h'/d)\in\mathbb{N}^3$ such that $d\cdot m\cdot (h'/d) = n$. Thus this term appears in the sum (1.1).

That's all there really is to sum index-interchange: there are finitely many terms in both cases and we sum over the same set in both cases but the underlying set is represented differently in some way. We just need to see that the set is the same in both sums; in the above I showed that by showing that the set of summed terms in (1.1) and (1.2) are subsets of each other and so the same. After a while, you'll get used to this and you won't need to write out something like the above and instead just "see" it but for now perhaps try to write out the argument when you find a tricky index change.

Thank you very much! I think I understood. I usually need some time to fully understand something. In that case maybe I still have to test numerically to prove to myself that it works. — Tokuchi Toua, Sep 13 '21 at 16:21

Markus Scheuer · Answer 2 · 2021-09-18T12:13:20.617

We look at this formula somewhat more detailed with focus on symmetry. At first we consider the Dirichlet convolution \begin{align*} \color{blue}{\left(\mu\ast f\right)(n)}&:=\sum_{d|n}\mu(d)f\left(\frac{n}{d}\right) =\sum_{dd^{\prime}=n}\mu(d)f\left(d^{\prime}\right)\tag{1}\\ &=\sum_{dd^{\prime}=n}\mu\left(d^{\prime}\right)f(d)=\sum_{d|n}f(d)\mu\left(\frac{n}{d}\right)\\ &\,\,\color{blue}{=\left(f\ast\mu\right)(n)} \end{align*}

Note that (1) is also the start of OPs formula. By using symmetries we see the Dirichlet convolution is commutative.

We introduce the constant function $u$ with $u(n)=1, n\in\mathbb{N}$, which enables us to put the focus somewhat more on symmetries. We can write this way OPs relation between $f$ and $g$ as \begin{align*} \color{blue}{f(n)}&=\sum_{d|n}g(d)=\sum_{d|n}g(d)\cdot 1\\ &=\sum_{d|n}g(d)u\left(\frac{n}{d}\right)=\sum_{dd^{\prime}=n}g(d)u\left(d^\prime\right)\tag{2}\\ &\,\,\color{blue}{=\left(g\ast u\right)(n)} \end{align*}

Note that (1) and (2) show the same form.

We obtain from (1) and (2) \begin{align*} \color{blue}{\left(\mu\ast f\right)(n)}&:=\sum_{d|n}\mu(d)f\left(\frac{n}{d}\right) =\sum_{dd^{\prime}=n}\mu(d)f\left(d^{\prime}\right)\tag{3.1}\\ &=\sum_{dd^{\prime}=n}\mu(d)\sum_{m|d^{\prime}}g(m)u\left(\frac{d^{\prime}}{m}\right)\tag{3.2}\\ &=\sum_{dd^{\prime}=n}\mu(d)\sum_{mm^{\prime}=d^{\prime}}g(m)u\left(m^{\prime}\right)\tag{3.3}\\ &\color{blue}{=\sum_{dmm^{\prime}=n}\mu(d)g(m)u\left(m^{\prime}\right)}\tag{3.4}\\ &=\sum_{md^{\prime}=n}g(m)\sum_{dm^{\prime}=d^{\prime}}\mu(d)u\left(m^{\prime}\right)\tag{3.5}\\ &=\sum_{md^{\prime}=n}g(m)\sum_{d|d^{\prime}}\mu(d)u\left(\frac{d^{\prime}}{d}\right)\tag{3.6}\\ &=\sum_{m|n}g(m)\sum_{d|d^{\prime}}\mu(d)\tag{3.7}\\ &\,\,\color{blue}{=g(n)} \end{align*}

Comments:

In (3.1) we use the representation from (1).
In (3.2) we use the representation from (2).
Note the symmetry between (3.3) with (3.5) and the symmetry between (3.2) and (3.6) which nicely show the relationship between these transformations.
In (3.7) we use that $u$ is the constant function $u= 1$ and obtain the representation as in OPs derivation.

Notes:

The arithmetic unit function $I(n):=\begin{cases}1&\quad n=1\\0&\quad \text{else}\end{cases}\qquad$ fulfills \begin{align*} f\ast I=f=I\ast f \end{align*}
We also have the relationship $\mu\ast u=I=u\ast \mu$.
OPs derivation above can be written using the associativity of the Dirichlet convolution as \begin{align*} \color{blue}{\mu\ast f}&=\mu\ast\left(g\ast u\right) =\mu\ast\left(u\ast g\right) =\left(\mu\ast u\right)\ast g =I\ast g \color{blue}{=g} \end{align*}

This derivation and many more relations of arithmetic functions are nicely and thoroughly explained in the classic Introduction to Analytic Number Theory by T. M. Apostol.

Calum Gilhooley · Answer 3 · 2021-09-19T16:15:57.937

The step from (1.1) to (1.2) can be split into three smaller steps, each similar to one of three steps into which the step before (1.1) can be split:

\begin{align*} & \phantom{{}={}} \sum_{dd'=n}\mu(d)\sum_{m|d'}g(m) \\ & = \sum_{dd'=n}\mu(d)\sum_{mh=d'}g(m) \\ & = \sum_{dd'=n}\;\sum_{mh=d'}\mu(d)g(m) \\ & = \sum_{dmh=n}\mu(d)g(m) \tag{1.1} \\ & = \sum_{mh'=n}\;\sum_{dh=h'}\mu(d)g(m) \\ & = \sum_{mh'=n}g(m)\sum_{dh=h'}\mu(d) \\ & = \sum_{mh'=n}g(m)\sum_{d|h'}\mu(d) \tag{1.2} \end{align*}

If the step before (1.1) was already clear, then the step from (1.1) to (1.2) should now also be clear.

That does not mean that even the smaller steps are trivial, however! You may still have questions about the validity of such arguments in general. If so, then you are right to do so, because it is not at all easy to find references on the subject, and proofs of general results of this type ought to be seen by everybody at least once in their lives (even if only then to be quickly forgotten).

Each of the smaller steps can be justified by a general rule for calculating with finite sums. (If I haven't slipped up, that is. My concentration is not great at the moment, so please do point out any steps that remain obscure, and I'll try to fix them.) Most of these general rules are spelled out by Terence Tao in Analysis I (2006), section 7.1.

There is sometimes a need for a more general version of Tao's rule 7.1.11(e). The only references I have for such a general rule (one is by Bourbaki, the other was written by me as an answer in Maths.SE last year) are both pretty hard to read. Fortunately, all that is needed in many cases is Tao's slightly less general Lemma 7.1.13, or Corollary 7.1.14: "Fubini's theorem for finite series". Less fortunately, it is the more general rule that is needed twice in the above derivation, to justify the steps immediately before and after (1.1).

I won't state such a rule in full generality, because I went over the top doing so in an answer to my question on the subject last year (and even that could be generalised a bit more). A slimmed-down version of the rule will do. Let $K$ be a finite set, $(A_k)_{k \in K}$ a pairwise disjoint family of finite sets, $B = \bigcup_{k \in K}A_k,$ and $u \colon B \to \mathbb{C}$ a function. Then $$ \sum_{b \in B}u(b) = \sum_{k \in K}\sum_{a \in A_k}u(a). $$

Let $n$ be a positive integer, and define: \begin{align*} K & = \{m \in \mathbb{N} : m \mid n\}, \\ A_m & = \{(m, d, h) \in \mathbb{N}^3 : mdh = n\} && (m \in K), \\ B & = \{(m, d, h) \in \mathbb{N}^3 : mdh = n\}, \\ u(m, d, h) & = \mu(d)g(m) && ((m, d, h) \in B). \end{align*} Then the rule just given (in conjunction with some simpler rules that I haven't stated, but that are stated in Tao's book) justifies the step immediately after (1.1).

If I can find time (it's not likely to happen today), I'll try to expand this answer by quoting all of the other rules that have implicitly been appealed to in the above derivation, and showing how they can be applied explicitly. (That is, unless someone begs me to stop!) :)

Three rules have been applied. They are:

($\alpha$) Let $A$ and $B$ be finite sets, $\theta \colon A \to B$ a bijection, and $u \colon B \to \mathbb{C}$ a function. Then $$ \sum_{a \in A}u(\theta(a)) = \sum_{b \in B}u(b). $$

($\beta$) Let $A$ be a finite set, $u \colon A \to \mathbb{C}$ a function, and $c$ a complex number. Then $$ \sum_{a \in A}cu(a) = c\sum_{a \in A}u(a). $$

($\gamma$) Let $K$ be a finite set, $(A_k)_{k \in K}$ a pairwise disjoint family of finite sets, $B = \bigcup_{k \in K}A_k,$ and $u \colon B \to \mathbb{C}$ a function. Then $$ \sum_{b \in B}u(b) = \sum_{k \in K}\sum_{a \in A_k}u(a). $$

The derivation has six steps.

The first step uses rule ($\alpha$), with \begin{align*} A & = \{(m, h) \in \mathbb{N}^2 : mh = d'\}, \\ B & = \{m \in \mathbb{N} : m \mid d'\}, \\ \theta(m, h) & = m && ((m, h) \in A), \\ u(m) & = g(m) && (m \in B). \end{align*}

The second step uses rule ($\beta$), with \begin{align*} A & = \{(m, h) \in \mathbb{N}^2 : mh = d'\}, \\ u(m, h) & = g(m) && ((m, h) \in A), \\ c & = \mu(d). \end{align*}

The third step uses rule ($\gamma$), with \begin{align*} K & = \{d \in \mathbb{N} : d \mid n\}, \\ A_d & = \{(m, d, h) \in \mathbb{N}^3 : mdh = n\} && (d \in K), \\ B & = \{(m, d, h) \in \mathbb{N}^3 : mdh = n\}, \\ u(m, d, h) & = \mu(d)g(m) && ((m, d, h) \in B). \end{align*}

At the same time, the third step uses rule ($\alpha$), with \begin{align*} A & = \{(m, h) \in \mathbb{N}^2 : mh \mid n\}, \\ B & = \{(m, d, h) \in \mathbb{N}^3 : mdh = n\}, \\ \theta(m, h) & = (m, n/(mh), h) && ((m, h) \in A), \\ u(m, d, h) & = \mu(d)g(m) && ((m, d, h) \in B). \end{align*}

If this is confusing, it is because the notation in use, although it is compact, convenient, and perfectly standard, does not permit the application of rule ($\alpha$) to be represented explicitly in this instance. (But I'll think some more about this, and edit the answer again, if I or someone else can come up with a clearer way to represent this step of the derivation. The important thing is to represent it in terms of general rules in some way; but this may not be the best way.)

The fourth step also uses rule ($\gamma$), as was explained in the part of the answer written yesterday.

At the same time, it uses rule ($\alpha$), with \begin{align*} A & = \{(d, h) \in \mathbb{N}^2 : dh \mid n\}, \\ B & = \{(m, d, h) \in \mathbb{N}^3 : mdh = n\}, \\ \theta(d, h) & = (n/(dh), d, h) && ((d, h) \in A), \\ u(m, d, h) & = \mu(d)g(m) && ((m, d, h) \in B). \end{align*}

The fifth step uses rule ($\beta$), with \begin{align*} A & = \{(d, h) \in \mathbb{N}^2 : dh = h'\}, \\ u(d, h) & = \mu(d) && ((d, h) \in A), \\ c & = g(m). \end{align*}

The sixth and final step uses rule ($\alpha$), with \begin{align*} A & = \{(d, h) \in \mathbb{N}^2 : dh = h'\}, \\ B & = \{d \in \mathbb{N} : d \mid h'\}, \\ \theta(d, h) & = d && ((d, h) \in A), \\ u(d) & = \mu(d) && (d \in B). \end{align*}

Of course, one can work through such derivations intuitively and confidently without splitting them up into tiny steps like this! But it helps to demystify the logic of such arguments if one knows they can be reduced to applications of a few simple and general rules, whose truth is entirely obvious. (Their proofs need not be remembered, although they should be seen at least once.) And where difficulties arise, I think it can be of practical help to make some of the applications of the rules explicit, even if not at this painful level of detail.

Interchange Summation Indexes - Möbius Inversion Formula

3 Answers3