52

What is an intuitive proof of the multivariable changing of variables formula (Jacobian) without using mapping and/or measure theory?

I think that textbooks overcomplicate the proof.

If possible, use linear algebra and calculus to solve it, since that would be the simplest for me to understand.

RobPratt
  • 45,619
Victor
  • 8,372
  • 2
    If there was a simpler proof, don't you think the books would use it? – Potato Dec 29 '12 at 20:47
  • 1
    @Potato - Couldn't the author also give the intuitions? – Victor Dec 29 '12 at 20:49
  • What exactly do you want? A different proof, or an intuitive explanation of the standard proof (say, the one that is in Folland). – Potato Dec 29 '12 at 20:50
  • @Victor Are you asking for a proof that doesn't use measure theory or for a simple proof? I don't think you can have both – Nameless Dec 29 '12 at 20:50
  • @Nameless - Any simpler proof that doesn't use the measure theory and/ or linear mapping – Victor Dec 29 '12 at 20:57
  • Sorry for not being helpful. I am interested in which books give proofs in measure theory? – Tim Dec 29 '12 at 20:59
  • 3
    A lengthy proof of the change of variables formula for Riemann integrals in $\mathbb R^n$ (that does not use measure theory) is given in Vector Calculus, Linear Algebra, and Differential Forms: A Unified Approach by Hubbard and Hubbard. A discussion of the intuition behind it is given on page 493. – Potato Dec 29 '12 at 21:04
  • 2
    @Tim A proof for Lebesgue integrals can be found in any standard book on measure theory and integration, including Folland's book. – Potato Dec 29 '12 at 21:14
  • @Potato - Which page on Folland's book? – Victor Dec 29 '12 at 22:23
  • @Victor Page 74, theorem 2.47 in my edition of Real Analysis: Modern Techniques and Their Applications. – Potato Dec 29 '12 at 22:31

5 Answers5

58

To do it for a particular number of variables is very easy to follow. Consider what you do when you integrate a function of x and y over some region. Basically, you chop up the region into boxes of area ${\rm d}x{~\rm d} y$, evaluate the function at a point in each box, multiply it by the area of the box. This can be notated a bit sloppily as:

$$\sum_{b \in \text{Boxes}} f(x,y) \cdot \text{Area}(b)$$

What you do when changing variables is to chop the region into boxes that are not rectangular, but instead chop it along lines that are defined by some function, call it $u(x,y)$, being constant. So say $u=x+y^2$, this would be all the parabolas $x+y^2=c$. You then do the same thing for another function, $v$, say $v=y+3$. Now in order to evaluate the expression above, you need to find "area of box" for the new boxes - it's not ${\rm d}x~{\rm d}y$ anymore.

As the boxes are infinitesimal, the edges cannot be curved, so they must be parallelograms (adjacent lines of constant $u$ or constant $v$ are parallel.) The parallelograms are defined by two vectors - the vector resulting from a small change in $u$, and the one resulting from a small change in $v$. In component form, these vectors are ${\rm d}u\left\langle\frac{\partial x}{\partial u}, ~\frac{\partial y}{\partial u}\right\rangle $ and ${\rm d}v\left\langle\frac{\partial x}{\partial v}, ~\frac{\partial y}{\partial v}\right\rangle $. To see this, imagine moving a small distance ${\rm d}u$ along a line of constant $v$. What's the change in $x$ when you change $u$ but hold $v$ constant? The partial of $x$ with respect to $u$, times ${\rm d}u$. Same with the change in $y$. (Notice that this involves writing $x$ and $y$ as functions of $u$, $v$, rather than the other way round. The main condition of a change in variables is that both ways round are possible.)

The area of a paralellogram bounded by $\langle x_0,~ y_0\rangle $ and $\langle x_1,~ y_1\rangle $ is $\vert y_0x_1-y_1x_0 \vert$, (or the abs value of the determinant of a 2 by 2 matrix formed by writing the two column vectors next to each other.)* So the area of each box is

$$\left\vert\frac{\partial x}{\partial u}{\rm d}u\frac{\partial y}{\partial v}{\rm d}v - \frac{\partial y}{\partial u}{\rm d}u\frac{\partial x}{\partial v}dv\right\vert$$

or

$$\left\vert \frac{\partial x}{\partial u}\frac{\partial y}{\partial v} - \frac{\partial y}{\partial u}\frac{\partial x}{\partial v}\right\vert~{\rm d}u~{\rm d}v$$

which you will recognise as being $\mathbf J~{\rm d}u~{\rm d}v$, where $\mathbf J$ is the Jacobian.

So, to go back to our original expression

$$\sum_{b \in \text{Boxes}} f(x,y) \cdot \text{Area}(b)$$

becomes

$$\sum_{b \in \text{Boxes}} f(u, v) \cdot \mathbf J \cdot {\rm d}u{\rm d}v$$

where $f(u, v)$ is exactly equivalent to $f(x, y)$ because $u$ and $v$ can be written in terms of $x$ and $y$, and vice versa. As the number of boxes goes to infinity, this becomes an integral in the $uv$ plane.

To generalize to $n$ variables, all you need is that the area/volume/equivalent of the $n$ dimensional box that you integrate over equals the absolute value of the determinant of an n by n matrix of partial derivatives. This is hard to prove, but easy to intuit.


*to prove this, take two vectors of magnitudes $A$ and $B$, with angle $\theta$ between them. Then write them in a basis such that one of them points along a specific direction, e.g.:

$$A\left\langle \frac{1}{\sqrt 2}, \frac{1}{\sqrt 2}\right\rangle \text{ and } B\left\langle \frac{1}{\sqrt 2}(\cos(\theta)+\sin(\theta)),~ \frac{1}{\sqrt 2} (\cos(\theta)-\sin(\theta))\right\rangle $$

Now perform the operation described above and you get $$\begin{align} & AB\cdot \frac12 \cdot (\cos(\theta) - \sin(\theta)) - AB \cdot 0 \cdot (\cos(\theta) + \sin(\theta)) \\ = & \frac 12 AB(\cos(\theta)-\sin(\theta)-\cos(\theta)-\sin(\theta)) \\ = & -AB\sin(\theta) \end{align}$$

The absolute value of this, $AB\sin(\theta)$, is how you find the area of a parallelogram - the products of the lengths of the sides times the sine of the angle between them.

DanielV
  • 23,556
Dan
  • 581
  • 4
  • 3
  • 1
    Welcome to math stackexchange. I liked your answer so I marked it up in latex, but please learn latex for future posts. You can see the latex by right clicking on a formula and selecting "show math as", "tex commands". – DanielV Jan 18 '16 at 23:02
  • Best intuitive explanation on the subject I met – John Jul 16 '17 at 11:44
  • There is this proof with pictures of two regions (Domains) in the textbook: "Larson R.,Edwards B.H.Calculus. Early transcendentals. 5Ed." on p.1047 but it is not proof that both areas (or integrals) are the same (remember continuum mechanics where they can differ because of deformation). – Igor Fomenko Oct 30 '18 at 07:43
49

The multivariable change of variables formula is nicely intuitive, and it's not too hard to imagine how somebody might have derived the formula from scratch. However, it seems that proving the theorem rigorously is not as easy as one might hope.

Here's my attempt at explaining the intuition -- how you would derive or discover the formula.

The first thing to understand is that if $A$ is an $N \times N$ matrix with real entries and $S \subset \mathbb R^N$, then $$ \tag{1} m(AS) = |\det A| \, m(S). $$ Here $m(S)$ is the area of $S$ (if $N=2$) or the volume of $S$ (if $N=3$) or more generally the Lebesgue measure of $S$. Technically I should assume that $S$ is measurable. The above equation (1) is intuitively clear from the SVD of $A$: \begin{equation} A = U \Sigma V^T \end{equation} where $U$ and $V$ are orthogonal and $\Sigma$ is diagonal with nonnegative diagonal entries. Multiplying by $V^T$ doesn't change the measure of $S$. Multiplying by $\Sigma$ scales along each axis, so the measure gets multiplied by $\det \Sigma = | \det A|$. Multiplying by $U$ doesn't change the measure.

Next suppose $\Omega$ and $\Theta$ are open subsets of $\mathbb R^N$ and suppose $g:\Omega \to \Theta$ is $1-1$ and onto. We should probably assume $g$ and $g^{-1}$ are $C^1$ just to be safe. (Since we're just seeking an intuitive derivation of the change of variables formula, we aren't obligated to worry too much about what assumptions we make on $g$.) Suppose also that $f:\Theta \to \mathbb R$ is, say, continuous (or whatever conditions we need for the theorem to actually be true).

Partition $\Theta$ into tiny subsets $\Theta_i$. For each $i$, let $u_i$ be a point in $\Theta_i$. Then \begin{equation} \int_{\Theta} f(u) \, du \approx \sum_i f(u_i) m(\Theta_i). \end{equation}

Now let $\Omega_i = g^{-1}(\Theta_i)$ and $x_i = g^{-1}(u_i)$ for each $i$. The sets $\Omega_i$ are tiny and they partition $\Omega$. Then \begin{align} \sum_i f(u_i) m(\Theta_i) &= \sum_i f(g(x_i)) m(g(\Omega_i)) \\ &\approx \sum_i f(g(x_i)) m(g(x_i) + Jg(x_i) (\Omega_i - x_i)) \\ &= \sum_i f(g(x_i)) m(Jg(x_i) \Omega_i) \\ &\approx \sum_i f(g(x_i)) |\det Jg(x_i)| m(\Omega_i) \\ &\approx \int_{\Omega} f(g(x)) |\det Jg(x)| \, dx. \end{align}

We have discovered that \begin{equation} \int_{g(\Omega)} f(u) \, du \approx \int_{\Omega} f(g(x)) |\det Jg(x)| \, dx. \end{equation} By using even tinier subsets $\Theta_i$, the approximation would be even better -- so we see by a limiting argument that we actually have equality.

At a key step in the above argument, we used the approximation \begin{equation} g(x) \approx g(x_i) + Jg(x_i)(x - x_i) \end{equation} which is a good approximation when $x$ is close to $x_i$

littleO
  • 51,938
  • 3
    Can you comment on what makes the rigorous proof more difficult? – Miheer Jul 25 '17 at 20:32
  • 3
    If you can't explain it to a six-year-old, you don't understand it yourself” ALBERT EINSTEIN – Igor Fomenko Oct 30 '18 at 06:59
  • This is lovely. Thank you. – ashman Nov 05 '20 at 03:18
  • Cleaner this becomes if one takes $f=1_E$, i.e. the characteristic function of a measurable set. Note that if you have the C.of.V. formula for characteristic function then you get it for all nonnegative and measurable (or integrable) functions $f$, via approximation by step functions. – Behnam Esmayli Dec 31 '20 at 14:40
  • Why are measures invariant to orthogonal transformations...? – user3180 Jul 05 '21 at 03:49
  • 1
    @user3180 If you think of the mapping $x \mapsto V^Tx$ as changing basis to a different orthonormal basis (the basis consisting of the columns of $V$), then I think it seems intuitive or plausible that applying this mapping should not change of measure of a region. For example, when computing the area of a region in the plane, it should not matter which orthonormal coordinate system you use. – littleO Jul 05 '21 at 05:19
  • Can you justify what you said rigorously with the measure axioms? – user3180 Jul 05 '21 at 05:22
  • Yea, it makes sense when the measure is area or volume, because the orthogonal matrix is a rotation or flip, which should not affect those quantities, but for a general measure, I'm not sure – user3180 Jul 05 '21 at 05:23
  • Is the formula even true for measures other than Lebesgue measure? The determinant formula is specific to Lebesgue measure, right? I was definitely thinking of Lebesgue measure specifically when I wrote this answer. – littleO Jul 05 '21 at 05:35
  • @littleO So the lebesgue measure of a vector is just the L2 norm? So your statement of m(AS) = m(S) is just saying the L2 norm is unaffected by rotation/flip of a vector? – user3180 Jul 05 '21 at 05:47
5

A lengthy proof of the change of variables formula for Riemann integrals in $\mathbb R^n$ (that does not use measure theory) is given in Vector Calculus, Linear Algebra, and Differential Forms: A Unified Approach by Hubbard and Hubbard. A discussion of the intuition behind it is given on page 493.

Potato
  • 40,171
4

The answers here are good but I am tempted to add a part which I think is quite important and something which others haven't talked about from what I see: Namely, why are we allowed to use, in the linearized limit, parallelograms (and hence Jacobian determinants) to approximate areas in the first place.

In fact, whenever you have a general coordinate transformation $(u,v) \to (x(u,v),y(u,v))$ of the plane, you find that you are forced to sum over quadrilaterals instead of parallelograms in general. One can try this by partitioning the uv plane into discrete values (i.e, $(u_i,v_j)$ where $i,j$ run from, say, $1$ to $n$) and seeing the corresponding images $(x(u_i,v_j),y(u_i,v_j))$, connecting these images together by straight lines forces you to sum over quadrilaterals instead of parallelograms (you can try this for yourselves for polar coordinates explicitly!), however one can show that the area of a quadrilateral differs from that of a parallelogram by second order, and hence won't matter in the limit when the partition goes to zero. This leads us directly to the Jacobian determinant and the exterior algebra of differential forms.

Leonid
  • 1,614
3

Let there be some vector function $f(x) = x'$, which can be interpreted as remapping points or changing coordinates. For example, $f(x) = \sqrt{x \cdot x} e_1 + \arctan \frac{x^2}{x^1} e_2$ remaps the cartesian coordinates $x^1, x^2$ to polar coordinates on the basis vectors $e_1, e_2$.

Now, let $c(\tau)$ be a path parameterized by the scalar parameter $\tau$. Let $f(c) = c'(\tau)$ be the image of this path under the transformation. The chain rule tells us that

$$\frac{dc'}{d\tau} = \Big(\frac{dc}{d\tau} \cdot \nabla \Big) f$$

Define $a \cdot \nabla f \equiv \underline f(a)$ as the Jacobian operator acting on a vector $a$, and the equation can be rewritten as

$$\frac{dc}{d\tau} = \underline f^{-1} \Big(\frac{dc'}{d\tau} \Big)$$

(Note that the primes have switched, so we use the inverse Jacobian.)

This is all we need to show that a line integral in the original coordinates is related to a line integral in the new coordinates by using the Jacobian. For some scalar field $\phi$, if $\phi(x) = \phi'(x')$, then

$$\int_c \phi \, d\ell = \int_{c'} \phi' \, \underline f^{-1}(d\ell')$$

because $d\ell'$ can be converted to $\frac{d\ell'}{d\tau} \, d\tau$.

Edit: didn't see the word intuitive. As far as intuitive explanations go, you can think of a coordinate transformation like so. Imagine the lines of a polar coordinate system being warped and stretched so that they become rectangular instead. This makes working with them easier, but because the shapes of coordinate lines, paths, and areas have changed (and because you don't want them to change the result, since changing coordinates should not change the result), the naive errors introduced must be corrected for with a factor of the Jacobian operator.

Muphrid
  • 19,902
  • "Let there be some vector function f(x)=x′, interpreted as remapping points or changing coordinates" can you elaborate on this. Clearly x and x ' are vectors, but when you say changing coordinates are you fixing the point and changing the coordinate axes, (or fixing the axes and moving the points). And what do you mean by " remaps" and the basis vectors e1,e2. A picture might be illustrative here. – john Jan 08 '20 at 10:49