9

I've always been baffled as to where transposes come from. I found this question, but the answer isn't satisfying to me - the idea seems to be "dual spaces are important, and you can define transposes using those". This leaves two questions:

  1. Why are dual spaces important?
  2. Whatever it is that we want to do with dual spaces, how does the transpose help us accomplish that?

For point (1) my linear algebra teacher told me something else that I didn't find quite satisfying, which is that if you're interested in linear transformations, then dual spaces are the "simplest kind" of linear transformation. This is quite vague though... what actual problems might we want to solve in which the concept of the dual space would arise naturally? And how would the concept of a transpose arise naturally from those?

Jack M
  • 27,819
  • 7
  • 63
  • 129

5 Answers5

5

One way to motivate dual spaces and transposes is to consider differentiation of scalar-valued functions of several variables. The basic point is that functionals are the easiest functions to deal with short of constant functions, so that differentiation is essentially approximation by a unique functional such that the error in the approximation is sufficiently well behaved. Moreover, transposes arise naturally when differentiating, say, the composition of a scalar-valued function with a change of coordinates.

Let $f : (a,b) \to \mathbb{R}$. Conventionally, one defines $f$ to be differentiable at $x \in (a,b)$ if the limit $$ \lim_{h \to 0} \frac{f(x+h)-f(x)}{h} $$ exists, in which case the value of that limit is defined to be the derivative $f^\prime(x)$ of $f$ at $x$. Observe, however, that this definition means that for $h$ small enough, $$ f(x+h)-f(x) = f^\prime(x)h + R_x(h), $$ where $h \to f^\prime(x)h$ defines a linear transformation $df_x :\mathbb{R} \to \mathbb{R}$ approximating $f$ near $x$, and where the error term $R_x(h)$ satisfies $$ \lim_{h \to 0} \frac{R_x(h)}{h} = 0. $$ In fact, $f$ is differentiable at $x$ if and only if there exists a linear transformation $T : \mathbb{R} \to \mathbb{R}$ such that $$ \lim_{h \to 0} \frac{\lvert f(x+h) - f(x) - T(h) \rvert}{\lvert h \rvert} = 0, $$ in which case $df_x := T$ is unique, and given by multiplication by the scalar $f^\prime(x) = T(1)$.

Now, let $f : U \to \mathbb{R}^m$, where $U$ is an open subset of $\mathbb{R}^n$. Then, we can still perfectly define $f$ to be differentiable at $x \in U$ if and only if there exists a linear transformation $T : \mathbb{R}^n \to \mathbb{R}^m$ such that $$ \lim_{h \to 0} \frac{\| f(x+h) - f(x) - T(h) \|}{\|h\|} = 0, $$ in which case $df_x := T$ is unique; in particular, for $\|h\|$ small enough, $$ f(x+h) - f(x) = df_x(h) + R_x(h), $$ where $df_x$ gives a linear approximation of $f$ near $x$, such that the error term $R_x(h)$ satisfies $$ \lim_{h \to 0} \frac{R_x(h)}{\|h\|} = 0. $$

At last, let's specialise to the case where $f : U \to \mathbb{R}$, i.e., where $m=1$. If $f$ is differentiable at $x$, then $df_x : \mathbb{R}^n \to \mathbb{R}$ is linear, and hence $df_x \in (\mathbb{R}^n)^\ast$ by definition. In particular, for any $v \in \mathbb{R}^n$, the directional derivative $$ \nabla_v f(x) := \lim_{\epsilon \to 0} \frac{f(x+\epsilon v) - f(x)}{\epsilon} $$ exists and is given by $$ \nabla_v f(x) = (d_x f)(v). $$ Moreover, the gradient of $f$ at $x$ is exactly the unique vector $\nabla f(x) \in \mathbb{R}^n$ such that $$ \forall v \in \mathbb{R}^n, \quad (d_x f)(v) = \langle \nabla f(x), v \rangle. $$ In any event, the derivative of a scalar-valued function of $n$ variables at a point is most naturally understood as a functional on $\mathbb{R}^n$, i.e., as an element of $(\mathbb{R}^n)^\ast$.

Now, suppose, for simplicity, that $f : \mathbb{R}^n \to \mathbb{R}$ is everywhere-differentiable, and let $S : \mathbb{R}^p \to \mathbb{R}^n$ be a linear transformation, e.g., a coordinate change $\mathbb{R}^n \to \mathbb{R}^n$. Then $f \circ S$ is indeed everywhere differentiable with derivative $$d_y(f \circ S) = (d_{Sy} f) \circ S = S^t d_{Sy} f,$$ at $y \in \mathbb{R}^p$. On the one hand, if $S = 0$, then $f \circ S = f(0)$ is constant, so that $d_y(f \circ S) = 0 = S^t d_{Sy} f$, as required. On the other hand, if $S \neq 0$, so that $$ \|S\| := \sup_{k \neq 0} \frac{\|Sk\|}{\|k\|} > 0, $$ it follows that $$ \frac{\|(f \circ S)(y+k)-(d_{Sy} f \circ S)(k)\|}{\|k\|} = \frac{\|f(Sy + Sk) - d_{Sy}f(Sk)\|}{\|k\|} \leq \|S\|\frac{\|f(Sy + Sk) - d_{Sy}f(Sk)\|}{\|Sk\|} \to 0, \quad k \to 0 $$ by differentiability of $f$ at $Sy$ and continuity of the map $$ k \mapsto \|S\|\frac{\|f(Sy + Sk) - d_{Sy}f(Sk)\|}{\|Sk\|}. $$ More concretely, once you know that $f \circ S$ is differentiable everywhere, then for each $v \in \mathbb{R}^n$, by linearity of $S$, $$ (f \circ S)(y + \epsilon v) = f(Sy + \epsilon Sk), $$ so that, indeed $$ \left(d_y(f \circ S)\right)(k) = \nabla_k(f \circ S)(y) = \nabla_{Sk}f(Sy) = (d_{Sy}f)(Sk) = (S^t d_{Sy}f)(k). $$ In general, if $S : \mathbb{R}^p \to \mathbb{R}^n$ is everywhere differentiable (again, for simplicity), then $$ d_y (f \circ S) = (d_{Sy}f) \circ d_y S = (d_y S)^t d_{Sy}f, $$ which is none other than the relevant case of the chain rule.

2

Dual spaces are important becuase they occur all over the place. The map that assigns the value $p(1)$ to the polynomial $p(t)$ is an element of the dual space of the space of polynomials. The trace map on $n\times n$ matrices is an element of the dual space of the space of matrices. The map sending a vector of length $n$ to the sum of the entries is an element of the dual space of $\mathbb{R}^n$.

So we are using dual spaces all the time, and it can be useful to recognize the fact.

The tranpose arises because each linear map $L$ from $U$ to $V$ determines a linear map from the dual space $W^*$ to the dual space $V^*$. In finite dimensions we can identify a vector space and its dual, and if we do this right, the dual of a linear map is the transpose.

Chris Godsil
  • 13,703
1

Do you understand the concept of an inner-product? It is the abstraction of the notion of the standard dot-product you are used to in vector calculus. If you are working in a Hilbert space $\mathscr{H}$ -- that is, a vector space that comes equipped with an inner-product $\langle \cdot, \cdot \rangle$ which is complete (completeness is a topological condition so that all Cauchy sequences converge) -- then there is a famous theorem by Riesz which states that that there is a natural duality between $\mathscr{H}$ and its (continuous) dual space $\mathscr{H}^*$ through the inner-product $\langle \cdot, \cdot \rangle$; this duality is given by $y \in \mathscr{H}$ is dual to $l_y \in \mathscr{H}^*$ where $l_y(x) = \langle y, x\rangle$. For a vector $y \in \Bbb{R}^n$, the operator $l_y$ is just the transpose of $y$ (when we consider the inner-product on $\Bbb{R}^n$ to be just the standard dot-product).

Perhaps the abstraction of what I wrote above is convoluting how this is a physically important construct. For one, in quantum mechanics, if you have ever use Dirac's "bra-c-ket" notation, you often write $\langle \varphi |$ and $|\varphi \rangle$ as vectors where $|\varphi\rangle$ is a square-integrable wave-function representing some physical probability density (all such function create a Hilbert space with a natural inner-product $\langle \cdot, \cdot \rangle$) and $\langle \varphi |$ is the dual vector of $|\varphi \rangle$ through the Riesz duality described above (that is, $\langle \varphi |$ is like an abstract version of the transpose of $|\varphi\rangle$). In particular, most observables can be represented by operators $H$ on this Hilbert space, and you calculate meaningful physical values through expressions such as $\langle \psi | H | \varphi \rangle$ which is suggestively the same as $\langle \psi, H \varphi \rangle$ in the inner-product notation; such expressions (perhaps subtly) are using dual spaces and abstract transposes.

Another example might come from geometry, where if we are looking at mechanics on some curved space (a Riemannian manifold), at each point $x$ there is a vector space $V$ representing all the velocity vectors of possible paths of particles traveling through the point $x$ on the curved space. In this setting (of a Riemannian manifold), there is an inner-product on $V$ which again gives us the duality between $V$ and its dual $V^*$ (again, by an abstract notion of transposition). As mentioned, $V$ represents velocity vectors of the particle moving through the point $x$, but the dual $V^*$ represents the momentum of that particle, and certainly momentum is a meaningful quantity.

Tom
  • 9,978
  • Assuming a fixed inner product is probably the worst possible start to explain the importance of the dual space. – Marc van Leeuwen Nov 02 '14 at 12:10
  • @MarcvanLeeuwen If you feel like my response is unusual, I will gladly remove it. I do think that most often in $\Bbb{R}^n$ the use of a transpose $v^T$ of a vector $v$ is secretly identifying $v^T$ as the corresponding dual element to $v$, which of course is making the identification with the fixed Euclidean inner-product. So I don't believe that it is so unreasonable... – Tom Nov 02 '14 at 17:00
1

Let's say you have a vector space with an inner product--that is, for two vectors $a, b$ there is a scalar $a \cdot b$.

Let $w$ be some vector; then we can define a linear map $\omega(a) = a \cdot w$ for any $a$. The map $\omega$ is an element of the dual space.

One application of such maps is a projection onto a coordinate basis. To write a vector $a$ as a linear combination of basis vectors $e_1, e_2, \ldots$, one computes the vectors $e^1, e^2, \ldots$ such that $e^1$ is orthogonal to $e_2, e_3, \ldots$ and so on. Then the map $a \mapsto a \cdot e^1$ gives us the correct coefficient: $a = (a \cdot e^1) e_1 + (a \cdot e^2) e_2 \ldots$. Again, that map is an element of the dual space.

Now, given our map $\omega$ above, what happens if we feed in $T(a)$, where $T$ is some linear operator that maps vectors to vectors?

We would get something like

$$(\omega \circ T)(a) = w \cdot T(a)$$

Is there another element $\chi$ of the dual space--remember, that's a map--that has the simpler form

$$\chi(a) = x \cdot a = w \cdot T(a)$$

for some vector $x$? Yes, there is: a common definition of the transpose in a metric space is merely that $w \cdot T(a) = T^\dagger(w) \cdot a$. Then we would get

$$\chi(a) = T^\dagger(w) \cdot a$$

And we could say that $\chi$ itself is just the result of some linear transformtion on $\omega$. That is, let $T^*$ denote some analogous linear operator on the dual space, and we get

$$\chi = T^*(\omega)$$

I've used two different notations for transpose here--$T^*$ for the transpose whose domain is elements of the dual space, and $T^\dagger$ for the transpose whose domain is elements of the base vector space. Just based on what their inputs are, you should be convinced that these must be different concepts, but they are very closely related, and in a metric space, it would be easy and common to work with only the one that's more convenient. It's actually rather redundant to keep the distinction between dual vectors and ordinary vectors.

That's no longer true when you get to more general, nonmetric spaces, in which the dual vectors no longer correspond canonically to any particular ordinary vectors. Thus, you can't write the maps that are elements of dual space in terms of dot products without a metric, but the maps themselves can still exist.

Muphrid
  • 19,902
0

The dual space contains linear functionals used to get numeric description of the vectors, most importantly, the one-forms $\beta^i$ reading off the coordinates of a vector in a chosen basis $\{a_i\}$: $\beta^i(v)=v^i$.

The set $\{\beta^i\}$ form the dual basis and any linear functional $\omega$ can be expressed as a sum $\omega_i\beta^i$. Thus, $\omega(v)=\omega_iv^i=wv$, where we reuse the symbols to also denote their coordinate representations: row-vector $\omega$ and column-vector $v$, their matrix product then being just $\omega v$.

Now, let $v=f(u)$ or in matrix form $v=Au$. We define map transpose $f^*$ as a "pull-back" of any $\omega$ by means of map composition, $f^*\omega=\omega\circ f$, or in matrix notation:

$$\omega v=\omega(Au)=(\omega A)u.$$

Where is the expected $A^T$? A quote from wikipedia: "as $f$ is represented by $A$ acting on the left on column vectors, $f^*$ is represented by the same matrix acting on the right on row vectors." However, if there is an inner product which identifies the space of column vectors with the dual space of row vectors, then we would be writing (again abusing the same symbol $\omega$ now to denote the double-dual column-vector),

$$\omega^T v=\omega^T(Au)=(\omega^T A)u=(A^T\omega)^Tu,$$ -- a familiar definition of map transpose through inner product. I hope @Marc van Leeuwen approves it that I didn't start from inner product but finished with it instead.

rych
  • 4,205