1

Here is a link to a long discussion regarding generalization of $\operatorname{sign}z$ function to dual numbers.

There are basically two proposed versions:

  1. $\operatorname{sign}(a+\varepsilon b) = \operatorname{sign}(a) + 2 b \delta(a) \varepsilon$ - proposed by user M.G. in their answer.

  2. $\operatorname{sign}(a+\varepsilon b) = \operatorname{sign}(a) + 2 \operatorname{sign}(b) \delta(a) \varepsilon$ - this version is mine

The test results with matrices in Mathematica are inconclusive as they give different results depending on whether $b$ is variable or a numerical constant (link to the question on Mathematica.SE).

The version (1) obviously does not keep the very important property of the Sign function: $\operatorname{sign} (u v)=\operatorname{sign} u\cdot \operatorname{sign} v$.

Another my point is that for $b>0$, $\operatorname{sign} x=\operatorname{sign} bx$, so the right-hand side should not depend on $b$ except for its sign.

I do not understand the arguments raised against my version (that it breaks with scaling and change of basis). Can anyone please outline in a simple language the possible arguments against my generalization or better explain the existing ones? Why it would not work?

Ѕᴀᴀᴅ
  • 34,263
Anixx
  • 9,119
  • even if $\mathrm{sign}(bx) = b,\mathrm{sign}(x)$, for $b>0$, in the version of M.G., you should rather see the $\epsilon$ as something infinitely small and so if you do a Taylor expansion it is coherent that you will get a multiplication by $b$ because $$ f(x+\epsilon,b) = f(x) + b,\epsilon,f'(x) + O(\epsilon^2) $$ so it makes sense at least very formally – LL 3.14 Nov 09 '21 at 04:03
  • @LL3.14 what's the point of doing Taylor expansion of a non-analytic function? – Anixx Nov 09 '21 at 04:05
  • I don't know ... but it just a natural way to see appear the multiplication by $b$ ... indicating that the dimension kind of makes sense? – LL 3.14 Nov 09 '21 at 04:06
  • @LL3.14 sign function is invariant against horizontal stretching, so it does not depend on the coefficient of the argument... – Anixx Nov 09 '21 at 04:08
  • @LL3.14 in dual numbers there is a formula $f(a+\varepsilon)=f(a)+f'(a)\varepsilon$. If the function is invariant against stretching, so should its derivative, no? – Anixx Nov 09 '21 at 04:13
  • @LL3.14 I fixed it. The formula in Wikipedia is obtained via Taylor expansion... – Anixx Nov 09 '21 at 04:16
  • Yes, your formula is equivalent. Applying your formula to $g(x) = f(bx)$ gives $$ f(a+\varepsilon b) = g(a/b+\varepsilon) = g(a/b) + \varepsilon g'(a/b) = f(a) + \varepsilon b f'(a) $$ And no, if a function $u$ is invariant under stretching, then $u'(ax) = (u(ax))'/a = (u(x))'/a = u'(x)/a$, so $u'$ is not invariant under stretching. – LL 3.14 Nov 09 '21 at 04:25
  • @LL3.14 in your last step you used chain rule, which does not work with sign function because $\operatorname{sign} x=\operatorname{sign} bx$, and so, $(\operatorname{sign} x)'=(\operatorname{sign} bx)'$ (for $b>0$). – Anixx Nov 09 '21 at 04:28
  • @LL3.14 and $f=g=\text{sign}$, of course. – Anixx Nov 09 '21 at 04:37
  • @LL3.14 in your example $g(x)=f(x/b)$. If $f(x)=\operatorname{sign}(x)$, then $f(x)=g(x)$. – Anixx Nov 09 '21 at 06:34
  • @LL3.14 This is your formula: $f(a+\varepsilon b) = g(a/b+\varepsilon) = g(a/b) + \varepsilon g'(a/b)$. If $f(x)=\operatorname{sign}(x)$ and $b>0$, then $f(x)=g(x)$. So, changing $g$ to $f$ n your formula, we have $f(a+\varepsilon b)=f(a/b)+\varepsilon f'(a/b)=f(a)+\varepsilon f'(a)$. – Anixx Nov 09 '21 at 06:44
  • Yes, but that do not tells you what happens if you have $\epsilon b$ instead of $b$ ? Notice that $\mathrm{sign}(a+\epsilon b) ≠ \mathrm{sign}(a+\epsilon)$, right? – LL 3.14 Nov 09 '21 at 06:45
  • @LL3.14 if $b>0$, $f(a+\varepsilon b)=f(a/b+\varepsilon)=f(a/b)+\varepsilon f'(a/b)=f(a)+\varepsilon f'(a)$ – Anixx Nov 09 '21 at 06:49
  • "Notice that sign(a+ϵb)≠sign(a+ϵ), right?" - only if b is not positive, otherwise it is equal. – Anixx Nov 09 '21 at 06:50
  • 1
    Again, no, $f'(a/b) ≠ f'(a)$ since $f'(a/b) = \delta_0(a/b)$ and $f'(a) = \delta_0(a)$ is you prefer this kind of proof – LL 3.14 Nov 09 '21 at 06:51
  • @LL3.14: my objection was simply about the fact that $\mathbb{R}[X]/(X^2) = \mathbb{R}[rX]/((rX)^2)$ are literally the same $\mathbb{R}$-algebras for any $r \in \mathbb{R} \setminus 0$, but the 2nd definition produces quantitatively different results b/w both cases. Algebraically, there is no intrinsic way to pinpoint $\varepsilon$ beyond being any of the infinitely many solutions to $X^2 = 0$. It's the same with $X^2 = -1$, it's not possible to distinguish b/w $\pm i$ algebraically, one can only fix a choice. – M.G. Nov 19 '21 at 13:09
  • @M.G. thanks. Can you make somewhat more extended answer on this? I really want to understand this and whether workarounds are possible. What if we pinpoint $\varepsilon$ analytically or add properties/axioms beyond $\varepsilon^2=0$? – Anixx Nov 20 '21 at 15:24
  • @M.G. as an example, I was thinking about defining $\varepsilon$ as $e^{-\frac12\int_0^1\frac1x dx}$. If defined so, via divergent integral, it is distinguished from $2\varepsilon$. https://mathoverflow.net/questions/406927/did-anyone-ever-propose-an-analytic-definition-of-zero-divisors-including-nilpo – Anixx Nov 20 '21 at 15:31
  • @Anixx: I am not sure how else to elaborate on this. We say that the dual numbers are $\mathbb{R}[\varepsilon]$ with $\varepsilon^2 = 0$, but it could also be $\mathbb{R}[\epsilon]$ with $\epsilon = r\varepsilon$. But which epsilon is the correct one? $\varepsilon = 1/r \epsilon$, so it is just as much a scaled version as $\varepsilon$ as is $\varepsilon$ of $\epsilon$. But maybe $\varepsilon = s \varepsilon_2$, if you get my meaning. There is no canonical choice. The only canonical operation is writing a general element into an eigenvalue and a nilpotent part. (cont.) – M.G. Nov 20 '21 at 16:06
  • Because the eigenvalue does not depend on the choice of basis and hence the decomposition into eigenvalue and nilpotent part itself does not depend on the choice of basis. For example, take the dual numbers in the form of the algebra of upper-triangular 2x2 matrices with same-entry-diagonal. Then take your favourite general invertible matrix and conjugate the algebra by this invertible matrix to get something completely different, but isomorphic. How do you decide then what is $\varepsilon$? – M.G. Nov 20 '21 at 16:09
  • @M.G. well, if I understand it correctly, it follows from what you said that if my generalization $\operatorname{sign}(a+\varepsilon b) = \operatorname{sign}(a) + 2 \operatorname{sign}(b) \delta(a) \varepsilon$ is good, then another generalization $\operatorname{sign}(a+\varepsilon b) = \operatorname{sign}(a) + 2 s \operatorname{sign}(b) \delta(a) \varepsilon$ ($s>0$) would be as good. But why we cannot just postulate the first formula? – Anixx Nov 20 '21 at 16:10
  • Anyway, the only way to somehow pinpoint $\varepsilon$ I can think of is by means of equipping the algebra with a some norm (maybe even a submultiplicative norm), but I don't think this would work beyond the dual numbers, that is, I don't think it would work if your algebra has elements of higher nilpotency order. – M.G. Nov 20 '21 at 16:13
  • @M.G. by the way, the decidability of $i$ versus $-i$ depends on the way you introduce the complex numbers. If via $i^2=-1$, then yes, it is undecidable. If via pairs of real numbers $(a,b)$, then $i$ versus $-i$ is perfectly decidable. The same way, if we introduce the duals not as $\varepsilon^2=0$ but as a matrix $\varepsilon=\left( \begin{array}{cc} 0 & 1 \ 0 & 0 \ \end{array} \right)$, the question of $\varepsilon$ versus $s\varepsilon$ is perfectly decidable. – Anixx Nov 20 '21 at 16:18
  • @Anixx: Sure, but this is a choice of basis. While it is sometimes convenient to work with this kind of basis of the complex numbers, it's actually not required at all. Complex analysis and complex geometry work just fine in a coordinate-free manner (as they should!). All one needs to know is the existence of an element $i$ with $i^2=-1$. Sure, you can postulate the formula that way, but only after you've fixed a choice of basis, specifically a choice for $\varepsilon$. With a different choice of $\varepsilon$, it will produce a different result, even though the input is the same. (Cont) – M.G. Nov 20 '21 at 16:27
  • Analysis (and geometry) should remain valid even without coordinates, i.e. without choice of basis, which is not the case with your proposed formula. – M.G. Nov 20 '21 at 16:29
  • @M.G. "With a different choice of ε, it will produce a different result, even though the input is the same." - yes. Definitely. But why this should be avoided? Yes, the formula, applied to $\varepsilon=\left( \begin{array}{cc} 0 & 1 \ 0 & 0 \ \end{array} \right)$ give different result from the formula applied to $\epsilon=\left( \begin{array}{cc} 0 & 2 \ 0 & 0 \ \end{array} \right)$. Why this is bad? – Anixx Nov 20 '21 at 16:38
  • @M.G. "Analysis (and geometry) should remain valid even without coordinates, i.e. without choice of basis, which is not the case with your proposed formula." - I do not understand, why you insist that this principle is important. For instance, it definitely does not work in divergent integrals, where $\int_0^\infty dx\ne \int_0^\infty d(2x)$. So, yes, $x$ should be thought here not as some arbitrary variable, but a special element, identity function. – Anixx Nov 20 '21 at 16:44
  • @M.G. For me definitely the integral $\int_1^\infty (x-1) dx$ is smaller than $\int_0^\infty xdx$, because the former entirely fits into the later and integral of their difference is infinite. Even though without coordinates, the figures under them are totally the same and differ only by a shift. – Anixx Nov 20 '21 at 16:49
  • @M.G. So, in divergent series, for instance, $\sum_{k=-\infty}^\infty 1\ne \sum_{k=-\infty}^\infty ((-1)^{2k+1}/2+1/2)$. The first one is the numerocity of integers, the second one is the numerocity of even numbers, which is twice less. So, if we postulate the invariance against basis change, we would get Cantor's cardinality instead of numerocity. Which is less precise and the same for integers and even numbers. – Anixx Nov 20 '21 at 16:58
  • @Anixx: well, certainly based vector spaces, i.e. objects consisting of a vector space together with a fixed choice of basis, are a thing. But for starters, you have to somehow answer why your choice of $\varepsilon$ is the preferred choice among uncountably infinitely many choices. And once you go down the rabbit hole of whatever analytic or geometric theory you are building, it's likely that you will keep facing the problem of making further such choices because the original definition was not set up intrinsically. – M.G. Nov 20 '21 at 16:59
  • (cont.) With so much choosing, it starts begging the question if such a theory is a mathematical theory or something arbitrary, that is so just because you say it is so :-) On the other hand, maybe you get lucky and there is only one choice to be made, only at the beginning :-) Finding distinguished elements (and elaborating why they are such) is quite important in mathematics. In the end of the day, who says that your coordinates are better than mine? :-) A good choice of coordinates is usually argued by how much they simplify computations, but not by producing a totally different theory :-) – M.G. Nov 20 '21 at 17:00
  • @M.G. well, for complex numbers the pair-number representation $i=(0,1)$ already can be considered canonical. Even though this is a definition of particular basis and algebraically $i=(0,-1)$ would work just as well. – Anixx Nov 20 '21 at 17:04
  • @M.G. Similarly, $\varepsilon=\left( \begin{array}{cc} 0 & -1 \ 0 & 0 \ \end{array} \right)$ and $\varepsilon=\left( \begin{array}{cc} 0 & 0 \ 1 & 0 \ \end{array} \right)$ and $\varepsilon=\left( \begin{array}{cc} 0 & 0 \ -1 & 0 \ \end{array} \right)$ would work just as well as $\varepsilon=\left( \begin{array}{cc} 0 & 1 \ 0 & 0 \ \end{array} \right)$. But once we chose the basis we can start to define functions that are not basis-invariant. – Anixx Nov 20 '21 at 17:08
  • @Anixx: sure, but complex numbers and complex analysis work just fine even without it. Why should the generalization of the sign function depend on the choice of basis? In the most basic examples over $\mathbb{R}$ and $\mathbb{C}$, it does not. It's weird (though also curious) that a function on some domain should depend on your preferred choice of basis. It actually becomes a function of the variable and of the basis. – M.G. Nov 20 '21 at 17:09
  • @M.G. for instance, we can declare that if the function $f(x)$ has different derivatives from the left and right $f'_l(x)$ and $f'_r(x)$, then $f(a+\varepsilon)=f(a)+f'_r(a)\varepsilon$ and $f(a-\varepsilon)=f(a)-f'_l(a)\varepsilon$. This would analytically distinguish $\varepsilon$ from $-\varepsilon$. – Anixx Nov 20 '21 at 17:14

1 Answers1

1

So first remark: the sign function is not differentiable in usual terms so one has to use derivatives in the distributions sense.

Second remark: Notice that the derivative of the function $x↦ u(λ x)$ evaluated at the point $x$ (which I write $(u(\lambda x))'$) is different from $u'(λ x)$ (the derivative of $u$ evaluated at the point $λ x$).

Let $u$ be a $0$-homogeneous distribution (what you call invariance by stretching, i.e. $u(λx)=u(x)$). Then its derivative (in the sense of distributions, or in the classical sense if this derivative exists in the classical sense) is $-1$-homogeneous since

  • on the one side $(u(λx))' = (u(x))'=u'(x)$ as the function in $0$-homogeneous
  • on the other side $(u(λ x))' = λ u'(λ x)$

The last identity is valid for distributions since for any smooth test function $\varphi$ $$ \langle (u(λ x))',\varphi\rangle = -\langle u(λ x),\varphi'\rangle = -\frac{1}{|\lambda|}\langle u,\varphi'(x/\lambda)\rangle \\ = -\langle u,\frac{1}{|\lambda|}\varphi'(x/\lambda)\rangle = -\frac{\lambda}{|\lambda|}\langle u,(\varphi(x/\lambda))'\rangle \\ = \frac{\lambda}{|\lambda|}\langle u',\varphi(x/\lambda)\rangle = \lambda\,\langle u'(\lambda x),\varphi\rangle $$ where I used the definition of the derivative of distribtions to get the 1st and 5th identities, the chain rule for smooth functions to get the 4th identity, and the definition of compositions of distributions to get identities 2 and 6.

Hence, in the sense of distributions and in the case of the function $\mathrm{sign}$, $\mathrm{sign}'(\lambda x) = \mathrm{sign}'(x)/\lambda$. More precisely, since the derivative of $\mathrm{sign}$ is $\delta_0$: $$ \delta_0(bx) = \mathrm{sign}'(bx) = (\mathrm{sign}(bx))'/b = (\mathrm{sign}(x))'/b = \delta_0(x)/b $$ which is why $\mathrm{sign}'= \delta_0$ is not "invariant by stretching", and this is why formula 1 makes more sense than formula 2 in your question (I think it was the only point you were missing?)

LL 3.14
  • 12,457
  • So, do you claim that $(\operatorname{sign}(a x))'\ne(\operatorname{sign}(x))'$ $(a>0)$, even though $\operatorname{sign}(a x)=\operatorname{sign}(x)$? This makes no sense! – Anixx Nov 09 '21 at 06:58
  • 1
    No, I claim $(\mathrm{sign}(ax))′=(\mathrm{sign}(x))'$ and $\mathrm{sign}'(ax)≠\mathrm{sign}(x)$. I added the precision of the notations in the second line of my post "Second remark". Tell me if you understand the differences – LL 3.14 Nov 09 '21 at 07:05
  • So, you use the chain rule on sign function yes? – Anixx Nov 09 '21 at 07:06
  • You wrote $b\text{ sign}'(bx) = (\mathrm{sign}(bx))'$, which means chain rule. But $(\mathrm{sign}(bx))'=(\mathrm{sign}(x))'=\mathrm{sign}'(x)$, agree? – Anixx Nov 09 '21 at 07:09
  • Yes, this is exactly what I wrote: "on the other side $(u(\lambda x))' = λu'(\lambda x)$" and I proved just on the next lines. But you can see other proofs for example here https://math.stackexchange.com/questions/2161202/prove-deltaax-frac1a-deltax – LL 3.14 Nov 09 '21 at 07:09
  • Oh, fixed. Meant $(\mathrm{sign}(bx))'=(\mathrm{sign}(x))'=\mathrm{sign}'(x)$, agree? – Anixx Nov 09 '21 at 07:11
  • Yes, that's true :) – LL 3.14 Nov 09 '21 at 07:12
  • Okay. Let it be so. Delta function is non-stretch-invariant. But is this argument the only one against the second variant of the definition of the sign function? Given that it keeps the important equality while the first one does not? – Anixx Nov 09 '21 at 07:19
  • Or they are in fact equivalent under different assumptions about delta function homogeneousity? – Anixx Nov 09 '21 at 07:24