Inconsistencies in the definition of derivative of a polynomial over a field

Question

A problem I came across defines a particular differentiation operator $D$ over the set of polynomials $\{P\}$ over a field $F$ with "the normal formula; that is $D(\sum_{i=0}^n a_nx^i) = \sum_{i=1}^n na_nx^{i-1}$." However, there seem to be some problems with this definition. First, we cannot assume that $F$ contains the natural numbers as a subset. For example, what is $D(x^2)$ if $F = \mathbb{Z_2}$? Furthermore, the derivative may not be well-defined, depending on the definition of equality for polynomials. For example, if $F = \mathbb{Z_3}$, then $2x^2 + x = 0$ for $x \in \{0, 1, 2\}$, but $D(2x^2 + x) = 2(2)x+1 = x + 1 \neq 0 = D(0)$.

The problem then defines the derivative of rational functions $P/Q$ with the standard quotient rule formula $D(P/Q) = \frac{P'Q - PQ'}{Q^2}$ and then asks to prove that that definition is well-defined, but I can't do that without knowing that the derivative of polynomials exists and is well-defined. Can anyone shed some light on how to interpret the definition given, or what restrictions need to be made to make it work?

The integers embed naturally in any ring with unity : $n = \underbrace{1+ \ldots+1}_n$ and $-n = \underbrace{(-1)+ \ldots+(-1)}_n$ — reuns, Apr 02 '17 at 02:24
@user1952009: Even to ${0,1}$? That's very interesting from a set theoretic point! :) — Asaf Karagila, Apr 02 '17 at 06:21
@user1952009: Not "embedding" (as indirectly implied by Asaf) but just "homomorphism". — user21820, Apr 02 '17 at 09:08
@CJDowd This is why many algebra courses use $R[X]$ to denote the polynomial ring, rather than $R[x]$: it's all too easy to think of $p(x)$ as a function rather than just a formal expression. Since there's such a common convention that variables are in lower-case, $P(X)$ is less obviously a function and more obviously a formal expression. — Patrick Stevens, Apr 02 '17 at 09:54
@user21820 English is not my native tongue. So you say 'embed' means injectively ? — reuns, Apr 02 '17 at 14:44
@user21820 What about saying that the formal expression $n = \underbrace{1+ \ldots+1}_n$ embeds naturally in any ring with unity ? — reuns, Apr 02 '17 at 14:50
@user1952009: "Embed" is not the right word here. What I would say is that $n$ makes sense in any ring with unity. — Martin Argerami, Apr 02 '17 at 17:30
@user1952009: Embedding means as an isomorphic copy. You can't get an isomorphic copy of any infinite structure in a finite one. As in my first comment, what you want is "homomorphism", which just requires the map to commute with the structure operations; $f(a+b) = f(a)+f(b)$ and $f(ab) = f(a) f(b)$, where $f$ is the homomorphism and $a,b$ are any integers. $f$ can be defined by induction for any target ring $R$, whether or not $R$ has nonzero characteristic. — user21820, Apr 03 '17 at 05:13

score 29 · Accepted Answer · answered Apr 02 '17 at 02:26

29

First Issues

If $F$ is a field, we always identify the natural number $n$ with the element $$1+1+..+1 \in F$$ This definition can be extended easily to $\mathbb Z$.

So, to answer the question $$D(x^2)=(1+1)x=x+x$$

Second issue

First, $2x^2+x \neq 0$ when $x=2$.

Also, you are making a very common mistake here. Even if the corresponding functions would be the same, the polynomials $2x^2+x$ and $0$ would still be different polynomials over $\mathbb Z_3$, and hence their derivatives would not need be equal.

A polynomial is just an algebraic extension. Over infinite fields, polynomials correspond uniquely to a function, but over finite fields you cannot identify polynomials with their corresponding function, you have to think about them just as polynomials.

answered Apr 02 '17 at 02:26

N. S.

132,525

or as functions over the algebraic closure $\overline{F}$, this is the topic in algebraic geometry – reuns Apr 02 '17 at 02:28
18

The difference between polynomials as formal expressions and the functions they represent is a very important and subtle one, that often doesn’t get clearly articulated in the algebra courses where it’s first encountered. – Peter LeFanu Lumsdaine Apr 02 '17 at 04:11
Is it true that you can "reconstruct" every polynomial in $\mathbb{F}[x]$ from its evaluation function $\mathbb{F}\to\mathbb{F}$, if and only $\mathbb{F}$ has zero characteristic? And the same with polynomials in $\mathbb{F}[x_1,x_2,\ldots,x_n]$ and their corresponding maps $\mathbb{F}^n\to\mathbb{F}$. – Jeppe Stig Nielsen Apr 02 '17 at 06:46
Wikipedia currently states: "The polynomial function defined by a polynomial $P$ is the function from $K$ into $K$ that is defined by $x\mapsto P(x)$. If $K$ is an infinite field, two different polynomials define different polynomial functions, but this property is false for finite fields. For example, if $K$ is a field with $q$ elements, then the polynomials $0$ and $X^q-X$ both define the zero function." This must be imprecise because of infinite fields of charateristic $q>0$? – Jeppe Stig Nielsen Apr 02 '17 at 08:36
2

@JeppeStigNielsen A polynomial $f\in\Bbb F[x]$ is determined by the map on $\Bbb F$ it induces if and only if $\Bbb F$ is infinite (note: $\Bbb F$ can still have characteristic greater than $0$, e.g. $\Bbb F_p(t)$). Proof: Suppose $\Bbb F$ is finite. Then the polynomial $g(x) = \prod_{\alpha\in\Bbb F}(x - \alpha)$ vanishes everywhere on $\Bbb F$. Moreover, for any $f\in\Bbb F[x]$, $f + g\neq f$, but the two induce identical maps $\Bbb F\to\Bbb F$. Conversely, suppose $\Bbb F$ is infinite, and let $f\in\Bbb F[x]$ with $\deg f = n$. – Stahl Apr 02 '17 at 08:41
(ctd) The construction here shows there is a unique polynomial of degree $n$ passing through $n+1$ distinct points. In particular, one will know the polynomial if one knows its value at every element of $\Bbb F$, since $\Bbb F$ is infinite (thus $\Bbb F$ has $n+1$ elements, and one can evaluate $f$ at those $n+1$ elements to obtain $n+1$ points which $f$ passes through). Because there are polynomials of arbitrarily high degree over any field, every polynomial is determined by the function it induces if and only if $\Bbb F$ is infinite. – Stahl Apr 02 '17 at 08:48
@Stahl But what if $\mathbb{F}$ is an infinite field of characteristic $7$. Is it not true that the polynomials $X^7-X$ and $0$ induce the same map $\mathbb{F}\to\mathbb{F}$? – Jeppe Stig Nielsen Apr 02 '17 at 08:51
4

@JeppeStigNielsen No, that's the point. $X^7 - X$ is the zero function on $\Bbb F_7$, but nonzero on any other characteristic $7$ field $K$. (Proof: $X^7 - X = \prod_{\alpha\in\Bbb F_7}(X - \alpha)$. If $\beta\in K\setminus\Bbb F_7$, then $\beta - \alpha\neq 0$ for any $\alpha\in\Bbb F_7$, so $\beta^7 - \beta$ is a product of nonzero elements of a field, hence nonzero.) – Stahl Apr 02 '17 at 08:53
@JeppeStigNielsen: Stahl is saying that characteristic has nothing to do with it. If the field is finite then there is a polynomial that vanishes on the entire field. If not then there is no such polynomial except the zero polynomial, and hence no two polynomials can agree everywhere. We can in fact also extend the reconstruction to rings. A polynomial over a ring $R$ can be reconstructed from its values on $R$ if an infinite field embeds into $R$. Thus multivariate polynomials over a field $F$ has unique evaluation maps iff $F$ is infinite. – user21820 Apr 02 '17 at 09:06
@Stahl Ah, I get it. And the two polynomials $X+X+X+X+X+X+X$ and $0$ are already equal as polynomials when we work in characteristic $7$, so it cannot make a counterexample. If one uses your $\prod_{\alpha\in\Bbb F}(x - \alpha)$ trick on $\mathbb{F}_{7^2}$, one gets an example of minimal degree 49. Would that polynomial be $X^{49}-X$? – Jeppe Stig Nielsen Apr 02 '17 at 09:19
@JeppeStigNielsen: Yes, because that's how the field of size $7^2$ is constructed to begin with. Basically you work in the algebraic closure of $F_7$ and take all the roots of $X \mapsto X^{49} - X$. You can check that they form a field and also contains the base field by Fermat's little theorem, and it has no repeated root because its formal derivative is $-1$. And I made a mistake in my above comment; I need $R$ to be an integral domain. – user21820 Apr 02 '17 at 09:20
@Stahl: I prefer the following approach. Can you check it? Take any infinite integral domain $R$ and any vanishing nonzero polynomial $f$ on $R$. Let $r$ be a sequence of distinct elements from $R$. Then define a sequence of polynomials by $f_0 = f$ and $f_k(X) = (X-r_k) f_{k+1}(X) + c_k$ where $c_k \in R$ and $f_{k+1}$ is a polynomial over $R$, by the division algorithm, such that $f_{k+1}$ has lower degree than $f_k$. By induction $f_k(r_m) = 0$ for every $m \ge k$, and so $c_k = 0$ and $f_{k+1}(r_m) = 0$ for every $m > k$ by choice of $r$. This is impossible since $r$ is infinite. – user21820 Apr 02 '17 at 09:46
@CJDowd This is why many algebra courses use $R[X]$ to denote the polynomial ring, rather than $R[x]$: it's all too easy to think of $p(x)$ as a function rather than just a formal expression. Since there's such a common convention that variables are in lower-case, $P(X)$ is less obviously a function and more obviously a formal expression. – Patrick Stevens Apr 02 '17 at 09:51
@JeppeStigNielsen: Ah I found that the two answers to this Math SE post state what I just did as well as the fact that commutativity is important, and that if the ring is commutative then a polynomial over it has finitely many roots iff the ring is an integral domain. – user21820 Apr 02 '17 at 09:51
1

@PatrickStevens: I believe the @ method of pinging only works if the user has previously participated in the thread. If so, you need to post a comment under the question to ping CJDowd. – user21820 Apr 02 '17 at 09:53

Stahl · Answer 2 · 2017-04-02T04:25:45.633

First of all, the formula you actually want is $$ D\left(\sum_{i = 0}^n a_i x^i\right) := \sum_{i = 1}^n i a_i x^{i - 1} $$ (you had $n$ in the index of the coefficients in the sum and $n$ instead of $i$ in the derivative). Now, to answer your questions:

First, we cannot assume that $F$ contains the natural numbers as a subset. For example, what is $D(x^2)$ if $F = \Bbb Z/(2)$?

Any commutative unital ring admits a unique map from $\Bbb Z$, determined by sending $1\mapsto 1$. So, if your field $F$ has characteristic $0$, $\Bbb Z$ can literally be thought of as a subset of $F$, via $n\mapsto\underbrace{1 + 1 + \dots + 1}_{n\textrm{ times}}$. If your field has characteristic $p$, then you may think of $\Bbb Z$ as mapping to the field, but the elements of $\Bbb Z$ become identified with their reductions modulo $p$. So if you wanted to be annoyingly precise, you could say let $\iota : \Bbb Z\to F$ be the unique ring homomorphism sending $1\mapsto 1$. Then $$ D\left(\sum_{i = 0}^n a_i x^i\right) := \sum_{i = 1}^n \iota(i) a_i x^{i - 1}. $$ So, $D(x^2) = \iota(2) x = 0$, since $2 \equiv 0\pmod{2}$, and the map $\Bbb Z\to\Bbb Z/(2)$ is reduction modulo $2$.

Furthermore, the derivative may not be well-defined, depending on the definition of equality for polynomials. For example, if $F=\Bbb Z/(3)$, then $2x^2+x=0$ for $x\in\{0,1,2\}$, but $D(2x^2+x)=2(2)x+1=x+1\neq 0=D(0)$.

The issue here is that you're identifying a polynomial in $F[x]$ with the function $F\to F$ it defines (via $\alpha\mapsto f(\alpha)$). A polynomial is simply a (finite) formal sum with coefficients in $F$, and two polynomials $\sum a_i x^i$ and $\sum b_i x^i$ are equal if and only if $a_i = b_i$ for all $i$. As you noted, two different formal sums may define the same function $F\to F$. However, any polynomial $f\in F[x]$ also defines a function $K\to K$ for any extension field $K$ of $F$, and once you pass to a suitable $K$, two polynomials will be equal if and only if they define the same function (taking $K$ to be the algebraic closure of $F$ will always do the trick, because a polynomial $f$ over a field has exactly $\deg f$ roots counted with multiplicity over the algebraic closure, and thus is determined by its value at $\deg f + 1$ elements of the algebraic closure). So a priori, a polynomial $f\in F[x]$ has more data than the function $F\to F$ it defines!

And can we define a derivative operator $D f(a) = \lim_{h \to 0} \frac{f(a+h)-f(a)}{h}$ in the algebraic closure of finite fields ? I'd say no, we can only in the fields of characteristic $0$ : containing $\mathbb{Q}$, where $h \to 0$ and $n \to \infty$ makes sense — reuns, Apr 02 '17 at 03:19
@user1952009 To be able to define the derivative using limits, you need to be able to define limits. Even for fields containing $\Bbb Q$, this might not always work: how would you define $h\to 0$ in $\Bbb Q((t))$? To define limits, you'll need at least a topology on your field, and that might not always suffice. For example, you could give any field the (in)discrete topology, but such a topology is rather unsuitable for analysis. — Stahl, Apr 02 '17 at 03:36

score 6 · Answer 3 · edited Apr 13 '17 at 12:58

Several of the other answers have pointed out the danger of conflating a polynomial $p$ with a polynomial function. When working over the real numbers this distinction isn't particularly important, as there is a one-to-one correspondence between the two types of objects, but over arbitrary rings (even other fields) the distinction is very important. The definition of derivative given in the OP applies to polynomials, and one way of paraphrasing the question is to ask under what conditions it can be applied consistently to polynomial functions.

With that as framing, here are a couple of observations that (I think) have not yet been made by the other answers.

Let $R$ be any ring, and let $p \in R[x]$ be any polynomial with coefficients in $R$. Note that we are thinking of $p$ here as a formal expression of the form $a_n x^n + \cdots a_0$, where each $a_k \in R$, but we are not thinking of $p$ as a function on $R$. However, $p$ does naturally induce a function $R \to R$, and it is useful to have a notation for that function, so let's call it $\hat{p}$. We now have two different objects, $$p \in R[x]$$ and $$\hat{p} \in R^R$$ where $R^R$ denotes the set of all functions that map $R \to R$. Both $R[x]$ and $R^R$ are rings: in $R[x]$ the addition and multiplication operations are "formal" (i.e. you just use the distributive and associative properties to expand and simplify a combination of polynomials) whereas in $R^R$ the addition and multiplication operations are "pointwise" (e.g., if $f,g \in R^R$ then $f+g$ is defined to be the function that maps $r \in R$ to $f(r) + g(r)$). One can now check that the association $p \mapsto \hat{p}$ is a ring homomorphism.

This homomorphism (which we can call the "functional interpretation map" and denote by $\Phi$) is not, in general, one-to-one. For instance if $R=\mathbb{Z}_p$ for some prime $p$ then the kernel of $\Phi$ is generated by $x^p - x$. This means that when $x^p - x$ is interpreted as a function, it "acts like" the (constant) zero function. Put another way we can say that $x^p$ and $x$ are equivalent as functions even though they are distinct as polynomials. More precisely, using the notation above we can write $\hat{x^p} = \hat{x}$ even though $x^p \ne x$.

This example also shows that the formal derivative rule in the OP does not "work" at the level of functions, because the formal derivative of $x^p$ is $p\cdot x^{p-1} = 0$, whereas the formal derivative of $x$ is $1$, so polynomials that are equivalent when interpreted as functions do not necessarily have equivalent derivatives.

(For the general question "When is $\Phi$ one-to-one?", see https://mathoverflow.net/questions/160986/rings-for-which-no-polynomial-induces-the-zero-function.)

So the other answers to this question, which point out that the formal derivative definition applies in the context of "formal polynomials" but not necessarily in the context of "polynomial functions", are spot on. However, notwithstanding that it turns out that there is an alternative way to define the derivative of a polynomial with coefficients in an arbitrary ring that (a) directly engages with (rather than avoids) the interpretation of a polynomial as a function, and (b) generalizes the connection between derivatives and difference quotients, but (c) avoids the need to use limits. In the familiar context where $R=\mathbb{R}$, it reproduces the standard theory of differentiation; for general $R$, it reproduces the formal definition given in the OP.

Here's how it works:

Let $p \in R[x]$ be an arbitrary polynomial. Choose any element $a \in R$. Then we can formally divide $p$ by $x-a$ and obtain a quotient $q$ and a remainder $r$. (We can determine $q$ and $r$ using either the long division or synthetic division algorithm -- they both work just fine over arbitrary rings.) Note that $q,r \in R[x]$; we are not yet interpreting these as functions. Furthermore since $r$ must be lower-degree than the divisor $x-a$, it must be a constant, i.e. an element of $R$ itself. (Here I am using the natural embedding of $R$ as a subring of $R[x]$.) We now have the relationship $$p = (x-a)q + r$$ Where $p, q \in R[x]$ and $r \in R$.

Now let's apply the functional interpretation homomorphism $\Phi$ to this equation. We find that for any $b \in R$, $$\hat{p}(b) = (b-a)\hat{q}(b) + r$$ and in particular $$\hat{p}(a) = r$$ This is the analogue of what is called (in high school algebra) "the Remainder Theorem". It tells us what we can rewrite the relationship between $p$ and $q$ as $$p = (x-a)q + \hat{p}(a)$$ or as $$p - \hat{p}(a) = (x-a)q$$

Interpreting the above as functions and acting on an arbitrary $b\in R$, we get

$$\hat{p}(b) - \hat{p}(a) = (b-a)\hat{q}(b)$$

Now it is very tempting to rewrite the equation above in the form $\hat{q}(b)=\frac{\hat{p}(b) - \hat{p}(a)}{b-a}$. We can't really do that, since division is not defined in $R$. If $b-a$ happens to be an invertible element then we can do it, and more generally if $R$ is an integral domain then we could embed $R$ as a subring of its field of fractions, but for arbitrary $R$ we need to be more careful. However, the equation $\hat{p}(b) - \hat{p}(a) = (b-a)\hat{q}(b)$ does suggest that we can interpret $\hat{q}(b)$ as the "slope" of the "line" joining $\left( a, \hat{p}(a) \right)$ and $\left( b, \hat{p}(b) \right)$.

If we accept this interpretation as a plausible one -- and note that in the case where $R$ is a field it reduces to the standard idea that "slope is rise over run" -- then it is natural to identify $\hat{q}(a)$ as the slope of the "tangent line" to the graph of $\hat{q}$ at $a$.

If all of this seems overly abstract and formal, here are a few observations to set your mind at ease (each of these is easily verified):

If $p=x^n$, so that $\hat{p}(a) = a^n$, then $\hat{q}(a) = n \cdot a^{n-1}$. In other words the "normal" power rule is recovered under this definition.
For any fixed $a$, the mapping $p \in R[x] \mapsto \hat{q}(a)$ is linear. In other words the "normal" linearity of the derivative operation is preserved by this genralization.
The product rule works, too -- although it is notationally hard to express it, and not particularly worthwhile.

Inconsistencies in the definition of derivative of a polynomial over a field

3 Answers3