Why is x4.0 faster than x4 in Python 3?

Question

Why is x**4.0 faster than x**4? I am using CPython 3.5.2.

$ python -m timeit "for x in range(100):" " x**4.0"
  10000 loops, best of 3: 24.2 usec per loop

$ python -m timeit "for x in range(100):" " x**4"
  10000 loops, best of 3: 30.6 usec per loop

I tried changing the power I raised by to see how it acts, and for example if I raise x to the power of 10 or 16 it's jumping from 30 to 35, but if I'm raising by 10.0 as a float, it's just moving around 24.1~4.

I guess it has something to do with float conversion and powers of 2 maybe, but I don't really know.

I noticed that in both cases powers of 2 are faster, I guess since those calculations are more native/easy for the interpreter/computer. But still, with floats it's almost not moving. 2.0 => 24.1~4 & 128.0 => 24.1~4 but 2 => 29 & 128 => 62

TigerhawkT3 pointed out that it doesn't happen outside of the loop. I checked and the situation only occurs (from what I've seen) when the base is getting raised. Any idea about that?

For what it's worth: Python 2.7.13 for me is a factor 2~3 faster, *and* shows the inverse behaviour: an integer exponent is faster than a floating point exponent. — , Feb 20 '17 at 22:20

Dimitris Fasarakis Hilliard · Accepted Answer · 2017-08-01T16:32:33.003

Why is x**4.0 faster than x**4 in Python 3^*?

Python 3 int objects are a full fledged object designed to support an arbitrary size; due to that fact, they are handled as such on the C level (see how all variables are declared as PyLongObject * type in long_pow). This also makes their exponentiation a lot more trickier and tedious since you need to play around with the ob_digit array it uses to represent its value to perform it. (Source for the brave. -- See: Understanding memory allocation for large integers in Python for more on PyLongObjects.)

Python float objects, on the contrary, can be transformed to a C double type (by using PyFloat_AsDouble) and operations can be performed using those native types. This is great because, after checking for relevant edge-cases, it allows Python to use the platforms' pow (C's pow, that is) to handle the actual exponentiation:

/* Now iv and iw are finite, iw is nonzero, and iv is
 * positive and not equal to 1.0.  We finally allow
 * the platform pow to step in and do the rest.
 */
errno = 0;
PyFPE_START_PROTECT("pow", return NULL)
ix = pow(iv, iw);

where iv and iw are our original PyFloatObjects as C doubles.

For what it's worth: Python 2.7.13 for me is a factor 2~3 faster, and shows the inverse behaviour.

The previous fact also explains the discrepancy between Python 2 and 3 so, I thought I'd address this comment too because it is interesting.

In Python 2, you're using the old int object that differs from the int object in Python 3 (all int objects in 3.x are of PyLongObject type). In Python 2, there's a distinction that depends on the value of the object (or, if you use the suffix L/l):

# Python 2
type(30)  # <type 'int'>
type(30L) # <type 'long'>

The <type 'int'> you see here does the same thing floats do, it gets safely converted into a C long when exponentiation is performed on it (The int_pow also hints the compiler to put 'em in a register if it can do so, so that could make a difference):

static PyObject *
int_pow(PyIntObject *v, PyIntObject *w, PyIntObject *z)
{
    register long iv, iw, iz=0, ix, temp, prev;
/* Snipped for brevity */

this allows for a good speed gain.

To see how sluggish <type 'long'>s are in comparison to <type 'int'>s, if you wrapped the x name in a long call in Python 2 (essentially forcing it to use long_pow as in Python 3), the speed gain disappears:

# <type 'int'>
(python2) ➜ python -m timeit "for x in range(1000):" " x**2"       
10000 loops, best of 3: 116 usec per loop
# <type 'long'> 
(python2) ➜ python -m timeit "for x in range(1000):" " long(x)**2"
100 loops, best of 3: 2.12 msec per loop

Take note that, though the one snippet transforms the int to long while the other does not (as pointed out by @pydsinger), this cast is not the contributing force behind the slowdown. The implementation of long_pow is. (Time the statements solely with long(x) to see).

[...] it doesn't happen outside of the loop. [...] Any idea about that?

This is CPython's peephole optimizer folding the constants for you. You get the same exact timings either case since there's no actual computation to find the result of the exponentiation, only loading of values:

dis.dis(compile('4 ** 4', '', 'exec'))
  1           0 LOAD_CONST               2 (256)
              3 POP_TOP
              4 LOAD_CONST               1 (None)
              7 RETURN_VALUE

Identical byte-code is generated for '4 ** 4.' with the only difference being that the LOAD_CONST loads the float 256.0 instead of the int 256:

dis.dis(compile('4 ** 4.', '', 'exec'))
  1           0 LOAD_CONST               3 (256.0)
              2 POP_TOP
              4 LOAD_CONST               2 (None)
              6 RETURN_VALUE

So the times are identical.

^{*All of the above apply solely for CPython, the reference implementation of Python. Other implementations might perform differently.}

Whatever it is, it's related to the loop over a `range`, as timing only the `**` operation itself yields no difference between integers and floats. — TigerhawkT3, Feb 20 '17 at 22:47
The difference only appears when looking up a variable (`4**4` is just as fast as `4**4.0`), and this answer doesn't touch on that at all. — TigerhawkT3, Feb 21 '17 at 00:22
But, constants will get folded @TigerhawkT3 (`dis(compile('4 ** 4', '', 'exec'))`) so the time should be *exactly* the same. — Dimitris Fasarakis Hilliard, Feb 21 '17 at 00:25
Your last timings seem not to show what you say. `long(x)**2.` is still faster than `long(x)**2` by a factor of 4-5. (Not one of the downvoters, though) — Graipher, Feb 21 '17 at 10:50
@TigerhawkT3 @JimFasarakis-Hilliard The question began on the simple question, without noticing that the effect is happening only in a loop (and when the index is involved in in the pow operation - `i**2` or `2**i`). But now, although there's a difference in the underline C code behavior, it doesn't seem to have such an affect on the pow operations when not on a running index. So how should we continue from here? the question changed, the answers are good but for the starting question. Would like to hear your advice on how to continue with the question, regarding the SO community standards. — arieljannai, Feb 21 '17 at 11:47
@Graipher but that was my point. Objects of `` in Python `2` display the same speed discrepancy as they do in `Python 3` because they are implemented the same way. On the other hand, objects of `` in Python `2` (which *don't exist in Python 3*) are way speedier as people who've timed it have claimed. — Dimitris Fasarakis Hilliard, Feb 21 '17 at 13:13
@arieljannai outside a loop the values get folded by the interpreter so the operations are exactly the same, there's no additional computation required and no speed difference displayed. What do you mean by running on an index? If you're talking about operations other than `pow` then, of course, those *would* display other behavior since they're implemented differently. Could you elaborate on what still troubles you with this answer? — Dimitris Fasarakis Hilliard, Feb 21 '17 at 13:42
@JimFasarakis-Hilliard Oh! the part that I was missing is that the interpreter handles it. What I meant with the index is that inside a loop, if I just run `2**17` or `2**17.0` there won't be such a difference, but if I'm using the loop variable (for `i` in ..) `i**17` will be much slower than `i**17.0`. But I believe that it still the same reason, since when it's a constant calculation the interpreter acts on it as if it was outside the loop and called a lot of times. — arieljannai, Feb 21 '17 at 14:02
So when it's a constant calculations they are getting folded by interpreter (whether in a loop or not), but when a running index is involved - it's getting calculated in the loop and translated to what you've shown. Thanks! — arieljannai, Feb 21 '17 at 14:04
So why did Python 3 make such a change if it has negative speed implications and can no longer use native types for integer operations? — mbomb007, Feb 21 '17 at 16:34
@mbomb007 the elimination of the `` type in Python 3 is probably explained by the efforts made to simplify the language. If you can have one type to represent integers it is more manageable than two (and worrying about converting from one to the other when necessary, users getting confused etc). The speed gain is secondary to that. The rationale section of [PEP 237](https://www.python.org/dev/peps/pep-0237/) also offers some more insight. — Dimitris Fasarakis Hilliard, Feb 21 '17 at 17:19
I'd like to throw out there that testing the Python 2 speed of `long(x) ** n` versus `x ** n` is a bit of a red herring, as you're explicitly casting the int `x` to a long. I'd be curious to see the speed comparison of `4L ** n` to `4 ** n`. — pydsigner, Feb 21 '17 at 22:05
@pydsigner Good catch! You are indeed correct; I hadn't though of that. I don't want to deviate from the tests the OP used in his question so I'll just go on to state how fast `long(int_object)` is instead. — Dimitris Fasarakis Hilliard, Feb 21 '17 at 22:15

score 25 · Answer 2 · edited Feb 27 '17 at 19:09

25

If we look at the bytecode, we can see that the expressions are purely identical. The only difference is a type of a constant that will be an argument of BINARY_POWER. So it's most certainly due to an int being converted to a floating point number down the line.

>>> def func(n):
...    return n**4
... 
>>> def func1(n):
...    return n**4.0
... 
>>> from dis import dis
>>> dis(func)
  2           0 LOAD_FAST                0 (n)
              3 LOAD_CONST               1 (4)
              6 BINARY_POWER
              7 RETURN_VALUE
>>> dis(func1)
  2           0 LOAD_FAST                0 (n)
              3 LOAD_CONST               1 (4.0)
              6 BINARY_POWER
              7 RETURN_VALUE

Update: let's take a look at Objects/abstract.c in the CPython source code:

PyObject *
PyNumber_Power(PyObject *v, PyObject *w, PyObject *z)
{
    return ternary_op(v, w, z, NB_SLOT(nb_power), "** or pow()");
}

PyNumber_Power calls ternary_op, which is too long to paste here, so here's the link.

It calls the nb_power slot of x, passing y as an argument.

Finally, in float_pow() at line 686 of Objects/floatobject.c we see that arguments are converted to a C double right before the actual operation:

static PyObject *
float_pow(PyObject *v, PyObject *w, PyObject *z)
{
    double iv, iw, ix;
    int negate_result = 0;

    if ((PyObject *)z != Py_None) {
        PyErr_SetString(PyExc_TypeError, "pow() 3rd argument not "
            "allowed unless all arguments are integers");
        return NULL;
    }

    CONVERT_TO_DOUBLE(v, iv);
    CONVERT_TO_DOUBLE(w, iw);
    ...

edited Feb 27 '17 at 19:09

Dimitris Fasarakis Hilliard

150,925
31
268
253

answered Feb 20 '17 at 22:28

leovp

4,528
1
20
24

why the downvote? conversion/variable type check seems to be the issue here. there's no speed difference with literals, between `12.0**40.0` and `12**40` for instance. – Jean-François Fabre Feb 20 '17 at 22:32
1

@Jean-FrançoisFabre I believe that's due to constant folding. – Dimitris Fasarakis Hilliard Feb 20 '17 at 22:35
2

I think the implication that there is a conversion and they aren't handled differently down the line "most certainly" is a bit of a stretch without a source. – miradulo Feb 20 '17 at 22:36
I also thought that way but couldn't really find any source for it – arieljannai Feb 20 '17 at 22:36
From the documentation on *BINARY_POWER* it's also not very indicative to that subject, it's just taking from the stack and raising – arieljannai Feb 20 '17 at 22:39
1

@Mitch - Particularly since, in this particular code, there's no difference in the execution time for those two operations. The difference only arises with the OP's loop. This answer is jumping to conclusions. – TigerhawkT3 Feb 20 '17 at 22:41
2

Why are you only looking at `float_pow` when that doesn't even run for the slow case? – user2357112 Feb 20 '17 at 23:43
The difference only appears when looking up a variable (`4**4` is just as fast as `4**4.0`), and this answer doesn't touch on that at all. – TigerhawkT3 Feb 21 '17 at 00:23
2

@TigerhawkT3: `4**4` and `4**4.0` get constant-folded. That's an entirely separate effect. – user2357112 Feb 21 '17 at 01:01
Would you change the line "it's most certainly due to an int being converted to a floating point number down the line"? Though a valid initial guess, this isn't the root cause (generally reformat your answer to include the update more gracefully.) – Dimitris Fasarakis Hilliard Feb 22 '17 at 16:08

score 2 · Answer 3 · answered Sep 11 '17 at 05:18

2

Because one is correct, another is approximation.

>>> 334453647687345435634784453567231654765 ** 4.0
1.2512490121794596e+154
>>> 334453647687345435634784453567231654765 ** 4
125124901217945966595797084130108863452053981325370920366144
719991392270482919860036990488994139314813986665699000071678
41534843695972182197917378267300625

answered Sep 11 '17 at 05:18

Veky

2,646
1
21
30

I don't know why that downvoter downvoted but I did because this answer doesn't answer the question. Just because something is correct does not in any way imply it is faster or slower. One is slower than the other because one can work with C types while the other has to work with Python Objects. – Dimitris Fasarakis Hilliard Jan 01 '18 at 16:49
3

Thanks for the explanation. Well, I really thought it was obvious that it's faster to calculate just the approximation of a number to 12 or so digits, than to calculate all of them exactly. After all, the only reason why we use approximations is that they are faster to calculate, right? – Veky Jan 01 '18 at 18:02

Why is x**4.0 faster than x**4 in Python 3?

3 Answers3

Why is x4.0 faster than x4 in Python 3?