I have a WebGL circuit simulator. One of the problems it has is that, due to using quite a lot of intermediate float textures as it simulates, it doesn't work on various mobile devices. They only support byte textures.
My intended solution to this problem is to encode the high-precision (i.e. 32-bit) floats as bytes. Every output float is packed into a nearly-IEEE format (I put the sign bit at the other end to avoid a few shifts, I don't do denormalized values, and I don't do infinities/NaNs). Similarly, every input is unpacked before being used.
I have found various blog posts and answers related to this task out on the internet (example 1, example 2, example 3), but I haven't found any that work properly on all finite non-denormalized floats.
The problem I'm running into is precision. I want to round trip the floats without introducing any error, but I can't seem to make a shader that preserves all 23 bits of the mantissa. There always seems to be some rounding on some machine that loses the last bit, though I can perturb the cases where the rounding happens and it happens differently on the various machines I've tested on.
Here is my packing method:
vec4 packFloatIntoBytes(float val) {
if (val == 0.0) {
return vec4(0.0, 0.0, 0.0, 0.0);
}
float mag = abs(val);
float exponent = floor(log2(mag));
// Correct log2 approximation errors.
exponent += float(exp2(exponent) <= mag / 2.0);
exponent -= float(exp2(exponent) > mag);
float mantissa;
if (exponent > 100.0) {
// Not sure why this needs to be done in two steps for the largest float to work.
// Best guess is the optimizer rewriting '/ exp2(e)' into '* exp2(-e)',
// but exp2(-128.0) is too small to represent.
mantissa = mag / 1024.0 / exp2(exponent - 10.0) - 1.0;
} else {
mantissa = mag / float(exp2(exponent)) - 1.0;
}
float a = exponent + 127.0;
mantissa *= 256.0;
float b = floor(mantissa);
mantissa -= b;
mantissa *= 256.0;
float c = floor(mantissa);
mantissa -= c;
mantissa *= 128.0;
float d = floor(mantissa) * 2.0 + float(val < 0.0);
return vec4(a, b, c, d) / 255.0;
}
And here's my unpacking method:
float unpackBytesIntoFloat(vec4 v) {
float a = floor(v.r * 255.0 + 0.5);
float b = floor(v.g * 255.0 + 0.5);
float c = floor(v.b * 255.0 + 0.5);
float d = floor(v.a * 255.0 + 0.5);
float exponent = a - 127.0;
float sign = 1.0 - mod(d, 2.0)*2.0;
float mantissa = float(a > 0.0)
+ b / 256.0
+ c / 65536.0
+ floor(d / 2.0) / 8388608.0;
return sign * mantissa * exp2(exponent);
}
This method is close. It works on my laptop, as far as I can tell. But it doesn't work on my Nexus tablet. For example, the float -0.20717763900756836
should be encoded as [124, 168, 76, 193]
. When I unpack that then repack it on the Nexus tablet the output is one ulp lower: [124, 168, 76, 191]
(which encodes -0.2071776[5390872955]
). Close, but I want perfect.
Mostly I'm at a loss trying to figure out where the precision is being destroyed in this method. Changes almost seem to have random effects, like replacing x * exp2(n)
with x / exp2(-n)
might fix an error in one place but introduce an error in another place.
Is there any exact way to pack floats into bytes, without losing precision?
Some example values that such a method should work on:
var testValues = new Float32Array([
0,
0.5,
1,
2,
-1,
1.1,
42,
16777215,
16777216,
16777218,
0.9999999403953552, // An ulp below 1.
1.0000001192092896, // An ulp above 1.
Math.pow(2.0, -126), // Smallest non-denormalized 32-bit float.
0.9999999403953552 * Math.pow(2.0, 128), // Largest finite 32-bit float.
Math.PI,
Math.E
]);