Set passes Dieharder randomness with ints values, fails when ints are split into bytes

Question

I've generated 100M of random numbers in range (0..255). These numbers fail Dieharder tasts (bitnum = 8). However, I can pass this test by combining four bytes into an int. Randomness of the set didn't change, so, something isn't correct with the test settings. What could cause it?

    def prep_input(file_name, number_of_iterations,numbit=32):
        f = open(file_name, "a")            
        max_rand = (-1 + 2 ** numbit)
        print(str(max_rand))
        a = len(str(max_rand))
        for i in range(0, number_of_iterations):
            f.write(str(randint(0, max_rand)).rjust(a) + "\n")

        f1.close()

With header it looks like

#==================================================================
# Dieharder
#==================================================================
type: d
count: 16180337
numbit: 8
255
  0
 93
 ....

dieharder -a -g 202 -f input.txt >>result.txt

Report on bytes: Set fails when bytes are used
Report on ints: Set passes tests when ints are used

Update:

I've tested up to 4Gb of data. This is close to 2^32 although ASCII format requires ~4 characters (4 bytes) per a byte of actual data.
When this file is converted to byte stream it passws several tests and then reaches the end of data. When concatinated it passes tests better than ASCII int version.
When Python random is fed as a binary to

import time import sys import os from random import randint

j=0 div = 2

while(True): sys.stdout.write(chr(randint(0, 255))) # newFileBytes = [randint(0, 255)] # os.write(1, (''.join(chr(i) for i in newFileBytes)).encode('charmap')) sys.stdout.flush() j+=1
```
if 0 == j%div :
    dh_file = open("progress.log", "a")
    dh_file.write(str(j)+"\n")
    dh_file.close()
    div = div*2
```

With python to_terminal.py | dieharder -a -g 200 >>test_result.dhres I get 16G (17179869184) as the last record in the log and out of 75 tests only two tests are weak the rest is a pass. Python passes DH when supplied as bit stream Clearly, the problem follows 8 bit ASCII representation.

Randomness is fundamentally impossible to measure. All statistical tests simply estimate randomness in different ways. I don't think it's surprising that different statistical tests produce different results. (Concretely, the output does look like something may be wrong, though.) — yyyyyyy, Jan 20 '19 at 23:29
Standard PRNG, 90% increase in pass rate thanks to data aggregation? There must be something wrong with the DH run parameters. — Stepan, Jan 21 '19 at 01:18
Run the basic ENT test across your bytified data. If it passes, you've done the Dieharder test wrong. If it fails ENT badly, you've converted the data wrong. Then take it from there... — Paul Uszak, Jan 21 '19 at 04:19
dieharder -a requires much more data (in the order of $2^{32}$ bytes), see citations from the manual there. Also, testing a would-be CS(P)RNG using Diehard is somewhat like testing a bathyscaphe in a pool: not detecting a failure allows to draw no operational conclusion. — fgrieu, Jan 21 '19 at 13:45
@fgrieu "bathyscaphe in a pool" (sanity test) is informative in this case. Turns out that bathyscaphe sinks if the hatch is open (bytes written as ASCII), but remains operational with hatch close (int representation). Did it convey any information? Sure. When we test bathyscaphe in high seas we must make sure that the hath is closed. Flipping from 90% fail to 90% pass - DH clearly misunderstands the format of data and this have nothing to do with Python generator itself. — Stepan, Jan 21 '19 at 19:23

Set passes Dieharder randomness with ints values, fails when ints are split into bytes

0 Answers0