If we could get a file's binary form, would this method of file compression work?

Question

I'm not quite sure how easy it is to fetch the binary composition of a file, but suppose we have some file with this representation:

010011

We could make 2 arrays.

One which stores the position of the 0s: ['x', '', 'x', 'x', '', '']

And another which stores the position of the 1s (which is simply the opposite of the previous array): ['', 'x', '', '', 'x', 'x']

Then we can persist these two arrays into a file, and voila?

I don't know, is there something i'm not realizing?

I'm not an expert in compression, was just wondering if this would work.

How much space would it take up to store the position of each of the 0's and 1's? — Robert Harvey, Jul 21 '14 at 04:51
Yeah it depends on the language i guess, for instance a boolean in an array takes one byte in some langauges. I suppose the idea could work if instead of marking each location we entered the starting and ending locations for a particular set of repeating 0s and 1s. I Don't know i guess it was a bad idea hehe — zeroRooter, Jul 21 '14 at 04:57
Your two areas take more space than the original. Thus, it is not "compression". — Gort the Robot, Jul 21 '14 at 05:14
See also BWT, MTF and then RLE. Also, why would it be difficult to "get a file's binary form"? — Elliott Frisch, Jul 21 '14 at 06:35
Also read Pigeonhole principle, which is a tool that can help find loopholes in one's arguments about a compression scheme. — rwong, Jul 21 '14 at 11:27
if anything this is more of a parity/error detecting code (though not a very good one) http://en.wikipedia.org/wiki/Repetition_code — jk., Jul 21 '14 at 12:53
A binary file is a file that stores the position of the 1's and of the 0's ............sequentially. — Pieter B, Jul 21 '14 at 12:58

score 7 · Answer 1 · edited Jul 21 '14 at 12:59

7

The most compact possible representation of your arrays would be 1 bit per entry. You have two arrays, each of length 6. I.e. your compressed file is 6+6 bits long, while your original file is 6 bits long. This is an increase of 100%.

Also, as @jk pointed out in his comment: your second array is identical to your input data. The first array is identical to the inverse of your input data.

       010011  # original
101100 010011  # 'compressed'

So, not only is your compressed data twice as long, it also contains an exact copy of the original data.

edited Jul 21 '14 at 12:59

jk.

10,236

answered Jul 21 '14 at 10:54

Jörg W Mittag

103,514

additionally one of the arrays is the original file while the other is the complement of the original file – jk. Jul 21 '14 at 11:41

If we could get a file's binary form, would this method of file compression work?

1 Answers1