Questions tagged [data-compression]
269 questions
37
votes
5 answers
Is there a known maximum for how much a string of 0's and 1's can be compressed?
A long time ago I read a newspaper article where a professor of some sort said that in the future we will be able to compress data to just two bits (or something like that).
This is of course not correct (and it could be that my memory of what he…

x457812
- 481
- 4
- 5
22
votes
7 answers
Why are these (lossless) compression methods of many similar png images ineffective?
I just came across the following thing: I put multiple identical copies of a png image into a folder and then tried to compress that folder with the following methods:
tar czf folder.tar.gz folder/
tar cf folder.tar folder/ && xz --stdout…

a_guest
- 323
- 2
- 6
18
votes
7 answers
Can random suitless $52$ playing card data be compressed to approach, match, or even beat entropy encoding storage? If so, how?
I have real data I am using for a simulated card game. I am only interested in the ranks of the cards, not the suits. However it is a standard $52$ card deck so there are only $4$ of each rank possible in the deck. The deck is shuffled well for…

David James
- 124
- 2
- 16
15
votes
1 answer
Why is compression ratio using bzip2 for a sequence of "a"s so jumpy?
library(ggplot2)
compress <- function(str) {
length(memCompress(paste(rep("a", str), collapse=""), type="bzip2"))
/ nchar(paste(rep("a", str), collapse=""))
}
cr <- data.frame(i = 1:10000, r = sapply(1:10000, compress))
ggplot(cr[cr$i>=5000 &…

Raffael
- 337
- 2
- 7
4
votes
0 answers
Approximate estimation of stream compressibility?
I wish to estimate the compression ratio of a stream of bytes. Specifically, I'm interested in DEFLATE compression. I'm looking for some algorithm/heuristic that can estimate roughly the compressibility (more quickly than just deflating the stream).…

leonbloy
- 256
- 1
- 10
3
votes
0 answers
Bayesian Coding
Suppose you have a sequence generated by an i.i.d. process (such as repeatedly rolling a die and recording the values in order) parameterized by some K-dimensional vector $\vec{\gamma}$ (the probabilities associated with each side of the die), which…

user6605
- 31
- 1
3
votes
1 answer
Is there a limit to data compression without mentioning Kolmogorov complexity?
I want to develop a previous answer to Is there a known maximum for how much a string of 0's and 1's can be compressed?.
I've looked at the Hutter Prize and various compression benchmarks. The compression records are graphed below over a 12 year…

Paul Uszak
- 1,602
- 1
- 13
- 21
3
votes
1 answer
LZ-type compression vs. entropy-encoding of BWT data
Why does there seem to be a preference for using Huffman (or Arithmetic) coding instead of a Lempel-Ziv type compression algorithm for Burrows-Wheeler Transformed data?
I noticed that data compressors such as Bzip2 or ZZip use mainly a combination…

Log
- 31
- 1
3
votes
1 answer
Shannon-Fano puzzle
I was playing around with Shannon-Fano (SF) entropy encoding when I ran into this issue.
I am aware that the compression that can be achieved with SF is sometimes inferior to that of Huffman encoding, but I just though it meant that code lengths…

500 - Internal Server Error
- 235
- 4
- 9
3
votes
2 answers
difference between lzw and huffman coding technique
What is the difference between the LZW and Huffman's coding of text compression ?
I've read this and
this , but I'm not able to distinguish ?

devGeek
- 133
- 1
- 1
- 5
2
votes
1 answer
Compression for set of integers preserving their sequence
For a given positive number n. Numbers from 0 up to n, are streamed one by one in some order without repetition.
What would be recommended a strategy to store the order of these n numbers in the most space conservative way?
Highest priority is to…

letsBeePolite
- 709
- 6
- 14
2
votes
1 answer
Clustering data for compression with PCA
If I have datapoints in a high dimensional space and want to find a (linear) subspace onto which a data-set projects well, I can use PCA and then discard less important dimensions of the new basis to get compressed datapoints. However, often the…

matthias_buehlmann
- 169
- 3
1
vote
1 answer
Compression with random dictionary
Is the following scenario theoretically possible, or provably impossible?
Alice generates a 1 GiB file with random bits and sends it to Bob. This file is a shared dictionary they call Q. Now, Alice and Bob starts sending each other new files with…

Jostein Trondal
- 111
- 2
1
vote
0 answers
Using less Random Access Memory
Suppose you were to compress data before it was sent to Random Access Memory. Wouldn't you technically be using less R.A.M?
user83052
1
vote
1 answer
How can I compress data that represents a number's change over time?
I have a value that increases by unknown increments at unknown times. Given a time, I need to check how much it has increased between that time and now.
I could simply store every time/increase pair whenever the value is increased, but this would…

Evorlor
- 111
- 4