1

TL;DR: is there any chance of decoding custom picture format in case I have sufficient amount of data? Please see examples at end.


I have binary files of an old DOS diskmag (late 90's). After couple of weeks playing with 010 Editor, Kaitai Struct and Python I managed to get somehow structured data from these files. It was eventually quite easy, files are innerly indexed and contains all pieces - something like Doom WAD, nothing unexpected for that years.

So now I can extract texts (stored as plaintext), sounds (stored mostly as PCM), music (MOD, XM), fonts (bitmaps) and pictures. Some pictures are stored as simple bitmap (not BMP with header and so, just raw pixel data), some pictures are compressed with PCX-like RLE algorithm, which was not a problem for me.

But there is also some other compression I do not recognize. Let's say that for every unknown picture i have:

  • dimensions in pixels (mostly 640x480)
  • color depth (always High Color)
  • first binary blob (small, usually 1-2 kB, exact size differs)
  • second binary blob (large, 50-250 kB, exact size differs)
  • screen dump of picture from diskmag running in DOSbox (yay!)

It seems to me that sizes of blobs depends on picture complexity and unique used colors. Also I guess that first blob is a dictionary of some kind (or maybe a Huffman tree?), whereas second blob contains compressed picture content itself.

Above that I suspect there is some weird home-made lossy compression, due easily recognisable artifacts around edges and even 8x8 quantization blocks in gradient-filled areas. See below for examples, I have thousands of them.

So my question: is there any chance of decoding these pictures? Do I have enough of data, or am I missing something? I think that everything I need is to recognize compression algorithm. Can anybody help me with it? I belive I can do the rest of work in Python as usually.

(Sorry for not linking binary blobs directly, I don't have enough of reputation here...)

Edit 1: Added example 06-04002-0004 - the smallest picture I've found (212x160 px).

Edit 2: I've realized that diskmag EXE have debug symbols inside! Some interesting function names are:

  • RLEDecomp (this one is for RLE packed images, I've already solved it)
  • LZSS_Decompress
  • InitHuffDecomp_
  • HuffDecompress_
  • BuildHuffTree_
  • BuildHuffTreeImg_

I can not say if it is more likely for decoding videos (yes, diskmag engine have his own format for videos too, I'm not that far yet), but maybe it will help?

Edit 3: Diskmag EXE for download: https://filebin.ca/3kL3kgNAqlJI


Example 06-04002-0004

  • dimensions: 212x160
  • High Color
  • first binary blob: https://filebin.ca/3kJWIHFLZJ9X
  • second binary blob: https://filebin.ca/3kJWL5SqHlAK
  • screen dump:

06-04002-0004


Example 06-05000-0004

  • dimensions: 640x480
  • High Color
  • first binary blob: https://filebin.ca/3kF4G2r0DnXT
  • second binary blob: https://filebin.ca/3kF4wopld9QG
  • screen dump:

06-05000-0004


Example 06-05050-0004

  • dimensions: 640x480
  • High Color
  • first binary blob: https://filebin.ca/3kF7hbOexD7D
  • second binary blob: https://filebin.ca/3kF7lLgeaAQ7
  • screen dump:

06-05050-0004


Example 06-05483-0004

  • dimensions: 312x480
  • High Color
  • first binary blob: https://filebin.ca/3kFAGWNEnNdM
  • second binary blob: https://filebin.ca/3kFAJdwtohzB
  • screen dump:

06-05483-0004


Example 06-07011-0004

  • dimensions: 640x480
  • High Color
  • first binary blob: https://filebin.ca/3kFGUQGcMZqD
  • second binary blob: https://filebin.ca/3kFGRc37e2pm
  • screen dump:

06-07011-0004

deefha
  • 11
  • 4
  • 2
    If you have the program binary, it may be easier just to reverse that and figure out how it is using the data in the files. – cimarron Dec 11 '17 at 06:52
  • Yes, I have program binary. By output of linux "file" command it's a MS-DOS executable, LE for MS-DOS, DOS4GW DOS extender (embedded). According to this answer: https://reverseengineering.stackexchange.com/questions/3074/decompiling-a-1990-dos-application reversing will not be easy... Maybe DOSbox in his debug version can help? – deefha Dec 11 '17 at 08:57
  • How do you know the pixel sizes of the images? Are they supposed to be stored in the unknown data files? 2. How do you know they are "high color" (which is what, 15 or 16 bit RGB)? Could it also be your local display settings? 3. Can you add links to the data files for the smallest (in pixels! not KB) file you can find?
  • – Jongware Dec 11 '17 at 09:34
  • @usr2564301 ad 1) Info about picture dimension is stored separately in picture header, not in first or second blob. ad 2) Info about picture color depth is also stored separately, it can be 256 colors indexed, RGB565 or something called "High Color" (I assume it's 15 bit RGB for some reason). – deefha Dec 11 '17 at 09:41
  • @usr2564301 ad 3) Yes, I can find smallest picture. Stay tuned, it will take a while. – deefha Dec 11 '17 at 09:43
  • 1
    @usr2564301 Added example 06-04002-0004 - the smallest picture I've found (212x160 px). It's the first example now. – deefha Dec 11 '17 at 11:17
  • 1
    Added info about debug symbols I've found in EXE. Sounds interesting! – deefha Dec 11 '17 at 11:49
  • 1
    Can you post the exe? – cimarron Dec 11 '17 at 16:21
  • @cimarron Of course, here is it: https://filebin.ca/3kL3kgNAqlJI Also added as Edit 3 into original question. – deefha Dec 11 '17 at 16:30