Any idea how to decode this binary data?

Question

I have binary data representing a table.

Here's the data when I print it with Python's repr(): \xff\xff\x05\x04test\x02A\x05test1@\x04\x03@@\x04\x05@0\x00\x00@\x05\x05test2\x03\x05\x05test1\x06@0\x00\x01@\x00

Here's what the table looks like in the proprietary software.

        test1
        test1test1
test          test1
test1
test1test2


test1
test1
test1
        test1
        test1


test1
test1

I was able to guess some of it:

It's column by column then cell by cell, starting at the top left cell.
The \x04 in \x04test seems to be the length (in bytes I guess) of the following word.
@ mean the last value

Anyone knows if the data is following a standard or have any tips how to decode it?

Thanks!

Here's an example with python :

from struct import unpack


def DecodeData(position):
    print "position", position
    firstChar = data[position:][:1]
    size_in_bytes = unpack('B', firstChar)[0]
    print "firstChar: {0}. size_in_bytes: {1}".format(repr(firstChar), size_in_bytes)
    return size_in_bytes


def ReadWord(position, size_in_bytes):
    word = unpack('%ds' % size_in_bytes, data[position:][:size_in_bytes])[0]
    print "word:", word

data = "\xff\xff\x05\x04test\x02A\x05test1@\x04\x03@@\x04\x05@0\x00\x00@\x05\x05test2\x03\x05\x05test1\x06@0\x00\x01@\x00"

position = 0

print ""
position += 1
DecodeData(position)
print "\\xff - ?"

print ""
position += 1
DecodeData(position)
print "\\x05 - ?"

print ""
position += 1
size_in_bytes = DecodeData(position)
position += 1
ReadWord(position, size_in_bytes)


print ""
position += size_in_bytes
DecodeData(position)
position += 1
DecodeData(position)
print """'2A' : could be to say that "test" has 2 empty cells before it"""

print ""
position += 1
size_in_bytes = DecodeData(position)
position += 1
word = unpack('%ds' % size_in_bytes, data[position:][:size_in_bytes])[0]
print "word:", word

position += size_in_bytes

DecodeData(position)
print """@: mean that there's another "test1" cell"""

print ""
position += 1
DecodeData(position)
position += 1
DecodeData(position)
print "\\x04\\x03 - Could be that the next value is 3 cells down"

print ""
position += 1
DecodeData(position)
print ""
position += 1
print "@@ - Seems to mean 3 repetitions"

print ""
position += 1
DecodeData(position)
position += 1
DecodeData(position)
print "\\x04\\x05 - Could be that the next value is 5 cells down"

print ""
position += 1
DecodeData(position)
print "@ - repetition"

print ""
position += 1
DecodeData(position)

print ""
position += 1
DecodeData(position)
position += 1
DecodeData(position)
print "\\x00\\x00 - That could mean to move to the first cell on the next column"

print ""
position += 1
DecodeData(position)
print "@ - repetition"

print ""
position += 1
DecodeData(position)
print "\\x05 - ?"

print ""
position += 1
size_in_bytes = DecodeData(position)
position += 1
word = unpack('%ds' % size_in_bytes, data[position:][:size_in_bytes])[0]
print "word:", word
position += size_in_bytes

print ""
DecodeData(position)
print "\\x03 - Could be to tell that the pervious word 'test2' is 3 cells down"

print ""
position += 1
DecodeData(position)
print "\\x05 - ?"

print ""
position += 1
size_in_bytes = DecodeData(position)
position += 1
word = unpack('%ds' % size_in_bytes, data[position:][:size_in_bytes])[0]
print "word:", word
position += size_in_bytes

print ""
DecodeData(position)
print "\\x06 - Could be to tell that the pervious word 'test1' is 6 cells down"

print ""
position += 1
DecodeData(position)
print "@ - repetition"

print ""
position += 1
DecodeData(position)
print "\\0 - ?"

print ""
position += 1
DecodeData(position)
position += 1
DecodeData(position)
print "\\x00\\x01 - Seems to mean, next column second cell"

print ""
position += 1
DecodeData(position)
print "@ - repetition"

print ""
position += 1
DecodeData(position)
print "\\x00 - end of data or column"

Do you have the module itself? It would be almost trivial to disassemble the repr function, assuming this is registered the normal way it is in C extension modules. — 0xC0000022L, Apr 30 '13 at 20:51
I'm not sure I understand what you mean by "module". But here's the data in HEX 0xFFFF050474657374024105746573743140040340400405403000004005057465737432030505746573743106403000014000. I'm using repr() only to get rid of the 'Decode error - output not utf-8' message in python so you can ignore that. — bbigras, Apr 30 '13 at 21:05
The data belongs to some kind of object and that usually belongs to a module, such as the ones you import in Python ;) — 0xC0000022L, Apr 30 '13 at 21:07
The data is from a varbinary field in a MSSQL database which is used by a proprietary and uncooperative software. — bbigras, Apr 30 '13 at 21:10
It would make sense to have the software that processes the data. See here. Basically too little info to help you. — 0xC0000022L, Apr 30 '13 at 21:12
I don't have the code of the proprietary software but I added some python code that I use to try to figure out the structure. — bbigras, May 01 '13 at 18:02
It would be extremely helpful to this newbie (me) if a few different examples of the data could be provided. — Sam Axe, May 02 '13 at 18:41

score 7 · Accepted Answer · edited May 04 '13 at 02:34

Here's an explanation for what I think the individual symbols mean. I'm basing this around the presumption that a little selector is going through the cells, one by one.

\xFF = Null cell
\x05 = A string is following, with \xNumber coming after the string to define how far to displace the string from the selector's current position, if at all.
\xNumber string = A string of length number
\x2A = Could be a byte that says not to displace the current string, and also to assume that the next piece of data is defining a string to be placed in the next cell. Questionable meaning.
\x04 \xNumber = Move selector ahead \xNumber cells and place previous string into there.
0 \x00 \x0Number = New column, move selector into row \xNumber, and place previous string into there. @ = Place previously used string in the cell following the current one.

So here's my interpretation of the data you're giving us:

\xFF\xFF = two null cells
\x05 = A cell, singular, with a string, placed following the null cells, because of the \x2A following the string
\x04 test = The string.
\x2A \x05 test1 = Another string placed into the cell following. No number needed, since \x2A implies that it's being placed right after "test"
@ = Place "test1" into the cell after the "test1" string was first placed.
\x04 \x03 = Move selector ahead three cells and place test1 where it lands.
@@ = Place into the two cells following also.
\x04 \x05 @ = Skip four cells, place into two cells.
0 = New column.
\x00 \x00 @ = Using string last defined (test1), place into first two cells of the column.
\x05 \x05 test2 \x03 = Place a cell three cells afterwords.
\x05\x05test1\x06 = Place test1 into a cell 6 after test2
@ = Place test1 again, too.
0 = move to next column
\x00\x01 = Place previous string at location 01
@ = And also at location 02
\x00 = Done

Explanation: My method was to look for a pattern, check if the pattern withstood further scrutiny - the first pattern I checked seemed to - and clear up any minor issues I had with it. Seems to have worked.

Please format your answers properly. @0xC0000022L did it for you this time. — asheeshr, May 05 '13 at 03:46

Any idea how to decode this binary data?

1 Answers1