I'm trying to reverse engineer a binary format, and I'm running up against a variable-length sequence of numbers which varies between files. I can't work out the pattern here and how to determine how long this sequence is. I'm 99% sure that the sequence is self-contained and isn't reliant on any information from the preceding parts of the file.
An example (in hex) of the sequence is:
06 00 00 00 01 00 00 00 fd ff ff ff 08 00 00 00 f7 ff ff ff 0e 00 00 00
To me, this looks like a sequence of little endian signed integers (i.e. 6, 1, -3, 8, -9, 14)
Here's some sequences I've collected:
- 6, 1, -3, 8, -9, 14
- 5, -11, 10, 13, -14, 27
- 2, 0, 3
- 3, 6, -7, 9
- 3, 1, -3
- 348, 1, -348
- 4, 1, -4
- 18, 1, -18
- 28, 1, -28
- 244, 1, -244
- 70, 1, -70
- 449, 1, -446, 453, 455, -456
- 12, 0, -1, 5, 9, 11, -12, 14, 17, 19, 25, 31, 33
- 69, 7, 13, 31, 87, 136, 168, 267,275, 277, 323, -324, 327, 329, 331, 334, -334, 337, 340, 344, 371, 379, 383, 386, 416, 422, -423, 426, 429, 432, 446, 686, -689, 692, -709,725, -727, 731, 833, 837, 841, 845, 856, 860, 868, 877, -878, 883, 885, 887, 909
- 1, 2
- 1, 861
- 1, 22
- 1, 1
- 1, 78
- 1, 93
- 1, 5
- 1, 9
- 1, 4016
Any ideas?
Update: As requested, some extra context:
- This is part of a "LYR" document format, used by the ESRI ArcGIS mapping software. LYR files are structured using the Microsoft Compound Document Format. All the useful content is stored in a single file contained inside the document.
- This file stream uses (a variation of) the COM IPersistStream interface to encode the object's contents/properties
- LYR objects consist of a hierarchy of objects -- they basically directly represent the original object class structure, with layer containing a renderer which itself contains multiple fill/line/marker symbols, a "labeling properties" object, a "field" set, etc.
- This sequence occurs mid-way through reading the bytes encoded by a "FeatureLayer" object. It's almost right in the middle of the content encoded by FeatureLayer objects, and doesn't fall into any headers or footers.
- For 99% of the files encountered, this sequence consists of a single
00 00 00 00
integer. - There's absolutely no chance of missed digits in the sequences above -- the sequences are bordered on either side by bytes of known purpose (the persisted stream representing the "area of interest" Envelope object for the layer and an array of layer "Extension" objects -- both of which are direct members of the FeatureLayer class).
- This sequence does not seem to contain any actual useful content -- every member of the FeatureLayer object class is encoded elsewhere
- The files aren't corrupt, and will open fine in the original software (ArcMap). However, saving them without any changes results in the sequence being removed from the file. It's possibly something which was only written in earlier versions of the software, OR it represents some temporary/cached values which don't directly form part of the persisted object state...