The subject of this stream (and all it relates to) is all pretty damn obvious in retrospect, of course. Maybe some of these lessons might've been learned by just reading C-based tutorials? Let's think about that... ![nope](https://elegua.za.net/404/images/Nope.png) __Allons-y!__ ---- Recently having a requirement to do some stuff with a packed/binary protocol, and not being the greatest fan of C (or rather, the effort involved with, and the userland around it), I decided to use Python and the relatively excellent [Construct library](https://pypi.python.org/pypi/construct) for this. `pip install construct` after initializing a venv, fire up some test data, and so far so good. ---- __Wrong assumption number one__ Validating the start of data, and then reading forward for the fixed packet/data length __Why this is wrong__ Corruption comes in all flavours, including too-short messages in the datastream. Validate with a window of _n>=2_ that reading your normal message length doesn't eat into the start of a next message. __Example__ Assuming our first 4 bytes identify a message, and writing some python-esque pseudocode (where function or variable names are not written, consider them already implemented for the example case)
def validate(data):
messageOffsets = findMessages(data)
if messageOffsets[1] < messageLength:
return badMessage
else:
""" validation here """
if data[0:4] == '\x01\x02\x03\x04':
return data[:messageLength]
----
If you forget lesson one, you are in for *a whole lot* of pain. None of your following data will make any sense whatsoever.
----
__Lesson two__
`hd(1)` is your friend.
elegua% echo hello | hexdump -v -e '/1 "%02X "'
68 65 6C 6C 6F 0A
elegua% hd -n 32 example
00000000 05 02 01 01 54 00 75 02 56 00 80 02 28 00 2d 00 |....T.u.V...(.-.|
00000010 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................|
elegua% hd -s 17 -n 16 example
00000011 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................|
elegua% hd -s 12 -n 10 example
0000000c 28 00 2d 00 00 00 00 00 00 00 |(.-.......|
In this way you can read any given file offset you need to, to see what's wrong where (like in the case of data being too short).
_Note_: `hd` is an alternate binary name for [`hexdump`](http://linux.die.net/man/1/hexdump)
----
__Lesson 3__
Remember to implement the checksums (if there are some)! This will help in the following cases:
* your protocol doesn't have a clearly defined end-of-message marker
* chances are that your message might also have the header in the payload data, in which case our aforementioned `findData && badMessage` structure would break down
For either of these cases, you could attempt a parse and know whether you have a valid message or not. This would then roughly become like so:
def validate(data):
messageOffsets = findMessages(data)
if messageOffsets[1] < messageLength:
try:
parseMessage(data[:messageLength])
return data[:messageLength]
except badDecode:
return failure
else:
""" validation here """
if data[0:4] == '\x01\x02\x03\x04':
return data[:messageLength]