The subject of this stream (and all it relates to) is all pretty damn obvious in retrospect, of course. Maybe some of these lessons might've been learned by just reading C-based tutorials? Let's think about that... ![nope](https://elegua.za.net/404/images/Nope.png) __Allons-y!__ ---- Recently having a requirement to do some stuff with a packed/binary protocol, and not being the greatest fan of C (or rather, the effort involved with, and the userland around it), I decided to use Python and the relatively excellent [Construct library](https://pypi.python.org/pypi/construct) for this. `pip install construct` after initializing a venv, fire up some test data, and so far so good. ---- __Wrong assumption number one__ Validating the start of data, and then reading forward for the fixed packet/data length __Why this is wrong__ Corruption comes in all flavours, including too-short messages in the datastream. Validate with a window of _n>=2_ that reading your normal message length doesn't eat into the start of a next message. __Example__ Assuming our first 4 bytes identify a message, and writing some python-esque pseudocode (where function or variable names are not written, consider them already implemented for the example case)
def validate(data):
	messageOffsets = findMessages(data)
	if messageOffsets[1] < messageLength:
		return badMessage
	else:
		""" validation here """
		if data[0:4] == '\x01\x02\x03\x04':
			return data[:messageLength]
---- If you forget lesson one, you are in for *a whole lot* of pain. None of your following data will make any sense whatsoever. ---- __Lesson two__ `hd(1)` is your friend.
elegua% echo hello | hexdump -v -e '/1 "%02X "'   
68 65 6C 6C 6F 0A

elegua% hd -n 32 example      
00000000  05 02 01 01 54 00 75 02  56 00 80 02 28 00 2d 00  |....T.u.V...(.-.|
00000010  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|

elegua% hd -s 17 -n 16 example
00000011  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|

elegua% hd -s 12 -n 10 example
0000000c  28 00 2d 00 00 00 00 00  00 00                    |(.-.......|
In this way you can read any given file offset you need to, to see what's wrong where (like in the case of data being too short). _Note_: `hd` is an alternate binary name for [`hexdump`](http://linux.die.net/man/1/hexdump) ---- __Lesson 3__ Remember to implement the checksums (if there are some)! This will help in the following cases: * your protocol doesn't have a clearly defined end-of-message marker * chances are that your message might also have the header in the payload data, in which case our aforementioned `findData && badMessage` structure would break down For either of these cases, you could attempt a parse and know whether you have a valid message or not. This would then roughly become like so:
def validate(data):
    messageOffsets = findMessages(data)
    if messageOffsets[1] < messageLength:
        try:
            parseMessage(data[:messageLength])
            return data[:messageLength]
        except badDecode:
            return failure
    else:
        """ validation here """
        if data[0:4] == '\x01\x02\x03\x04':
            return data[:messageLength]