The subject of this stream (and all it relates to) is all pretty damn obvious in retrospect, of course.
Maybe some of these lessons might've been learned by just reading C-based tutorials? Let's think about that...
Allons-y!
Recently having a requirement to do some stuff with a packed/binary protocol, and not being the greatest fan of C (or rather, the effort involved with, and the userland around it), I decided to use Python and the relatively excellent Construct library for this.
pip install construct
after initializing a venv, fire up some test data, and so far so good.
Wrong assumption number one
Validating the start of data, and then reading forward for the fixed packet/data length
Why this is wrong
Corruption comes in all flavours, including too-short messages in the datastream. Validate with a window of n>=2 that reading your normal message length doesn't eat into the start of a next message.
Example
Assuming our first 4 bytes identify a message, and writing some python-esque pseudocode (where function or variable names are not written, consider them already implemented for the example case)
def validate(data):
messageOffsets = findMessages(data)
if messageOffsets[1] < messageLength:
return badMessage
else:
""" validation here """
if data[0:4] == '\x01\x02\x03\x04':
return data[:messageLength]
If you forget lesson one, you are in for a whole lot of pain. None of your following data will make any sense whatsoever.
Lesson two
hd(1)
is your friend.
elegua% echo hello | hexdump -v -e '/1 "%02X "'
68 65 6C 6C 6F 0A
elegua% hd -n 32 example
00000000 05 02 01 01 54 00 75 02 56 00 80 02 28 00 2d 00 |....T.u.V...(.-.|
00000010 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................|
elegua% hd -s 17 -n 16 example
00000011 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................|
elegua% hd -s 12 -n 10 example
0000000c 28 00 2d 00 00 00 00 00 00 00 |(.-.......|
In this way you can read any given file offset you need to, to see what's wrong where (like in the case of data being too short).
Note: hd
is an alternate binary name for hexdump
Lesson 3
Remember to implement the checksums (if there are some)! This will help in the following cases:
findData && badMessage
structure would break downFor either of these cases, you could attempt a parse and know whether you have a valid message or not. This would then roughly become like so:
def validate(data):
messageOffsets = findMessages(data)
if messageOffsets[1] < messageLength:
try:
parseMessage(data[:messageLength])
return data[:messageLength]
except badDecode:
return failure
else:
""" validation here """
if data[0:4] == '\x01\x02\x03\x04':
return data[:messageLength]