Lesson 3
Remember to implement the checksums (if there are some)! This will help in the following cases:
findData && badMessage
structure would break downFor either of these cases, you could attempt a parse and know whether you have a valid message or not. This would then roughly become like so:
def validate(data):
messageOffsets = findMessages(data)
if messageOffsets[1] < messageLength:
try:
parseMessage(data[:messageLength])
return data[:messageLength]
except badDecode:
return failure
else:
""" validation here """
if data[0:4] == '\x01\x02\x03\x04':
return data[:messageLength]
Lesson two
hd(1)
is your friend.
elegua% echo hello | hexdump -v -e '/1 "%02X "'
68 65 6C 6C 6F 0A
elegua% hd -n 32 example
00000000 05 02 01 01 54 00 75 02 56 00 80 02 28 00 2d 00 |....T.u.V...(.-.|
00000010 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................|
elegua% hd -s 17 -n 16 example
00000011 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................|
elegua% hd -s 12 -n 10 example
0000000c 28 00 2d 00 00 00 00 00 00 00 |(.-.......|
In this way you can read any given file offset you need to, to see what's wrong where (like in the case of data being too short).
Note: hd
is an alternate binary name for hexdump
If you forget lesson one, you are in for a whole lot of pain. None of your following data will make any sense whatsoever.
Wrong assumption number one
Validating the start of data, and then reading forward for the fixed packet/data length
Why this is wrong
Corruption comes in all flavours, including too-short messages in the datastream. Validate with a window of n>=2 that reading your normal message length doesn't eat into the start of a next message.
Example
Assuming our first 4 bytes identify a message, and writing some python-esque pseudocode (where function or variable names are not written, consider them already implemented for the example case)
def validate(data):
messageOffsets = findMessages(data)
if messageOffsets[1] < messageLength:
return badMessage
else:
""" validation here """
if data[0:4] == '\x01\x02\x03\x04':
return data[:messageLength]
Recently having a requirement to do some stuff with a packed/binary protocol, and not being the greatest fan of C (or rather, the effort involved with, and the userland around it), I decided to use Python and the relatively excellent Construct library for this.
pip install construct
after initializing a venv, fire up some test data, and so far so good.
The subject of this stream (and all it relates to) is all pretty damn obvious in retrospect, of course.
Maybe some of these lessons might've been learned by just reading C-based tutorials? Let's think about that...
Allons-y!