Check for messages[mtype]._instances is None#822
Conversation
| return | ||
| instance_value = getattr(msg, msg._instance_field) | ||
| if not mtype in messages: | ||
| if mtype not in messages or messages[mtype]._instances is None: |
There was a problem hiding this comment.
Why not check for msg.get_type() == 'BAD_DATA' at the top of this method, and not do silly things with invalid data?
There was a problem hiding this comment.
Thanks for the review! @peterbarker
That fixes the bug, and pytest succeeds. But has the side effect of never adding BAD_DATA messages to mavfile sysid_state.messages, which is a slight change to behavior. I looked for all usages in this repo, and I don't see any problems, and in fact I imagine it might fix an obscure class of bugs. If you are OK w/ this I'll make the change.
There was a problem hiding this comment.
I made this change and rebased. Tests pass locally. Thanks.
There was a problem hiding this comment.
Whoops, I added the if test before the docstring. Fixed.
a11ee02 to
d21387c
Compare
54a19ce to
2e91367
Compare
|
Whoops, I lost track of this. Thanks @peterbarker and @tridge for the review and fix. I rebased. |
| def add_message(messages, mtype, msg): | ||
| '''add a msg to array of messages, taking account of instance messages''' | ||
| if msg.get_type() == 'BAD_DATA': | ||
| # avoid dealing with instance field |
There was a problem hiding this comment.
Have you made sure that BAD_DATA still appears in MAVProxy's messages command after this change?
There was a problem hiding this comment.
I have confirmed that BAD_DATA messages are still delivered by recv_match() from a tlog file after this change.
Without fix, crashes on the 5th bad message:
(.venv) pymavlink ~/dive_logs/seaq_logs (main=)$ /home/clyde/projects/ardusub_log_tools/tlog_bad_data.py --verbose 2023_04_18_surftrak3/2023-04-18\ 11-03-34.tlog
Processing 1 files
-------------------
Results for 2023_04_18_surftrak3/2023-04-18 11-03-34.tlog
BadDataMsg mavlink2=True sysid=1 compid=194 msg_id=132 reason: invalid MAVLink CRC in msgID 132 0x2fb3 should be 0x6d00
BadDataMsg mavlink2=True sysid=1 compid=194 msg_id=132 reason: invalid MAVLink CRC in msgID 132 0x8261 should be 0x4c16
BadDataMsg mavlink2=True sysid=1 compid=194 msg_id=132 reason: invalid MAVLink CRC in msgID 132 0xe104 should be 0x27d7
BadDataMsg mavlink2=True sysid=1 compid=194 msg_id=132 reason: invalid MAVLink CRC in msgID 132 0x63c9 should be 0x1e79
CRASH WITH ERROR "'NoneType' object does not support item assignment" READING 2023_04_18_surftrak3/2023-04-18 11-03-34.tlog
Same file with fix (not verbose). Most (not all) bad messages were DISTANCE_SENSOR.
(.venv) pymavlink ~/dive_logs/seaq_logs (main=)$ /home/clyde/projects/ardusub_log_tools/tlog_bad_data.py 2023_04_18_surftrak3/2023-04-18\ 11-03-34.tlog
Processing 1 files
-------------------
Results for 2023_04_18_surftrak3/2023-04-18 11-03-34.tlog
msg_id 21 count 6
msg_id 76 count 4
msg_id 77 count 4
msg_id 132 count 1949
msg_id 259 count 1
msg_id 260 count 1
msg_id 262 count 1
msg_id 269 count 1
1967 BAD_DATA messages, 1967 of them were CRC errors
There was a problem hiding this comment.
With MAVExplorer.py, before fix:
MAV> dump DISTANCE_SENSOR
MAV> ERROR in command ['DISTANCE_SENSOR']: 'NoneType' object does not support item assignment
With fix:
MAV> dump DISTANCE_SENSOR
MAV> 2023-04-18 10:58:18.373 DISTANCE_SENSOR {time_boot_ms : 214433, min_distance : 20, max_distance : 5000, current_distance : 3948, type : 1, id : 1, orientation : 25, covariance : 255, horizontal_fov : 0.5199999809265137, vertical_fov : 0.5199999809265137, quaternion : [0.0, 0.0, 0.0, 0.0], signal_quality : 6}
2023-04-18 10:58:18.373 DISTANCE_SENSOR {time_boot_ms : 214517, min_distance : 20, max_distance : 5000, current_distance : 3942, type : 1, id : 1, orientation : 25, covariance : 255, horizontal_fov : 0.5199999809265137, vertical_fov : 0.5199999809265137, quaternion : [0.0, 0.0, 0.0, 0.0], signal_quality : 7}
...
There was a problem hiding this comment.
OK, I think I do need to understand what's going on here.
It feels like we might be adding a BAD_DATA message to messages['BAD_DATA'] - and that object does not behave enough like a "normal" message to have things work. We may need to work out in what way the BAD_DATA doesn't match expected behaviour and either fix the caller or fix the object.
Where is the 'NoneType' object does not support item assignment exception being raised? set debug 100 (I made the number up) might give you a full stack trace to work that out.
You didn't actually answer my question here, BTW. If I'm understanding this even a little then messages in MAVProxy when running on a poor link will now not contain the BAD_DATA message, which would be a regression. We should be able to treat the thing as a normal message.
There was a problem hiding this comment.
Hi @peterbarker -- I added the stack trace and some notes to the linked issue.
Yeah, I didn't test the "bad comms" case. I will need to rig something up to test this.
I need to spend more time thinking about the appropriate fix. Feel free to close this PR.
Happy New Year!
There was a problem hiding this comment.
Hi @peterbarker -- I added the stack trace and some notes to the linked issue.
Yeah, I didn't test the "bad comms" case. I will need to rig something up to test this.
I need to spend more time thinking about the appropriate fix. Feel free to close this PR.
Nono - this is obviously a problem, and I'd not want to lose the thinking that's been put in here.
Happy New Year!
Thanks!
There was a problem hiding this comment.
I got nerd-sniped by this, so decided to look into it with Clyde's latest debug info, and have written up investigation notes in the issue that I believe explain what's going on and how to properly fix it.
Tl;dr:
add_messageis being erroneously called with a name (instead of a type), so the handling code is using conflicting information- I don't think it should be being passed the type in the first place, since it can just get it from the message object using
msg.get_type().
2e91367 to
7571782
Compare
5c3e1d1 to
0213bd2
Compare
0213bd2 to
a2b7499
Compare
This is a small change to fix #807. I've tested this against quite a few tlog files over the past 2 weeks and it works great. Running pytest comes up clean.
See the longer discussion in https://discuss.bluerobotics.com/t/mavlogdump-py-crashes-while-reading-distance-sensor-msgs-in-some-tlog-files/14228
Thanks,
/Clyde