Conversation
414753b to
1715920
Compare
|
Can we move the ordering and semaphore fixes out to their own PR? They grew a little more complicated than initially imagined. I'm also not sure if they're responsible for CI failing. They are also more worth backporting in ardupilot. |
|
I found some other issues with the constructor ordering, fixed them, and broke it out into #76 . |
02a5b32 to
c539851
Compare
|
@tpwrules @bugobliterator can I get a review of the new changes? |
There was a problem hiding this comment.
I think things are a little mixed up in my head still but I've provided some comments.
Do you have more detailed design info or docs on what Lua (and indeed libcanard as it currently exists) should be able to do? The stuff this code actually does conflicts a bit with my impressions and the capabilities in the bindings you've proposed.
| Subscriber<msgtype>* next; | ||
| static Subscriber<msgtype> *branch_head[CANARD_NUM_HANDLERS]; |
There was a problem hiding this comment.
I like the idea of getting rid of these, it saves a lot of complexity and memory for something that I don't think was really ever used. However, I am concerned about the performance as handling each message will now have to search through the full list of handlers, instead of only the (on average) half until it gets to its desired handler.
Could we make link() do a sort of sorted insert and keep the same types together? If there are none existing, then just put it at the front to preserve existing behavior. Then the handler loop can early exit after it finds all handlers of the current type.
There was a problem hiding this comment.
i think we could make it an expanding array possibly with sort on insert and bisection search in the future, but I don't think it is needed for this PR.
having more predictable performance cost is in some ways a benefit. It was essentially random before
There was a problem hiding this comment.
I see your point about the randomness, but I think a 100% overhead is at least worth benchmarking. Note also that this means we pay the decode cost multiple times, which can be high.
There was a problem hiding this comment.
we don't pay the decode cost multiple times unless we actually have multiple subscribers to the message, which we don't unless a lua script subscribes to a message that ardupilot also subscribes to. That will not be common, and when it does happen we don't have much of a choice as we can't instantiate the templated type at runtime in lua based on variables for the data type in the script
| if (transfer.data_type_id == entry->msgid && | ||
| entry->transfer_type == transfer.transfer_type) { | ||
| entry->handle_message(transfer); | ||
| return; |
There was a problem hiding this comment.
It's unclear to me why we need to be able to call multiple handler list entries in the first place anyway (c.f. first commit in this PR), but I don't think I've chewed on it enough. Theoretically if you created a new Subscriber, then it would be in the branch list of any old Subscriber and get called.
Should the first commit just be dropped? I think the commit to simplify handling in effect reverts it?
There was a problem hiding this comment.
i've folded that commit into the later commit
c539851 to
240363c
Compare
|
@tpwrules the reason we need the get_semaphore() is mutex ordering. If we don't et that mutex before we get the sem in the scripting code then we deadlock if a message arrives during deletion of a Subscriber in DroneCAN_Handle::check_message(). It happens in SITL if you comment out the get_semaphore() code, DroneCAN thread is in handle_message(), scripting in check_message() |
240363c to
a2f0df9
Compare
tpwrules
left a comment
There was a problem hiding this comment.
Setting aside performance and get_semaphore(), if there is indeed intended to be no change in semantics, I think the return value of handle_message is confusing and changes things.
For example, a memory allocation failure should not affect which handlers get run; the Lua stuff violates this rule.
| entry->transfer_type == transfer.transfer_type) { | ||
| entry->handle_message(transfer); | ||
| return; | ||
| if (entry->handle_message(transfer) && |
There was a problem hiding this comment.
I think the semantics of this return value are not good and don't match the original behavior. I think handle_message should return true if the handler is the final consumer of the message and false if other handlers should get a chance. Then the loop here also does not need to check the transfer type, it is implied by the handler.
There was a problem hiding this comment.
i changed it so the return value means that the message was for the right recipient
| /// @brief handles incoming messages | ||
| /// @param transfer transfer object of the request | ||
| void handle_message(const CanardRxTransfer& transfer) override NOINLINE_FUNC { | ||
| bool handle_message(const CanardRxTransfer& transfer) override NOINLINE_FUNC { |
There was a problem hiding this comment.
Should return true iff server_node_id and transfer_id match.
| /// @brief parse the message and call the callback | ||
| /// @param transfer transfer object | ||
| void handle_message(const CanardRxTransfer& transfer) override NOINLINE_FUNC { | ||
| bool handle_message(const CanardRxTransfer& transfer) override NOINLINE_FUNC { |
There was a problem hiding this comment.
Should return true iff decode fails (avoid wasting time on future failing decodes).
There was a problem hiding this comment.
not quite, one of the advantages of lua DroneCAN is we can fix incorrect DSDL. We have had that from vendors, where DSDL that we don't control (eg. Hobbywing) changes. If we try to short circuit on decode failure then a lua receiver can't fix the issue
There was a problem hiding this comment.
Okay fair enough, then it should always return false.
| /// @param transfer transfer object of the request | ||
| void handle_message(const CanardRxTransfer& transfer) override NOINLINE_FUNC { | ||
| /// @return true if message has been handled | ||
| bool handle_message(const CanardRxTransfer& transfer) override NOINLINE_FUNC { |
There was a problem hiding this comment.
Should always return true.
| Subscriber<msgtype>* next; | ||
| static Subscriber<msgtype> *branch_head[CANARD_NUM_HANDLERS]; |
There was a problem hiding this comment.
I see your point about the randomness, but I think a 100% overhead is at least worth benchmarking. Note also that this means we pay the decode cost multiple times, which can be high.
use a single linked list for message reception, not a list of lists this allows for the creation of new HandlerList handlers at runtime
ba4650c to
52888aa
Compare
tpwrules
left a comment
There was a problem hiding this comment.
Would have liked to see this ever so slightly better documented and maybe the semantics could still use improvement but I like the simplification.
This makes some changes to support handling DroneCAN messages in lua scripting in ArduPilot
See PR ArduPilot/ardupilot#29310
This greatly simplifies the code, and also saves a lot of flash (saves 5.8k of flash on CubeOrange)