Skip to content

changes to support lua DroneCAN messaging#74

Merged
tridge merged 1 commit intomasterfrom
pr-canard-list-sem
May 5, 2025
Merged

changes to support lua DroneCAN messaging#74
tridge merged 1 commit intomasterfrom
pr-canard-list-sem

Conversation

@tridge
Copy link
Member

@tridge tridge commented Feb 17, 2025

This makes some changes to support handling DroneCAN messages in lua scripting in ArduPilot
See PR ArduPilot/ardupilot#29310

  • allow more than one HandlerList object per message type
  • added a get_semaphore() static method
  • removed the double-link-list system and use a single linked list
  • move the locking inside HandlerList

This greatly simplifies the code, and also saves a lot of flash (saves 5.8k of flash on CubeOrange)

@tpwrules
Copy link
Collaborator

tpwrules commented Feb 17, 2025

Can we move the ordering and semaphore fixes out to their own PR? They grew a little more complicated than initially imagined. I'm also not sure if they're responsible for CI failing. They are also more worth backporting in ardupilot.

@tpwrules
Copy link
Collaborator

I found some other issues with the constructor ordering, fixed them, and broke it out into #76 .

@tridge tridge force-pushed the pr-canard-list-sem branch 3 times, most recently from 02a5b32 to c539851 Compare April 19, 2025 23:40
@tridge
Copy link
Member Author

tridge commented Apr 21, 2025

@tpwrules @bugobliterator can I get a review of the new changes?

Copy link
Collaborator

@tpwrules tpwrules left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think things are a little mixed up in my head still but I've provided some comments.

Do you have more detailed design info or docs on what Lua (and indeed libcanard as it currently exists) should be able to do? The stuff this code actually does conflicts a bit with my impressions and the capabilities in the bindings you've proposed.

Comment on lines -92 to -93
Subscriber<msgtype>* next;
static Subscriber<msgtype> *branch_head[CANARD_NUM_HANDLERS];
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like the idea of getting rid of these, it saves a lot of complexity and memory for something that I don't think was really ever used. However, I am concerned about the performance as handling each message will now have to search through the full list of handlers, instead of only the (on average) half until it gets to its desired handler.

Could we make link() do a sort of sorted insert and keep the same types together? If there are none existing, then just put it at the front to preserve existing behavior. Then the handler loop can early exit after it finds all handlers of the current type.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i think we could make it an expanding array possibly with sort on insert and bisection search in the future, but I don't think it is needed for this PR.
having more predictable performance cost is in some ways a benefit. It was essentially random before

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see your point about the randomness, but I think a 100% overhead is at least worth benchmarking. Note also that this means we pay the decode cost multiple times, which can be high.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we don't pay the decode cost multiple times unless we actually have multiple subscribers to the message, which we don't unless a lua script subscribes to a message that ardupilot also subscribes to. That will not be common, and when it does happen we don't have much of a choice as we can't instantiate the templated type at runtime in lua based on variables for the data type in the script

if (transfer.data_type_id == entry->msgid &&
entry->transfer_type == transfer.transfer_type) {
entry->handle_message(transfer);
return;
Copy link
Collaborator

@tpwrules tpwrules Apr 24, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's unclear to me why we need to be able to call multiple handler list entries in the first place anyway (c.f. first commit in this PR), but I don't think I've chewed on it enough. Theoretically if you created a new Subscriber, then it would be in the branch list of any old Subscriber and get called.

Should the first commit just be dropped? I think the commit to simplify handling in effect reverts it?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i've folded that commit into the later commit

@tridge tridge force-pushed the pr-canard-list-sem branch from c539851 to 240363c Compare April 25, 2025 01:30
@tridge
Copy link
Member Author

tridge commented Apr 25, 2025

@tpwrules the reason we need the get_semaphore() is mutex ordering. If we don't et that mutex before we get the sem in the scripting code then we deadlock if a message arrives during deletion of a Subscriber in DroneCAN_Handle::check_message(). It happens in SITL if you comment out the get_semaphore() code, DroneCAN thread is in handle_message(), scripting in check_message()

@tridge tridge force-pushed the pr-canard-list-sem branch from 240363c to a2f0df9 Compare April 25, 2025 01:44
Copy link
Collaborator

@tpwrules tpwrules left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Setting aside performance and get_semaphore(), if there is indeed intended to be no change in semantics, I think the return value of handle_message is confusing and changes things.

For example, a memory allocation failure should not affect which handlers get run; the Lua stuff violates this rule.

entry->transfer_type == transfer.transfer_type) {
entry->handle_message(transfer);
return;
if (entry->handle_message(transfer) &&
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the semantics of this return value are not good and don't match the original behavior. I think handle_message should return true if the handler is the final consumer of the message and false if other handlers should get a chance. Then the loop here also does not need to check the transfer type, it is implied by the handler.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i changed it so the return value means that the message was for the right recipient

/// @brief handles incoming messages
/// @param transfer transfer object of the request
void handle_message(const CanardRxTransfer& transfer) override NOINLINE_FUNC {
bool handle_message(const CanardRxTransfer& transfer) override NOINLINE_FUNC {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should return true iff server_node_id and transfer_id match.

/// @brief parse the message and call the callback
/// @param transfer transfer object
void handle_message(const CanardRxTransfer& transfer) override NOINLINE_FUNC {
bool handle_message(const CanardRxTransfer& transfer) override NOINLINE_FUNC {
Copy link
Collaborator

@tpwrules tpwrules Apr 25, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should return true iff decode fails (avoid wasting time on future failing decodes).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not quite, one of the advantages of lua DroneCAN is we can fix incorrect DSDL. We have had that from vendors, where DSDL that we don't control (eg. Hobbywing) changes. If we try to short circuit on decode failure then a lua receiver can't fix the issue

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay fair enough, then it should always return false.

/// @param transfer transfer object of the request
void handle_message(const CanardRxTransfer& transfer) override NOINLINE_FUNC {
/// @return true if message has been handled
bool handle_message(const CanardRxTransfer& transfer) override NOINLINE_FUNC {
Copy link
Collaborator

@tpwrules tpwrules Apr 25, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should always return true.

Comment on lines -92 to -93
Subscriber<msgtype>* next;
static Subscriber<msgtype> *branch_head[CANARD_NUM_HANDLERS];
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see your point about the randomness, but I think a 100% overhead is at least worth benchmarking. Note also that this means we pay the decode cost multiple times, which can be high.

use a single linked list for message reception, not a list of lists

this allows for the creation of new HandlerList handlers at runtime
@tridge tridge force-pushed the pr-canard-list-sem branch 2 times, most recently from ba4650c to 52888aa Compare May 1, 2025 22:44
Copy link
Collaborator

@tpwrules tpwrules left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would have liked to see this ever so slightly better documented and maybe the semantics could still use improvement but I like the simplification.

@tridge tridge merged commit 5d7b725 into master May 5, 2025
4 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants