Skip to content

Conversation

@christopherwun
Copy link
Collaborator

@christopherwun christopherwun commented Nov 17, 2025

I'm trying to stabilize the native bno implementation for circuitpython as part of the work to clean up the cionic-circuitpython repo. I was running into nondeterministic hard faults due to invalid memory accesses for a while so tried to dig in and identify the root cause. I found out that the cause for this is in this function:

mp_obj_t common_hal_bno080_BNO080_read(bno080_BNO080_obj_t *self, uint8_t report_id) {
    // mp_obj_t fquat[QUAT_DIMENSION];
    // int rc = 0;

    switch (report_id) {
        case BNO080_SRID_ARVR_GAME_ROTATION_VECTOR:
        case BNO080_SRID_ARVR_ROTATION_VECTOR:
        case BNO080_SRID_GEOMAGNETIC_ROTATION_VECTOR:
        case BNO080_SRID_GAME_ROTATION_VECTOR:
        case BNO080_SRID_ROTATION_VECTOR:
            mp_printf(&mp_plat_print, "[module] read: returning rotation vector\n");
            // print fquat ptrs
            mp_printf(&mp_plat_print, "self=%p, self->fquat=%p\n", self, self->fquat);
            for (int i = 0; i < QUAT_DIMENSION; i++) {
                mp_printf(&mp_plat_print, "fquat[%d] ptr=%p\n", i, self->fquat[i]);
            }
            // print fquat values
            for (int i = 0; i < QUAT_DIMENSION; i++) {
                mp_printf(&mp_plat_print, "fquat[%d] value=%f\n", i, (double)mp_obj_get_float(self->fquat[i]));
            }
            return mp_obj_new_list(QUAT_DIMENSION, self->fquat);
        case BNO080_SRID_ACCELEROMETER:
            mp_printf(&mp_plat_print, "[module] read: returning accelerometer data\n");
            return mp_obj_new_list(ACCEL_DIMENSION, self->accel);
        case BNO080_SRID_GYROSCOPE:
            mp_printf(&mp_plat_print, "[module] read: returning gyroscope data\n");
            return mp_obj_new_list(GYRO_DIMENSION, self->gyro);
        case BNO080_SRID_MAGNETIC_FIELD:
            mp_printf(&mp_plat_print, "[module] read: returning magnetic field data\n");
            return mp_obj_new_list(MAG_DIMENSION, self->mag);
        case BNO080_SRID_GRAVITY:
            mp_printf(&mp_plat_print, "[module] read: returning gravity data\n");
            return mp_obj_new_list(GRAV_DIMENSION, self->grav);
    }

    return NULL;
} 

and it seems to be that the fquat ptrs are sometimes uninitialized when the python layer calls bno080.read() for the first time. This happens when bno080_report_rotation hasn't been called yet (i.e. if the ISR has not picked up the channel report yet). I'm pretty confident that this is the problem but I wasn't as sure about the best fix. In my head the options were:

  • initialize the fquat ptrs in the constructor instead
  • add a check for null/ invalid ptrs in the read function
  • add a timeout to the set feature function (to wait for the report to get picked up after setting the feature) -- this could happen in the C or the Python layer

Here, I implemented bullet 1 and 2 to be extra defensive. I initialized data pointers in the constructor and also check for invalid ptrs in the read function.

Note that this implementation will lead to some garbage values (0,0,0,0) for data streams in the case that a crash would have happened instead. A future implementation could be to return NULL instead of garbage values and update the python-side CionicOrientation class implementation to handle cases where bno08x.read() returns NULL, but I went with this because there are many examples in cionic-circuitpython in non-main branches which would also be affected by this change.

Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR fixes nondeterministic hard faults caused by uninitialized memory accesses in the BNO080 sensor module. The issue occurred when read() was called before the ISR had populated sensor data arrays, resulting in invalid pointer dereferences.

  • Initializes all sensor data arrays (quaternion, accelerometer, gyroscope, magnetometer, gravity) to valid float objects (0.0) in the constructor
  • Re-initializes these arrays in the reset function to handle device resets
  • Adds defensive NULL checks in the read function to catch any remaining edge cases

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@christopherwun christopherwun marked this pull request as ready for review November 18, 2025 15:52
@cionicwear cionicwear deleted a comment from Copilot AI Nov 18, 2025
@cionicwear cionicwear deleted a comment from Copilot AI Nov 18, 2025
@cionicwear cionicwear deleted a comment from Copilot AI Nov 18, 2025

STATIC void bno080_init_data_arrays(bno080_BNO080_obj_t *self) {
for (int i = 0; i < QUAT_DIMENSION; i++) {
self->fquat[i] = mp_obj_new_float(0.0f);

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wouldn’t this create a memory leak? I might be wrong, but from what I recall, mp_obj_new_float allocates a new object on the heap. If self->fquat[i] is not NULL, the existing pointer will be overwritten and effectively lost. The garbage collector is probably helping with this but at the performance cost.

Copy link
Collaborator Author

@christopherwun christopherwun Nov 20, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe you are correct. I first noticed this pattern in the report function when debugging, thinking that this could be a reason for the hard fault (see fn pasted below). But, after looking into it further, since self->fquat points to an mp_obj_t array and Python (and therefore MicroPython) floats are immutable, the struct has to be updated with a new float object pointer on each update. Besides, I haven't seen any problems with memory allocation with the current implementation.

I guess an alternative solution would be to try using the native float instead of an mp_obj_t array (since we don't access self.fquat from the Python layer anyways). I think this would probably be a separate PR though - let me know your thoughts.

STATIC void bno080_report_rotation(bno080_BNO080_obj_t *self, elapsed_t timestamp, const uint8_t *pkt, int len) {
    /**
     ...
     */
    uint8_t qp = 14;  /// per section 6.5.19 Q Point = 14
    // https://en.wikipedia.org/wiki/Q_(number_format)
    float scale = pow(2.0, -qp);

    self->fquat[0] = mp_obj_new_float(READ_LE(int16_t, &pkt[4]) * scale);  // i
    self->fquat[1] = mp_obj_new_float(READ_LE(int16_t, &pkt[6]) * scale);  // j
    self->fquat[2] = mp_obj_new_float(READ_LE(int16_t, &pkt[8]) * scale);  // k
    self->fquat[3] = mp_obj_new_float(READ_LE(int16_t, &pkt[10]) * scale);  // real
}

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was curious whether making that change would improve performance so I tried it in this PR here as well. Haven't been able to test longer-term performance (which I assume is where we would see big changes), but the short term sampling rate is pretty much the same (and likely limited by the Python layer regardless). #11

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants