Skip to content

fix(sensors): improve sensor initialization and handling to prevent hangs and handle growth#2327

Open
NickDunklee wants to merge 1 commit intomeshcore-dev:devfrom
NickDunklee:fix-environment-sensor-refactor
Open

fix(sensors): improve sensor initialization and handling to prevent hangs and handle growth#2327
NickDunklee wants to merge 1 commit intomeshcore-dev:devfrom
NickDunklee:fix-environment-sensor-refactor

Conversation

@NickDunklee
Copy link
Copy Markdown

This is a medium-ish refactor to attempt to clean up sensor handling logic both for board stability and future potential growth before the code becomes all spaghetti and meatballs.

I'd be curious to see if anyone running sensors out there that knows how to build and flash MeshCore code could give this a try and see how it behaves. It is working fine on my end on multiple nodes.

PR notes are gigantic because it is a fundamental behavior repair for sensors, so I wanted to over-explain. Also, if it hadn't been mentioned previously, push-back is always welcome. I'm just spending my time trying to clean up / fix / enhance this corner of the firmware, and want to contribute my improvements back to the project.

Problem:

Current MeshCore code makes no attempt to see what sensors are actually available on the I2C bus at startup and blindly tries to interact with sensors. This has some very bad side-effects, like if a sensor that is unsupported, or has a weird initialization process, the MeshCore node will just hang at boot and never successfully start up and ostensibly looks bricked, or the INA226 and SHT4X both sharing the same address and the code just silently fighting.

The current implementation also gloms sensor readouts from the MCU and environment sensors onto the same telemetry channel, with some arbitrary exceptions for incrementing channels based on certain behavioral situations. The MCU temperature and external temperature sensors would appear on channel 1, and it wouldn't be possible to tell which sensor the temperature value was coming from.

Per CayenneLPP: Data Channel: Uniquely identifies each sensor in the device across frames, eg. “indoor sensor” So this channel division implementation falls inline with what CayenneLPP intended. There are up to 256 channels available. So I tried to model this change in that behavioral style.

Proposed Improvement:

This implementation scans the I2C bus for what devices are present, sets each sensor to its own CayenneLPP channel, and keeps MCU telemetry on channel 1 only. So Channel 1 is always "self" and no confusion can result.

Details:

  • Channel 1 is always the MCU and things about it, so you always know that telemetry is from the board itself. Exception is GPS, GPS stays on channel 1 as well since it is "about the board" even though it's a bit gray-area as GPS can often be a secondary chip.
  • Each sensor board is allocated to a dedicated CayenneLPP channel, so if you are reading from that channel, you know the data is from that sensor only. (Sensors emitting more than one of the same type of measurement are exceptions.)
  • scanI2CBus() probes addresses 0x08–0x77 with raw beginTransmission/endTransmission. No sensor library is touched until after this completes. This will prevent sensor-based boot hangs, unknown or unresponsive devices never reach a library init call.
  • Created SENSOR_TABLE a compile-time array that is gated by the existing ENV_INCLUDE_* macros. A sentinel { 0, nullptr, nullptr, nullptr } at the end keeps the array non-empty regardless of which sensors are enabled, avoiding zero-length array warnings.
  • When begin() is called, scan first, then loop: skip if address not detected, skip if init() returns 0, otherwise register one ActiveSensor entry per sub-channel.
  • querySensors() I replaced the entire #ifdef chain with a 3-line loop.
  • T1000-E has its own T1000SensorManager, so it should be completely unaffected by this change.
  • SHT4X quirky initialization behavior is retained.
  • MLX90614 - git commits around this didn't have any notes as to why it is reporting ambient temperature on a separate channel as well as the object temperature, as the ambient temperature is used internally to compute the object temperature and not really needed for the sensor's purpose - just the same, kept the existing behavior of reporting the ambient temperature one channel above the channel assigned to the sensor
  • All bool *_initialized fields are gone, replaced with ActiveSensor _active_sensors[16] (query function pointer and sub-channel index) and _active_sensor_count. SensorDef lives entirely in the .cpp so the header has no dependency on it.
  • Details on the INA226 and SHT4X: both default to address 0x44, the old code had a bug and would have both begin() calls fire and they would just fight each other silently. In the new code, the respective sensor code is only called if the device is actually present, however, if both were present simultaneously, SHT4X comes first in the table and would win, and INA226 would return false and be skipped. The INA226 has 16 possible addresses that are configurable in the hardware itself, so in a potential scenario where both sensors would be present, the person implementing that design could take that into account.
  • BME680 gas resistance will now transmit on the same channel as the rest of BME680 telemetry which is inline with CayenneLPP standards. Coupling this PR with fix(bme680): fix gas resistance variable to produce usable data #2146 streamline the whole sensor telemetry, and with fix(bme680): fixed bme680 loss of precision int math error in pressure and altitude measurements #2149 will overall improve BME680 handling. The gas resistance sensor actually has a binary library to make it more useful, calibration, accounting for age of sensor, and other improvements, but since that adds more flash consumption, I have omitted that in PRs thus far.
  • RAK12035 and other current upstream dev branch changes integrated.

…angs and handle growth

This is a medium-ish refactor to attempt to clean up sensor handling logic both for board stability and future potential growth before the code becomes all spaghetti and meatballs.

I'd be curious to see if anyone running sensors out there that knows how to build and flash MeshCore code could give this a try and see how it behaves. It is working fine on my end on multiple nodes.

PR notes are gigantic because it is a fundamental behavior repair for sensors, so I wanted to over-explain. Also, if it hadn't been mentioned previously, push-back is always welcome. I'm just spending my time trying to clean up / fix / enhance this corner of the firmware, and want to contribute my improvements back to the project.

**Problem:**

Current MeshCore code makes no attempt to see what sensors are actually available on the I2C bus at startup and blindly tries to interact with sensors. This has some very bad side-effects, like if a sensor that is unsupported, or has a weird initialization process, the MeshCore node will just hang at boot and never successfully start up and ostensibly looks bricked, or the INA226 and SHT4X both sharing the same address and the code just silently fighting.

The current implementation also gloms sensor readouts from the MCU and environment sensors onto the same telemetry channel, with some arbitrary exceptions for incrementing channels based on certain behavioral situations. The MCU temperature and external temperature sensors would appear on channel 1, and it wouldn't be possible to tell which sensor the temperature value was coming from.

Per [CayenneLPP](https://github.com/myDevicesIoT/CayenneLPP): *Data Channel: Uniquely identifies each sensor in the device across frames, eg. “indoor sensor”* So this channel division implementation falls inline with what CayenneLPP intended. There are up to 256 channels available. So I tried to model this change in that behavioral style.

**Proposed Improvement:**

This implementation scans the I2C bus for what devices are present, sets each sensor to its own CayenneLPP channel, and keeps MCU telemetry on channel 1 only. So Channel 1 is always "self" and no confusion can result.

Details:
  - Channel 1 is always the MCU and things about it, so you always know that telemetry is from the board itself. Exception is GPS, GPS stays on channel 1 as well since it is "about the board" even though it's a bit gray-area as GPS can often be a secondary chip.
  - Each sensor board is allocated to a dedicated CayenneLPP channel, so if you are reading from that channel, you know the data is from that sensor only. (Sensors emitting more than one of the same type of measurement are exceptions.)
  - `scanI2CBus()` probes addresses 0x08–0x77 with raw `beginTransmission`/`endTransmission`. No sensor library is touched until after this completes. This will prevent sensor-based boot hangs, unknown or unresponsive devices never reach a library init call.
  - Created `SENSOR_TABLE` a compile-time array that is gated by the existing `ENV_INCLUDE_*` macros. A sentinel `{ 0, nullptr, nullptr, nullptr }` at the end keeps the array non-empty regardless of which sensors are enabled, avoiding zero-length array warnings.
  - When `begin()` is called, scan first, then loop: skip if address not detected, skip if `init()` returns 0, otherwise register one ActiveSensor entry per sub-channel.
  - `querySensors()` I replaced the entire #ifdef chain with a 3-line loop.
  - T1000-E has its own T1000SensorManager, so it should be completely unaffected by this change.
  - SHT4X quirky initialization behavior is retained.
  - MLX90614  - git commits around this didn't have any notes as to why it is reporting ambient temperature on a separate channel as well as the object temperature, as the ambient temperature is used internally to compute the object temperature and not really needed for the sensor's purpose - just the same, kept the existing behavior of reporting the ambient temperature one channel above the channel assigned to the sensor
  - All `bool *_initialized` fields are gone, replaced with `ActiveSensor _active_sensors[16]` (query function pointer and sub-channel index) and `_active_sensor_count. SensorDef` lives entirely in the `.cpp` so the header has no dependency on it.
  - Details on the INA226 and SHT4X: both default to address 0x44, the old code had a bug and would have both begin() calls fire and they would just fight each other silently. In the new code, the respective sensor code is only called if the device is actually present, however, if both were present simultaneously, SHT4X comes first in the table and would win, and INA226 would return false and be skipped. The INA226 has 16 possible addresses that are configurable in the hardware itself, so in a potential scenario where both sensors would be present, the person implementing that design could take that into account.
  - BME680 gas resistance will now transmit on the same channel as the rest of BME680 telemetry which is inline with CayenneLPP standards. Coupling this PR with meshcore-dev#2146 streamline the whole sensor telemetry, and with meshcore-dev#2149 will overall improve BME680 handling. The gas resistance sensor actually has a binary library to make it more useful, calibration, accounting for age of sensor, and other improvements, but since that adds more flash consumption, I have omitted that in PRs thus far.
 - RAK12035 and other current upstream dev branch changes integrated.
@NickDunklee
Copy link
Copy Markdown
Author

Examples from some nodes:
Node 1:
node1-2
node1-1

Node 2:
node2-2
node2-1

@KPrivitt
Copy link
Copy Markdown
Contributor

This is better, but with multiple sensors attached, it would be nice to display the sensor name so as you scroll down the possible 256 channels you can find the one you are looking for (what if it is 233, how do you know it's not 232 which looks identical) If this isn't possible this is not really making it better from a user perspective.

From my understanding CayenneLPP doesn't provide a simple text string output. I couldn't find one. I've been told CayenneLPP is not part of the MeshCore source code and is not modifiable here (done through BlueFruit??)

A secondary "issue" is sensor power management. On RAK boards most sensors are on a common power plane, by knowing what sensors are present and having each sensor have its own initialization the power plane could be controlled to save power and brought up/down as needed. This would be a significant improvement.

As you noted Sensor management is currently a mess.

Last note, not all sensors are on the I2C, some are on the serial interface and there is more than one I2C interface (wire and/or wire1, depending on the board used)

Good luck in getting this past the devs...

@NickDunklee
Copy link
Copy Markdown
Author

I had a thought to map sensor classes by channel banks to at least filter via inference, like assign channels 2-5 for environment, 6-9 to electrical, and such, so at least you would know if a sensor is in that range you could infer it is probably the one you want. It seems CayenneLPP doesn't have a "standard" for channel banks, though, so I figured going simple/generic for now would at least be a nudge in a more organized direction.

Not trying to step on toes either, just trying to kick code I'm working on back upstream in case it is useful to the mesh in general, and trying to adhere to a "simple, not too big" change format, although this one obviously became larger. I want to maintain optimum small code and ideally minimal power consumption too, while still trying to adhere to some design best practices. I have too much time to spend on this as I've been laid off for a bit, so I'm sure my active PRs will reduce to a trickle if I can fix that problem sooner than later, haha.

Since I already had issues with bringing a previous sensor online due to how sensor initialization happens, I put some effort into plumbing that out in the same PR, as it kills both birds, and makes future sensor development easier, as when deploying test code, you get a usable core instead of what seems like bricking the whole board while trying to bring new sensors online. Seems good for new users too, if a new sensor is attached and the system doesn't support it, it should detect it, and ideally not have any code to execute and go on its merry, instead the board again appearing bricked.

I have downstream software ingesting the telemetry, which is what inspired the reorganization when I realized there is no way to tell temperatures (or anything, if duplicates) apart. One can have an environment sensor removed from the mainboard, exposed to the elements reading things, and I know that data is not coming from the mainboard's sensors. Proximity to the boards can throw off readings, like a RAK is charging its battery, suddenly even the "external" temp sensor shows a jump in temperature, (VOC too) so I'm trying to separate out the telemetry so a sensor could be moved away from the board via ribbon cable and not be affected by what's going on in the board.

Thanks for the notes around sensor wire and power planes too, reading up the RAK spec, I noticed there's some sleep interval that can be employed, that would probably be an area for further investigation! Cool to see your soil moisture sensor add-on too! I was hoping to dig into some other env sensors next.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants