Skip to content

Delay auto-tuning v2#2337

Open
stachuman wants to merge 16 commits intomeshcore-dev:devfrom
stachuman:delay-tuning-v2
Open

Delay auto-tuning v2#2337
stachuman wants to merge 16 commits intomeshcore-dev:devfrom
stachuman:delay-tuning-v2

Conversation

@stachuman
Copy link
Copy Markdown

@stachuman stachuman commented Apr 19, 2026

Improving ACK delivery and number of ACK received - theoretically ensuring that if DM is delivered - at least one ACK is received by sender (we speak of probability!)

This base on an extensive simulations (over 100k simulations done - dedicated simulator built (hopefully can be used also for other purposes)) - all details and road to this PR can be traced in this discussion #2053

Proposal base on theoretical work of KPrivitt and my simulations (all source data used for simulations are available on my github page)

--

There are some differences comparing to the previous PR

  1. change of variable to: auto.tune.delays (to make it more consistent)
    set auto.tune.delays on/off
  2. wired recalc of parameters into onAdvertRecv - so no periodic recalc done, instead - automatic recalc called
  3. auto.tune.delays by default is off

Measured performance at defaults (same 4 topologies, 6 rnd seeds each):

Density Msg Delivery Channel Delivery ACK Delivery Avg ACK Copies
sparse 67.0% 57.5% 36.0% 0.60
medium 62.7% 78.8% 33.7% 0.70
dense 65.0% 64.5% 24.3% 0.50
very_dense 64.3% 43.0% 13.0% 0.20
Mean 64.8% 60.9% 26.8% 0.50

Proposed changes - autotune tx/direct tx delays:

Density Msg Delivery Δ ACK Delivery Δ ACK Copies Δ
sparse −2.7pp +11.0pp +0.80
medium −6.3pp +1.6pp +1.00
dense −6.0pp +13.7pp +1.30
very_dense −9.3pp +17.0pp +1.40
Mean −6.1pp +10.9pp +1.13

liamcottle and others added 15 commits March 24, 2026 15:38
…or changes

- Add multi byte FAQ
- Reword amped radio output setting numbers
- Clarify repeater ID collision including distance, supercede meshcore-dev#1478
- Reference awesome meshcore for community projects. Supercede meshcore-dev#1893
Removed "see note" from RAK 4631 entry in FAQ.
Fixed an extra TOC jump link inserted by VSCode Markdown All in One VS Code extension.
fixed typos and refined multibyte sections.
add multibyte FAQ, reference awesome-meshcore community projects, minor changes
Update RAK 4631 entry in FAQ on new bootloader - removed "see note"
# Conflicts:
#	docs/faq.md
… - improving ACK delivery and number of ACK received - theoretically ensuring that if DM is delivered - at least one ACK is received by sender.
@KPrivitt
Copy link
Copy Markdown
Contributor

KPrivitt commented Apr 19, 2026

In the prior PR the frequency of the neighbor count was every 5 min. This is far too frequent and is consuming compute resources that can be utilized elsewhere.

While the SNR of a received Advert can vary several dB from message to message (pings can change on every one sent) and this can affect the neighbor count (for repeaters close to the 0dB SNR threshold), but the surrounding number of repeaters actually changes very slowly. I believe the count should be done daily, twice a week or once a week.

@1nerdherder
Copy link
Copy Markdown

1nerdherder commented Apr 20, 2026

I've been watching this work across the two pull request conversations. This was a heavy lift, deserving of strong consideration amongst the devs. At a minimum, the existing defaults are not optimal. The power of the autotune algorithm approach is that it makes all repeaters "good neighbors" who will adapt their settings in harmony as the mesh evolves.

@stachuman
Copy link
Copy Markdown
Author

In the prior PR the frequency of the neighbor count was every 5 min. This is far too frequent and is consuming compute resources that can be utilized elsewhere.

While the SNR of a received Advert can vary several dB from message to message (pings can change on every one sent) and this can affect the neighbor count (for repeaters close to the 0dB SNR threshold), but the surrounding number of repeaters actually changes very slowly. I believe the count should be done daily, twice a week or once a week.

Correct- on one hand calculating every couple of minutes is not a big burden, yet for the sake of clean code I have moved that to advert recp. Code.

@terminalvelocity23
Copy link
Copy Markdown
Contributor

terminalvelocity23 commented Apr 21, 2026

Hello, I've tested your PR in our mesh, which is very dense and there's a lot of in-band noise. The algorithm has set the delays so high that the repeater effectively stopped functioning.
Also, it didn't return the delays to their original values after disabling.

изображение

@stachuman
Copy link
Copy Markdown
Author

Hello, I've tested your PR in our mesh, which is very dense and there's a lot of in-band noise. The algorithm has set the delays so high that the repeater effectively stopped functioning. Also, it didn't return the delays to their original values after disabling.

--
Can you please elaborate on 'stopped functioning'? Was it one repeater with changed firmware or more? Creating wider network? Also - what do you mean - 'effectively stopped functioning'? There are delays - to limit number of collisions, but transmission is done.

For the last point - very valid point, let me update that.

@terminalvelocity23
Copy link
Copy Markdown
Contributor

Can you please elaborate on 'stopped functioning'? Was it one repeater with changed firmware or more? Creating wider network? Also - what do you mean - 'effectively stopped functioning'? There are delays - to limit number of collisions, but transmission is done.

It was one repeater to test this feature. The delays were set so high it effectively stopped relaying packets, everything but its admin interface was handled by other repeaters around. It stopped showing up in outbound and inbound paths.

@stachuman
Copy link
Copy Markdown
Author

stachuman commented Apr 21, 2026

Can you please elaborate on 'stopped functioning'? Was it one repeater with changed firmware or more? Creating wider network? Also - what do you mean - 'effectively stopped functioning'? There are delays - to limit number of collisions, but transmission is done.

It was one repeater to test this feature. The delays were set so high it effectively stopped relaying packets, everything but its admin interface was handled by other repeaters around. It stopped showing up in outbound and inbound paths.

In fact - this is not bad thing what you observed, it’s in fact desired effect. The purpose of mesh network is NOT to ensure that every repeater is transmitting but to ensure effectivenes of the overall network.
To be precise - in a dense network it is NOT recommended that ‚all repeaters’ transmit within the same time window - as this only increase probability of collision - failed transmission.

Not to go into details - ‚all the repeaters in the area carried on the transmission but your one was silent’ - if then - due to collisions - ‚all the other traffic would fails’ - your repeater will retransmit with a delay - giving the chance to deliver message, and opposite - if ‚all the other traffic’ will deliver, your one won’t be even required (it will kind of reduce density of network - what is a recommended thing).

And this is the purpose of PR - to increase overall probability of delivery (ACK) - it is NOT to increase single repeater number of transmissions. Effectiveness is not coming here from how quick re-transmission is done - but is coming from probability of evading collisions with other repeaters.

Hope- I’m clear in my explanation.

@stachuman
Copy link
Copy Markdown
Author

stachuman commented Apr 21, 2026

Here are theoretical results with 2 scenarios - 1. We address the busiest routers in an organized way, 2. We address randomly routers with auto delay function.
(0% - we use only default firmware, 30% - means - we use 30% of repeaters in auto-delay optimization mode)

Degree Strategy (upgrade busiest nodes first)

% Optimized N nodes Delivery std ACK Channel F_del P_del F_ack P_ack col/lost ack/del
0% 0 61.7% 7.1 22% 62% 58% 48% 15% 44% 32.8 0.4
10% 14 62.0% 3.3 23% 73% 59% 43% 17% 38% 42.3 0.5
30% 43 57.7% 3.4 25% 79% 44% 36% 18% 36% 22.8 0.7
50% 72 55.0% 9.9 30% 75% 34% 32% 29% 34% 32.1 0.8
75% 107 52.3% 3.9 30% 84% 25% 24% 29% 31% 31.8 1.1
100% 143 50.7% 6.5 27% 72% 21% 22% 28% 26% 43.3 1.3

Random Strategy (uncoordinated rollout - random repeaters uses auto-delay)

% Optimized N nodes Delivery std ACK Channel F_del P_del F_ack P_ack col/lost ack/del
0% 0 61.7% 7.1 22% 62% 58% 48% 15% 44% 32.8 0.4
10% 14 58.3% 8.0 27% 74% 51% 46% 21% 38% 34.3 0.6
30% 43 56.0% 4.2 29% 80% 43% 33% 25% 35% 38.4 0.8
50% 72 51.3% 7.8 22% 79% 42% 23% 23% 21% 26.7 0.8
75% 107 46.3% 5.7 26% 79% 30% 22% 30% 26% 33.9 1.2
100% 143 50.7% 6.5 27% 72% 21% 22% 28% 26% 43.3 1.3

Radio Efficiency (collision-free RX ratio)

% Optimized Degree radio_eff Degree ackpath_eff Random radio_eff Random ackpath_eff
0% 62.1% 59.2% 62.1% 59.2%
10% 65.3% 63.6% 65.1% 62.9%
30% 71.4% 69.9% 69.5% 68.1%
50% 74.1% 72.2% 73.6% 71.9%
75% 77.1% 77.1% 77.2% 76.8%
100% 76.8% 76.5% 76.8% 76.5%

1. Mixed firmware outperforms full optimization on ACK delivery

The best ACK rate (30%) occurs at degree 50-75%, not at 100% (27%) - as it comes from the scenario 1 - it means - addressing the most busiest repeaters
My interpretation:

  • Optimized nodes reduce collision pressure in dense clusters
  • Default nodes relay faster, creating alternative paths and timing diversity
  • The combination produces more successful ACK round-trips than either firmware alone

Note - around 75% we are reaching 1 ack delivered - so that's another reason to see that ass a sweet-spot setting.

2. Channel (broadcast) delivery peaks at degree 75%

Channel delivery reaches 84% at degree 75% — a +22pp improvement over the 0% baseline (62%) and +12pp over full optimization (72%).

3. Degree strategy is consistently better than random

Metric (at 75%) Degree Random Delta
Delivery 52.3% 46.3% +6.0pp
ACK 30% 26% +4pp
Channel 84% 79% +5pp
Std deviation 3.9% 5.7% more stable

@1nerdherder
Copy link
Copy Markdown

Sorry for the late comment:
The earlier analysis showed the existing defaults to be flawed and actually making things worse, so should we not be selecting a new “default” rather than reuse the known defective one?

@terminalvelocity23
Copy link
Copy Markdown
Contributor

terminalvelocity23 commented Apr 22, 2026

I'd say it's still too agressive. I've switched auto-tuning on two my repeaters pointed no the north and south of the high-rise I'm in, and dropped tx power on the companion, so nobody but them will hear it.
After a few hours the delays have settled at 12.8 for flood and 38 and 40 for direct. I mean yeah, it makes for collision avoidance, but considering the fact that the max message length is 150/2 minus whatever bytes your name requires if you use non-English alphabet, conveying any complex thought requires a few messages in succession. And the delay of up to 40 seconds breaks the sequence.

@stachuman
Copy link
Copy Markdown
Author

I'd say it's still too agressive. I've switched auto-tuning on two my repeaters pointed no the north and south of the high-rise I'm in, and dropped tx power on the companion, so nobody but them will hear it. After a few hours the delays have settled at 12.8 for flood and 38 and 40 for direct. I mean yeah, it makes for collision avoidance, but considering the fact that the max message length is 150/2 minus whatever bytes your name requires if you use non-English alphabet, conveying any complex thought requires a few messages in succession. And the delay of up to 40 seconds breaks the sequence.

Well... everything base on probability, not on feelings. However - I admit - scenario where sequences of messages is sent was not tested.
Would you like to propose a scenario?

@terminalvelocity23
Copy link
Copy Markdown
Contributor

@stachuman Idk about a scenario, but maybe capping the delays at maybe 20s max isn't a bad idea.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

8 participants