Skip to content

fix(lifecycle): prevent hello rate-limit death spiral on secret-rotation failure#465

Open
linboxin wants to merge 1 commit intoEvoMap:mainfrom
linboxin:fix/hello-rate-limit-death-spiral
Open

fix(lifecycle): prevent hello rate-limit death spiral on secret-rotation failure#465
linboxin wants to merge 1 commit intoEvoMap:mainfrom
linboxin:fix/hello-rate-limit-death-spiral

Conversation

@linboxin
Copy link
Copy Markdown

Summary

Fixes the node_secret_invalid → rate-limit death spiral still present in v1.69.12+. Three client-side bugs in src/proxy/lifecycle/manager.js caused hello() to be called in an unbounded loop until the 60/hour rate limit was exhausted.

What changed

  • hello() now checks !res.ok before parsing the response body. A 429 sets _helloRateLimitUntil (honouring Retry-After, defaulting to 3600s) and returns { ok: false, error: 'hello_rate_limited' }. Subsequent calls within the window are suppressed before any network I/O.
  • reAuthenticate() now breaks (not continues) when hub returns ok but no node_secret — retrying a hub-side rotation failure only burns hello quota.
  • reAuthenticate() breaks immediately on rate-limit errors from hello().
  • On exhausting all attempts, _reauthBackoffUntil is set for 30 minutes, blocking re-entry from heartbeat ticks and proxy HTTP callers.

How to test

  1. Run node index.js — confirm no errors on startup
  2. Simulate hub returning 403 on heartbeat + 200 OK with no secret on hello: hello() should fire once, then backoff for 1800s
  3. Simulate hub returning 429 on hello: _helloRateLimitUntil should be set, subsequent calls suppressed

Risk

Low — only affects the re-auth failure path; normal heartbeat flow is unchanged.

Related

Closes #464

…ion failure

Fixes the regression described in EvoMap#349 still present in v1.69.x.

Three client-side bugs combined to exhaust the 60 hello/hour rate limit:

1. hello() did not check res.ok before processing the response body.
   A 429 reply fell through and returned { ok: true } with no secret,
   making reAuthenticate() believe the call succeeded.

2. reAuthenticate() used `continue` when hello returned ok but no
   node_secret.  Because of bug (1) this path was hit on every 429,
   so all MAX_REAUTH_ATTEMPTS were consumed with no back-off.

3. After exhausting attempts reAuthenticate() returned false with no
   cooldown, so the next heartbeat tick re-entered it immediately.

Fix:
- hello() now checks !res.ok first; 429 sets _helloRateLimitUntil
  (honouring Retry-After, defaulting to 3600 s) and returns
  { ok: false, error: 'hello_rate_limited' }.  Subsequent calls within
  the window are suppressed before any network I/O.
- reAuthenticate() breaks (not continues) when hub returns ok but no
  secret — retrying won't fix a hub-side rotation failure.
- reAuthenticate() also breaks immediately on hello_rate_limited /
  hello_rate_limit_active errors.
- On exhausting all attempts, _reauthBackoffUntil is set for 30 min,
  blocking re-entry from heartbeat ticks or proxy HTTP callers.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

node_secret_invalid → rate limit death spiral still present in v1.69.12

1 participant