Skip to content

test_runner: infinite loop in FileTest#drainRawBuffer when child stdout contains FF 0F followed by large size bytes #62693

@lslv1243

Description

@lslv1243

Version

v25.9.0 (also reproduced on v24.14.1 LTS, v22.21.0 LTS, v23.4.0)

Platform

Darwin 25.3.0 Darwin Kernel Version 25.3.0: Wed Jan 28 20:53:05 PST 2026; root:xnu-12377.81.4~5/RELEASE_ARM64_T6020 arm64

Bug is architectural (not OS-specific) — it's in the JS code of lib/internal/test_runner/runner.js.

Subsystem

test_runner

What steps will reproduce the bug?

Save as repro.mjs:

import { test } from 'node:test';

test('hang', () => {
  // v8Header is [0xFF, 0x0F]. The next 4 bytes are parsed as a
  // big-endian "full message size". 0x7FFFFFFF is much larger than
  // any rawBufferSize this child will ever reach.
  process.stdout.write(Buffer.from([0xff, 0x0f, 0x7f, 0xff, 0xff, 0xff]));
});

Run:

node --test --test-force-exit repro.mjs

The process hangs at 100 % CPU forever. Reproduces with and without --test-force-exit. --test-timeout cannot recover the process because the main thread is 100 % in JS microtasks — the event loop is starved. SIGTERM is ignored; only SIGKILL stops it.

How often does it reproduce? Is there a required condition?

100 % deterministic with those exact 6 bytes on v22.21.0, v23.4.0, v24.14.1, and v25.9.0. Required condition: the child's stdout must contain the sequence 0xFF 0x0F followed by 4 bytes that, read as a big-endian uint32, exceed the parser's accumulated #rawBufferSize. Any child stdout containing FF 0F near the start followed by arbitrary bytes is vulnerable — I first hit this in the wild via a test that wrote random binary output and intermittently hung roughly one run in five.

What is the expected behavior? Why is that the expected behavior?

Either the parser treats unrecognized/oversized frames as normal stdout and keeps draining, or #drainRawBuffer detects no-progress iterations and breaks. Test runner output framing must be robust against arbitrary child stdout content; user tests cannot reasonably be required to avoid a particular byte sequence.

What do you see instead?

Infinite loop in the parent test-runner process while handling the worker's OnExit callback. sample/lldb show:

uv_run → uv__wait_children → ProcessWrap::OnExit → MakeCallback
 → MicrotaskQueue::PerformCheckpointInternal → RunMicrotasks
 → PromiseFulfillReactionJob → AsyncFunctionAwaitResolveClosure
 → [interpreted frames] → Builtins_CreateTypedArray
 → Runtime_AllocateInYoungGeneration → Heap::Scavenge

~66 % of main-thread samples in Heap::Scavenge, ~670 FastBuffer/TypedArray allocations per second, Buffer::IndexOfBuffer also hot. Memory is flat (~1 GB) — pure alloc/GC churn, not a leak.

Additional information

Root causelib/internal/test_runner/runner.js on main:

#drainRawBuffer() {
  while (this.#rawBuffer.length > 0) {          // L366 — no no-progress guard
    this.#processRawBuffer();
  }
}

#processRawBuffer() {
  let bufferHead = this.#rawBuffer[0];
  let headerIndex = bufferHead.indexOf(v8Header);
  let nonSerialized = new FastBuffer();

  while (bufferHead && headerIndex !== 0) {     // L376 — skipped when headerIndex === 0
    
  }

  while (bufferHead?.length >= kSerializedSizeHeader) {   // L399
    const fullMessageSize = (
      bufferHead[kV8HeaderLength]     << 24 |
      bufferHead[kV8HeaderLength + 1] << 16 |
      bufferHead[kV8HeaderLength + 2] << 8  |
      bufferHead[kV8HeaderLength + 3]
    ) + kSerializedSizeHeader;

    if (this.#rawBufferSize < fullMessageSize) break;     // L409 — breaks without mutating #rawBuffer
    
  }
}

When the buffer starts with FF 0F, the first loop is skipped (headerIndex === 0). When the following 4 bytes form a size larger than #rawBufferSize, the second loop breaks on its first check. #processRawBuffer returns without shrinking #rawBuffer or #rawBufferSize, and #drainRawBuffer's while condition is still true — so it re-enters #processRawBuffer, which again allocates a FastBuffer, again calls bufferHead.indexOf(v8Header), again breaks on the same check, forever. #drainRawBuffer is reached via drain() → report() when the subtest file exits, so the hang manifests after the test body finishes, on the parent's ProcessWrap::OnExit path.

The framing uses only a 2-byte magic (v8.Serializer header FF 0F), which is short enough to collide with arbitrary user stdout (random/binary bytes, compressed data, protobuf frames, etc.).

Suggested fixes (in order of increasing invasiveness):

  1. In #drainRawBuffer, track #rawBufferSize (and #rawBuffer.length) before/after each #processRawBuffer call and break if neither changed — guarantees termination regardless of content. Cheapest fix.
  2. At drain time (stream closed), if fullMessageSize > #rawBufferSize in #processRawBuffer's second loop, treat the leading bytes as corrupted framing and flush them as stdout instead of waiting for more data that will never arrive.
  3. (Long term, breaking protocol change) Use a longer framing magic or an explicit length prefix before v8Header so accidental collisions become astronomically unlikely.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions