Atomics.wait Visibility Bug

Cross-engine happens-before ordering failure with 3+ workers
Discovered during SpawnDev.ILGPU v4.6.0 development

Environment

Cross-Origin Isolated checking...
SharedArrayBuffer checking...
Hardware Concurrency checking...
Browser Engine checking...

The Bug

When 3 or more Web Workers synchronize using a generation-counting barrier with Atomics.wait / Atomics.notify, workers whose Atomics.wait returns "not-equal" (because the generation already changed) do not see prior stores from all other workers.

With 2 workers, wait/notify works correctly. With 3+ workers, ~66% of cross-worker reads are stale — exactly 2/3, because each worker reads 2 other workers' slots and sees neither. The synchronization edge flows from the last arriver (who calls notify) to woken waiters, but not transitively to workers that observe the generation change via the "not-equal" fast path.

This is not engine-specific. Confirmed on V8 (Chrome 146, Chrome Canary 148, Node.js 22), SpiderMonkey (Firefox 148), all on Windows 11 / AMD Ryzen 5 7500F. Two independent JS engines exhibit nearly identical ~63-66% failure rates, pointing to a spec-level issue in the ECMAScript memory model rather than an implementation bug in any single engine.

Replacing Atomics.wait with a pure Atomics.load spin loop eliminates the issue completely on all tested engines. Every Atomics.load is seq_cst, so when a worker observes the new generation, all prior stores from all threads are guaranteed visible.

Barrier Protocol & Race Condition Diagram

The generation-counting barrier: each worker arrives, the last arriver bumps the generation and notifies. The bug is in the "not-equal" return path.

Timeline with 3 Workers: Worker A Worker B Worker C ────────── ────────── ────────── data[0] = 42 data[1] = 99 data[2] = 7 // non-atomic writes │ │ │ add(arrival, 1) add(arrival, 1) add(arrival, 1) // arrive at barrier arrived = 1 arrived = 2 arrived = 3 (LAST) │ │ │ wait(gen, 0) wait(gen, 0) store(arrival, 0) ...sleeping... │ but gen is add(gen, 1) // gen = 1 │ │ already 1! notify(gen) │ │ │ woken by notify wait returns (past barrier) sees gen = 1 "not-equal" │ │ │ read data[2] = 7 OK read data[0] = ??? read data[1] = 99 OK read data[2] = ??? Worker B's Atomics.wait returned "not-equal" because gen was already bumped. V8 does NOT enforce happens-before in this case — stale reads result.
Barrier Algorithm (JavaScript)
// Generation-counting barrier — the standard pattern
function barrier(view, arrivalIdx, genIdx, workerCount) {
    const myGen = Atomics.load(view, genIdx);
    const arrived = Atomics.add(view, arrivalIdx, 1) + 1;

    if (arrived === workerCount) {
        // Last arriver: reset counter, bump generation, notify
        Atomics.store(view, arrivalIdx, 0);
        Atomics.add(view, genIdx, 1);
        Atomics.notify(view, genIdx);
    } else {
        // Wait for generation to change
        while (Atomics.load(view, genIdx) === myGen) {
            Atomics.wait(view, genIdx, myGen);
            //         ^^^ BUG: "not-equal" return does NOT provide
            //             happens-before for third-party stores
        }
    }
}

// Workaround: pure spin barrier
function barrierSpin(view, arrivalIdx, genIdx, workerCount) {
    const myGen = Atomics.load(view, genIdx);
    const arrived = Atomics.add(view, arrivalIdx, 1) + 1;

    if (arrived === workerCount) {
        Atomics.store(view, arrivalIdx, 0);
        Atomics.add(view, genIdx, 1);
    } else {
        while (Atomics.load(view, genIdx) === myGen) {
            // Pure spin — every Atomics.load is seq_cst
            // When we see the new gen, ALL prior stores are visible
        }
    }
}
ECMAScript Spec Reference

The ECMAScript specification defines Atomics.wait in Section 25.4.12 (ES2024). The agent enters the WaiterList critical section, then compares the value. If it differs, the function returns "not-equal" without suspending.

The Memory Model (Section 29) defines synchronization via Synchronize events. Atomics.notify synchronizes with agents it wakes. The question is: does the critical section entry in Atomics.wait (even on the "not-equal" fast path) create a synchronization edge to all prior writes from all agents?

The WebAssembly Threads proposal specifies memory.atomic.wait32 to perform an ARDSEQCST (atomic read with seq_cst ordering) as its first step. This should establish ordering regardless of the return path. If V8's "not-equal" fast path skips the full fence, that is a conformance issue.