Bun has an open PR adding shared-memory threads to JavaScriptCore

Source: github.com
143 points by gr4vityWall a day ago on hackernews | 302 comments

added 21 commits

June 5, 2026 08:09
Design specs for shared-heap Thread support in JSC: heap server and
per-thread allocators, shared VM state, TID/SW-tagged and segmented
butterflies, JIT tiers under N mutators, and the Thread/Lock/Condition/
ThreadLocal API. Includes TSAN, race-amplifier, and bench-gate docs plus
the design overview in THREAD.md.
…harness

Thread/Lock/Condition/ThreadLocal and Atomics-on-properties behind
--useThreads, serialized by the VM's JSLock as a semantic oracle for the
upcoming shared-heap implementation. Includes a 39-test corpus, a TSAN
no-JIT build target (zero data races at idle, empty suppressions), a
randomized-yield race amplifier, and a serial-perf bench gate with a
recorded baseline. Also fixes ICU static-archive link order in WTF and
two pre-existing no-JIT build breaks.
…rent object model, JIT support, and Thread API

Multi-mutator heap with per-thread allocators and N-thread safepoints,
process-global sharded atom table and StructureID allocation locking,
per-thread VM-lite execution state, TID/shared-write tagged butterflies
with segmented fallback and TTL watchpoint elision, per-tier TID/SW
checks with handler ICs in FTL and epoch-based CodeBlock reclamation,
and real mutator threads behind the Thread/Lock/Condition/ThreadLocal
API with Atomics on object properties. All behind --useJSThreads with
the GIL retained as a --useThreadGIL fallback layer.
…ation, waiter lists

Six rounds of gate-driven fixes against the threads corpus: per-thread
CLoop stacks replacing the shared-stack frame clobber, LocalAllocator
and Heap shared-mode races, retired JIT artifact accounting, waiter
list and condition wakeup fixes, LLInt call path initialization for
spawned threads, butterfly regime dispatch in slow paths, and a
watchpoint disarm before the flag-on JITData leak in ~CodeBlock.
Corpus: 81/85 passing; adds per-test timeouts to the runner so hangs
report as failures.
… on spawned threads

trySpreadFast reached the flat-only butterfly() accessor on arrays
whose butterfly had segmented under a racing same-shape add storm; the
spread path now dispatches on the regime and falls back to the generic
slow path. Baseline-compiled callees invoked from spawned threads read
per-thread JIT state that only LLInt entry initialized; the thread
entry sequence now materializes it for all tiers. Threads corpus green.
…tion handout

SPEC-ungil.md: N-mutator execution model — JSLock GIL-off entered-token mode,
per-thread microtask/task queues with keepalive lifetime, stop-the-world
conductor protocol (seq_cst stop-bit/access Dekker pair), thread teardown
state machine (TEARDOWN/COLLECTED/DETACHED under the lite registry lock),
~VM completion fence via registry condition wait, haveABadTime class-4 stops,
lazy-init owner-reentry contract, termination model (VM-wide only).
Includes executed inventory audits (K4/N7), full revision history with
binding annexes, and the flattened 18-task implementation handout.

Workflow updates: ungil implementation runs DAG-scheduled parallel task waves
with disjoint file ownership and per-task adversarial review; verification
ladder covers GIL-on and flag-off regression arms; scanner/fuzz/CVE-audit
workflows harden id/path sanitization.

coderabbitai[bot]

coderabbitai[bot]

coderabbitai[bot]

claude[bot]

claude[bot]

No Source/ change landed (refuter discipline held). Evidence pack
SHAREDHEAP-ALLOC-EVIDENCE.md is the round's contribution:

- 99.67% of 70.9M cells ALREADY hit interval-bump (Riptide's bump-in-
  fresh-block path). Refills are 0.33% of allocs / ~4.1% of wall. The
  'per-thread fresh-block cache' candidate targets the wrong lever and
  carries RSS risk.
- The measurable per-cell tax is the 3-hop allocator LOOKUP
  (allocationClientForCurrentThread -> allocatorForSizeStep ->
  allocateForClient, ~250ms). Higher-leverage zero-RSS candidate: cache
  LocalAllocator* per (thread, size-class).
- Decomposition: of intcs W=1 +5889ms gap vs Java, only ~1912ms (33%)
  is sharedGCHeap+gilOff tax. ~3937ms (67%) is plain-JSC floor
  (WTF::equal Map-key compare, lockProtoFuncHold, rope-resolve, IC-miss,
  CellLock/DeferTermination/traps). <6000ms needed BOTH; allocator-only
  was never going to clear it.
- intcs W=16 RSS noise is mode-correlated (slow-mode rep = low-RSS rep).

§41: clean-tree re-baseline, all within ±3% of §40, all gates green,
RSS within +10%.

claude[bot]

… 36.5% of tax)

H-VMLITE-TLCPTR: bake a process-constant TLC slot index at JIT-compile
time and load the per-thread LocalAllocator* lite-relative
(VMLite::{tlcTable,tlcTableBound}) instead of a null Allocator constant.
H-TLS-TABLE: collapse the C++ CompleteSubspace::allocate sharedGCHeap arm
to two IE-TLS loads + one indexed load. H-TLC-FIXEDTABLE-NOREALLOC:
pre-grow the TLC table so cached pointers never go stale. defer-hoist-
lazyslow: hoist DeferGCForAWhile out of the gilOff
operationCompileFTLLazySlowPath steady-state. GCClient::CompleteSubspaceView
infra (staging).

uprobe-verified: CompleteSubspace::allocateForClient 27.8M -> 0 (3-hop
fully eliminated). operationCompileFTLLazySlowPath 46.6M -> 36.4M (-22%
only — stringSpace is iso, not table-addressable; tlcSlotForConcurrently
<JSRopeString> returns nullopt so MakeRope still bakes null Allocator;
JSRopeString+JSString = 71% of cells).

§42: intcs W=1 7788 -> 7142 (-646ms, tax 1912 -> 1214); nomap W=1 -428;
default W=1 -1120; flat W=16 -22. RSS: intcs W=1 -2.3%, W=16 -10.4%.
Corpus 94+95/0, identity 40/0, all checksums stable. Residual ~75% one
mechanism: 36.4M MakeRope thunk traversals (iso-subspace TLC-slot
extension + thin-thunk are the named follow-ups).

coderabbitai[bot]

claude[bot]

…unk (76.3% of tax cumulative)

H-ISO-TLCSLOT (IsoSubspace.{h,cpp}, GCThreadLocalCache.cpp,
DFGSpeculativeJIT.cpp, FTLLowerDFGToB3.cpp, AssemblyHelpers.h): per-type
IsoSubspace TLC slot stamped at GCClient::Heap creation;
tlcSlotForConcurrentlyWithIso<T>() resolves via the stamped index. JSArray
EXCLUDED (returns nullopt): JIT inline allocateObject/emitAllocateJSObject
stores butterfly word UNTAGGED -> fresh inline JSArray reads as foreign at
§4.2 ensureLength -> segments on first growth (measured 182,339
convertToSegmentedButterfly + 19M operationArrayPush -> +3,472ms). Under
§42 JSArray cell allocator was always null GIL-off so path went to
operationNewArrayWithSize (TID-tags in C++); §43 iso arm would be FIRST
time JSArray inline path fires GIL-off. Gated on Task-8 (TID-tag every JIT
inline butterfly install). All other iso ClassTypes either no-butterfly
(JSRopeString/JSString) or null-butterfly inline path.

Thin-thunk (FTLThunks.cpp, FTLLazySlowPath.h): gilOff steady state does
the T8 acquire-load m_stubCodePtr IN JIT code, tail-jump if non-null; no
saveAllRegisters/restoreAllRegisters dump, no C call. Null falls through
to today's full thunk.

uprobe: operationCompileFTLLazySlowPath 36.4M -> 56 (-99.9998%).

§43: intcs W=1 7142 -> 6381 (-761); nomap W=1 -1018; default W=1 -976.
Cumulative §42+§43 = 1459ms = 76.3% of original 1912ms tax (now 453ms).
RSS: intcs W=1 -2.3%, W=16 -10.6%. Corpus 94+95/0, identity 40/0, 34/34
checksums stable. Residual: JSArray iso-TLC ~400-500ms gated on Task-8.

claude[bot]

coderabbitai[bot]

…n race (corrects §40)

Mechanism: String.fromCharCode is a lazy static property on StringConstructor
(initial structure {prototype,length,name}, butterfly Flat TID=0). intcs/
noconcat have no main-thread termOf() before workers, so first access is a
16-thread race from phaseAI -> termOf. If a WORKER (TID!=0) wins: foreign-TID
structure transition on a Flat butterfly -> convertToSegmentedButterfly ->
StringConstructor butterfly Segmented for process lifetime -> DFG
compileGetButterfly emits segmented check as speculationCheck(BadIndexingType)
-> every String.fromCharCode get_by_id in termOf bc#22 / tokenize bc#266 /
genDocTextI bc#254 OSR-exits -> handleGetById doesn't consult
hasExitSite(BadIndexingType) -> recompile re-emits SAME body -> termOf 15x /
genDocTextI 8x / tokenize 9x recompile loop = 4600ms slow-mode.

§40's verdict ('Map<string> is the trigger') was WRONG: nomap is monomodal
because its nmShardOf[] precompute incidentally calls termOf() on main at
module init, reifying at TID=0. Removing Map was incidental.

Discriminating tests: force main reify -> 15/15 fast; force worker reify ->
12/12 slow; reportDFGCompileTimes fast termOf=1 vs slow termOf=15; verboseOSR
exit kinds = BadIndexingType at GetButterfly(String). Explains §34(C): logGC
adds main-thread dataLog latency -> main loses race -> 0/15 fast.

Fix (bench-level): String.fromCharCode(97); at module init. 30-rep phaseA
before max/min 3.14 (18/30 slow) -> after 1.11 (0/30 slow). intcs W=16
median total 3359 [3050,3754]. Corpus 94+95/0, identity 40/0, all checksums
stable.

ENGINE-SIDE BUG remains: any GIL-off program first-touching a lazy static
property (String.fromCharCode/fromCodePoint/raw, Array.of/from/isArray,
Object.assign/keys/...) from a worker segments that constructor's butterfly
and DFG GetButterfly OSR-exits forever. Candidate fixes: (a) handleGetById
checks hasExitSite(BadIndexingType) and falls back to getById IC; (b)
foreign-transition rule special-cases property-only NonArray butterflies.

claude[bot]

coderabbitai[bot]

…adIndexingType backstop

ConcurrentButterfly.cpp §4.2 trySegmentedTransition StayFlatShared gate:
foreign-TID/SW=1 property transition on a Flat butterfly with NO indexing
header AND NO outOfLineCapacity growth reuses the existing flat allocation
under cell lock — release-store value into live slot, nuke + DCAS
{newStructure, (installerTID,SW=1)}. R7 read protocol via same M2/M5
ordering as owner StayFlat reuse. I12 holds via step-0 F2 fire. Gated
!useThreadGIL.

DFGByteCodeParser handleGetById: under useJSThreads, before the simple
CheckStructure+GetButterfly+GetByOffset / MultiGetByOffset lowering,
consult m_exitProfile.hasExitSite(m_currentIndex, BadIndexingType) and
fall back to GetById IC node on hit. Mirrors BadCache idiom. Converges
in one recompile when gate cannot apply (capacity grows). useJSThreads-
gated; flag-off byte-identical.

bench.js: §44 prewarm removed (no longer needed).
JSTests/threads/jit/foreign-reify-getbyid-converges.js: worker-reifies
Array.from/Object.keys/String.raw, asserts numberOfDFGCompiles<=4.

§45: force-worker-reify 12/12 SLOW -> 12/12 FAST [1563,1650]. intcs W=16
30-rep WITHOUT prewarm: max/min 3.14->1.08, 18/30 slow->0/30. String/
Object stay Flat (StayFlatShared); Array segments (cap 8->16) but
backstop converges (compiles: String=1 Object=1 Array=2). Corpus 95+96/0,
identity 40/0, 63/63 checksums stable. Residual: capacity-growing foreign
reification could get StayFlatSharedGrow; duplicate-property-name
structures pre-exist (separate).
…gcAtEnd 10/10, Tier-B + scanner residuals

22 tasks across 47 source files. Per docs/threads/CVE-AUDIT-RESULTS.md
'Ship-readiness closure' section + SCALEBENCH.md §46.

Validation sweeps r2: 01 (validateDFGClobberize SIGTRAP family) 86/8 ->
95/0; 05 (validateExceptionChecks unchecked-scope sites at VMTraps:686/
ThreadAtomics:1425/ConditionObject:103/ThreadManager:1124/JSObject:4283/
CallData:76) 44/50 -> 95/0; 04+06 -> 95/0. gcAtEnd property-wait-
termination 10/10 (MachineStackMarker SEGV gone).

03 (validateButterflyTagDiscipline) lint WIRED — 80/15 surfaces the §43
Task-8 residual (43 I14 reports: GetByOffset/PutByOffset storage edge from
non-tag-masking producer) as a regression detector. Task-8 DEFERRED:
JSArray exclusion makes the latent concern unreachable; perf upside
~400-500ms recorded.

§45 duplicate-property: misdiagnosis (private @hasOwn rendered without @
in describe()); under-lock getDirectOffset re-probe at reifyStaticProperty
added defensively (useJSThreads-gated).

TSan build: TsanDeferredCtorMember<StructureTransitionTable> forwarder for
tryGetSingleSlotConcurrently.

--cve 60/3/0/2: 3 fails are exit-3 test-side issues (missing useDollarVM
in requireOptions; B1/B2/B10 fixes now spec-correctly throw where tests
expected old behavior). No memory-safety failures; all Tier-A repros pass.

Corpus 95+96/0, identity 40/0, bench all checksums stable ±2% of §45.

claude[bot]

…gate RED on transition-heavy-constructor +5.71%

Full-JIT GIL-off TSAN sweep (CLoop OFF per standing ruling). 27 fix waves
across 23 source files: relaxed-atomic conversions on the racy-probe-vs-
allocator-handout pattern (BitSet/Atomics/ConcurrentPtrHashSet/IsoSubspace/
MarkedBlock/Heap/CodeBlock/DFGJITCode/DFGDesiredWeakReferences/
RegExpCachedResult/RegExpGlobalData/CachedCall/InterpreterInlines/
ThreadAtomics/JSGlobalObject/JSGenericTypedArrayViewInlines).

40 active race: suppressions, all with justification (6 pre-existing
upstream parallel-GC; 23 wave-7 atomic-probe-vs-allocator reader-side; 1
recordParse rule-1; 9 JIT-one-sider allocator-side §0 accepted-tradeoff;
1 SPEC-congc T5-rootscan). 0 CLoop entries.

229/247 exit-0 (best run health of campaign). All non-zero exits carry 0
data-race reports (functional, owned by separate queues).

Debug corpus 96/0/3. Release bench-gate: 7/8 within 1%; transition-heavy-
constructor +5.71% (re-measured +5.38% at loadavg 1.76 — load-stable).
Attribution UNPROVEN; the closeout was +3.9%, r27 is +1.7pp worse. The
unconditional WTF header conversions are the candidate; combined-revert
experiment owed. megamorphic-access -13.3% (record, do not claim).

RED gate recorded honestly per the 'honest partials over fake green'
directive. TSAN-TRIAGE.md §24 has the full r27 record.

claude[bot]

…se_after_return=0, smokes

10-min --resume smoke: 1,175,199 edges, REPRL/Thread-API OK, coverage
6.70->10.15% during import; corpus now 7314 files (import doesn't finish
in smoke window after rebuild edge-renumber).

4-min fresh-storage smoke: 262 programs generated, 4 unique SIGABRT
crashes (likely pre-existing class-static/gc family from 06-07; appears
with useJSThreads=1 default-on in every profile execution, may not be
threads-specific). NOT triaged.

run-fuzzilli.sh: export detect_stack_use_after_return=0 (lane-pin
required it but script lacked it). triage-r1-batch.sh added.
… on current tree

All 292 prior-campaign crash files (fuzzilli-storage{,-B,-C}/crashes/*.js
from 06-07/06-10) re-run 3x against the post-§46+TSAN Fuzz binary: zero
reproduce. §46 correctness closure + TSAN r27 closed everything the fuzzer
had previously found.

triage-r1-batch.sh: validate TARGET ARGS tokens against ^--[A-Za-z0-9]
[A-Za-z0-9=._-]*$ and pass as array (header is Fuzzilli-authored, not
fuzzed-JS-controlled, but defense-in-depth).

claude[bot]

claude[bot]

…rrayStorage source)

4h campaign on post-§46+TSAN tree. 125/128 = ASSERT
!hasAnyArrayStorage(source->indexingType()) at ConcurrentButterfly.cpp:1064
trySegmentedTransition <- tryPutDirectTransitionConcurrent <-
putDirectInternal. Single-threaded --useJSThreads=true; Debug repros
deterministically. 1 = storeTaggedButterflyWordConcurrent ABRT (related).

Prior-campaign re-triage on same tree: 292/292 NOREPRO.

triage-r1-batch.sh: remove the '--' separator I added (jsc treats it as
script-args delimiter -> drops to REPL). Allowlist kept.
…KED V5b); r47 setButterfly audit escapes found

tryPutDirectTransitionConcurrent: tryArrayStoragePropertyTransition reroute
+ I35 CoW materialize-first (materializeCopyOnWriteButterflyConcurrent +
RESTART before locked protocols, mirrors classifyConcurrentLockedAdd's
§4.8-precedes-§4.x). Closes r3-001 (ConcurrentButterfly.cpp:1064
!hasAnyArrayStorage) AND the 12 CoW variants (cpp:1068 !isCopyOnWrite from
defineProperty(CoW-literal, name, accessor) when E4 ineligible). r3b
re-triage 134/136 NOREPRO. Regression tests array-storage-/cow-named-
property-transition.js. r3-001/002 20/20 Debug.

bench-gate transition-heavy-constructor +6.08%: closeout commit
2f5a5c4 reproduces +6.90%/+7.37% on this host with full Source/
reverted (15+21-run medians), C' samples 51.9-61.3ms (18% spread).
Per-header audit found none on the bench's transition path. Host-
inadmissible variance; transferred to PARKED V5b per AB17g item 4.

§45 discriminant holds (force-worker-reify 5/5 fast). Corpus 97+98/0.
Identity 40/0. Checksums stable.

NEW r47 (2h re-fuzz, 423K execs): 8/9 = ONE root family — setButterfly
foreign-TID owner-assert escapes at (1) JSArrayBufferView::
slowDownAndWasteMemory (6/8; also poison-deref SEGV: wastage butterfly's
IndexingHeader::arrayBuffer uninitialized between setButterfly publish and
cell-locked setArrayBuffer, isArrayBufferViewOutOfBounds reads it
unfenced); (2) shiftButterflyAfterFlattening; (3)
flattenDictionaryStructureImpl. Trap working as designed (deterministic
abort, not silent steal). DEFERRED to r47 fix round.
…helper); 2h re-fuzz 0 r47-family

slowDownAndWasteMemory (JSArrayBufferView.cpp): cell-locked re-check ->
build wastage butterfly LOCAL + fill IndexingHeader::arrayBuffer BEFORE
publication -> storeStoreFence -> tag-PRESERVING seq_cst CAS (§4.6 AS-COPY
shape, NonArray) -> storeStoreFence before m_mode flip. Closes r47-001
owner-TID + r47-002 poison arrayBuffer mid-publish.

shiftButterflyAfterFlattening (JSObject.cpp) + flattenDictionaryStructure
Impl null-case (Structure.cpp): world-stopped + cell-locked tag-preserving
seq_cst store/zero (§6/§4.6 T3/I17).

SURFACED reads: existingBufferInButterfly (JSArrayBufferView.h) + JIT
emitLoadTypedArrayArrayBuffer (AssemblyHelpers.cpp) — Wasteful TA CAN
carry SEGMENTED word (foreign-TID named-prop add growing OOL capacity ->
trySegmentedTransition; §44 StayFlatShared gate requires !hasIndexingHeader
which Wasteful HAS). Segment-aware dispatch: spine->indexedFragment(0)->
slots[0] (§4.1 I8 alias). The 'wasteful-mode butterflies are never
segmented' comment was FALSE.

All useJSThreads-gated, flag-off byte-identical. 3 regression tests.

§48: r47-001/002 20/20; r47+r3b retriage 10/11 NOREPRO 0 r47-family;
corpus 100+102/0 (+5); identity 40/0; checksums stable. 2h re-fuzz r48
(310K execs): 2 flaky/NOREPRO, 0 r47-family. Pre-existing
isPinnedPropertyTable flake noted (06-07 class-static/gc, not r47).

claude[bot]