added 21 commits
Design specs for shared-heap Thread support in JSC: heap server and per-thread allocators, shared VM state, TID/SW-tagged and segmented butterflies, JIT tiers under N mutators, and the Thread/Lock/Condition/ ThreadLocal API. Includes TSAN, race-amplifier, and bench-gate docs plus the design overview in THREAD.md.
…harness Thread/Lock/Condition/ThreadLocal and Atomics-on-properties behind --useThreads, serialized by the VM's JSLock as a semantic oracle for the upcoming shared-heap implementation. Includes a 39-test corpus, a TSAN no-JIT build target (zero data races at idle, empty suppressions), a randomized-yield race amplifier, and a serial-perf bench gate with a recorded baseline. Also fixes ICU static-archive link order in WTF and two pre-existing no-JIT build breaks.
…rent object model, JIT support, and Thread API Multi-mutator heap with per-thread allocators and N-thread safepoints, process-global sharded atom table and StructureID allocation locking, per-thread VM-lite execution state, TID/shared-write tagged butterflies with segmented fallback and TTL watchpoint elision, per-tier TID/SW checks with handler ICs in FTL and epoch-based CodeBlock reclamation, and real mutator threads behind the Thread/Lock/Condition/ThreadLocal API with Atomics on object properties. All behind --useJSThreads with the GIL retained as a --useThreadGIL fallback layer.
…ation, waiter lists Six rounds of gate-driven fixes against the threads corpus: per-thread CLoop stacks replacing the shared-stack frame clobber, LocalAllocator and Heap shared-mode races, retired JIT artifact accounting, waiter list and condition wakeup fixes, LLInt call path initialization for spawned threads, butterfly regime dispatch in slow paths, and a watchpoint disarm before the flag-on JITData leak in ~CodeBlock. Corpus: 81/85 passing; adds per-test timeouts to the runner so hangs report as failures.
… on spawned threads trySpreadFast reached the flat-only butterfly() accessor on arrays whose butterfly had segmented under a racing same-shape add storm; the spread path now dispatches on the regime and falls back to the generic slow path. Baseline-compiled callees invoked from spawned threads read per-thread JIT state that only LLInt entry initialized; the thread entry sequence now materializes it for all tiers. Threads corpus green.
…tion handout SPEC-ungil.md: N-mutator execution model — JSLock GIL-off entered-token mode, per-thread microtask/task queues with keepalive lifetime, stop-the-world conductor protocol (seq_cst stop-bit/access Dekker pair), thread teardown state machine (TEARDOWN/COLLECTED/DETACHED under the lite registry lock), ~VM completion fence via registry condition wait, haveABadTime class-4 stops, lazy-init owner-reentry contract, termination model (VM-wide only). Includes executed inventory audits (K4/N7), full revision history with binding annexes, and the flattened 18-task implementation handout. Workflow updates: ungil implementation runs DAG-scheduled parallel task waves with disjoint file ownership and per-task adversarial review; verification ladder covers GIL-on and flag-off regression arms; scanner/fuzz/CVE-audit workflows harden id/path sanitization.
No Source/ change landed (refuter discipline held). Evidence pack SHAREDHEAP-ALLOC-EVIDENCE.md is the round's contribution: - 99.67% of 70.9M cells ALREADY hit interval-bump (Riptide's bump-in- fresh-block path). Refills are 0.33% of allocs / ~4.1% of wall. The 'per-thread fresh-block cache' candidate targets the wrong lever and carries RSS risk. - The measurable per-cell tax is the 3-hop allocator LOOKUP (allocationClientForCurrentThread -> allocatorForSizeStep -> allocateForClient, ~250ms). Higher-leverage zero-RSS candidate: cache LocalAllocator* per (thread, size-class). - Decomposition: of intcs W=1 +5889ms gap vs Java, only ~1912ms (33%) is sharedGCHeap+gilOff tax. ~3937ms (67%) is plain-JSC floor (WTF::equal Map-key compare, lockProtoFuncHold, rope-resolve, IC-miss, CellLock/DeferTermination/traps). <6000ms needed BOTH; allocator-only was never going to clear it. - intcs W=16 RSS noise is mode-correlated (slow-mode rep = low-RSS rep). §41: clean-tree re-baseline, all within ±3% of §40, all gates green, RSS within +10%.
… 36.5% of tax)
H-VMLITE-TLCPTR: bake a process-constant TLC slot index at JIT-compile
time and load the per-thread LocalAllocator* lite-relative
(VMLite::{tlcTable,tlcTableBound}) instead of a null Allocator constant.
H-TLS-TABLE: collapse the C++ CompleteSubspace::allocate sharedGCHeap arm
to two IE-TLS loads + one indexed load. H-TLC-FIXEDTABLE-NOREALLOC:
pre-grow the TLC table so cached pointers never go stale. defer-hoist-
lazyslow: hoist DeferGCForAWhile out of the gilOff
operationCompileFTLLazySlowPath steady-state. GCClient::CompleteSubspaceView
infra (staging).
uprobe-verified: CompleteSubspace::allocateForClient 27.8M -> 0 (3-hop
fully eliminated). operationCompileFTLLazySlowPath 46.6M -> 36.4M (-22%
only — stringSpace is iso, not table-addressable; tlcSlotForConcurrently
<JSRopeString> returns nullopt so MakeRope still bakes null Allocator;
JSRopeString+JSString = 71% of cells).
§42: intcs W=1 7788 -> 7142 (-646ms, tax 1912 -> 1214); nomap W=1 -428;
default W=1 -1120; flat W=16 -22. RSS: intcs W=1 -2.3%, W=16 -10.4%.
Corpus 94+95/0, identity 40/0, all checksums stable. Residual ~75% one
mechanism: 36.4M MakeRope thunk traversals (iso-subspace TLC-slot
extension + thin-thunk are the named follow-ups).
…unk (76.3% of tax cumulative)
H-ISO-TLCSLOT (IsoSubspace.{h,cpp}, GCThreadLocalCache.cpp,
DFGSpeculativeJIT.cpp, FTLLowerDFGToB3.cpp, AssemblyHelpers.h): per-type
IsoSubspace TLC slot stamped at GCClient::Heap creation;
tlcSlotForConcurrentlyWithIso<T>() resolves via the stamped index. JSArray
EXCLUDED (returns nullopt): JIT inline allocateObject/emitAllocateJSObject
stores butterfly word UNTAGGED -> fresh inline JSArray reads as foreign at
§4.2 ensureLength -> segments on first growth (measured 182,339
convertToSegmentedButterfly + 19M operationArrayPush -> +3,472ms). Under
§42 JSArray cell allocator was always null GIL-off so path went to
operationNewArrayWithSize (TID-tags in C++); §43 iso arm would be FIRST
time JSArray inline path fires GIL-off. Gated on Task-8 (TID-tag every JIT
inline butterfly install). All other iso ClassTypes either no-butterfly
(JSRopeString/JSString) or null-butterfly inline path.
Thin-thunk (FTLThunks.cpp, FTLLazySlowPath.h): gilOff steady state does
the T8 acquire-load m_stubCodePtr IN JIT code, tail-jump if non-null; no
saveAllRegisters/restoreAllRegisters dump, no C call. Null falls through
to today's full thunk.
uprobe: operationCompileFTLLazySlowPath 36.4M -> 56 (-99.9998%).
§43: intcs W=1 7142 -> 6381 (-761); nomap W=1 -1018; default W=1 -976.
Cumulative §42+§43 = 1459ms = 76.3% of original 1912ms tax (now 453ms).
RSS: intcs W=1 -2.3%, W=16 -10.6%. Corpus 94+95/0, identity 40/0, 34/34
checksums stable. Residual: JSArray iso-TLC ~400-500ms gated on Task-8.
…n race (corrects §40)
Mechanism: String.fromCharCode is a lazy static property on StringConstructor
(initial structure {prototype,length,name}, butterfly Flat TID=0). intcs/
noconcat have no main-thread termOf() before workers, so first access is a
16-thread race from phaseAI -> termOf. If a WORKER (TID!=0) wins: foreign-TID
structure transition on a Flat butterfly -> convertToSegmentedButterfly ->
StringConstructor butterfly Segmented for process lifetime -> DFG
compileGetButterfly emits segmented check as speculationCheck(BadIndexingType)
-> every String.fromCharCode get_by_id in termOf bc#22 / tokenize bc#266 /
genDocTextI bc#254 OSR-exits -> handleGetById doesn't consult
hasExitSite(BadIndexingType) -> recompile re-emits SAME body -> termOf 15x /
genDocTextI 8x / tokenize 9x recompile loop = 4600ms slow-mode.
§40's verdict ('Map<string> is the trigger') was WRONG: nomap is monomodal
because its nmShardOf[] precompute incidentally calls termOf() on main at
module init, reifying at TID=0. Removing Map was incidental.
Discriminating tests: force main reify -> 15/15 fast; force worker reify ->
12/12 slow; reportDFGCompileTimes fast termOf=1 vs slow termOf=15; verboseOSR
exit kinds = BadIndexingType at GetButterfly(String). Explains §34(C): logGC
adds main-thread dataLog latency -> main loses race -> 0/15 fast.
Fix (bench-level): String.fromCharCode(97); at module init. 30-rep phaseA
before max/min 3.14 (18/30 slow) -> after 1.11 (0/30 slow). intcs W=16
median total 3359 [3050,3754]. Corpus 94+95/0, identity 40/0, all checksums
stable.
ENGINE-SIDE BUG remains: any GIL-off program first-touching a lazy static
property (String.fromCharCode/fromCodePoint/raw, Array.of/from/isArray,
Object.assign/keys/...) from a worker segments that constructor's butterfly
and DFG GetButterfly OSR-exits forever. Candidate fixes: (a) handleGetById
checks hasExitSite(BadIndexingType) and falls back to getById IC; (b)
foreign-transition rule special-cases property-only NonArray butterflies.
…adIndexingType backstop
ConcurrentButterfly.cpp §4.2 trySegmentedTransition StayFlatShared gate:
foreign-TID/SW=1 property transition on a Flat butterfly with NO indexing
header AND NO outOfLineCapacity growth reuses the existing flat allocation
under cell lock — release-store value into live slot, nuke + DCAS
{newStructure, (installerTID,SW=1)}. R7 read protocol via same M2/M5
ordering as owner StayFlat reuse. I12 holds via step-0 F2 fire. Gated
!useThreadGIL.
DFGByteCodeParser handleGetById: under useJSThreads, before the simple
CheckStructure+GetButterfly+GetByOffset / MultiGetByOffset lowering,
consult m_exitProfile.hasExitSite(m_currentIndex, BadIndexingType) and
fall back to GetById IC node on hit. Mirrors BadCache idiom. Converges
in one recompile when gate cannot apply (capacity grows). useJSThreads-
gated; flag-off byte-identical.
bench.js: §44 prewarm removed (no longer needed).
JSTests/threads/jit/foreign-reify-getbyid-converges.js: worker-reifies
Array.from/Object.keys/String.raw, asserts numberOfDFGCompiles<=4.
§45: force-worker-reify 12/12 SLOW -> 12/12 FAST [1563,1650]. intcs W=16
30-rep WITHOUT prewarm: max/min 3.14->1.08, 18/30 slow->0/30. String/
Object stay Flat (StayFlatShared); Array segments (cap 8->16) but
backstop converges (compiles: String=1 Object=1 Array=2). Corpus 95+96/0,
identity 40/0, 63/63 checksums stable. Residual: capacity-growing foreign
reification could get StayFlatSharedGrow; duplicate-property-name
structures pre-exist (separate).
…gcAtEnd 10/10, Tier-B + scanner residuals 22 tasks across 47 source files. Per docs/threads/CVE-AUDIT-RESULTS.md 'Ship-readiness closure' section + SCALEBENCH.md §46. Validation sweeps r2: 01 (validateDFGClobberize SIGTRAP family) 86/8 -> 95/0; 05 (validateExceptionChecks unchecked-scope sites at VMTraps:686/ ThreadAtomics:1425/ConditionObject:103/ThreadManager:1124/JSObject:4283/ CallData:76) 44/50 -> 95/0; 04+06 -> 95/0. gcAtEnd property-wait- termination 10/10 (MachineStackMarker SEGV gone). 03 (validateButterflyTagDiscipline) lint WIRED — 80/15 surfaces the §43 Task-8 residual (43 I14 reports: GetByOffset/PutByOffset storage edge from non-tag-masking producer) as a regression detector. Task-8 DEFERRED: JSArray exclusion makes the latent concern unreachable; perf upside ~400-500ms recorded. §45 duplicate-property: misdiagnosis (private @hasOwn rendered without @ in describe()); under-lock getDirectOffset re-probe at reifyStaticProperty added defensively (useJSThreads-gated). TSan build: TsanDeferredCtorMember<StructureTransitionTable> forwarder for tryGetSingleSlotConcurrently. --cve 60/3/0/2: 3 fails are exit-3 test-side issues (missing useDollarVM in requireOptions; B1/B2/B10 fixes now spec-correctly throw where tests expected old behavior). No memory-safety failures; all Tier-A repros pass. Corpus 95+96/0, identity 40/0, bench all checksums stable ±2% of §45.
…gate RED on transition-heavy-constructor +5.71% Full-JIT GIL-off TSAN sweep (CLoop OFF per standing ruling). 27 fix waves across 23 source files: relaxed-atomic conversions on the racy-probe-vs- allocator-handout pattern (BitSet/Atomics/ConcurrentPtrHashSet/IsoSubspace/ MarkedBlock/Heap/CodeBlock/DFGJITCode/DFGDesiredWeakReferences/ RegExpCachedResult/RegExpGlobalData/CachedCall/InterpreterInlines/ ThreadAtomics/JSGlobalObject/JSGenericTypedArrayViewInlines). 40 active race: suppressions, all with justification (6 pre-existing upstream parallel-GC; 23 wave-7 atomic-probe-vs-allocator reader-side; 1 recordParse rule-1; 9 JIT-one-sider allocator-side §0 accepted-tradeoff; 1 SPEC-congc T5-rootscan). 0 CLoop entries. 229/247 exit-0 (best run health of campaign). All non-zero exits carry 0 data-race reports (functional, owned by separate queues). Debug corpus 96/0/3. Release bench-gate: 7/8 within 1%; transition-heavy- constructor +5.71% (re-measured +5.38% at loadavg 1.76 — load-stable). Attribution UNPROVEN; the closeout was +3.9%, r27 is +1.7pp worse. The unconditional WTF header conversions are the candidate; combined-revert experiment owed. megamorphic-access -13.3% (record, do not claim). RED gate recorded honestly per the 'honest partials over fake green' directive. TSAN-TRIAGE.md §24 has the full r27 record.
…se_after_return=0, smokes 10-min --resume smoke: 1,175,199 edges, REPRL/Thread-API OK, coverage 6.70->10.15% during import; corpus now 7314 files (import doesn't finish in smoke window after rebuild edge-renumber). 4-min fresh-storage smoke: 262 programs generated, 4 unique SIGABRT crashes (likely pre-existing class-static/gc family from 06-07; appears with useJSThreads=1 default-on in every profile execution, may not be threads-specific). NOT triaged. run-fuzzilli.sh: export detect_stack_use_after_return=0 (lane-pin required it but script lacked it). triage-r1-batch.sh added.
… on current tree
All 292 prior-campaign crash files (fuzzilli-storage{,-B,-C}/crashes/*.js
from 06-07/06-10) re-run 3x against the post-§46+TSAN Fuzz binary: zero
reproduce. §46 correctness closure + TSAN r27 closed everything the fuzzer
had previously found.
triage-r1-batch.sh: validate TARGET ARGS tokens against ^--[A-Za-z0-9]
[A-Za-z0-9=._-]*$ and pass as array (header is Fuzzilli-authored, not
fuzzed-JS-controlled, but defense-in-depth).
…rrayStorage source) 4h campaign on post-§46+TSAN tree. 125/128 = ASSERT !hasAnyArrayStorage(source->indexingType()) at ConcurrentButterfly.cpp:1064 trySegmentedTransition <- tryPutDirectTransitionConcurrent <- putDirectInternal. Single-threaded --useJSThreads=true; Debug repros deterministically. 1 = storeTaggedButterflyWordConcurrent ABRT (related). Prior-campaign re-triage on same tree: 292/292 NOREPRO. triage-r1-batch.sh: remove the '--' separator I added (jsc treats it as script-args delimiter -> drops to REPL). Allowlist kept.
…KED V5b); r47 setButterfly audit escapes found tryPutDirectTransitionConcurrent: tryArrayStoragePropertyTransition reroute + I35 CoW materialize-first (materializeCopyOnWriteButterflyConcurrent + RESTART before locked protocols, mirrors classifyConcurrentLockedAdd's §4.8-precedes-§4.x). Closes r3-001 (ConcurrentButterfly.cpp:1064 !hasAnyArrayStorage) AND the 12 CoW variants (cpp:1068 !isCopyOnWrite from defineProperty(CoW-literal, name, accessor) when E4 ineligible). r3b re-triage 134/136 NOREPRO. Regression tests array-storage-/cow-named- property-transition.js. r3-001/002 20/20 Debug. bench-gate transition-heavy-constructor +6.08%: closeout commit 2f5a5c4 reproduces +6.90%/+7.37% on this host with full Source/ reverted (15+21-run medians), C' samples 51.9-61.3ms (18% spread). Per-header audit found none on the bench's transition path. Host- inadmissible variance; transferred to PARKED V5b per AB17g item 4. §45 discriminant holds (force-worker-reify 5/5 fast). Corpus 97+98/0. Identity 40/0. Checksums stable. NEW r47 (2h re-fuzz, 423K execs): 8/9 = ONE root family — setButterfly foreign-TID owner-assert escapes at (1) JSArrayBufferView:: slowDownAndWasteMemory (6/8; also poison-deref SEGV: wastage butterfly's IndexingHeader::arrayBuffer uninitialized between setButterfly publish and cell-locked setArrayBuffer, isArrayBufferViewOutOfBounds reads it unfenced); (2) shiftButterflyAfterFlattening; (3) flattenDictionaryStructureImpl. Trap working as designed (deterministic abort, not silent steal). DEFERRED to r47 fix round.
…helper); 2h re-fuzz 0 r47-family slowDownAndWasteMemory (JSArrayBufferView.cpp): cell-locked re-check -> build wastage butterfly LOCAL + fill IndexingHeader::arrayBuffer BEFORE publication -> storeStoreFence -> tag-PRESERVING seq_cst CAS (§4.6 AS-COPY shape, NonArray) -> storeStoreFence before m_mode flip. Closes r47-001 owner-TID + r47-002 poison arrayBuffer mid-publish. shiftButterflyAfterFlattening (JSObject.cpp) + flattenDictionaryStructure Impl null-case (Structure.cpp): world-stopped + cell-locked tag-preserving seq_cst store/zero (§6/§4.6 T3/I17). SURFACED reads: existingBufferInButterfly (JSArrayBufferView.h) + JIT emitLoadTypedArrayArrayBuffer (AssemblyHelpers.cpp) — Wasteful TA CAN carry SEGMENTED word (foreign-TID named-prop add growing OOL capacity -> trySegmentedTransition; §44 StayFlatShared gate requires !hasIndexingHeader which Wasteful HAS). Segment-aware dispatch: spine->indexedFragment(0)-> slots[0] (§4.1 I8 alias). The 'wasteful-mode butterflies are never segmented' comment was FALSE. All useJSThreads-gated, flag-off byte-identical. 3 regression tests. §48: r47-001/002 20/20; r47+r3b retriage 10/11 NOREPRO 0 r47-family; corpus 100+102/0 (+5); identity 40/0; checksums stable. 2h re-fuzz r48 (310K execs): 2 flaky/NOREPRO, 0 r47-family. Pre-existing isPinnedPropertyTable flake noted (06-07 class-static/gc, not r47).