lzkk
ca6c9fa540
docs: 记录 MAX_SPILL_ROUNDS 修复——mm1 指令数削减 99.4%
5 days ago
lzkk
d238777f17
fix(regalloc): 根除 spill 代码指数级膨胀——MAX_SPILL_ROUNDS 统一为 3
...
根因:MAX_SPILL_ROUNDS 在 vreg≤120 的函数上为 10,导致每轮 spill
数量翻倍(14→25→48→94→186→370→738→1474→2946→5890),
67-vreg 的 mm1 累计产生 11,785 个 frame slot,帧 138KB,85K 指令。
修复:
- MAX_SPILL_ROUNDS 统一为 3,防止级联膨胀
- 新增 AssignSpillSlots:不重叠活区间的 spilled vreg 共享 frame slot
- RewriteWithAllocation 接收可选 liveness 参数以支持 slot 共享
效果(mm1):529 行(-99.4%),帧 1232 字节(-99.1%)
5 days ago
lzkk
535ab08d32
feat(backend): AsmPrinter 帧基址缓存,避免连续栈访问重复计算地址
...
新增 g_frame_base_offset/g_frame_base_valid 缓存机制:
- PrintStackAccess 尝试复用 x13 中已计算的帧地址
- 相邻访问偏移差在 ldur/stur ±256 或 ldr/str 0~32760 范围内时免重算
- x13 被覆写时(ADRP/EmitAddressFromBase/EmitStackAdjust)自动失效
- 为后续 MIR 层 spill 排序优化提供基础架构
5 days ago
lzkk
3ab88232f7
fix(hooks): Stop hook 改为智能检测——仅在 src/ 有未提交改动时提醒
5 days ago
lzkk
6f14ee1a7a
fix(infra): 编译器资源限制包装器 + 测试脚本超时防护,防止 OOM 闪退
...
多层防护防止编译器内存爆炸(mm1.sy 9.9GB)触发 OOM Killer 导致终端闪退:
- compiler-wrapper.sh: 通用包装器,ulimit -v 12GB + timeout 300s
- setup-compiler-wrapper.sh: cmake 构建后恢复包装器
- 2026test.sh, verify_asm.sh: 自动检测包装器 + 编译器调用加 timeout
build/ 下文件不进版本控制,不影响比赛提交。
5 days ago
lzkk
5300e2c1ec
fix(hooks): 修复会话崩溃 + 优化开发规范配置
...
- block-destructive.sh: 移除 set -e,补全 git checkout/clean 保护,安全降级空 stdin
- spec-reminder.sh: 精简 ~300→150 字符,减少 token 消耗
- memory-guard.sh: 修复 pgrep 进程匹配模式
- settings.json: PreToolUse matcher 精确化(仅匹配 6 类危险命令),禁用 chrome MCP
- RegAlloc.cpp: MAX_SPILL_ROUNDS 3→5,大 block(>20 defs)全干涉保守修复
- CLAUDE.md: 同步 spill 轮次、新增 shift chain 故障模式、更新工具编排说明
5 days ago
lzkk
da5d618297
fix(hooks): memory-guard 输出合法 JSON,修复会话崩溃
...
SessionStart hook 要求 stdout 输出 JSON,但旧版 memory-guard.sh 只写
stderr,stdout 为空,导致 Claude Code hook runner JSON 解析异常后崩溃。
改为输出 {"continue": true} 并将警告注入 additionalContext。
5 days ago
lzkk
2d3a5ff998
perf(backend): Peephole 新增全局变量 store-load 转发和 load CSE
...
StoreGlobal 后紧跟 LoadGlobal 同一符号时,同寄存器则删除 load,
否则转为 MovReg。LoadGlobal 连续出现时同样处理。
shuffle -6, conv2d -3, crypto -3, h-9 -3。总计 -15 条,零退化。
5 days ago
lzkk
b2b7210f11
perf(backend): 除法/取模统一使用 sdiv,删除2的幂次移位序列
...
AArch64 sdiv+msub 比移位序列(add+cmp+csel+asr)短2-4条指令。
删除 DivRR/ModRR 约150行的2的幂次移位代码,统一走 sdiv。
新增 x%1==0 / x%-1==0 优化。
crypto -249, huffman -186, crc -84, fft -72, h-9 -42,
many_mat_cal -24, 03_sort -24, h-1 -21, conv2d -21,
transpose -12, sl -3。总计 -735 条。
matmul +3 在容忍范围内。
5 days ago
lzkk
befdca6451
perf(backend): 叶函数跳过帧设置,节省 x29/x30 保存/恢复
...
MachineFunction 添加 HasCall 标记,Lowering 在发射 Call 时设置。
叶函数无帧且无 callee-saved 寄存器的函数完全跳过 prologue/epilogue;
有帧叶函数改用 str/ldr x29 替代 stp/ldp x29,x30。
huffman -93, crypto -54, conv2d -45, crc -27, h-9 -27,
03_sort -18, opt_scheduling -18, h-4 -12, fft -9, shuffle -9。
总计 -312 条,零退化。
5 days ago
lzkk
854168fb4e
perf(backend): 消除连续全局变量访问的冗余 ADRP
...
AsmPrinter 添加 ADRP 缓存,同符号连续访问时跳过重复的页面地址装载。
x13 被非全局访问路径使用时失效缓存;基本块入口重置。
shuffle -48, crypto -27, conv2d -21, fft -12, huffman -9, h-9 -9,
03_sort -6, h-8 -3。总计 -135 条,零退化。
5 days ago
lzkk
acdac5391d
fix(backend): EmitLargeImmediate 跳过前导零,避免冗余 movz #0
...
32-bit 立即数低 16 位为零时(如 0x00020000),直接发射移位
movz 而非 movz #0 + movk 双指令。crypto -7, fft -2, h-4 -1,
h-10 -1,总计 -33 条,零退化。
5 days ago
lzkk
bb58aac749
fix(mem2reg): 添加大参数函数安全门禁,修复 87_many_params
...
Mem2Reg 在处理含大量 alloca 的递归函数时会产生错误的 SSA 形式,
导致降级阶段生成错误代码(参数转发偏移不正确)。
修复:当 promotable alloca 数量 >24 时跳过 Mem2Reg,保留栈分配方式。
该门禁不影响正常小函数的 SSA 优化。
测试结果:
- functional: 87/88 → 100/100 (87_many_params 修复)
- h_functional: 30/31 (30_many_dimensions 仍失败,已知 GEP 降级 bug)
5 days ago
lzkk
fccd935a24
feat(backend): 新增 AddImm/SubImm 操作码,消除冗余 MovImm
...
AArch64 add/sub 支持 12 位立即数,但 MIR 只有 AddRR/SubRR,
导致 RHS 为常量时需先 MovImm 再 RR 运算。本次修改:
- MIR.h:新增 AddImm、SubImm 操作码
- Lowering.cpp:Add/Sub 降级时 RHS 为 0-4095 常量直接用 AddImm/SubImm
- RegAlloc.cpp:AddImm/SubImm 复用 AddRR/SubRR 的 def-use 分析
- AsmPrinter.cpp:通用打印机自动处理 Imm 操作数(#value)
效果(对比 CmpImm 基线):
- sl1-3: 261→247 (-14, -5.4%)
- huffman-01-03: 792→790 (-2)
- h-5-01-03: 341→338 (-3)
- 全 60 个性能用例总减少 55 行
- 功能测试 0 新故障
更新:优化记录.md 新增条目,基线自动更新
5 days ago
lzkk
bd7dcedb2a
feat(backend): ICmp 降级常量折叠到 CmpImm,消除冗余 MovImm
...
在 Lowering 的两个 ICmp 路径中,当比较操作数为常量且值在 0-4095
范围内时,直接使用 CmpImm 而非 MovImm+CmpRR。LHS 为常量时自动
交换操作数并反转条件码(SwapCondCode)。
性能测试(20 个代表性用例):
- 13 个改善(-1 到 -25 条指令)
- 6 个不变
- 1 个轻微退化(h-5,+1 条,+0.3%,在容忍范围内)
- 总减少 91 条指令(-1.1%)
同步更新:CLAUDE.md 完整开发规范、指令数基线初始化、
.claude/hooks 执行保障系统。
6 days ago
黄熙哲
6b9cf3a448
fix(backend): add x16/x17 to GP allocatable set to fix segfaults
...
Adding x16 and x17 (IP0/IP1, caller-saved) increases GP registers
from 16 to 18, reducing register pressure for large functions.
Fixes segfaults: 39_fp_params (64 params), 30_many_dimensions (2MB frame).
Also improves performance: crc -8, fft0 -4, huffman -12, sl -1 etc.
6 days ago
黄熙哲
5902060dae
fix(backend): lower coalesce skip threshold to fix segfaults
...
Change coalesce skip condition from vregs >150 to:
move_prefs > 100 || vregs * move_prefs > 600
The original threshold of 150 was too coarse — it missed functions
like conv2d (71 vregs, 15 moves) whose coalescing still produces
incorrect spill code. The new product condition catches functions
whose move graph complexity indicates risky coalescing.
Fixes segfaults: conv2d-1/2/3, 65_color, 68_brainfk, 37_dct.
6 days ago
黄熙哲
34cb79449f
fix(backend): skip coalescing for large functions to prevent segfault\n\nFor functions with >150 vregs, discard move_preferences after\ncollection to skip active coalescing. Large functions like\nconv2d, 65_color, 68_brainfk have complex interference graphs\nthat cause coalescing to generate incorrect spill code.\n\nFixes segfaults in: conv2d-1/2/3, 65_color, 68_brainfk, 37_dct.\n\nKnown limitations: 30_many_dimensions and 39_fp_params still\nsegfault (pre-existing original compiler bugs in lowering/RA).\nMinor instruction count changes: h-8 +2.5%, matmul +7% etc.
7 days ago
黄熙哲
a84ffd210b
chore: simplify baseline to single-column historical minimum\n\nRemove source baseline concept. Each test now tracks only its\nbest-ever instruction count. count_asm.sh updated to directly\nupdate baseline when a new lower value is found.
7 days ago
黄熙哲
b7e78ebd56
fix(backend): AsmPrinter large frame + RegAlloc spill limit\n\nApply only proven-safe fixes on clean baseline:\n- AsmPrinter: movz/movk for large stack offsets (>12KB)\n 30_many_dimensions: 7M -> 1455 lines (99.9% reduction)\n- RegAlloc: limit spill rounds to 3 for large functions (>120 vregs)\n 39_fp_params: >120s -> <1s compilation\n\nZero instruction count regression confirmed.\n57/60 performance tests at historical best baseline.
7 days ago
黄熙哲
2e368f86cf
chore: update instruction count baseline after Mem2Reg threshold tuning\n\nKey improvements from PHI threshold relaxation:\n- many_mat_cal: 523->432 (-91 lines, 17.4%)\n- h-8: 504->407 (-97 lines, 19.2%)\n- matmul: 450->366 (-84 lines, 18.7%)\n\nCrypto and other complex functions unaffected (correctly skipped).
1 week ago
黄熙哲
cc9f4f9a76
feat(mem2reg): tune PHI threshold to allow Mem2Reg on moderate functions\n\nChange phi_threshold from max(50, block_count) to max(100, block_count*2).\nThe old threshold was too conservative for functions with many allocas\nlike many_mat_cal (~15 allocas, 60 blocks), causing premature skip.\nThe new threshold allows these while still blocking crypto-like functions\nwhere excessive PHI nodes hurt code quality.\n\nmany_mat_cal: -91 lines, matmul: -84 lines, h-8: -97 lines
1 week ago
黄熙哲
d5d8924050
chore: update instruction count baseline after loop optimizations merge\n\nAdditional reductions from loop IR passes:\n- conv2d: 657->629 (-28), fft: 619->605 (-14)\n- huffman: 849->829 (-20), sl: 280->264 (-16)\n- knapsack: 175->167 (-8), transpose: 211->207 (-4)\n- 01_mm: 313->310 (-3), h-10: 335->329 (-6)\n\nRestore CLAUDE.md deleted during merge.
1 week ago
黄熙哲
06bada3ff5
Merge remote master into local master
1 week ago
黄熙哲
39b7e2ed19
feat(backend): loop-depth weighted spill cost model\n\nAdds DFS-based back-edge detection to compute basic block loop\nnesting depth. Each vreg inherits the max loop depth of its\ndefining blocks. Spill cost multiplies interval+ref by 10^depth,\nmaking loop-carried variables much more expensive to spill.
1 week ago
黄熙哲
993e81363a
fix(backend): recompute degree unconditionally after MergeInto\n\nAfter a merge, u inherits v's neighbors, so degree[u] must always\nbe recomputed. Previously, when degree[u] < K before merge, the\nstale low degree was kept, which could push a high-degree merged\nnode into simplify_worklist with wrong metadata.\n\nAlso remove redundant if(!remaining.empty()) guard in spill path\nand clean up extra brace from removed GiveUpPhase.
1 week ago
黄熙哲
bef03ec220
chore: update instruction count baseline after Module D rewrite\n\n54/60 performance tests reduced. Key improvements:\n- conv2d: -95 lines (12.6%)\n- huffman: -44 lines (4.9%)\n- fft: -39 lines (5.9%)\n- crc: -38 lines (11.6%)\n- 03_sort: -28 lines (4.2%)\n- 01_mm: -22 lines (6.6%)\n\nAlso fix count_asm.sh sed to match any current value.
1 week ago
黄熙哲
570253f1f2
feat(backend): relax Briggs threshold to 2*K and fix move_adj self-loop\n\nUsing >= 2*K instead of >= K for high-degree neighbor count allows\nmore node pairs to be safely merged. Fixed a bug in MergeInto where\nmove_adj[u] could contain u (self-loop) when v's move set included u,\ncausing iterator invalidation during move_adj cleanup.
1 week ago
黄熙哲
3691da34ee
feat(backend): rewrite main loop with held_nodes release and ReactivatePairs
1 week ago
黄熙哲
0881889ec1
feat(backend): add ReactivatePairs and stale_pairs for coalescing
1 week ago
黄熙哲
07048a123b
feat(backend): separate move-related low-degree nodes into held_nodes
1 week ago
黄熙哲
99fe17fc3f
feat(backend): propagate coalesced node colors in AssignColors\n\nAfter active coalescing, merged_set nodes inherit their representative's\ncolor, ensuring move-related vregs share the same physical register.
1 week ago
黄熙哲
081580ac0a
feat(backend): integrate active coalescing into ColorGraph main loop\n\nReplaces inner simplify while-loop with if-else chain:\nSimplify -> MergePhase -> GiveUpPhase -> Spill.\nLambdas moved outside while loop for clarity.
1 week ago
黄熙哲
0e4f9f1910
feat(backend): add MergePhase and GiveUpPhase for active coalescing\n\nMergePhase uses the Briggs conservative test to safely merge move-related\nnode pairs before coloring. GiveUpPhase abandons moves for low-degree\nnodes when merging is no longer beneficial.
1 week ago
黄熙哲
ca6c2a18c9
feat(backend): add coalesce data structures and helpers to ColorGraph\n\nIntroduces MovePair, move_adj, FindRep, GetRep, HasMovePair as\ninfrastructure for the upcoming Coalesce and Freeze phases.\nModifies simplify loop to skip already-merged nodes via GetRep.
1 week ago
黄熙哲
560f565a51
chore: update instruction count baseline after Module B stp/ldp\n\nAlso modify count_asm.sh to auto-update baseline when instruction\ncounts decrease below the recorded values.
1 week ago
黄熙哲
af71513361
feat(backend): use stp/ldp for callee-saved registers in prologue/epilogue\n\nGroups callee-saved X and S registers and emits paired stp/ldp\ninstructions, reducing save/restore overhead by ~50%. Odd remainders\nstill use str/ldr. Adds fallback else branch for future register types.
1 week ago
安峻邑
cb33c344ac
启动循环优化
1 week ago
安峻邑
b93e81ce74
循环优化
1 week ago
安峻邑
4bc21faf61
循环优化
1 week ago
安峻邑
d07bf9f0d2
循环优化
1 week ago
安峻邑
81b5c2a2b0
循环优化
1 week ago
安峻邑
860e5edadf
实现循环优化:LICM、强度削弱、循环展开、循环分裂
1 week ago
黄熙哲
e26fd3f520
fix(peephole): remove dead conditional branch inversion code\n\nThe CondBr+Branch inversion pattern was unreachable because the\nsimple Br fallthrough check runs first and removes the Br. Removed\nthe dead code and the unused NegateCondCode helper.
1 week ago
黄熙哲
7490fd3a49
feat(peephole): add branch fallthrough and conditional branch inversion\n\nEliminates unconditional Br when target is the next block in layout.\nInverts CondBr condition when the following Br targets the fallthrough\nblock, eliminating the extra jump.
1 week ago
黄熙哲
1701b2cf51
feat(peephole): merge adjacent zero-value stack stores\n\nWhen str WZR, fi#N and str WZR, fi#N+1 appear consecutively,\nreplaces them with a single str XZR, fi#N (64-bit zero store).
1 week ago
黄熙哲
e44ba819ec
feat(peephole): add store-load forwarding pattern\n\nWhen StoreStack regA, fi#N is immediately followed by LoadStack regB, fi#N\nwith regA != regB, replaces the load with MovReg regB, regA, eliminating\nthe redundant memory access.
1 week ago
黄熙哲
083616e50d
fix(backend): add redundant MovReg elimination on no-spill early-return path\n\nThe MovReg cleanup was only running after the final RewriteWithAllocation\nat the end of the spill loop, missing the early-return path when\nallocation succeeded without spilling. This left behind no-op moves\nlike 'mov x0, x0' that coalescing created.
1 week ago
黄熙哲
6f829c30f9
feat(backend): eliminate redundant MovReg after register allocation\n\nScans all blocks after RewriteWithAllocation and removes MovReg\ninstructions where source and destination are the same physical\nregister. This cleans up cases where move coalescing successfully\nassigned the same register to both sides.
1 week ago
黄熙哲
4bdca3f722
feat(backend): move coalescing via color preference and phi cycle breaking\n\nCollects move_preferences from MovReg instructions and uses them\nduring color selection to prefer the same physical register for\nmove-related virtual registers. Detects and breaks cycles in move\npreference chains to ensure correctness.
1 week ago