lzkk
ddaf8831a2
fix(mir): CMakeLists.txt 改用 GreedyAlloc.cpp 替代 LinearScanAlloc.cpp
4 days ago
lzkk
da1e456133
feat(mir): 实现 LLVM-style 贪婪寄存器分配器 —— 统一架构
...
核心变更:
- MIR.h: 增强 LiveInterval(VNInfo/UsePosition/Segment)+ LiveRegMatrix + RegClass
- GreedyAlloc.cpp: TryAssign/TryAnyFreeReg/TryEvict/TrySplit 贪婪分配 + RewriteSpills
- InstLiveness.cpp: EnhanceIntervals 前向 pass + ComputeInstLiveness 适配
- MIRBasicBlock.cpp: InsertInst/ReplaceVReg API
- main.cpp: 切换至 RunGreedyRegAlloc
- RegAlloc.cpp/LinearScanAlloc.cpp: #if 0 隔离
架构:优先级队列驱动分配(每轮全新分配),TryEvict 无条件驱逐,
StoreStack+LoadStack 溢出重写,区间分裂处理高寄存器压力。
功能测试通过率: 53/100(剩余 47 例需调试溢出重写循环)
4 days ago
lzkk
0a29e6ac42
fix(mir): AsmPrinter 函数调用后失效帧基址缓存——修复 92_register_alloc
...
Call 指令仅失效了 ADRP 缓存但未失效帧基址缓存(x13)。x13 是 caller-saved
寄存器,被调用破坏后后续栈访问使用垃圾地址。影响所有含函数调用的大栈帧函数。
修复: InvalidateFrameBase() 添加到 Opcode::Call 分支。
4 days ago
lzkk
363b809736
fix(mir): 大栈帧 asm 输出 x13 缓存失效 + 叶函数栈参数偏移修复 + IR 数组初始化阈值降低
...
- AsmPrinter: 大偏移量 movz/movk 路径中使用 x13 后失效帧基址缓存和 ADRP 缓存
- FrameLowering: 叶函数(仅保存 x29)栈参数偏移从 16 修正为 8
- IRGenDecl: 数组初始化阈值从 10000 降至 256,避免大数组 IR 膨胀导致后端超时
4 days ago
lzkk
120d7197d8
fix(mir): 线性扫描活变量分析 def 位置记录 + callee-saved 寄存器限定 + CLI 标志修复
...
- InstLiveness: 反向扫描中记录 def vreg 的区间起点,修复 phi copy 的 MovReg
def 位置未被区间覆盖导致寄存器分配不一致的 bug(25_while_if 死循环)
- LinearScanAlloc: GP_ALLOCATABLE 限定为 callee-saved 寄存器(x19-x28),
避免跨函数调用时 caller-saved 寄存器被破坏导致段错误(54_hidden_var)
- CLI: 修复 --regalloc= 标志 strncmp 长度 off-by-one(12→11)
4 days ago
lzkk
e1777c9eab
fix(ir): CSE 安全门禁——非 SSA 函数跳过 Load/GEP CSE
...
- 当 alloca 数量 > 24(Mem2Reg 跳过)时,跳过 Load 和 GEP 的 CSE
- 修复 27_scope5 优化 bug(非 SSA 代码中 Load CSE 错误合并不同作用域变量)
- 86_long_code2 不受影响(仅 1 个 alloca,Load/GEP CSE 正常启用)
4 days ago
lzkk
28c336728d
fix(mir): 线性扫描区间分裂修复 + 多定义点 vreg 跳过逻辑修复
...
- 区间分裂: last.end = cur.start (而非 cur.end),确保 save point 之后寄存器值正确
- 多定义点 vreg: 改为按区间覆盖检查,支持 phi-copy 插入的多定义场景
- 30_many_dimensions 已修复(19D 嵌套循环输出正确)
- 25_while_if 循环变量映射仍有 bug,待进一步修复
4 days ago
lzkk
fbea91986d
feat(mir): 指令级活变量分析 + CLI/构建集成
...
- InstLiveness: 三阶段算法(块级 fixpoint + 指令反向扫描 + 区间构建)
- 支持 phi-copy 插入环境(非严格 SSA):union 区间
- CLI.h 添加 regalloc 字段,支持 --regalloc=linear/graphcoloring
4 days ago
lzkk
8f3012cd9f
fix(ir): CSE 扩展支持 LoadInst 和 GEPInst——修复 86_long_code2 编译超时
...
- ExprKey 从固定双操作数改为变长操作数向量,支持 Load/GEP/Binary
- IsCSECandidate 新增 Load 和 GEP 指令类型
- Store 指令失效对应地址的 Load 缓存,保证语义安全
- 86_long_code2: 编译从 >300s 超时降至 ~3s 完成,GEP+Load 从 8000 对降至 2 对
4 days ago
lzkk
28ad162de4
feat(mir): 线性扫描寄存器分配初始实现(WIP,--regalloc=linear 可用)
...
- Wimmer & Mössenböck (2005) 优化区间分割算法
- 685 行,支持 GP/FP 寄存器池
- 目前通过简单用例,循环函数有寄存器映射 bug(25_while_if 无限循环)
- 默认仍使用图着色,线性扫描可通过 CLI 切换
4 days ago
lzkk
a9ebfdc0e0
feat(mir): 添加指令级活变量分析——精确 [start,end] 区间
...
- 三阶段算法:块级 backward dataflow fixpoint + 指令级反向扫描 + 区间构建
- 支持 phi-copy 插入(非严格 SSA):union 区间
- 31 种 MIR opcode 的 def 判定(HasVRegDef)
- 用于线性扫描寄存器分配的精确活跃区间输入
5 days ago
lzkk
6c5441ff43
feat(mir): 添加 MIR 验证器和寄存器分配验证器
...
- MIRVerifier: vreg 单定义(逐块)检查、块终止指令检查
- RegAllocVerifier: 物理寄存器有效性检查(范围 0-96)
- Debug 构建中每个 MIR pass 后自动运行(#ifndef NDEBUG)
5 days ago
lzkk
fb77d7e03c
chore(ir): 清理 IRVerifier 死代码和注释
5 days ago
lzkk
0b589c77da
feat(ir): 添加 IR 验证器,校验 SSA 支配性/终结指令/PHI 一致性
...
- 新建 IRVerifier pass:检查非 PHI 指令的 SSA 支配性、基本块终结指令、
PHI 操作数结构
- 提取 DominatorTree 类到独立头文件,供验证器复用
- User 新增 ClearOperands() 方法,用于重建操作数列表
- 修复 CFGSimplify 两处 PHI 清理遗漏:
1. 常量条件分支简化后,死目标的 PHI 未移除原前驱条目
2. 不可达块删除时,PHI 中部分不可达前驱条目未清理
- 验证器仅在 Debug 模式生效(#ifndef NDEBUG)
- 快门禁:functional 86/87 通过,h_functional 30/31 通过
(1 例预置超时/段错误,非本次引入)
5 days ago
lzkk
c12b6830b8
fix(regalloc): MAX_SPILL_ROUNDS=1 + 保守修复阈值 20→200,修复 spill 错误代码
...
根因:block-level liveness 下多轮 spill 创建的 reload vreg 与保守修复
(block_defs 全干涉)交互,产生错误寄存器分配,导致段错误/输出不匹配。
修复:
- MAX_SPILL_ROUNDS 3→1:防止多轮 spill 产生错误 reload vreg
- 保守修复阈值 20→200:避免过度干涉导致图着色错误分配
修复用例:
- 04_arr_defn3:段错误 → 正确 (14)
- 05_arr_defn4:错误输出 → 正确 (21)
- 09_BFS:bad_alloc/段错误 → 正确
- 13_LCA、54_hidden_var 等多个预存故障也一并修复
剩余已知问题:84_long_array2(编译超时)、30_many_dimensions(GEP偏移)
5 days ago
lzkk
d238777f17
fix(regalloc): 根除 spill 代码指数级膨胀——MAX_SPILL_ROUNDS 统一为 3
...
根因:MAX_SPILL_ROUNDS 在 vreg≤120 的函数上为 10,导致每轮 spill
数量翻倍(14→25→48→94→186→370→738→1474→2946→5890),
67-vreg 的 mm1 累计产生 11,785 个 frame slot,帧 138KB,85K 指令。
修复:
- MAX_SPILL_ROUNDS 统一为 3,防止级联膨胀
- 新增 AssignSpillSlots:不重叠活区间的 spilled vreg 共享 frame slot
- RewriteWithAllocation 接收可选 liveness 参数以支持 slot 共享
效果(mm1):529 行(-99.4%),帧 1232 字节(-99.1%)
5 days ago
lzkk
535ab08d32
feat(backend): AsmPrinter 帧基址缓存,避免连续栈访问重复计算地址
...
新增 g_frame_base_offset/g_frame_base_valid 缓存机制:
- PrintStackAccess 尝试复用 x13 中已计算的帧地址
- 相邻访问偏移差在 ldur/stur ±256 或 ldr/str 0~32760 范围内时免重算
- x13 被覆写时(ADRP/EmitAddressFromBase/EmitStackAdjust)自动失效
- 为后续 MIR 层 spill 排序优化提供基础架构
5 days ago
lzkk
5300e2c1ec
fix(hooks): 修复会话崩溃 + 优化开发规范配置
...
- block-destructive.sh: 移除 set -e,补全 git checkout/clean 保护,安全降级空 stdin
- spec-reminder.sh: 精简 ~300→150 字符,减少 token 消耗
- memory-guard.sh: 修复 pgrep 进程匹配模式
- settings.json: PreToolUse matcher 精确化(仅匹配 6 类危险命令),禁用 chrome MCP
- RegAlloc.cpp: MAX_SPILL_ROUNDS 3→5,大 block(>20 defs)全干涉保守修复
- CLAUDE.md: 同步 spill 轮次、新增 shift chain 故障模式、更新工具编排说明
5 days ago
lzkk
2d3a5ff998
perf(backend): Peephole 新增全局变量 store-load 转发和 load CSE
...
StoreGlobal 后紧跟 LoadGlobal 同一符号时,同寄存器则删除 load,
否则转为 MovReg。LoadGlobal 连续出现时同样处理。
shuffle -6, conv2d -3, crypto -3, h-9 -3。总计 -15 条,零退化。
5 days ago
lzkk
b2b7210f11
perf(backend): 除法/取模统一使用 sdiv,删除2的幂次移位序列
...
AArch64 sdiv+msub 比移位序列(add+cmp+csel+asr)短2-4条指令。
删除 DivRR/ModRR 约150行的2的幂次移位代码,统一走 sdiv。
新增 x%1==0 / x%-1==0 优化。
crypto -249, huffman -186, crc -84, fft -72, h-9 -42,
many_mat_cal -24, 03_sort -24, h-1 -21, conv2d -21,
transpose -12, sl -3。总计 -735 条。
matmul +3 在容忍范围内。
5 days ago
lzkk
befdca6451
perf(backend): 叶函数跳过帧设置,节省 x29/x30 保存/恢复
...
MachineFunction 添加 HasCall 标记,Lowering 在发射 Call 时设置。
叶函数无帧且无 callee-saved 寄存器的函数完全跳过 prologue/epilogue;
有帧叶函数改用 str/ldr x29 替代 stp/ldp x29,x30。
huffman -93, crypto -54, conv2d -45, crc -27, h-9 -27,
03_sort -18, opt_scheduling -18, h-4 -12, fft -9, shuffle -9。
总计 -312 条,零退化。
5 days ago
lzkk
854168fb4e
perf(backend): 消除连续全局变量访问的冗余 ADRP
...
AsmPrinter 添加 ADRP 缓存,同符号连续访问时跳过重复的页面地址装载。
x13 被非全局访问路径使用时失效缓存;基本块入口重置。
shuffle -48, crypto -27, conv2d -21, fft -12, huffman -9, h-9 -9,
03_sort -6, h-8 -3。总计 -135 条,零退化。
5 days ago
lzkk
acdac5391d
fix(backend): EmitLargeImmediate 跳过前导零,避免冗余 movz #0
...
32-bit 立即数低 16 位为零时(如 0x00020000),直接发射移位
movz 而非 movz #0 + movk 双指令。crypto -7, fft -2, h-4 -1,
h-10 -1,总计 -33 条,零退化。
5 days ago
lzkk
bb58aac749
fix(mem2reg): 添加大参数函数安全门禁,修复 87_many_params
...
Mem2Reg 在处理含大量 alloca 的递归函数时会产生错误的 SSA 形式,
导致降级阶段生成错误代码(参数转发偏移不正确)。
修复:当 promotable alloca 数量 >24 时跳过 Mem2Reg,保留栈分配方式。
该门禁不影响正常小函数的 SSA 优化。
测试结果:
- functional: 87/88 → 100/100 (87_many_params 修复)
- h_functional: 30/31 (30_many_dimensions 仍失败,已知 GEP 降级 bug)
5 days ago
lzkk
fccd935a24
feat(backend): 新增 AddImm/SubImm 操作码,消除冗余 MovImm
...
AArch64 add/sub 支持 12 位立即数,但 MIR 只有 AddRR/SubRR,
导致 RHS 为常量时需先 MovImm 再 RR 运算。本次修改:
- MIR.h:新增 AddImm、SubImm 操作码
- Lowering.cpp:Add/Sub 降级时 RHS 为 0-4095 常量直接用 AddImm/SubImm
- RegAlloc.cpp:AddImm/SubImm 复用 AddRR/SubRR 的 def-use 分析
- AsmPrinter.cpp:通用打印机自动处理 Imm 操作数(#value)
效果(对比 CmpImm 基线):
- sl1-3: 261→247 (-14, -5.4%)
- huffman-01-03: 792→790 (-2)
- h-5-01-03: 341→338 (-3)
- 全 60 个性能用例总减少 55 行
- 功能测试 0 新故障
更新:优化记录.md 新增条目,基线自动更新
5 days ago
lzkk
bd7dcedb2a
feat(backend): ICmp 降级常量折叠到 CmpImm,消除冗余 MovImm
...
在 Lowering 的两个 ICmp 路径中,当比较操作数为常量且值在 0-4095
范围内时,直接使用 CmpImm 而非 MovImm+CmpRR。LHS 为常量时自动
交换操作数并反转条件码(SwapCondCode)。
性能测试(20 个代表性用例):
- 13 个改善(-1 到 -25 条指令)
- 6 个不变
- 1 个轻微退化(h-5,+1 条,+0.3%,在容忍范围内)
- 总减少 91 条指令(-1.1%)
同步更新:CLAUDE.md 完整开发规范、指令数基线初始化、
.claude/hooks 执行保障系统。
5 days ago
黄熙哲
6b9cf3a448
fix(backend): add x16/x17 to GP allocatable set to fix segfaults
...
Adding x16 and x17 (IP0/IP1, caller-saved) increases GP registers
from 16 to 18, reducing register pressure for large functions.
Fixes segfaults: 39_fp_params (64 params), 30_many_dimensions (2MB frame).
Also improves performance: crc -8, fft0 -4, huffman -12, sl -1 etc.
6 days ago
黄熙哲
5902060dae
fix(backend): lower coalesce skip threshold to fix segfaults
...
Change coalesce skip condition from vregs >150 to:
move_prefs > 100 || vregs * move_prefs > 600
The original threshold of 150 was too coarse — it missed functions
like conv2d (71 vregs, 15 moves) whose coalescing still produces
incorrect spill code. The new product condition catches functions
whose move graph complexity indicates risky coalescing.
Fixes segfaults: conv2d-1/2/3, 65_color, 68_brainfk, 37_dct.
6 days ago
黄熙哲
34cb79449f
fix(backend): skip coalescing for large functions to prevent segfault\n\nFor functions with >150 vregs, discard move_preferences after\ncollection to skip active coalescing. Large functions like\nconv2d, 65_color, 68_brainfk have complex interference graphs\nthat cause coalescing to generate incorrect spill code.\n\nFixes segfaults in: conv2d-1/2/3, 65_color, 68_brainfk, 37_dct.\n\nKnown limitations: 30_many_dimensions and 39_fp_params still\nsegfault (pre-existing original compiler bugs in lowering/RA).\nMinor instruction count changes: h-8 +2.5%, matmul +7% etc.
7 days ago
黄熙哲
b7e78ebd56
fix(backend): AsmPrinter large frame + RegAlloc spill limit\n\nApply only proven-safe fixes on clean baseline:\n- AsmPrinter: movz/movk for large stack offsets (>12KB)\n 30_many_dimensions: 7M -> 1455 lines (99.9% reduction)\n- RegAlloc: limit spill rounds to 3 for large functions (>120 vregs)\n 39_fp_params: >120s -> <1s compilation\n\nZero instruction count regression confirmed.\n57/60 performance tests at historical best baseline.
7 days ago
黄熙哲
cc9f4f9a76
feat(mem2reg): tune PHI threshold to allow Mem2Reg on moderate functions\n\nChange phi_threshold from max(50, block_count) to max(100, block_count*2).\nThe old threshold was too conservative for functions with many allocas\nlike many_mat_cal (~15 allocas, 60 blocks), causing premature skip.\nThe new threshold allows these while still blocking crypto-like functions\nwhere excessive PHI nodes hurt code quality.\n\nmany_mat_cal: -91 lines, matmul: -84 lines, h-8: -97 lines
1 week ago
黄熙哲
06bada3ff5
Merge remote master into local master
1 week ago
黄熙哲
39b7e2ed19
feat(backend): loop-depth weighted spill cost model\n\nAdds DFS-based back-edge detection to compute basic block loop\nnesting depth. Each vreg inherits the max loop depth of its\ndefining blocks. Spill cost multiplies interval+ref by 10^depth,\nmaking loop-carried variables much more expensive to spill.
1 week ago
黄熙哲
993e81363a
fix(backend): recompute degree unconditionally after MergeInto\n\nAfter a merge, u inherits v's neighbors, so degree[u] must always\nbe recomputed. Previously, when degree[u] < K before merge, the\nstale low degree was kept, which could push a high-degree merged\nnode into simplify_worklist with wrong metadata.\n\nAlso remove redundant if(!remaining.empty()) guard in spill path\nand clean up extra brace from removed GiveUpPhase.
1 week ago
黄熙哲
570253f1f2
feat(backend): relax Briggs threshold to 2*K and fix move_adj self-loop\n\nUsing >= 2*K instead of >= K for high-degree neighbor count allows\nmore node pairs to be safely merged. Fixed a bug in MergeInto where\nmove_adj[u] could contain u (self-loop) when v's move set included u,\ncausing iterator invalidation during move_adj cleanup.
1 week ago
黄熙哲
3691da34ee
feat(backend): rewrite main loop with held_nodes release and ReactivatePairs
1 week ago
黄熙哲
0881889ec1
feat(backend): add ReactivatePairs and stale_pairs for coalescing
1 week ago
黄熙哲
07048a123b
feat(backend): separate move-related low-degree nodes into held_nodes
1 week ago
黄熙哲
99fe17fc3f
feat(backend): propagate coalesced node colors in AssignColors\n\nAfter active coalescing, merged_set nodes inherit their representative's\ncolor, ensuring move-related vregs share the same physical register.
1 week ago
黄熙哲
081580ac0a
feat(backend): integrate active coalescing into ColorGraph main loop\n\nReplaces inner simplify while-loop with if-else chain:\nSimplify -> MergePhase -> GiveUpPhase -> Spill.\nLambdas moved outside while loop for clarity.
1 week ago
黄熙哲
0e4f9f1910
feat(backend): add MergePhase and GiveUpPhase for active coalescing\n\nMergePhase uses the Briggs conservative test to safely merge move-related\nnode pairs before coloring. GiveUpPhase abandons moves for low-degree\nnodes when merging is no longer beneficial.
1 week ago
黄熙哲
ca6c2a18c9
feat(backend): add coalesce data structures and helpers to ColorGraph\n\nIntroduces MovePair, move_adj, FindRep, GetRep, HasMovePair as\ninfrastructure for the upcoming Coalesce and Freeze phases.\nModifies simplify loop to skip already-merged nodes via GetRep.
1 week ago
黄熙哲
af71513361
feat(backend): use stp/ldp for callee-saved registers in prologue/epilogue\n\nGroups callee-saved X and S registers and emits paired stp/ldp\ninstructions, reducing save/restore overhead by ~50%. Odd remainders\nstill use str/ldr. Adds fallback else branch for future register types.
1 week ago
安峻邑
cb33c344ac
启动循环优化
1 week ago
安峻邑
860e5edadf
实现循环优化:LICM、强度削弱、循环展开、循环分裂
1 week ago
黄熙哲
e26fd3f520
fix(peephole): remove dead conditional branch inversion code\n\nThe CondBr+Branch inversion pattern was unreachable because the\nsimple Br fallthrough check runs first and removes the Br. Removed\nthe dead code and the unused NegateCondCode helper.
1 week ago
黄熙哲
7490fd3a49
feat(peephole): add branch fallthrough and conditional branch inversion\n\nEliminates unconditional Br when target is the next block in layout.\nInverts CondBr condition when the following Br targets the fallthrough\nblock, eliminating the extra jump.
1 week ago
黄熙哲
1701b2cf51
feat(peephole): merge adjacent zero-value stack stores\n\nWhen str WZR, fi#N and str WZR, fi#N+1 appear consecutively,\nreplaces them with a single str XZR, fi#N (64-bit zero store).
1 week ago
黄熙哲
e44ba819ec
feat(peephole): add store-load forwarding pattern\n\nWhen StoreStack regA, fi#N is immediately followed by LoadStack regB, fi#N\nwith regA != regB, replaces the load with MovReg regB, regA, eliminating\nthe redundant memory access.
1 week ago
黄熙哲
083616e50d
fix(backend): add redundant MovReg elimination on no-spill early-return path\n\nThe MovReg cleanup was only running after the final RewriteWithAllocation\nat the end of the spill loop, missing the early-return path when\nallocation succeeded without spilling. This left behind no-op moves\nlike 'mov x0, x0' that coalescing created.
1 week ago