nudt-compiler-cpp

Commit Graph

Author	SHA1	Message	Date
lzkk	ca6c9fa540	docs: 记录 MAX_SPILL_ROUNDS 修复——mm1 指令数削减 99.4%	5 days ago
lzkk	d238777f17	fix(regalloc): 根除 spill 代码指数级膨胀——MAX_SPILL_ROUNDS 统一为 3 根因：MAX_SPILL_ROUNDS 在 vreg≤120 的函数上为 10，导致每轮 spill 数量翻倍（14→25→48→94→186→370→738→1474→2946→5890）， 67-vreg 的 mm1 累计产生 11,785 个 frame slot，帧 138KB，85K 指令。修复： - MAX_SPILL_ROUNDS 统一为 3，防止级联膨胀 - 新增 AssignSpillSlots：不重叠活区间的 spilled vreg 共享 frame slot - RewriteWithAllocation 接收可选 liveness 参数以支持 slot 共享效果（mm1）：529 行（-99.4%），帧 1232 字节（-99.1%）	5 days ago
lzkk	535ab08d32	feat(backend): AsmPrinter 帧基址缓存，避免连续栈访问重复计算地址新增 g_frame_base_offset/g_frame_base_valid 缓存机制： - PrintStackAccess 尝试复用 x13 中已计算的帧地址 - 相邻访问偏移差在 ldur/stur ±256 或 ldr/str 0~32760 范围内时免重算 - x13 被覆写时（ADRP/EmitAddressFromBase/EmitStackAdjust）自动失效 - 为后续 MIR 层 spill 排序优化提供基础架构	5 days ago
lzkk	3ab88232f7	fix(hooks): Stop hook 改为智能检测——仅在 src/ 有未提交改动时提醒	5 days ago
lzkk	6f14ee1a7a	fix(infra): 编译器资源限制包装器 + 测试脚本超时防护，防止 OOM 闪退多层防护防止编译器内存爆炸（mm1.sy 9.9GB）触发 OOM Killer 导致终端闪退： - compiler-wrapper.sh: 通用包装器，ulimit -v 12GB + timeout 300s - setup-compiler-wrapper.sh: cmake 构建后恢复包装器 - 2026test.sh, verify_asm.sh: 自动检测包装器 + 编译器调用加 timeout build/ 下文件不进版本控制，不影响比赛提交。	5 days ago
lzkk	5300e2c1ec	fix(hooks): 修复会话崩溃 + 优化开发规范配置 - block-destructive.sh: 移除 set -e，补全 git checkout/clean 保护，安全降级空 stdin - spec-reminder.sh: 精简 ~300→150 字符，减少 token 消耗 - memory-guard.sh: 修复 pgrep 进程匹配模式 - settings.json: PreToolUse matcher 精确化（仅匹配 6 类危险命令），禁用 chrome MCP - RegAlloc.cpp: MAX_SPILL_ROUNDS 3→5，大 block(>20 defs)全干涉保守修复 - CLAUDE.md: 同步 spill 轮次、新增 shift chain 故障模式、更新工具编排说明	5 days ago
lzkk	da5d618297	fix(hooks): memory-guard 输出合法 JSON，修复会话崩溃 SessionStart hook 要求 stdout 输出 JSON，但旧版 memory-guard.sh 只写 stderr，stdout 为空，导致 Claude Code hook runner JSON 解析异常后崩溃。改为输出 {"continue": true} 并将警告注入 additionalContext。	5 days ago
lzkk	2d3a5ff998	perf(backend): Peephole 新增全局变量 store-load 转发和 load CSE StoreGlobal 后紧跟 LoadGlobal 同一符号时，同寄存器则删除 load，否则转为 MovReg。LoadGlobal 连续出现时同样处理。 shuffle -6, conv2d -3, crypto -3, h-9 -3。总计 -15 条，零退化。	5 days ago
lzkk	b2b7210f11	perf(backend): 除法/取模统一使用 sdiv，删除2的幂次移位序列 AArch64 sdiv+msub 比移位序列(add+cmp+csel+asr)短2-4条指令。删除 DivRR/ModRR 约150行的2的幂次移位代码，统一走 sdiv。新增 x%1==0 / x%-1==0 优化。 crypto -249, huffman -186, crc -84, fft -72, h-9 -42, many_mat_cal -24, 03_sort -24, h-1 -21, conv2d -21, transpose -12, sl -3。总计 -735 条。 matmul +3 在容忍范围内。	5 days ago
lzkk	befdca6451	perf(backend): 叶函数跳过帧设置，节省 x29/x30 保存/恢复 MachineFunction 添加 HasCall 标记，Lowering 在发射 Call 时设置。叶函数无帧且无 callee-saved 寄存器的函数完全跳过 prologue/epilogue；有帧叶函数改用 str/ldr x29 替代 stp/ldp x29,x30。 huffman -93, crypto -54, conv2d -45, crc -27, h-9 -27, 03_sort -18, opt_scheduling -18, h-4 -12, fft -9, shuffle -9。总计 -312 条，零退化。	5 days ago
lzkk	854168fb4e	perf(backend): 消除连续全局变量访问的冗余 ADRP AsmPrinter 添加 ADRP 缓存，同符号连续访问时跳过重复的页面地址装载。 x13 被非全局访问路径使用时失效缓存；基本块入口重置。 shuffle -48, crypto -27, conv2d -21, fft -12, huffman -9, h-9 -9, 03_sort -6, h-8 -3。总计 -135 条，零退化。	5 days ago
lzkk	acdac5391d	fix(backend): EmitLargeImmediate 跳过前导零，避免冗余 movz #0 32-bit 立即数低 16 位为零时（如 0x00020000），直接发射移位 movz 而非 movz #0 + movk 双指令。crypto -7, fft -2, h-4 -1, h-10 -1，总计 -33 条，零退化。	5 days ago
lzkk	bb58aac749	fix(mem2reg): 添加大参数函数安全门禁，修复 87_many_params Mem2Reg 在处理含大量 alloca 的递归函数时会产生错误的 SSA 形式，导致降级阶段生成错误代码（参数转发偏移不正确）。修复：当 promotable alloca 数量 >24 时跳过 Mem2Reg，保留栈分配方式。该门禁不影响正常小函数的 SSA 优化。测试结果： - functional: 87/88 → 100/100 (87_many_params 修复) - h_functional: 30/31 (30_many_dimensions 仍失败，已知 GEP 降级 bug)	5 days ago
lzkk	fccd935a24	feat(backend): 新增 AddImm/SubImm 操作码，消除冗余 MovImm AArch64 add/sub 支持 12 位立即数，但 MIR 只有 AddRR/SubRR，导致 RHS 为常量时需先 MovImm 再 RR 运算。本次修改： - MIR.h：新增 AddImm、SubImm 操作码 - Lowering.cpp：Add/Sub 降级时 RHS 为 0-4095 常量直接用 AddImm/SubImm - RegAlloc.cpp：AddImm/SubImm 复用 AddRR/SubRR 的 def-use 分析 - AsmPrinter.cpp：通用打印机自动处理 Imm 操作数（#value）效果（对比 CmpImm 基线）： - sl1-3: 261→247 (-14, -5.4%) - huffman-01-03: 792→790 (-2) - h-5-01-03: 341→338 (-3) - 全 60 个性能用例总减少 55 行 - 功能测试 0 新故障更新：优化记录.md 新增条目，基线自动更新	5 days ago
lzkk	bd7dcedb2a	feat(backend): ICmp 降级常量折叠到 CmpImm，消除冗余 MovImm 在 Lowering 的两个 ICmp 路径中，当比较操作数为常量且值在 0-4095 范围内时，直接使用 CmpImm 而非 MovImm+CmpRR。LHS 为常量时自动交换操作数并反转条件码（SwapCondCode）。性能测试（20 个代表性用例）： - 13 个改善（-1 到 -25 条指令） - 6 个不变 - 1 个轻微退化（h-5，+1 条，+0.3%，在容忍范围内） - 总减少 91 条指令（-1.1%）同步更新：CLAUDE.md 完整开发规范、指令数基线初始化、 .claude/hooks 执行保障系统。	6 days ago
黄熙哲	6b9cf3a448	fix(backend): add x16/x17 to GP allocatable set to fix segfaults Adding x16 and x17 (IP0/IP1, caller-saved) increases GP registers from 16 to 18, reducing register pressure for large functions. Fixes segfaults: 39_fp_params (64 params), 30_many_dimensions (2MB frame). Also improves performance: crc -8, fft0 -4, huffman -12, sl -1 etc.	6 days ago
黄熙哲	5902060dae	fix(backend): lower coalesce skip threshold to fix segfaults Change coalesce skip condition from vregs >150 to: move_prefs > 100 \|\| vregs * move_prefs > 600 The original threshold of 150 was too coarse — it missed functions like conv2d (71 vregs, 15 moves) whose coalescing still produces incorrect spill code. The new product condition catches functions whose move graph complexity indicates risky coalescing. Fixes segfaults: conv2d-1/2/3, 65_color, 68_brainfk, 37_dct.	6 days ago
黄熙哲	34cb79449f	fix(backend): skip coalescing for large functions to prevent segfault\n\nFor functions with >150 vregs, discard move_preferences after\ncollection to skip active coalescing. Large functions like\nconv2d, 65_color, 68_brainfk have complex interference graphs\nthat cause coalescing to generate incorrect spill code.\n\nFixes segfaults in: conv2d-1/2/3, 65_color, 68_brainfk, 37_dct.\n\nKnown limitations: 30_many_dimensions and 39_fp_params still\nsegfault (pre-existing original compiler bugs in lowering/RA).\nMinor instruction count changes: h-8 +2.5%, matmul +7% etc.	7 days ago
黄熙哲	a84ffd210b	chore: simplify baseline to single-column historical minimum\n\nRemove source baseline concept. Each test now tracks only its\nbest-ever instruction count. count_asm.sh updated to directly\nupdate baseline when a new lower value is found.	7 days ago
黄熙哲	b7e78ebd56	fix(backend): AsmPrinter large frame + RegAlloc spill limit\n\nApply only proven-safe fixes on clean baseline:\n- AsmPrinter: movz/movk for large stack offsets (>12KB)\n 30_many_dimensions: 7M -> 1455 lines (99.9% reduction)\n- RegAlloc: limit spill rounds to 3 for large functions (>120 vregs)\n 39_fp_params: >120s -> <1s compilation\n\nZero instruction count regression confirmed.\n57/60 performance tests at historical best baseline.	7 days ago
黄熙哲	2e368f86cf	chore: update instruction count baseline after Mem2Reg threshold tuning\n\nKey improvements from PHI threshold relaxation:\n- many_mat_cal: 523->432 (-91 lines, 17.4%)\n- h-8: 504->407 (-97 lines, 19.2%)\n- matmul: 450->366 (-84 lines, 18.7%)\n\nCrypto and other complex functions unaffected (correctly skipped).	1 week ago
黄熙哲	cc9f4f9a76	feat(mem2reg): tune PHI threshold to allow Mem2Reg on moderate functions\n\nChange phi_threshold from max(50, block_count) to max(100, block_count*2).\nThe old threshold was too conservative for functions with many allocas\nlike many_mat_cal (~15 allocas, 60 blocks), causing premature skip.\nThe new threshold allows these while still blocking crypto-like functions\nwhere excessive PHI nodes hurt code quality.\n\nmany_mat_cal: -91 lines, matmul: -84 lines, h-8: -97 lines	1 week ago
黄熙哲	d5d8924050	chore: update instruction count baseline after loop optimizations merge\n\nAdditional reductions from loop IR passes:\n- conv2d: 657->629 (-28), fft: 619->605 (-14)\n- huffman: 849->829 (-20), sl: 280->264 (-16)\n- knapsack: 175->167 (-8), transpose: 211->207 (-4)\n- 01_mm: 313->310 (-3), h-10: 335->329 (-6)\n\nRestore CLAUDE.md deleted during merge.	1 week ago
黄熙哲	06bada3ff5	Merge remote master into local master	1 week ago
黄熙哲	39b7e2ed19	feat(backend): loop-depth weighted spill cost model\n\nAdds DFS-based back-edge detection to compute basic block loop\nnesting depth. Each vreg inherits the max loop depth of its\ndefining blocks. Spill cost multiplies interval+ref by 10^depth,\nmaking loop-carried variables much more expensive to spill.	1 week ago
黄熙哲	993e81363a	fix(backend): recompute degree unconditionally after MergeInto\n\nAfter a merge, u inherits v's neighbors, so degree[u] must always\nbe recomputed. Previously, when degree[u] < K before merge, the\nstale low degree was kept, which could push a high-degree merged\nnode into simplify_worklist with wrong metadata.\n\nAlso remove redundant if(!remaining.empty()) guard in spill path\nand clean up extra brace from removed GiveUpPhase.	1 week ago
黄熙哲	bef03ec220	chore: update instruction count baseline after Module D rewrite\n\n54/60 performance tests reduced. Key improvements:\n- conv2d: -95 lines (12.6%)\n- huffman: -44 lines (4.9%)\n- fft: -39 lines (5.9%)\n- crc: -38 lines (11.6%)\n- 03_sort: -28 lines (4.2%)\n- 01_mm: -22 lines (6.6%)\n\nAlso fix count_asm.sh sed to match any current value.	1 week ago
黄熙哲	570253f1f2	feat(backend): relax Briggs threshold to 2K and fix move_adj self-loop\n\nUsing >= 2K instead of >= K for high-degree neighbor count allows\nmore node pairs to be safely merged. Fixed a bug in MergeInto where\nmove_adj[u] could contain u (self-loop) when v's move set included u,\ncausing iterator invalidation during move_adj cleanup.	1 week ago
黄熙哲	3691da34ee	feat(backend): rewrite main loop with held_nodes release and ReactivatePairs	1 week ago
黄熙哲	0881889ec1	feat(backend): add ReactivatePairs and stale_pairs for coalescing	1 week ago
黄熙哲	07048a123b	feat(backend): separate move-related low-degree nodes into held_nodes	1 week ago
黄熙哲	99fe17fc3f	feat(backend): propagate coalesced node colors in AssignColors\n\nAfter active coalescing, merged_set nodes inherit their representative's\ncolor, ensuring move-related vregs share the same physical register.	1 week ago
黄熙哲	081580ac0a	feat(backend): integrate active coalescing into ColorGraph main loop\n\nReplaces inner simplify while-loop with if-else chain:\nSimplify -> MergePhase -> GiveUpPhase -> Spill.\nLambdas moved outside while loop for clarity.	1 week ago
黄熙哲	0e4f9f1910	feat(backend): add MergePhase and GiveUpPhase for active coalescing\n\nMergePhase uses the Briggs conservative test to safely merge move-related\nnode pairs before coloring. GiveUpPhase abandons moves for low-degree\nnodes when merging is no longer beneficial.	1 week ago
黄熙哲	ca6c2a18c9	feat(backend): add coalesce data structures and helpers to ColorGraph\n\nIntroduces MovePair, move_adj, FindRep, GetRep, HasMovePair as\ninfrastructure for the upcoming Coalesce and Freeze phases.\nModifies simplify loop to skip already-merged nodes via GetRep.	1 week ago
黄熙哲	560f565a51	chore: update instruction count baseline after Module B stp/ldp\n\nAlso modify count_asm.sh to auto-update baseline when instruction\ncounts decrease below the recorded values.	1 week ago
黄熙哲	af71513361	feat(backend): use stp/ldp for callee-saved registers in prologue/epilogue\n\nGroups callee-saved X and S registers and emits paired stp/ldp\ninstructions, reducing save/restore overhead by ~50%. Odd remainders\nstill use str/ldr. Adds fallback else branch for future register types.	1 week ago
安峻邑	cb33c344ac	启动循环优化	1 week ago
安峻邑	b93e81ce74	循环优化	1 week ago
安峻邑	4bc21faf61	循环优化	1 week ago
安峻邑	d07bf9f0d2	循环优化	1 week ago
安峻邑	81b5c2a2b0	循环优化	1 week ago
安峻邑	860e5edadf	实现循环优化：LICM、强度削弱、循环展开、循环分裂	1 week ago
黄熙哲	e26fd3f520	fix(peephole): remove dead conditional branch inversion code\n\nThe CondBr+Branch inversion pattern was unreachable because the\nsimple Br fallthrough check runs first and removes the Br. Removed\nthe dead code and the unused NegateCondCode helper.	1 week ago
黄熙哲	7490fd3a49	feat(peephole): add branch fallthrough and conditional branch inversion\n\nEliminates unconditional Br when target is the next block in layout.\nInverts CondBr condition when the following Br targets the fallthrough\nblock, eliminating the extra jump.	1 week ago
黄熙哲	1701b2cf51	feat(peephole): merge adjacent zero-value stack stores\n\nWhen str WZR, fi#N and str WZR, fi#N+1 appear consecutively,\nreplaces them with a single str XZR, fi#N (64-bit zero store).	1 week ago
黄熙哲	e44ba819ec	feat(peephole): add store-load forwarding pattern\n\nWhen StoreStack regA, fi#N is immediately followed by LoadStack regB, fi#N\nwith regA != regB, replaces the load with MovReg regB, regA, eliminating\nthe redundant memory access.	1 week ago
黄熙哲	083616e50d	fix(backend): add redundant MovReg elimination on no-spill early-return path\n\nThe MovReg cleanup was only running after the final RewriteWithAllocation\nat the end of the spill loop, missing the early-return path when\nallocation succeeded without spilling. This left behind no-op moves\nlike 'mov x0, x0' that coalescing created.	1 week ago
黄熙哲	6f829c30f9	feat(backend): eliminate redundant MovReg after register allocation\n\nScans all blocks after RewriteWithAllocation and removes MovReg\ninstructions where source and destination are the same physical\nregister. This cleans up cases where move coalescing successfully\nassigned the same register to both sides.	1 week ago
黄熙哲	4bdca3f722	feat(backend): move coalescing via color preference and phi cycle breaking\n\nCollects move_preferences from MovReg instructions and uses them\nduring color selection to prefer the same physical register for\nmove-related virtual registers. Detects and breaks cycles in move\npreference chains to ensure correctness.	1 week ago

1 2 3 4 5

248 Commits (ca6c9fa5407edbc1bb9d91a4697dbeff954e91d2) All Branches Search

248 Commits (ca6c9fa5407edbc1bb9d91a4697dbeff954e91d2)

All Branches