nudt-compiler-cpp

Commit Graph

Author	SHA1	Message	Date
lzkk	d6f42a2a2e	fix(mir): 回退到稳定版本——PhysReg映射+spill reload+别名无条件冲突稳定在94%正确率。包含三个关键修复： 1. PhysReg映射修正: Ptr +31, Float +62, FP_ALLOCATABLE排除S0-S1 2. Spill reload为每次use创建新vreg 3. GPR32/GPR64别名无条件冲突 Mem2Reg交互问题（65_color, 94_nested_loops）已精确诊断为：高寄存器压力→未分配vreg→共享回退寄存器X16→地址计算错误→SIGSEGV 完整修复需要LLVM SplitKit级别的活范围区域分裂。	3 days ago
lzkk	0f1b545568	fix(mir): 分配顺序改为FirstUsePos + 别名无条件冲突 + per-round reserve 三处改进： 1. 新vreg按FirstUsePos升序分配（指令顺序），确保重叠vreg在干涉检查时可见替代之前的Length()排序（弦图最优序近似） 2. GPR32/GPR64别名无条件冲突（移除segments.empty()条件） 3. 每轮EnhancerIntervals后reserve *16 防止push_back指针失效 Mem2Reg问题已精确诊断：高寄存器压力下，多个Ptr vreg的重叠段未被子涉检测捕获，导致地址计算base和index共享同一物理寄存器。当前正确率：94% (94/100)，剩余2个Mem2Reg交互失败+4个预存缺陷。	3 days ago
lzkk	83228a8123	fix(mir): GPR32/GPR64别名检查移除segments.empty()条件 Wn和Xn是同一硬件寄存器（如W12和X12），GPR32和GPR64 vreg 在同一phys_reg上总是冲突，无论segments是否为空。原代码要求!segments.empty()，导致某些情况下别名冲突被漏检。参考: AArch64寄存器别名机制，Wn为Xn的低32位	3 days ago
lzkk	5fb106bde8	fix(mir): LLVM两遍分配 + 间隙分裂 + Assign安全网改进： 1. 两遍分配（Pass1短活范围优先→Pass2延迟vreg驱逐/分裂） — SSA弦图完美消除序近似：短范围先分配，长范围获得清晰干涉图 2. TrySplit改间隙分裂：在连续use间最大间隙处分裂，替代简单中点切分 — 参考 LLVM tryLocalSplit() 3. Assign返回bool + ForceAssign（预填跳过检查） — 防止非法分配通过的安全网诊断：Mem2Reg开启时，CheckInterference在某些情况下漏过重叠检测，导致冲突vreg分配到同一物理寄存器。具体触发条件待进一步定位。当前正确率：94% (94/100)	4 days ago
lzkk	508f9d8ddc	fix(mir): TrySplit引用失效修复 + LLVM Defer机制 Three fixes: 1. TrySplit: 参数从 LiveInterval& 改为 vreg索引，避免 push_back 使引用失效 — 旧代码在 intervals.push_back() 后继续使用 li 引用（UB），高寄存器压力下导致段错误 2. LLVM Defer机制: 首次无法分配时将 vreg 推迟到堆尾（RS_New→RS_Deferred） — 让更小范围先分配以获得更清晰干涉图 — 参考: llvm/lib/CodeGen/RegAllocGreedy.cpp selectOrSplit() 3. LiveInterval 新增 deferred_count 字段追踪推迟状态诊断: 65_color/94_nested_loops 在 Mem2Reg 开启时失败，关闭时通过。 Mem2Reg 消除栈分配（alloca→SSA），增加同时活跃 vreg 数量，触发高压 spill。根因追踪到 TrySplit 的引用失效（与之前 heap 指针失效同类 bug）。当前: 94% (94/100)，剩余6失败详见 project_greedy_alloc_progress.md	4 days ago
lzkk	80dc583143	fix(mir): PhysReg映射修正 + spill reload分配独立vreg——避免寄存器冲突三处关键修复： 1. PhysReg映射偏移修正：Ptr +32→+31 (Wn→Xn)、Float +64→+62 (Sn-index→Sn) — 原偏移使Ptr映射到X(n+1)，如W12→X13，与AsmPrinter scratch X13冲突 2. FP_ALLOCATABLE排除S0-S1（参数/返回值寄存器），对应+62映射 3. Spill reload为每次use创建新vreg（LLVM InlineSpiller风格） — 旧方案所有溢出vreg共享回退寄存器W16/X16，同时活跃时互相覆盖参考：LLVM LiveRegMatrix foreachUnit + InlineSpiller reload 正确率：90% → 94%（修复64_calculator/65_color/85_long_code/88_many_params2/96_matrix_add）	4 days ago
lzkk	ddaf8831a2	fix(mir): CMakeLists.txt 改用 GreedyAlloc.cpp 替代 LinearScanAlloc.cpp	4 days ago
lzkk	da1e456133	feat(mir): 实现 LLVM-style 贪婪寄存器分配器 —— 统一架构核心变更： - MIR.h: 增强 LiveInterval（VNInfo/UsePosition/Segment）+ LiveRegMatrix + RegClass - GreedyAlloc.cpp: TryAssign/TryAnyFreeReg/TryEvict/TrySplit 贪婪分配 + RewriteSpills - InstLiveness.cpp: EnhanceIntervals 前向 pass + ComputeInstLiveness 适配 - MIRBasicBlock.cpp: InsertInst/ReplaceVReg API - main.cpp: 切换至 RunGreedyRegAlloc - RegAlloc.cpp/LinearScanAlloc.cpp: #if 0 隔离架构：优先级队列驱动分配（每轮全新分配），TryEvict 无条件驱逐， StoreStack+LoadStack 溢出重写，区间分裂处理高寄存器压力。功能测试通过率: 53/100（剩余 47 例需调试溢出重写循环）	4 days ago
lzkk	0a29e6ac42	fix(mir): AsmPrinter 函数调用后失效帧基址缓存——修复 92_register_alloc Call 指令仅失效了 ADRP 缓存但未失效帧基址缓存（x13）。x13 是 caller-saved 寄存器，被调用破坏后后续栈访问使用垃圾地址。影响所有含函数调用的大栈帧函数。修复: InvalidateFrameBase() 添加到 Opcode::Call 分支。	4 days ago
lzkk	363b809736	fix(mir): 大栈帧 asm 输出 x13 缓存失效 + 叶函数栈参数偏移修复 + IR 数组初始化阈值降低 - AsmPrinter: 大偏移量 movz/movk 路径中使用 x13 后失效帧基址缓存和 ADRP 缓存 - FrameLowering: 叶函数（仅保存 x29）栈参数偏移从 16 修正为 8 - IRGenDecl: 数组初始化阈值从 10000 降至 256，避免大数组 IR 膨胀导致后端超时	4 days ago
lzkk	120d7197d8	fix(mir): 线性扫描活变量分析 def 位置记录 + callee-saved 寄存器限定 + CLI 标志修复 - InstLiveness: 反向扫描中记录 def vreg 的区间起点，修复 phi copy 的 MovReg def 位置未被区间覆盖导致寄存器分配不一致的 bug（25_while_if 死循环） - LinearScanAlloc: GP_ALLOCATABLE 限定为 callee-saved 寄存器（x19-x28），避免跨函数调用时 caller-saved 寄存器被破坏导致段错误（54_hidden_var） - CLI: 修复 --regalloc= 标志 strncmp 长度 off-by-one（12→11）	4 days ago
lzkk	e1777c9eab	fix(ir): CSE 安全门禁——非 SSA 函数跳过 Load/GEP CSE - 当 alloca 数量 > 24（Mem2Reg 跳过）时，跳过 Load 和 GEP 的 CSE - 修复 27_scope5 优化 bug（非 SSA 代码中 Load CSE 错误合并不同作用域变量） - 86_long_code2 不受影响（仅 1 个 alloca，Load/GEP CSE 正常启用）	4 days ago
lzkk	28c336728d	fix(mir): 线性扫描区间分裂修复 + 多定义点 vreg 跳过逻辑修复 - 区间分裂: last.end = cur.start (而非 cur.end)，确保 save point 之后寄存器值正确 - 多定义点 vreg: 改为按区间覆盖检查，支持 phi-copy 插入的多定义场景 - 30_many_dimensions 已修复（19D 嵌套循环输出正确） - 25_while_if 循环变量映射仍有 bug，待进一步修复	4 days ago
lzkk	fbea91986d	feat(mir): 指令级活变量分析 + CLI/构建集成 - InstLiveness: 三阶段算法（块级 fixpoint + 指令反向扫描 + 区间构建） - 支持 phi-copy 插入环境（非严格 SSA）：union 区间 - CLI.h 添加 regalloc 字段，支持 --regalloc=linear/graphcoloring	4 days ago
lzkk	8f3012cd9f	fix(ir): CSE 扩展支持 LoadInst 和 GEPInst——修复 86_long_code2 编译超时 - ExprKey 从固定双操作数改为变长操作数向量，支持 Load/GEP/Binary - IsCSECandidate 新增 Load 和 GEP 指令类型 - Store 指令失效对应地址的 Load 缓存，保证语义安全 - 86_long_code2: 编译从 >300s 超时降至 ~3s 完成，GEP+Load 从 8000 对降至 2 对	4 days ago
lzkk	28ad162de4	feat(mir): 线性扫描寄存器分配初始实现（WIP，--regalloc=linear 可用） - Wimmer & Mössenböck (2005) 优化区间分割算法 - 685 行，支持 GP/FP 寄存器池 - 目前通过简单用例，循环函数有寄存器映射 bug（25_while_if 无限循环） - 默认仍使用图着色，线性扫描可通过 CLI 切换	4 days ago
lzkk	a9ebfdc0e0	feat(mir): 添加指令级活变量分析——精确 [start,end] 区间 - 三阶段算法：块级 backward dataflow fixpoint + 指令级反向扫描 + 区间构建 - 支持 phi-copy 插入（非严格 SSA）：union 区间 - 31 种 MIR opcode 的 def 判定（HasVRegDef） - 用于线性扫描寄存器分配的精确活跃区间输入	4 days ago
lzkk	6c5441ff43	feat(mir): 添加 MIR 验证器和寄存器分配验证器 - MIRVerifier: vreg 单定义（逐块）检查、块终止指令检查 - RegAllocVerifier: 物理寄存器有效性检查（范围 0-96） - Debug 构建中每个 MIR pass 后自动运行（#ifndef NDEBUG）	4 days ago
lzkk	fb77d7e03c	chore(ir): 清理 IRVerifier 死代码和注释	4 days ago
lzkk	0b589c77da	feat(ir): 添加 IR 验证器，校验 SSA 支配性/终结指令/PHI 一致性 - 新建 IRVerifier pass：检查非 PHI 指令的 SSA 支配性、基本块终结指令、 PHI 操作数结构 - 提取 DominatorTree 类到独立头文件，供验证器复用 - User 新增 ClearOperands() 方法，用于重建操作数列表 - 修复 CFGSimplify 两处 PHI 清理遗漏： 1. 常量条件分支简化后，死目标的 PHI 未移除原前驱条目 2. 不可达块删除时，PHI 中部分不可达前驱条目未清理 - 验证器仅在 Debug 模式生效（#ifndef NDEBUG） - 快门禁：functional 86/87 通过，h_functional 30/31 通过（1 例预置超时/段错误，非本次引入）	4 days ago
lzkk	ef6eedee83	fix(infra): count_asm.sh 使用相对路径替代硬编码路径原硬编码 /home/vega/compile/compiler/nudt-compiler-cpp 导致在其他机器上输出全 0。改为自动检测脚本所在目录。	5 days ago
lzkk	c12b6830b8	fix(regalloc): MAX_SPILL_ROUNDS=1 + 保守修复阈值 20→200，修复 spill 错误代码根因：block-level liveness 下多轮 spill 创建的 reload vreg 与保守修复（block_defs 全干涉）交互，产生错误寄存器分配，导致段错误/输出不匹配。修复： - MAX_SPILL_ROUNDS 3→1：防止多轮 spill 产生错误 reload vreg - 保守修复阈值 20→200：避免过度干涉导致图着色错误分配修复用例： - 04_arr_defn3：段错误 → 正确 (14) - 05_arr_defn4：错误输出 → 正确 (21) - 09_BFS：bad_alloc/段错误 → 正确 - 13_LCA、54_hidden_var 等多个预存故障也一并修复剩余已知问题：84_long_array2（编译超时）、30_many_dimensions（GEP偏移）	5 days ago
lzkk	ca6c9fa540	docs: 记录 MAX_SPILL_ROUNDS 修复——mm1 指令数削减 99.4%	5 days ago
lzkk	d238777f17	fix(regalloc): 根除 spill 代码指数级膨胀——MAX_SPILL_ROUNDS 统一为 3 根因：MAX_SPILL_ROUNDS 在 vreg≤120 的函数上为 10，导致每轮 spill 数量翻倍（14→25→48→94→186→370→738→1474→2946→5890）， 67-vreg 的 mm1 累计产生 11,785 个 frame slot，帧 138KB，85K 指令。修复： - MAX_SPILL_ROUNDS 统一为 3，防止级联膨胀 - 新增 AssignSpillSlots：不重叠活区间的 spilled vreg 共享 frame slot - RewriteWithAllocation 接收可选 liveness 参数以支持 slot 共享效果（mm1）：529 行（-99.4%），帧 1232 字节（-99.1%）	5 days ago
lzkk	535ab08d32	feat(backend): AsmPrinter 帧基址缓存，避免连续栈访问重复计算地址新增 g_frame_base_offset/g_frame_base_valid 缓存机制： - PrintStackAccess 尝试复用 x13 中已计算的帧地址 - 相邻访问偏移差在 ldur/stur ±256 或 ldr/str 0~32760 范围内时免重算 - x13 被覆写时（ADRP/EmitAddressFromBase/EmitStackAdjust）自动失效 - 为后续 MIR 层 spill 排序优化提供基础架构	5 days ago
lzkk	3ab88232f7	fix(hooks): Stop hook 改为智能检测——仅在 src/ 有未提交改动时提醒	5 days ago
lzkk	6f14ee1a7a	fix(infra): 编译器资源限制包装器 + 测试脚本超时防护，防止 OOM 闪退多层防护防止编译器内存爆炸（mm1.sy 9.9GB）触发 OOM Killer 导致终端闪退： - compiler-wrapper.sh: 通用包装器，ulimit -v 12GB + timeout 300s - setup-compiler-wrapper.sh: cmake 构建后恢复包装器 - 2026test.sh, verify_asm.sh: 自动检测包装器 + 编译器调用加 timeout build/ 下文件不进版本控制，不影响比赛提交。	5 days ago
lzkk	5300e2c1ec	fix(hooks): 修复会话崩溃 + 优化开发规范配置 - block-destructive.sh: 移除 set -e，补全 git checkout/clean 保护，安全降级空 stdin - spec-reminder.sh: 精简 ~300→150 字符，减少 token 消耗 - memory-guard.sh: 修复 pgrep 进程匹配模式 - settings.json: PreToolUse matcher 精确化（仅匹配 6 类危险命令），禁用 chrome MCP - RegAlloc.cpp: MAX_SPILL_ROUNDS 3→5，大 block(>20 defs)全干涉保守修复 - CLAUDE.md: 同步 spill 轮次、新增 shift chain 故障模式、更新工具编排说明	5 days ago
lzkk	da5d618297	fix(hooks): memory-guard 输出合法 JSON，修复会话崩溃 SessionStart hook 要求 stdout 输出 JSON，但旧版 memory-guard.sh 只写 stderr，stdout 为空，导致 Claude Code hook runner JSON 解析异常后崩溃。改为输出 {"continue": true} 并将警告注入 additionalContext。	5 days ago
lzkk	2d3a5ff998	perf(backend): Peephole 新增全局变量 store-load 转发和 load CSE StoreGlobal 后紧跟 LoadGlobal 同一符号时，同寄存器则删除 load，否则转为 MovReg。LoadGlobal 连续出现时同样处理。 shuffle -6, conv2d -3, crypto -3, h-9 -3。总计 -15 条，零退化。	5 days ago
lzkk	b2b7210f11	perf(backend): 除法/取模统一使用 sdiv，删除2的幂次移位序列 AArch64 sdiv+msub 比移位序列(add+cmp+csel+asr)短2-4条指令。删除 DivRR/ModRR 约150行的2的幂次移位代码，统一走 sdiv。新增 x%1==0 / x%-1==0 优化。 crypto -249, huffman -186, crc -84, fft -72, h-9 -42, many_mat_cal -24, 03_sort -24, h-1 -21, conv2d -21, transpose -12, sl -3。总计 -735 条。 matmul +3 在容忍范围内。	5 days ago
lzkk	befdca6451	perf(backend): 叶函数跳过帧设置，节省 x29/x30 保存/恢复 MachineFunction 添加 HasCall 标记，Lowering 在发射 Call 时设置。叶函数无帧且无 callee-saved 寄存器的函数完全跳过 prologue/epilogue；有帧叶函数改用 str/ldr x29 替代 stp/ldp x29,x30。 huffman -93, crypto -54, conv2d -45, crc -27, h-9 -27, 03_sort -18, opt_scheduling -18, h-4 -12, fft -9, shuffle -9。总计 -312 条，零退化。	5 days ago
lzkk	854168fb4e	perf(backend): 消除连续全局变量访问的冗余 ADRP AsmPrinter 添加 ADRP 缓存，同符号连续访问时跳过重复的页面地址装载。 x13 被非全局访问路径使用时失效缓存；基本块入口重置。 shuffle -48, crypto -27, conv2d -21, fft -12, huffman -9, h-9 -9, 03_sort -6, h-8 -3。总计 -135 条，零退化。	5 days ago
lzkk	acdac5391d	fix(backend): EmitLargeImmediate 跳过前导零，避免冗余 movz #0 32-bit 立即数低 16 位为零时（如 0x00020000），直接发射移位 movz 而非 movz #0 + movk 双指令。crypto -7, fft -2, h-4 -1, h-10 -1，总计 -33 条，零退化。	5 days ago
lzkk	bb58aac749	fix(mem2reg): 添加大参数函数安全门禁，修复 87_many_params Mem2Reg 在处理含大量 alloca 的递归函数时会产生错误的 SSA 形式，导致降级阶段生成错误代码（参数转发偏移不正确）。修复：当 promotable alloca 数量 >24 时跳过 Mem2Reg，保留栈分配方式。该门禁不影响正常小函数的 SSA 优化。测试结果： - functional: 87/88 → 100/100 (87_many_params 修复) - h_functional: 30/31 (30_many_dimensions 仍失败，已知 GEP 降级 bug)	5 days ago
lzkk	fccd935a24	feat(backend): 新增 AddImm/SubImm 操作码，消除冗余 MovImm AArch64 add/sub 支持 12 位立即数，但 MIR 只有 AddRR/SubRR，导致 RHS 为常量时需先 MovImm 再 RR 运算。本次修改： - MIR.h：新增 AddImm、SubImm 操作码 - Lowering.cpp：Add/Sub 降级时 RHS 为 0-4095 常量直接用 AddImm/SubImm - RegAlloc.cpp：AddImm/SubImm 复用 AddRR/SubRR 的 def-use 分析 - AsmPrinter.cpp：通用打印机自动处理 Imm 操作数（#value）效果（对比 CmpImm 基线）： - sl1-3: 261→247 (-14, -5.4%) - huffman-01-03: 792→790 (-2) - h-5-01-03: 341→338 (-3) - 全 60 个性能用例总减少 55 行 - 功能测试 0 新故障更新：优化记录.md 新增条目，基线自动更新	5 days ago
lzkk	bd7dcedb2a	feat(backend): ICmp 降级常量折叠到 CmpImm，消除冗余 MovImm 在 Lowering 的两个 ICmp 路径中，当比较操作数为常量且值在 0-4095 范围内时，直接使用 CmpImm 而非 MovImm+CmpRR。LHS 为常量时自动交换操作数并反转条件码（SwapCondCode）。性能测试（20 个代表性用例）： - 13 个改善（-1 到 -25 条指令） - 6 个不变 - 1 个轻微退化（h-5，+1 条，+0.3%，在容忍范围内） - 总减少 91 条指令（-1.1%）同步更新：CLAUDE.md 完整开发规范、指令数基线初始化、 .claude/hooks 执行保障系统。	5 days ago
黄熙哲	6b9cf3a448	fix(backend): add x16/x17 to GP allocatable set to fix segfaults Adding x16 and x17 (IP0/IP1, caller-saved) increases GP registers from 16 to 18, reducing register pressure for large functions. Fixes segfaults: 39_fp_params (64 params), 30_many_dimensions (2MB frame). Also improves performance: crc -8, fft0 -4, huffman -12, sl -1 etc.	6 days ago
黄熙哲	5902060dae	fix(backend): lower coalesce skip threshold to fix segfaults Change coalesce skip condition from vregs >150 to: move_prefs > 100 \|\| vregs * move_prefs > 600 The original threshold of 150 was too coarse — it missed functions like conv2d (71 vregs, 15 moves) whose coalescing still produces incorrect spill code. The new product condition catches functions whose move graph complexity indicates risky coalescing. Fixes segfaults: conv2d-1/2/3, 65_color, 68_brainfk, 37_dct.	6 days ago
黄熙哲	34cb79449f	fix(backend): skip coalescing for large functions to prevent segfault\n\nFor functions with >150 vregs, discard move_preferences after\ncollection to skip active coalescing. Large functions like\nconv2d, 65_color, 68_brainfk have complex interference graphs\nthat cause coalescing to generate incorrect spill code.\n\nFixes segfaults in: conv2d-1/2/3, 65_color, 68_brainfk, 37_dct.\n\nKnown limitations: 30_many_dimensions and 39_fp_params still\nsegfault (pre-existing original compiler bugs in lowering/RA).\nMinor instruction count changes: h-8 +2.5%, matmul +7% etc.	6 days ago
黄熙哲	a84ffd210b	chore: simplify baseline to single-column historical minimum\n\nRemove source baseline concept. Each test now tracks only its\nbest-ever instruction count. count_asm.sh updated to directly\nupdate baseline when a new lower value is found.	7 days ago
黄熙哲	b7e78ebd56	fix(backend): AsmPrinter large frame + RegAlloc spill limit\n\nApply only proven-safe fixes on clean baseline:\n- AsmPrinter: movz/movk for large stack offsets (>12KB)\n 30_many_dimensions: 7M -> 1455 lines (99.9% reduction)\n- RegAlloc: limit spill rounds to 3 for large functions (>120 vregs)\n 39_fp_params: >120s -> <1s compilation\n\nZero instruction count regression confirmed.\n57/60 performance tests at historical best baseline.	7 days ago
黄熙哲	2e368f86cf	chore: update instruction count baseline after Mem2Reg threshold tuning\n\nKey improvements from PHI threshold relaxation:\n- many_mat_cal: 523->432 (-91 lines, 17.4%)\n- h-8: 504->407 (-97 lines, 19.2%)\n- matmul: 450->366 (-84 lines, 18.7%)\n\nCrypto and other complex functions unaffected (correctly skipped).	1 week ago
黄熙哲	cc9f4f9a76	feat(mem2reg): tune PHI threshold to allow Mem2Reg on moderate functions\n\nChange phi_threshold from max(50, block_count) to max(100, block_count*2).\nThe old threshold was too conservative for functions with many allocas\nlike many_mat_cal (~15 allocas, 60 blocks), causing premature skip.\nThe new threshold allows these while still blocking crypto-like functions\nwhere excessive PHI nodes hurt code quality.\n\nmany_mat_cal: -91 lines, matmul: -84 lines, h-8: -97 lines	1 week ago
黄熙哲	d5d8924050	chore: update instruction count baseline after loop optimizations merge\n\nAdditional reductions from loop IR passes:\n- conv2d: 657->629 (-28), fft: 619->605 (-14)\n- huffman: 849->829 (-20), sl: 280->264 (-16)\n- knapsack: 175->167 (-8), transpose: 211->207 (-4)\n- 01_mm: 313->310 (-3), h-10: 335->329 (-6)\n\nRestore CLAUDE.md deleted during merge.	1 week ago
黄熙哲	06bada3ff5	Merge remote master into local master	1 week ago
黄熙哲	39b7e2ed19	feat(backend): loop-depth weighted spill cost model\n\nAdds DFS-based back-edge detection to compute basic block loop\nnesting depth. Each vreg inherits the max loop depth of its\ndefining blocks. Spill cost multiplies interval+ref by 10^depth,\nmaking loop-carried variables much more expensive to spill.	1 week ago
黄熙哲	993e81363a	fix(backend): recompute degree unconditionally after MergeInto\n\nAfter a merge, u inherits v's neighbors, so degree[u] must always\nbe recomputed. Previously, when degree[u] < K before merge, the\nstale low degree was kept, which could push a high-degree merged\nnode into simplify_worklist with wrong metadata.\n\nAlso remove redundant if(!remaining.empty()) guard in spill path\nand clean up extra brace from removed GiveUpPhase.	1 week ago
黄熙哲	bef03ec220	chore: update instruction count baseline after Module D rewrite\n\n54/60 performance tests reduced. Key improvements:\n- conv2d: -95 lines (12.6%)\n- huffman: -44 lines (4.9%)\n- fft: -39 lines (5.9%)\n- crc: -38 lines (11.6%)\n- 03_sort: -28 lines (4.2%)\n- 01_mm: -22 lines (6.6%)\n\nAlso fix count_asm.sh sed to match any current value.	1 week ago
黄熙哲	570253f1f2	feat(backend): relax Briggs threshold to 2K and fix move_adj self-loop\n\nUsing >= 2K instead of >= K for high-degree neighbor count allows\nmore node pairs to be safely merged. Fixed a bug in MergeInto where\nmove_adj[u] could contain u (self-loop) when v's move set included u,\ncausing iterator invalidation during move_adj cleanup.	1 week ago

1 2 3 4 5 ...

270 Commits (d6f42a2a2efc79ec631546293c5752d1cbd1cf92) All Branches Search

270 Commits (d6f42a2a2efc79ec631546293c5752d1cbd1cf92)

All Branches