pfn7so3we
  • Joined on
Loading Heatmap…

pfn7so3we pushed to hxz at pybqixnm9/nudt-compiler-cpp

  • 4070fb206b chore: sync all local changes and untracked files
  • 67ee77e8d9 Revert "fix(backend): skip graph coloring for functions with >250 vregs"
  • 4be2f32cbb fix(backend): skip graph coloring for functions with >250 vregs
  • 2632202833 fix(backend): unify coalesce skip condition at both sites
  • 2031d8f8f9 fix(backend): use immediate form for small stack adjustments
  • Compare 6 commits »

2 days ago

pfn7so3we pushed to master at pybqixnm9/nudt-compiler-cpp

  • a253ce37d9 refactor(backend): simplify scratch register selection with lambda helper

5 days ago

pfn7so3we pushed to master at pybqixnm9/nudt-compiler-cpp

  • 6b9cf3a448 fix(backend): add x16/x17 to GP allocatable set to fix segfaults
  • 5902060dae fix(backend): lower coalesce skip threshold to fix segfaults
  • 34cb79449f fix(backend): skip coalescing for large functions to prevent segfault\n\nFor functions with >150 vregs, discard move_preferences after\ncollection to skip active coalescing. Large functions like\nconv2d, 65_color, 68_brainfk have complex interference graphs\nthat cause coalescing to generate incorrect spill code.\n\nFixes segfaults in: conv2d-1/2/3, 65_color, 68_brainfk, 37_dct.\n\nKnown limitations: 30_many_dimensions and 39_fp_params still\nsegfault (pre-existing original compiler bugs in lowering/RA).\nMinor instruction count changes: h-8 +2.5%, matmul +7% etc.
  • a84ffd210b chore: simplify baseline to single-column historical minimum\n\nRemove source baseline concept. Each test now tracks only its\nbest-ever instruction count. count_asm.sh updated to directly\nupdate baseline when a new lower value is found.
  • b7e78ebd56 fix(backend): AsmPrinter large frame + RegAlloc spill limit\n\nApply only proven-safe fixes on clean baseline:\n- AsmPrinter: movz/movk for large stack offsets (>12KB)\n 30_many_dimensions: 7M -> 1455 lines (99.9% reduction)\n- RegAlloc: limit spill rounds to 3 for large functions (>120 vregs)\n 39_fp_params: >120s -> <1s compilation\n\nZero instruction count regression confirmed.\n57/60 performance tests at historical best baseline.
  • Compare 5 commits »

6 days ago

pfn7so3we pushed to hxz at pybqixnm9/nudt-compiler-cpp

  • 6b9cf3a448 fix(backend): add x16/x17 to GP allocatable set to fix segfaults
  • 5902060dae fix(backend): lower coalesce skip threshold to fix segfaults
  • 34cb79449f fix(backend): skip coalescing for large functions to prevent segfault\n\nFor functions with >150 vregs, discard move_preferences after\ncollection to skip active coalescing. Large functions like\nconv2d, 65_color, 68_brainfk have complex interference graphs\nthat cause coalescing to generate incorrect spill code.\n\nFixes segfaults in: conv2d-1/2/3, 65_color, 68_brainfk, 37_dct.\n\nKnown limitations: 30_many_dimensions and 39_fp_params still\nsegfault (pre-existing original compiler bugs in lowering/RA).\nMinor instruction count changes: h-8 +2.5%, matmul +7% etc.
  • a84ffd210b chore: simplify baseline to single-column historical minimum\n\nRemove source baseline concept. Each test now tracks only its\nbest-ever instruction count. count_asm.sh updated to directly\nupdate baseline when a new lower value is found.
  • b7e78ebd56 fix(backend): AsmPrinter large frame + RegAlloc spill limit\n\nApply only proven-safe fixes on clean baseline:\n- AsmPrinter: movz/movk for large stack offsets (>12KB)\n 30_many_dimensions: 7M -> 1455 lines (99.9% reduction)\n- RegAlloc: limit spill rounds to 3 for large functions (>120 vregs)\n 39_fp_params: >120s -> <1s compilation\n\nZero instruction count regression confirmed.\n57/60 performance tests at historical best baseline.
  • Compare 5 commits »

6 days ago

pfn7so3we pushed to master at pybqixnm9/nudt-compiler-cpp

  • 5312c30aee docs(mem2reg): fix comment to match actual threshold value
  • 2e368f86cf chore: update instruction count baseline after Mem2Reg threshold tuning\n\nKey improvements from PHI threshold relaxation:\n- many_mat_cal: 523->432 (-91 lines, 17.4%)\n- h-8: 504->407 (-97 lines, 19.2%)\n- matmul: 450->366 (-84 lines, 18.7%)\n\nCrypto and other complex functions unaffected (correctly skipped).
  • cc9f4f9a76 feat(mem2reg): tune PHI threshold to allow Mem2Reg on moderate functions\n\nChange phi_threshold from max(50, block_count) to max(100, block_count*2).\nThe old threshold was too conservative for functions with many allocas\nlike many_mat_cal (~15 allocas, 60 blocks), causing premature skip.\nThe new threshold allows these while still blocking crypto-like functions\nwhere excessive PHI nodes hurt code quality.\n\nmany_mat_cal: -91 lines, matmul: -84 lines, h-8: -97 lines
  • d5d8924050 chore: update instruction count baseline after loop optimizations merge\n\nAdditional reductions from loop IR passes:\n- conv2d: 657->629 (-28), fft: 619->605 (-14)\n- huffman: 849->829 (-20), sl: 280->264 (-16)\n- knapsack: 175->167 (-8), transpose: 211->207 (-4)\n- 01_mm: 313->310 (-3), h-10: 335->329 (-6)\n\nRestore CLAUDE.md deleted during merge.
  • Compare 4 commits »

1 week ago

pfn7so3we pushed to hxz at pybqixnm9/nudt-compiler-cpp

  • 5312c30aee docs(mem2reg): fix comment to match actual threshold value
  • 2e368f86cf chore: update instruction count baseline after Mem2Reg threshold tuning\n\nKey improvements from PHI threshold relaxation:\n- many_mat_cal: 523->432 (-91 lines, 17.4%)\n- h-8: 504->407 (-97 lines, 19.2%)\n- matmul: 450->366 (-84 lines, 18.7%)\n\nCrypto and other complex functions unaffected (correctly skipped).
  • cc9f4f9a76 feat(mem2reg): tune PHI threshold to allow Mem2Reg on moderate functions\n\nChange phi_threshold from max(50, block_count) to max(100, block_count*2).\nThe old threshold was too conservative for functions with many allocas\nlike many_mat_cal (~15 allocas, 60 blocks), causing premature skip.\nThe new threshold allows these while still blocking crypto-like functions\nwhere excessive PHI nodes hurt code quality.\n\nmany_mat_cal: -91 lines, matmul: -84 lines, h-8: -97 lines
  • d5d8924050 chore: update instruction count baseline after loop optimizations merge\n\nAdditional reductions from loop IR passes:\n- conv2d: 657->629 (-28), fft: 619->605 (-14)\n- huffman: 849->829 (-20), sl: 280->264 (-16)\n- knapsack: 175->167 (-8), transpose: 211->207 (-4)\n- 01_mm: 313->310 (-3), h-10: 335->329 (-6)\n\nRestore CLAUDE.md deleted during merge.
  • 06bada3ff5 Merge remote master into local master
  • Compare 14 commits »

1 week ago

pfn7so3we pushed to hxz at pybqixnm9/nudt-compiler-cpp

  • 39b7e2ed19 feat(backend): loop-depth weighted spill cost model\n\nAdds DFS-based back-edge detection to compute basic block loop\nnesting depth. Each vreg inherits the max loop depth of its\ndefining blocks. Spill cost multiplies interval+ref by 10^depth,\nmaking loop-carried variables much more expensive to spill.
  • 993e81363a fix(backend): recompute degree unconditionally after MergeInto\n\nAfter a merge, u inherits v's neighbors, so degree[u] must always\nbe recomputed. Previously, when degree[u] < K before merge, the\nstale low degree was kept, which could push a high-degree merged\nnode into simplify_worklist with wrong metadata.\n\nAlso remove redundant if(!remaining.empty()) guard in spill path\nand clean up extra brace from removed GiveUpPhase.
  • bef03ec220 chore: update instruction count baseline after Module D rewrite\n\n54/60 performance tests reduced. Key improvements:\n- conv2d: -95 lines (12.6%)\n- huffman: -44 lines (4.9%)\n- fft: -39 lines (5.9%)\n- crc: -38 lines (11.6%)\n- 03_sort: -28 lines (4.2%)\n- 01_mm: -22 lines (6.6%)\n\nAlso fix count_asm.sh sed to match any current value.
  • 570253f1f2 feat(backend): relax Briggs threshold to 2*K and fix move_adj self-loop\n\nUsing >= 2*K instead of >= K for high-degree neighbor count allows\nmore node pairs to be safely merged. Fixed a bug in MergeInto where\nmove_adj[u] could contain u (self-loop) when v's move set included u,\ncausing iterator invalidation during move_adj cleanup.
  • 3691da34ee feat(backend): rewrite main loop with held_nodes release and ReactivatePairs
  • Compare 23 commits »

1 week ago

pfn7so3we pushed to master at pybqixnm9/nudt-compiler-cpp

  • 06bada3ff5 Merge remote master into local master
  • 39b7e2ed19 feat(backend): loop-depth weighted spill cost model\n\nAdds DFS-based back-edge detection to compute basic block loop\nnesting depth. Each vreg inherits the max loop depth of its\ndefining blocks. Spill cost multiplies interval+ref by 10^depth,\nmaking loop-carried variables much more expensive to spill.
  • 993e81363a fix(backend): recompute degree unconditionally after MergeInto\n\nAfter a merge, u inherits v's neighbors, so degree[u] must always\nbe recomputed. Previously, when degree[u] < K before merge, the\nstale low degree was kept, which could push a high-degree merged\nnode into simplify_worklist with wrong metadata.\n\nAlso remove redundant if(!remaining.empty()) guard in spill path\nand clean up extra brace from removed GiveUpPhase.
  • bef03ec220 chore: update instruction count baseline after Module D rewrite\n\n54/60 performance tests reduced. Key improvements:\n- conv2d: -95 lines (12.6%)\n- huffman: -44 lines (4.9%)\n- fft: -39 lines (5.9%)\n- crc: -38 lines (11.6%)\n- 03_sort: -28 lines (4.2%)\n- 01_mm: -22 lines (6.6%)\n\nAlso fix count_asm.sh sed to match any current value.
  • 570253f1f2 feat(backend): relax Briggs threshold to 2*K and fix move_adj self-loop\n\nUsing >= 2*K instead of >= K for high-degree neighbor count allows\nmore node pairs to be safely merged. Fixed a bug in MergeInto where\nmove_adj[u] could contain u (self-loop) when v's move set included u,\ncausing iterator invalidation during move_adj cleanup.
  • Compare 29 commits »

1 week ago

pfn7so3we pushed to hxz at pybqixnm9/nudt-compiler-cpp

  • 774a2688a3 feat(remat): add rematerializable annotation for MovImm instructions
  • 4812329aa4 refactor(backend): remove redundant live-out pairwise interference edges
  • 6b39d2d397 fix: add missing FP threshold in second ColorGraph call site
  • 26d89b2fbd fix: parameterize caller-saved threshold for GP/FP in ColorGraph
  • 4d95f33dc2 refactor: make caller-saved color preference explicit in ColorGraph Select phase
  • Compare 5 commits »

1 week ago

pfn7so3we pushed to hxz at pybqixnm9/nudt-compiler-cpp

  • f6047f7d85 feat(opt): 移植 worktree 优化遍并修复关键 bug
  • e3e01256cd Merge master into zhm: apply all fixes for evaluation system
  • 8bbd8f96bb Fix starttime/stoptime function name and add line number parameter
  • aca995140a Fix merge conflict in README.md
  • 7ab465d25b Add missing PassManager.h and fix .gitignore to not ignore src/include/
  • Compare 64 commits »

2 weeks ago

pfn7so3we pushed to hxz at pybqixnm9/nudt-compiler-cpp

1 month ago

pfn7so3we pushed to hxz at pybqixnm9/nudt-compiler-cpp

2 months ago

pfn7so3we pushed to lzk at pybqixnm9/nudt-compiler-cpp

2 months ago

pfn7so3we created branch hxz in pybqixnm9/nudt-compiler-cpp

2 months ago

pfn7so3we pushed to hxz at pybqixnm9/nudt-compiler-cpp

2 months ago