# CLAUDE.md This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository. ## Project Overview SysY 编译器课程实验 — a progressive compiler (Lab1–Lab6) for the SysY language (a C subset) targeting ARM64/AArch64. Built with C++17, CMake, and ANTLR4. ## Build Commands ### Prerequisites ```bash # Install dependencies (Ubuntu 22.04 / WSL) sudo apt install -y build-essential cmake git openjdk-11-jre llvm clang gcc-aarch64-linux-gnu qemu-user ``` ### Generate ANTLR Lexer/Parser (required before first build) ```bash mkdir -p build/generated/antlr4 java -jar third_party/antlr-4.13.2-complete.jar \ -Dlanguage=Cpp -visitor -no-listener -Xexact-output-dir \ -o build/generated/antlr4 src/antlr4/SysY.g4 ``` ### Lab1 (frontend only — parse tree printing) ```bash cmake -S . -B build -DCMAKE_BUILD_TYPE=Release -DCOMPILER_PARSE_ONLY=ON cmake --build build -j "$(nproc)" ./build/bin/compiler --emit-parse-tree test/test_case/functional/simple_add.sy ``` ### Full build (all labs, including IR gen, optimization, and codegen) ```bash cmake -S . -B build -DCMAKE_BUILD_TYPE=Release -DCOMPILER_PARSE_ONLY=OFF cmake --build build -j "$(nproc)" ``` ## Compiler Usage ```bash # Competition format: compile to assembly ./build/bin/compiler -S -o output.s input.sy # With optimization ./build/bin/compiler -S -o output.s input.sy -O1 # Emit IR ./build/bin/compiler --emit-ir input.sy # Single test: compile, link, run with QEMU, and compare output ./scripts/verify_asm.sh test/test_case/functional/simple_add.sy --run # Same for IR path (uses llc + clang to compile/run) ./scripts/verify_ir.sh test/test_case/functional/simple_add.sy --run ``` ### CLI Options | Flag | Effect | |------|--------| | `-S` | Emit assembly (default when no mode specified) | | `-o ` | Output file path | | `-O`, `-O1`, `-O2`, `-O3` | Optimization level | | `--emit-parse-tree` | Print ANTLR parse tree | | `--emit-ir` | Print LLVM-style IR | | `--emit-asm` | Print AArch64 assembly | | `-h`, `--help` | Show help | ## Testing ### Main test harness ```bash # Run all 2026test functional tests with optimization ./2026test.sh # Functional tests only, max 10 cases, stop on first failure ./2026test.sh -c functional -n 10 -x # Without optimization ./2026test.sh -O0 ``` ### Legacy test scripts ```bash ./test1.sh # Lab1: syntax tree ./test2.sh # Lab2: IR generation ./test3.sh # Lab3: assembly generation ./test4.sh # Lab4: scalar optimization ./test5.sh # Lab5: register allocation ``` ## Architecture ### Compiler Pipeline ``` SysY source (.sy) → ANTLR Lexer/Parser → AST (ANTLR parse tree) → Sema (name resolution, type checking) → IR (LLVM-style, load/store form) → IR Passes (Mem2Reg → LICM → ConstFold/Prop/DCE/CFG/CSE) → MIR (machine IR, AArch64) → RegAlloc → FrameLowering → Peephole → AArch64 Assembly output ``` ### Source Layout | Directory | Purpose | |-----------|---------| | `src/antlr4/SysY.g4` | ANTLR grammar for SysY language | | `src/frontend/` | ANTLR driver + syntax tree printer (Lab1) | | `src/sem/` | Semantic analysis: name binding, scope, type checking (Lab2 prep) | | `src/irgen/` | IR generation via ANTLR visitor: Decl, Exp, Stmt, Func (Lab2) | | `src/ir/` | LLVM-style IR: Value/User/Use hierarchy, Module/Function/BasicBlock, IRBuilder (Lab2) | | `src/ir/passes/` | Scalar optimizations (Lab4): Mem2Reg, ConstFold, ConstProp, DCE, CFGSimplify, CSE, LICM | | `src/ir/analysis/` | DominatorTree, LoopInfo | | `src/mir/` | Machine IR + AArch64 backend (Lab3, Lab5): Lowering (IR→MIR), RegAlloc, FrameLowering, AsmPrinter | | `src/mir/passes/` | MIR peephole pass | | `src/utils/` | CLI argument parsing, logging | | `src/include/` | **Build-time headers.** At build time CMake adds `src/include` as include path. | | `include/` | **Platform-provided headers.** Gitignored — supplied externally by grading platform. Mirrors `src/include/`. | | `sylib/` | SysY runtime library (sylib.c), linked into final executables | | `scripts/` | verify_asm.sh, verify_ir.sh — single-case test helpers | | `third_party/` | ANTLR jar + antlr4-runtime sources | | `test/test_case/` | Reference test cases with expected outputs | ### Key Design Patterns - **IR IR**: Lightweight LLVM-style IR with `Value → User → Instruction` class hierarchy and def-use chains via `Use` objects. `IRBuilder` appends instructions to a `BasicBlock`. The IR starts in load/store (alloca-based) form; `Mem2Reg` promotes allocas to SSA phi nodes. - **MIR IR**: Lower-level, three-address machine IR using AArch64 opcodes and a union-like `Operand` (reg, vreg, imm, frame index, label, symbol). Purely a data container — no SSA, no def-use analysis. - **ANTLR Visitor**: `IRGenImpl` extends the ANTLR-generated `SysYBaseVisitor` to walk the parse tree and emit IR. Each `visit*` method returns `std::any` (typically `ir::Value*`). - **Pass infrastructure**: Each IR pass is a standalone function (`RunMem2Reg`, `RunDCE`, etc.) taking a `Module&`. `PassManager` and `PassManagerModule` orchestrate them with fixed-point iteration (serialize, compare, re-run until convergence). ### Important Notes - The `#if COMPILER_PARSE_ONLY` macro in `main.cpp` guards all code beyond Lab1. The CMake option `COMPILER_PARSE_ONLY` controls whether `sem`, `irgen`, and `mir` subdirectories are built. - `include/` is in `.gitignore` and absent from the build include path. It is provided externally by the grading platform. When editing headers, work in `src/include/`. - The grammar `SysY.g4` defines the SysY language subset: `int`/`float`/`void` types, arrays, `const`, `if`/`else`/`while`/`break`/`continue`/`return`, and C-like expressions including logical short-circuit `&&`/`||`. - Commit message convention: `(): ` where type ∈ {feat, fix, refactor, docs, test, chore} and scope ∈ {frontend, irgen, backend, test, doc}.