You can not select more than 25 topics Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.
nudt-compiler-cpp/CLAUDE.md

135 lines
5.8 KiB

This file contains ambiguous Unicode characters!

This file contains ambiguous Unicode characters that may be confused with others in your current locale. If your use case is intentional and legitimate, you can safely ignore this warning. Use the Escape button to highlight these characters.

# CLAUDE.md
This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.
## Project Overview
SysY 编译器课程实验 — a progressive compiler (Lab1Lab6) for the SysY language (a C subset) targeting ARM64/AArch64. Built with C++17, CMake, and ANTLR4.
## Build Commands
### Prerequisites
```bash
# Install dependencies (Ubuntu 22.04 / WSL)
sudo apt install -y build-essential cmake git openjdk-11-jre llvm clang gcc-aarch64-linux-gnu qemu-user
```
### Generate ANTLR Lexer/Parser (required before first build)
```bash
mkdir -p build/generated/antlr4
java -jar third_party/antlr-4.13.2-complete.jar \
-Dlanguage=Cpp -visitor -no-listener -Xexact-output-dir \
-o build/generated/antlr4 src/antlr4/SysY.g4
```
### Lab1 (frontend only — parse tree printing)
```bash
cmake -S . -B build -DCMAKE_BUILD_TYPE=Release -DCOMPILER_PARSE_ONLY=ON
cmake --build build -j "$(nproc)"
./build/bin/compiler --emit-parse-tree test/test_case/functional/simple_add.sy
```
### Full build (all labs, including IR gen, optimization, and codegen)
```bash
cmake -S . -B build -DCMAKE_BUILD_TYPE=Release -DCOMPILER_PARSE_ONLY=OFF
cmake --build build -j "$(nproc)"
```
## Compiler Usage
```bash
# Competition format: compile to assembly
./build/bin/compiler -S -o output.s input.sy
# With optimization
./build/bin/compiler -S -o output.s input.sy -O1
# Emit IR
./build/bin/compiler --emit-ir input.sy
# Single test: compile, link, run with QEMU, and compare output
./scripts/verify_asm.sh test/test_case/functional/simple_add.sy --run
# Same for IR path (uses llc + clang to compile/run)
./scripts/verify_ir.sh test/test_case/functional/simple_add.sy --run
```
### CLI Options
| Flag | Effect |
|------|--------|
| `-S` | Emit assembly (default when no mode specified) |
| `-o <file>` | Output file path |
| `-O`, `-O1`, `-O2`, `-O3` | Optimization level |
| `--emit-parse-tree` | Print ANTLR parse tree |
| `--emit-ir` | Print LLVM-style IR |
| `--emit-asm` | Print AArch64 assembly |
| `-h`, `--help` | Show help |
## Testing
### Main test harness
```bash
# Run all 2026test functional tests with optimization
./2026test.sh
# Functional tests only, max 10 cases, stop on first failure
./2026test.sh -c functional -n 10 -x
# Without optimization
./2026test.sh -O0
```
### Legacy test scripts
```bash
./test1.sh # Lab1: syntax tree
./test2.sh # Lab2: IR generation
./test3.sh # Lab3: assembly generation
./test4.sh # Lab4: scalar optimization
./test5.sh # Lab5: register allocation
```
## Architecture
### Compiler Pipeline
```
SysY source (.sy) → ANTLR Lexer/Parser → AST (ANTLR parse tree)
→ Sema (name resolution, type checking) → IR (LLVM-style, load/store form)
→ IR Passes (Mem2Reg → LICM → ConstFold/Prop/DCE/CFG/CSE) → MIR (machine IR, AArch64)
→ RegAlloc → FrameLowering → Peephole → AArch64 Assembly output
```
### Source Layout
| Directory | Purpose |
|-----------|---------|
| `src/antlr4/SysY.g4` | ANTLR grammar for SysY language |
| `src/frontend/` | ANTLR driver + syntax tree printer (Lab1) |
| `src/sem/` | Semantic analysis: name binding, scope, type checking (Lab2 prep) |
| `src/irgen/` | IR generation via ANTLR visitor: Decl, Exp, Stmt, Func (Lab2) |
| `src/ir/` | LLVM-style IR: Value/User/Use hierarchy, Module/Function/BasicBlock, IRBuilder (Lab2) |
| `src/ir/passes/` | Scalar optimizations (Lab4): Mem2Reg, ConstFold, ConstProp, DCE, CFGSimplify, CSE, LICM |
| `src/ir/analysis/` | DominatorTree, LoopInfo |
| `src/mir/` | Machine IR + AArch64 backend (Lab3, Lab5): Lowering (IR→MIR), RegAlloc, FrameLowering, AsmPrinter |
| `src/mir/passes/` | MIR peephole pass |
| `src/utils/` | CLI argument parsing, logging |
| `src/include/` | **Build-time headers.** At build time CMake adds `src/include` as include path. |
| `include/` | **Platform-provided headers.** Gitignored — supplied externally by grading platform. Mirrors `src/include/`. |
| `sylib/` | SysY runtime library (sylib.c), linked into final executables |
| `scripts/` | verify_asm.sh, verify_ir.sh — single-case test helpers |
| `third_party/` | ANTLR jar + antlr4-runtime sources |
| `test/test_case/` | Reference test cases with expected outputs |
### Key Design Patterns
- **IR IR**: Lightweight LLVM-style IR with `Value → User → Instruction` class hierarchy and def-use chains via `Use` objects. `IRBuilder` appends instructions to a `BasicBlock`. The IR starts in load/store (alloca-based) form; `Mem2Reg` promotes allocas to SSA phi nodes.
- **MIR IR**: Lower-level, three-address machine IR using AArch64 opcodes and a union-like `Operand` (reg, vreg, imm, frame index, label, symbol). Purely a data container — no SSA, no def-use analysis.
- **ANTLR Visitor**: `IRGenImpl` extends the ANTLR-generated `SysYBaseVisitor` to walk the parse tree and emit IR. Each `visit*` method returns `std::any` (typically `ir::Value*`).
- **Pass infrastructure**: Each IR pass is a standalone function (`RunMem2Reg`, `RunDCE`, etc.) taking a `Module&`. `PassManager` and `PassManagerModule` orchestrate them with fixed-point iteration (serialize, compare, re-run until convergence).
### Important Notes
- The `#if COMPILER_PARSE_ONLY` macro in `main.cpp` guards all code beyond Lab1. The CMake option `COMPILER_PARSE_ONLY` controls whether `sem`, `irgen`, and `mir` subdirectories are built.
- `include/` is in `.gitignore` and absent from the build include path. It is provided externally by the grading platform. When editing headers, work in `src/include/`.
- The grammar `SysY.g4` defines the SysY language subset: `int`/`float`/`void` types, arrays, `const`, `if`/`else`/`while`/`break`/`continue`/`return`, and C-like expressions including logical short-circuit `&&`/`||`.
- Commit message convention: `<type>(<scope>): <subject>` where type ∈ {feat, fix, refactor, docs, test, chore} and scope ∈ {frontend, irgen, backend, test, doc}.