forked from NUDT-compiler/nudt-compiler-cpp
You can not select more than 25 topics
Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.
5.8 KiB
5.8 KiB
CLAUDE.md
This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.
Project Overview
SysY 编译器课程实验 — a progressive compiler (Lab1–Lab6) for the SysY language (a C subset) targeting ARM64/AArch64. Built with C++17, CMake, and ANTLR4.
Build Commands
Prerequisites
# Install dependencies (Ubuntu 22.04 / WSL)
sudo apt install -y build-essential cmake git openjdk-11-jre llvm clang gcc-aarch64-linux-gnu qemu-user
Generate ANTLR Lexer/Parser (required before first build)
mkdir -p build/generated/antlr4
java -jar third_party/antlr-4.13.2-complete.jar \
-Dlanguage=Cpp -visitor -no-listener -Xexact-output-dir \
-o build/generated/antlr4 src/antlr4/SysY.g4
Lab1 (frontend only — parse tree printing)
cmake -S . -B build -DCMAKE_BUILD_TYPE=Release -DCOMPILER_PARSE_ONLY=ON
cmake --build build -j "$(nproc)"
./build/bin/compiler --emit-parse-tree test/test_case/functional/simple_add.sy
Full build (all labs, including IR gen, optimization, and codegen)
cmake -S . -B build -DCMAKE_BUILD_TYPE=Release -DCOMPILER_PARSE_ONLY=OFF
cmake --build build -j "$(nproc)"
Compiler Usage
# Competition format: compile to assembly
./build/bin/compiler -S -o output.s input.sy
# With optimization
./build/bin/compiler -S -o output.s input.sy -O1
# Emit IR
./build/bin/compiler --emit-ir input.sy
# Single test: compile, link, run with QEMU, and compare output
./scripts/verify_asm.sh test/test_case/functional/simple_add.sy --run
# Same for IR path (uses llc + clang to compile/run)
./scripts/verify_ir.sh test/test_case/functional/simple_add.sy --run
CLI Options
| Flag | Effect |
|---|---|
-S |
Emit assembly (default when no mode specified) |
-o <file> |
Output file path |
-O, -O1, -O2, -O3 |
Optimization level |
--emit-parse-tree |
Print ANTLR parse tree |
--emit-ir |
Print LLVM-style IR |
--emit-asm |
Print AArch64 assembly |
-h, --help |
Show help |
Testing
Main test harness
# Run all 2026test functional tests with optimization
./2026test.sh
# Functional tests only, max 10 cases, stop on first failure
./2026test.sh -c functional -n 10 -x
# Without optimization
./2026test.sh -O0
Legacy test scripts
./test1.sh # Lab1: syntax tree
./test2.sh # Lab2: IR generation
./test3.sh # Lab3: assembly generation
./test4.sh # Lab4: scalar optimization
./test5.sh # Lab5: register allocation
Architecture
Compiler Pipeline
SysY source (.sy) → ANTLR Lexer/Parser → AST (ANTLR parse tree)
→ Sema (name resolution, type checking) → IR (LLVM-style, load/store form)
→ IR Passes (Mem2Reg → LICM → ConstFold/Prop/DCE/CFG/CSE) → MIR (machine IR, AArch64)
→ RegAlloc → FrameLowering → Peephole → AArch64 Assembly output
Source Layout
| Directory | Purpose |
|---|---|
src/antlr4/SysY.g4 |
ANTLR grammar for SysY language |
src/frontend/ |
ANTLR driver + syntax tree printer (Lab1) |
src/sem/ |
Semantic analysis: name binding, scope, type checking (Lab2 prep) |
src/irgen/ |
IR generation via ANTLR visitor: Decl, Exp, Stmt, Func (Lab2) |
src/ir/ |
LLVM-style IR: Value/User/Use hierarchy, Module/Function/BasicBlock, IRBuilder (Lab2) |
src/ir/passes/ |
Scalar optimizations (Lab4): Mem2Reg, ConstFold, ConstProp, DCE, CFGSimplify, CSE, LICM |
src/ir/analysis/ |
DominatorTree, LoopInfo |
src/mir/ |
Machine IR + AArch64 backend (Lab3, Lab5): Lowering (IR→MIR), RegAlloc, FrameLowering, AsmPrinter |
src/mir/passes/ |
MIR peephole pass |
src/utils/ |
CLI argument parsing, logging |
src/include/ |
Build-time headers. At build time CMake adds src/include as include path. |
include/ |
Platform-provided headers. Gitignored — supplied externally by grading platform. Mirrors src/include/. |
sylib/ |
SysY runtime library (sylib.c), linked into final executables |
scripts/ |
verify_asm.sh, verify_ir.sh — single-case test helpers |
third_party/ |
ANTLR jar + antlr4-runtime sources |
test/test_case/ |
Reference test cases with expected outputs |
Key Design Patterns
- IR IR: Lightweight LLVM-style IR with
Value → User → Instructionclass hierarchy and def-use chains viaUseobjects.IRBuilderappends instructions to aBasicBlock. The IR starts in load/store (alloca-based) form;Mem2Regpromotes allocas to SSA phi nodes. - MIR IR: Lower-level, three-address machine IR using AArch64 opcodes and a union-like
Operand(reg, vreg, imm, frame index, label, symbol). Purely a data container — no SSA, no def-use analysis. - ANTLR Visitor:
IRGenImplextends the ANTLR-generatedSysYBaseVisitorto walk the parse tree and emit IR. Eachvisit*method returnsstd::any(typicallyir::Value*). - Pass infrastructure: Each IR pass is a standalone function (
RunMem2Reg,RunDCE, etc.) taking aModule&.PassManagerandPassManagerModuleorchestrate them with fixed-point iteration (serialize, compare, re-run until convergence).
Important Notes
- The
#if COMPILER_PARSE_ONLYmacro inmain.cppguards all code beyond Lab1. The CMake optionCOMPILER_PARSE_ONLYcontrols whethersem,irgen, andmirsubdirectories are built. include/is in.gitignoreand absent from the build include path. It is provided externally by the grading platform. When editing headers, work insrc/include/.- The grammar
SysY.g4defines the SysY language subset:int/float/voidtypes, arrays,const,if/else/while/break/continue/return, and C-like expressions including logical short-circuit&&/||. - Commit message convention:
<type>(<scope>): <subject>where type ∈ {feat, fix, refactor, docs, test, chore} and scope ∈ {frontend, irgen, backend, test, doc}.