You can not select more than 25 topics Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.
nudt-compiler-cpp/CLAUDE.md

5.8 KiB

CLAUDE.md

This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.

Project Overview

SysY 编译器课程实验 — a progressive compiler (Lab1Lab6) for the SysY language (a C subset) targeting ARM64/AArch64. Built with C++17, CMake, and ANTLR4.

Build Commands

Prerequisites

# Install dependencies (Ubuntu 22.04 / WSL)
sudo apt install -y build-essential cmake git openjdk-11-jre llvm clang gcc-aarch64-linux-gnu qemu-user

Generate ANTLR Lexer/Parser (required before first build)

mkdir -p build/generated/antlr4
java -jar third_party/antlr-4.13.2-complete.jar \
  -Dlanguage=Cpp -visitor -no-listener -Xexact-output-dir \
  -o build/generated/antlr4 src/antlr4/SysY.g4

Lab1 (frontend only — parse tree printing)

cmake -S . -B build -DCMAKE_BUILD_TYPE=Release -DCOMPILER_PARSE_ONLY=ON
cmake --build build -j "$(nproc)"
./build/bin/compiler --emit-parse-tree test/test_case/functional/simple_add.sy

Full build (all labs, including IR gen, optimization, and codegen)

cmake -S . -B build -DCMAKE_BUILD_TYPE=Release -DCOMPILER_PARSE_ONLY=OFF
cmake --build build -j "$(nproc)"

Compiler Usage

# Competition format: compile to assembly
./build/bin/compiler -S -o output.s input.sy

# With optimization
./build/bin/compiler -S -o output.s input.sy -O1

# Emit IR
./build/bin/compiler --emit-ir input.sy

# Single test: compile, link, run with QEMU, and compare output
./scripts/verify_asm.sh test/test_case/functional/simple_add.sy --run

# Same for IR path (uses llc + clang to compile/run)
./scripts/verify_ir.sh test/test_case/functional/simple_add.sy --run

CLI Options

Flag Effect
-S Emit assembly (default when no mode specified)
-o <file> Output file path
-O, -O1, -O2, -O3 Optimization level
--emit-parse-tree Print ANTLR parse tree
--emit-ir Print LLVM-style IR
--emit-asm Print AArch64 assembly
-h, --help Show help

Testing

Main test harness

# Run all 2026test functional tests with optimization
./2026test.sh

# Functional tests only, max 10 cases, stop on first failure
./2026test.sh -c functional -n 10 -x

# Without optimization
./2026test.sh -O0

Legacy test scripts

./test1.sh   # Lab1: syntax tree
./test2.sh   # Lab2: IR generation
./test3.sh   # Lab3: assembly generation
./test4.sh   # Lab4: scalar optimization
./test5.sh   # Lab5: register allocation

Architecture

Compiler Pipeline

SysY source (.sy) → ANTLR Lexer/Parser → AST (ANTLR parse tree)
  → Sema (name resolution, type checking) → IR (LLVM-style, load/store form)
  → IR Passes (Mem2Reg → LICM → ConstFold/Prop/DCE/CFG/CSE) → MIR (machine IR, AArch64)
  → RegAlloc → FrameLowering → Peephole → AArch64 Assembly output

Source Layout

Directory Purpose
src/antlr4/SysY.g4 ANTLR grammar for SysY language
src/frontend/ ANTLR driver + syntax tree printer (Lab1)
src/sem/ Semantic analysis: name binding, scope, type checking (Lab2 prep)
src/irgen/ IR generation via ANTLR visitor: Decl, Exp, Stmt, Func (Lab2)
src/ir/ LLVM-style IR: Value/User/Use hierarchy, Module/Function/BasicBlock, IRBuilder (Lab2)
src/ir/passes/ Scalar optimizations (Lab4): Mem2Reg, ConstFold, ConstProp, DCE, CFGSimplify, CSE, LICM
src/ir/analysis/ DominatorTree, LoopInfo
src/mir/ Machine IR + AArch64 backend (Lab3, Lab5): Lowering (IR→MIR), RegAlloc, FrameLowering, AsmPrinter
src/mir/passes/ MIR peephole pass
src/utils/ CLI argument parsing, logging
src/include/ Build-time headers. At build time CMake adds src/include as include path.
include/ Platform-provided headers. Gitignored — supplied externally by grading platform. Mirrors src/include/.
sylib/ SysY runtime library (sylib.c), linked into final executables
scripts/ verify_asm.sh, verify_ir.sh — single-case test helpers
third_party/ ANTLR jar + antlr4-runtime sources
test/test_case/ Reference test cases with expected outputs

Key Design Patterns

  • IR IR: Lightweight LLVM-style IR with Value → User → Instruction class hierarchy and def-use chains via Use objects. IRBuilder appends instructions to a BasicBlock. The IR starts in load/store (alloca-based) form; Mem2Reg promotes allocas to SSA phi nodes.
  • MIR IR: Lower-level, three-address machine IR using AArch64 opcodes and a union-like Operand (reg, vreg, imm, frame index, label, symbol). Purely a data container — no SSA, no def-use analysis.
  • ANTLR Visitor: IRGenImpl extends the ANTLR-generated SysYBaseVisitor to walk the parse tree and emit IR. Each visit* method returns std::any (typically ir::Value*).
  • Pass infrastructure: Each IR pass is a standalone function (RunMem2Reg, RunDCE, etc.) taking a Module&. PassManager and PassManagerModule orchestrate them with fixed-point iteration (serialize, compare, re-run until convergence).

Important Notes

  • The #if COMPILER_PARSE_ONLY macro in main.cpp guards all code beyond Lab1. The CMake option COMPILER_PARSE_ONLY controls whether sem, irgen, and mir subdirectories are built.
  • include/ is in .gitignore and absent from the build include path. It is provided externally by the grading platform. When editing headers, work in src/include/.
  • The grammar SysY.g4 defines the SysY language subset: int/float/void types, arrays, const, if/else/while/break/continue/return, and C-like expressions including logical short-circuit &&/||.
  • Commit message convention: <type>(<scope>): <subject> where type ∈ {feat, fix, refactor, docs, test, chore} and scope ∈ {frontend, irgen, backend, test, doc}.