Merge branch 'dyz'

2 weeks ago · b3230ec6d5
parent d5469f93f6 304599b17b
commit b3230ec6d5
15 changed files with 1848 additions and 153 deletions
--- a/README.md
+++ b/README.md
@ -20,6 +20,10 @@

 如果希望进一步参考编译相关项目和往届优秀实现，可以查看编译比赛官网的技术支持栏目：<https://compiler.educg.net/#/index?TYPE=26COM>。其中的“备赛推荐”整理了一些编译相关项目，也能看到往届优秀作品的开源实现，这些内容都很值得参考。

+此外，仓库中还提供了一份当前实现状态与测试入口的总览文档，便于组内同步进度：
+
+- `doc/实验进度与测试方法.md`
+
 ## 3. 头歌平台协作流程

 头歌平台的代码托管方式与 GitHub/Gitee 类似。如果你希望基于当前仓库快速开始协作，可以参考下面这套流程。
--- a/doc/实验进度与测试方法.md
+++ b/doc/实验进度与测试方法.md
@ -0,0 +1,436 @@
+# 实验进度与测试方法
+
+## 1. 当前实验进度
+
+本文档用于记录当前仓库在各个 Lab 上的实现状态，以及对应的测试与验证方式。  
+需要注意：本仓库当前仍处于“课程示例框架 + 逐步补全”的阶段，并不是一个已经完整实现全部 SysY 语义的编译器。
+
+### 1.1 Lab1 当前进度
+
+Lab1 对应前端语法分析与语法树构建。
+
+当前状态：
+
+- 已提供 `SysY.g4`、ANTLR 驱动与语法树打印能力。
+- 已支持通过 `--emit-parse-tree` 输出语法树。
+- 可使用 `parse-only` 模式单独构建前端，不依赖 `sem` / `irgen` / `mir`。
+
+### 1.2 Lab2 当前进度
+
+Lab2 对应“语法树 -> 语义检查 -> IR”。
+
+当前状态可以拆成两部分来看：
+
+1. `Sema`
+   - 已完成一版基于当前 SysY grammar 的语义检查基础实现。
+   - 已支持多层作用域、变量/常量重定义检查、先声明后使用。
+   - 已支持函数符号收集、函数调用检查、`main` 入口检查。
+   - 已支持 `break` / `continue` 使用位置检查。
+   - 已支持 `return` 与函数返回类型匹配检查。
+   - 已支持 `const` 常量表达式求值、数组维度检查、全局初始化常量性检查。
+   - 已支持 `int/float` 标量表达式、比较、逻辑表达式的基础类型检查。
+   - 已内建 `getint`、`putch`、`getfloat`、`getarray`、`putarray` 等常见运行库函数声明。
+
+2. `IRGen`
+   - 当前仓库原有 `IRGen` 仍是最小示例版本。
+   - 当前只适合支持“局部 `int` 变量 + 常量 + 简单表达式 + `return`”这类极小子集。
+   - 由于 grammar 已扩展，而 `IRGen` 尚未完全同步，所以 Lab2 目前**只完成了前半部分：Sema 基础扩展**。
+   - Lab2 的 IR 生成部分仍需继续补全。
+
+### 1.3 Lab3 当前进度
+
+Lab3 对应“IR -> MIR -> 汇编”。
+
+当前状态：
+
+- 仓库中保留了最小后端链路。
+- 仅适合消费当前最小 IR 子集。
+- 尚不具备对完整 SysY 程序稳定生成汇编的能力。
+
+### 1.4 Lab4-Lab6 当前进度
+
+当前仓库已经预留：
+
+- IR 分析与 Pass 目录结构
+- `Mem2Reg`、`ConstFold`、`ConstProp`、`DCE`、`CSE`、`CFGSimplify` 等文件框架
+- 循环分析、支配树、后端优化等实验入口
+
+但这些阶段是否“完成”，取决于你们后续自行补全，不应默认认为仓库当前已经完全实现。
+
+## 2. 推荐测试思路
+
+建议把测试分成三层：
+
+1. `单阶段验证`
+   - 只验证某个阶段是否工作，例如只看 parse、只看 sema、只看 IR 输出。
+
+2. `链路验证`
+   - 从源码一路走到 IR 或汇编，再运行程序，比对 `.out`。
+
+3. `批量回归`
+   - 对 `test/test_case` 下多个测试统一执行，避免只靠 `simple_add.sy` 判断功能是否完成。
+
+## 3. 别人拉取当前实现后的推荐编译方式
+
+如果其他同学拉取了当前仓库，建议按下面顺序准备环境并编译。
+
+### 3.1 先生成 ANTLR 输出
+
+当前仓库的 CMake 会收集构建目录中的 ANTLR 生成文件，但不会自动调用 ANTLR，所以第一次构建前应先执行：
+
+```bash
+mkdir -p build/generated/antlr4
+java -jar third_party/antlr-4.13.2-complete.jar \
+  -Dlanguage=Cpp \
+  -visitor -no-listener \
+  -Xexact-output-dir \
+  -o build/generated/antlr4 \
+  src/antlr4/SysY.g4
+```
+
+### 3.2 如果只想验证 Lab1
+
+只构建 parse-only 前端：
+
+```bash
+cmake -S . -B build -DCMAKE_BUILD_TYPE=Release -DCOMPILER_PARSE_ONLY=ON
+cmake --build build -j "$(nproc)"
+```
+
+构建后可直接运行：
+
+```bash
+./scripts/test_lab1.sh test/test_case/functional
+```
+
+### 3.3 如果想验证当前 Lab2 的 Sema 部分
+
+由于当前仓库中的 `IRGen` 还没有完全跟上新 grammar，而我们这次主要完成的是 `Sema`，所以推荐单独准备一个 `build-sema/` 目录来验证语义检查。
+
+推荐命令如下：
+
+```bash
+cmake -S . -B build-sema -DCMAKE_BUILD_TYPE=Release -DCOMPILER_PARSE_ONLY=OFF
+mkdir -p build-sema/generated
+cp -r build/generated/antlr4 build-sema/generated/
+cmake --build build-sema --target frontend utils sem -j "$(nproc)"
+```
+
+然后编译 `sema_check`：
+
+```bash
+g++ -std=c++17 \
+  -Iinclude \
+  -Isrc \
+  -Ibuild-sema/generated/antlr4 \
+  -Ithird_party/antlr4-runtime-4.13.2/runtime/src \
+  tools/sema_check.cpp \
+  build-sema/src/sem/libsem.a \
+  build-sema/src/frontend/libfrontend.a \
+  build-sema/src/utils/libutils.a \
+  build-sema/libantlr4_runtime.a \
+  -pthread \
+  -o build-sema/sema_check
+```
+
+完成后即可运行：
+
+```bash
+./scripts/test_lab2_sema.sh positive
+./scripts/test_lab2_sema.sh negative
+```
+
+说明：
+
+- `build/` 主要用于 Lab1 parse-only 或后续全量构建
+- `build-sema/` 主要用于当前阶段单独验证 `Sema`
+- `scripts/test_lab2_sema.sh` 依赖 `./build-sema/sema_check`
+
+### 3.4 如果后续要做全量构建
+
+等 `IRGen` 与 grammar 完全同步后，可直接做全量构建：
+
+```bash
+cmake -S . -B build -DCMAKE_BUILD_TYPE=Release -DCOMPILER_PARSE_ONLY=OFF
+cmake --build build -j "$(nproc)"
+```
+
+但在当前阶段，不建议把“全量 build 成功”作为验证 `Sema` 的唯一标准，因为 Lab2 目前完成的是语义分析前半部分，不是整套 IR 生成。
+
+## 4. Lab1 测试方法
+
+### 3.1 构建命令
+
+先生成 ANTLR 输出：
+
+```bash
+mkdir -p build/generated/antlr4
+java -jar third_party/antlr-4.13.2-complete.jar \
+  -Dlanguage=Cpp \
+  -visitor -no-listener \
+  -Xexact-output-dir \
+  -o build/generated/antlr4 \
+  src/antlr4/SysY.g4
+```
+
+然后使用 `parse-only` 构建：
+
+```bash
+cmake -S . -B build -DCMAKE_BUILD_TYPE=Release -DCOMPILER_PARSE_ONLY=ON
+cmake --build build -j "$(nproc)"
+```
+
+### 3.2 单个样例测试
+
+```bash
+./build/bin/compiler --emit-parse-tree test/test_case/functional/simple_add.sy
+```
+
+### 3.3 批量测试
+
+仓库已提供 parse 批量测试脚本。为避免终端直接打印大量语法树导致输出过长，脚本会把每个用例的语法树输出写入单独日志文件。
+
+```bash
+./scripts/test_lab1.sh test/test_case/functional
+```
+
+如果希望指定日志目录，可以使用：
+
+```bash
+./scripts/test_lab1.sh test/test_case/functional test/test_result/lab1_parse_logs
+```
+
+终端中会看到形如：
+
+```text
+TEST test/test_case/functional/simple_add.sy -> test/test_result/lab1_parse_logs/simple_add.parse.log
+...
+ALL_PARSE_OK (...) logs: test/test_result/lab1_parse_logs
+```
+
+说明当前测试目录中的 `.sy` 文件都能通过语法分析；具体语法树内容可直接查看对应 `.parse.log` 文件。
+
+## 5. Lab2 测试方法
+
+Lab2 建议分成两部分测试：`Sema` 和 `IRGen`。
+
+### 4.1 Lab2 当前推荐先测 Sema
+
+因为当前仓库中 `IRGen` 还未完全同步到新 grammar，所以当前阶段更适合先用“语义检查”来证明 Lab2 前半部分已经实现。
+
+#### 4.1.1 当前已验证通过的正例
+
+下面这些测试用例已经可以作为当前 `Sema` 的正向样例：
+
+```bash
+./scripts/test_lab2_sema.sh positive
+```
+
+如果希望指定日志目录，可以使用：
+
+```bash
+./scripts/test_lab2_sema.sh positive test/test_result/lab2_sema_positive_logs
+```
+
+预期现象：
+
+- 终端按用例打印 `TEST ... -> ...`
+- 全部通过后输出 `ALL_SEMA_POSITIVE_OK (...)`
+- 详细输出写入 `*.sema.log`
+
+#### 4.1.2 当前可用于演示的反例
+
+当前已经准备好的反例位于：
+
+- `test/test_case/sema_negative/undef.sy`
+- `test/test_case/sema_negative/break.sy`
+- `test/test_case/sema_negative/ret.sy`
+- `test/test_case/sema_negative/call.sy`
+
+执行命令：
+
+```bash
+./scripts/test_lab2_sema.sh negative
+```
+
+如果希望指定日志目录，可以使用：
+
+```bash
+./scripts/test_lab2_sema.sh negative test/test_result/lab2_sema_negative_logs
+```
+
+预期现象：
+
+- 终端按用例打印 `TEST ... -> ...`
+- 全部符合预期后输出 `ALL_SEMA_NEGATIVE_OK (...)`
+- 每个反例的详细错误信息写入对应 `.sema.log`
+
+例如：
+
+- 使用未声明变量
+- 循环外 `break`
+- `void` 函数返回值
+- 函数参数个数不匹配
+
+#### 4.1.3 语义错误定位信息说明
+
+语义错误信息中的 `@行:列` 用于标明错误位置。
+
+例如：
+
+```text
+[error] [sema] @1:19 - 使用了未声明的标识符: a
+```
+
+表示：
+
+- `1` 是第 1 行
+- `19` 是第 19 列
+
+也就是提示错误出现在源代码第 1 行第 19 列附近，便于快速定位。
+
+#### 4.1.4 当前 Sema 已覆盖的主要错误类型
+
+当前已实现的典型错误检测包括：
+
+- 未声明标识符使用
+- 同作用域重定义
+- 函数重定义
+- 缺少合法 `main`
+- 函数参数数量或类型不匹配
+- `break/continue` 不在循环中
+- `return` 与函数返回类型不匹配
+- 给 `const` 对象赋值
+- 数组维度非法
+- 全局初始化不满足编译期常量要求
+
+### 4.2 Lab2 后续 IR 测试方式
+
+当 `IRGen` 与当前 grammar 对齐后，可使用如下命令输出 IR：
+
+```bash
+./build/bin/compiler --emit-ir test/test_case/functional/simple_add.sy
+```
+
+若需要进一步验证 “IR -> 可执行程序” 链路，可使用：
+
+```bash
+./scripts/verify_ir.sh test/test_case/functional/simple_add.sy test/test_result/ir --run
+```
+
+但需要强调：  
+在当前仓库状态下，这条命令只适合用于未来 IRGen 完成后的测试；不能拿它来证明当前已完成的 `Sema` 部分。
+
+## 6. Lab3 测试方法
+
+Lab3 对应汇编输出与后端链路。
+
+### 5.1 构建
+
+需要全量构建：
+
+```bash
+cmake -S . -B build -DCMAKE_BUILD_TYPE=Release -DCOMPILER_PARSE_ONLY=OFF
+cmake --build build -j "$(nproc)"
+```
+
+### 5.2 单个样例输出汇编
+
+```bash
+./build/bin/compiler --emit-asm test/test_case/functional/simple_add.sy
+```
+
+### 5.3 汇编链路验证
+
+```bash
+./scripts/verify_asm.sh test/test_case/functional/simple_add.sy test/test_result/asm --run
+```
+
+`--run` 模式下会：
+
+1. 生成汇编
+2. 交叉编译为 AArch64 可执行文件
+3. 用 `qemu-aarch64` 运行
+4. 将输出与同名 `.out` 比对
+
+## 7. Lab4 测试方法
+
+Lab4 是优化实验，测试重点不只是“能不能运行”，还包括“优化前后语义一致”。
+
+建议按下面顺序验证：
+
+1. 先确保未优化版本功能正确
+2. 接入优化后再次跑 `verify_ir.sh` 或 `verify_asm.sh`
+3. 比较优化前后的 IR 或汇编输出
+4. 在多个测试上回归，避免某个优化只在 `simple_add` 上看起来没问题
+
+推荐命令：
+
+```bash
+./scripts/verify_ir.sh test/test_case/functional/simple_add.sy test/test_result/ir --run
+./scripts/verify_asm.sh test/test_case/functional/simple_add.sy test/test_result/asm --run
+```
+
+如果你们为优化实现了单独开关，也应额外对比：
+
+```bash
+./build/bin/compiler --emit-ir test/test_case/functional/simple_add.sy
+./build/bin/compiler --emit-asm test/test_case/functional/simple_add.sy
+```
+
+## 8. Lab5 测试方法
+
+Lab5 的测试重点是：
+
+- 寄存器分配后代码仍然正确
+- spill/reload 逻辑没有破坏语义
+- 汇编仍能完整运行
+
+推荐直接走后端完整链路：
+
+```bash
+./scripts/verify_asm.sh test/test_case/functional/simple_add.sy test/test_result/asm --run
+```
+
+完成寄存器分配后，不应只测单个样例，建议至少覆盖：
+
+- `functional/`
+- `performance/` 中若干较大样例
+
+## 9. Lab6 测试方法
+
+Lab6 重点是循环和并行相关优化，测试要分成功能正确性和优化收益两部分。
+
+### 8.1 功能正确性
+
+```bash
+./scripts/verify_ir.sh test/test_case/functional/simple_add.sy test/test_result/ir --run
+./scripts/verify_asm.sh test/test_case/functional/simple_add.sy test/test_result/asm --run
+```
+
+### 8.2 优化效果观察
+
+你们可以对比优化前后的：
+
+- IR 输出
+- 汇编输出
+- 执行时间
+- 代码规模
+
+例如：
+
+```bash
+./build/bin/compiler --emit-ir test/test_case/functional/simple_add.sy
+./build/bin/compiler --emit-asm test/test_case/functional/simple_add.sy
+```
+
+真正评估循环优化时，建议使用包含明显循环结构的功能或性能测试，而不是只看 `simple_add.sy`。
+
+## 10. 当前阶段的建议结论
+
+如果你要汇报当前仓库状态，可以概括为：
+
+1. Lab1 的语法树构建链路已经具备独立测试方式。
+2. Lab2 当前已经完成 `Sema` 基础扩展，并可通过正反例直接演示。
+3. Lab2 的 `IRGen` 还需要继续补全，当前不能把整份 Lab2 视为全部完成。
+4. Lab3 及后续实验目前主要还是框架和最小样例能力，完整覆盖仍需后续实现。
--- a/include/sem/Sema.h
+++ b/include/sem/Sema.h
@ -1,30 +1,69 @@
 // 基于语法树的语义检查与名称绑定。
 #pragma once

+#include <string>
 #include <unordered_map>
+#include <vector>

 #include "SysYParser.h"

+enum class SemanticType {
+  Void,
+  Int,
+  Float,
+};
+
+struct ScalarConstant {
+  SemanticType type = SemanticType::Int;
+  double number = 0.0;
+};
+
+struct ObjectBinding {
+  enum class DeclKind {
+    Var,
+    Const,
+    Param,
+  };
+
+  std::string name;
+  SemanticType type = SemanticType::Int;
+  DeclKind decl_kind = DeclKind::Var;
+  bool is_array_param = false;
+  std::vector<int> dimensions;
+  const SysYParser::VarDefContext* var_def = nullptr;
+  const SysYParser::ConstDefContext* const_def = nullptr;
+  const SysYParser::FuncFParamContext* func_param = nullptr;
+  bool has_const_value = false;
+  ScalarConstant const_value;
+};
+
+struct FunctionBinding {
+  std::string name;
+  SemanticType return_type = SemanticType::Int;
+  std::vector<ObjectBinding> params;
+  const SysYParser::FuncDefContext* func_def = nullptr;
+  bool is_builtin = false;
+};
+
 class SemanticContext {
 public:
-  void BindVarUse(SysYParser::VarContext* use,
-                  SysYParser::VarDefContext* decl) {
-    var_uses_[use] = decl;
-  }
+  void BindObjectUse(const SysYParser::LValContext* use, ObjectBinding binding);
+  const ObjectBinding* ResolveObjectUse(
+      const SysYParser::LValContext* use) const;
+
+  void BindFunctionCall(const SysYParser::UnaryExpContext* call,
+                        FunctionBinding binding);
+  const FunctionBinding* ResolveFunctionCall(
+      const SysYParser::UnaryExpContext* call) const;

-  SysYParser::VarDefContext* ResolveVarUse(
-      const SysYParser::VarContext* use) const {
-    auto it = var_uses_.find(use);
-    return it == var_uses_.end() ? nullptr : it->second;
-  }
+  void RegisterFunction(FunctionBinding binding);
+  const FunctionBinding* ResolveFunction(const std::string& name) const;

 private:
-  std::unordered_map<const SysYParser::VarContext*,
-                     SysYParser::VarDefContext*>
-      var_uses_;
+  std::unordered_map<const SysYParser::LValContext*, ObjectBinding> object_uses_;
+  std::unordered_map<const SysYParser::UnaryExpContext*, FunctionBinding>
+      function_calls_;
+  std::unordered_map<std::string, FunctionBinding> functions_;
 };

-// 目前仅检查：
-// - 变量先声明后使用
-// - 局部变量不允许重复定义
 SemanticContext RunSema(SysYParser::CompUnitContext& comp_unit);
--- a/include/sem/SymbolTable.h
+++ b/include/sem/SymbolTable.h
@ -1,17 +1,25 @@
-// 极简符号表：记录局部变量定义点。
+// 维护对象符号的多层作用域。
 #pragma once

 #include <string>
+#include <string_view>
 #include <unordered_map>
+#include <vector>

-#include "SysYParser.h"
+#include "sem/Sema.h"

 class SymbolTable {
 public:
-  void Add(const std::string& name, SysYParser::VarDefContext* decl);
-  bool Contains(const std::string& name) const;
-  SysYParser::VarDefContext* Lookup(const std::string& name) const;
+  SymbolTable();
+
+  void EnterScope();
+  void ExitScope();
+
+  bool Add(const ObjectBinding& symbol);
+  bool ContainsInCurrentScope(std::string_view name) const;
+  const ObjectBinding* Lookup(std::string_view name) const;
+  size_t Depth() const;

 private:
-  std::unordered_map<std::string, SysYParser::VarDefContext*> table_;
+  std::vector<std::unordered_map<std::string, ObjectBinding>> scopes_;
 };
--- a/scripts/test_lab1.sh
+++ b/scripts/test_lab1.sh
@ -0,0 +1,38 @@
+#!/usr/bin/env bash
+
+set -euo pipefail
+
+case_dir="${1:-test/test_case}"
+log_dir="${2:-test/test_result/lab1_parse_logs}"
+
+if [[ ! -d "$case_dir" ]]; then
+  echo "测试目录不存在: $case_dir" >&2
+  exit 1
+fi
+
+compiler="./build/bin/compiler"
+if [[ ! -x "$compiler" ]]; then
+  echo "未找到编译器: $compiler ，请先构建 parse-only 版本。" >&2
+  exit 1
+fi
+
+mkdir -p "$log_dir"
+
+mapfile -t cases < <(find "$case_dir" -name '*.sy' | sort)
+if [[ ${#cases[@]} -eq 0 ]]; then
+  echo "未找到任何 .sy 测试文件: $case_dir" >&2
+  exit 1
+fi
+
+for f in "${cases[@]}"; do
+  rel="${f#$case_dir/}"
+  safe_name="${rel//\//__}"
+  log_file="$log_dir/${safe_name%.sy}.parse.log"
+  echo "TEST $f -> $log_file"
+  if ! "$compiler" --emit-parse-tree "$f" >"$log_file" 2>&1; then
+    echo "FAIL $f (see $log_file)" >&2
+    exit 1
+  fi
+done
+
+echo "ALL_PARSE_OK (${#cases[@]} cases) logs: $log_dir"
--- a/scripts/test_lab2_sema.sh
+++ b/scripts/test_lab2_sema.sh
@ -0,0 +1,92 @@
+#!/usr/bin/env bash
+
+set -euo pipefail
+
+mode="${1:-positive}"
+log_dir="${2:-test/test_result/lab2_sema_logs}"
+
+checker="./build-sema/sema_check"
+if [[ ! -x "$checker" ]]; then
+  echo "未找到语义测试驱动: $checker" >&2
+  echo "请先准备 build-sema/sema_check。" >&2
+  exit 1
+fi
+
+mkdir -p "$log_dir"
+
+case_files=()
+expected_prefix=""
+
+case "$mode" in
+  positive)
+    expected_prefix="OK"
+    case_files=(
+      "test/test_case/functional/simple_add.sy"
+      "test/test_case/functional/09_func_defn.sy"
+      "test/test_case/functional/25_scope3.sy"
+      "test/test_case/functional/29_break.sy"
+      "test/test_case/functional/05_arr_defn4.sy"
+      "test/test_case/functional/95_float.sy"
+    )
+    ;;
+  negative)
+    expected_prefix="ERR"
+    case_files=(
+      "test/test_case/sema_negative/undef.sy"
+      "test/test_case/sema_negative/break.sy"
+      "test/test_case/sema_negative/ret.sy"
+      "test/test_case/sema_negative/call.sy"
+    )
+    ;;
+  *)
+    echo "用法: $0 [positive|negative] [log_dir]" >&2
+    exit 1
+    ;;
+esac
+
+if [[ ${#case_files[@]} -eq 0 ]]; then
+  echo "没有可执行的测试用例" >&2
+  exit 1
+fi
+
+for f in "${case_files[@]}"; do
+  if [[ ! -f "$f" ]]; then
+    echo "测试文件不存在: $f" >&2
+    exit 1
+  fi
+done
+
+all_ok=true
+for f in "${case_files[@]}"; do
+  base="$(basename "${f%.sy}")"
+  log_file="$log_dir/${base}.sema.log"
+  echo "TEST $f -> $log_file"
+  set +e
+  "$checker" "$f" >"$log_file" 2>&1
+  status=$?
+  set -e
+
+  if ! grep -q "^${expected_prefix} $f$" "$log_file"; then
+    echo "FAIL $f (see $log_file)" >&2
+    all_ok=false
+    continue
+  fi
+
+  if [[ "$mode" == "positive" && $status -ne 0 ]]; then
+    echo "FAIL $f (expected success, see $log_file)" >&2
+    all_ok=false
+    continue
+  fi
+
+  if [[ "$mode" == "negative" && $status -eq 0 ]]; then
+    echo "FAIL $f (expected semantic error, see $log_file)" >&2
+    all_ok=false
+    continue
+  fi
+done
+
+if [[ "$all_ok" != true ]]; then
+  exit 1
+fi
+
+echo "ALL_SEMA_${mode^^}_OK (${#case_files[@]} cases) logs: $log_dir"
--- a/src/antlr4/SysY.g4
+++ b/src/antlr4/SysY.g4
@ -20,58 +20,152 @@ grammar SysY;
 /* Lexer rules                                     */
 /*===-------------------------------------------===*/

+CONST: 'const';
 INT: 'int';
+FLOAT: 'float';
+VOID: 'void';
+IF: 'if';
+ELSE: 'else';
+WHILE: 'while';
+BREAK: 'break';
+CONTINUE: 'continue';
 RETURN: 'return';

+LE: '<=';
+GE: '>=';
+EQ: '==';
+NE: '!=';
+AND: '&&';
+OR: '||';
+
 ASSIGN: '=';
+LT: '<';
+GT: '>';
 ADD: '+';
+SUB: '-';
+MUL: '*';
+DIV: '/';
+MOD: '%';
+NOT: '!';

 LPAREN: '(';
 RPAREN: ')';
+LBRACK: '[';
+RBRACK: ']';
 LBRACE: '{';
 RBRACE: '}';
+COMMA: ',';
 SEMICOLON: ';';

 ID: [a-zA-Z_][a-zA-Z_0-9]*;
-ILITERAL: [0-9]+;
+
+HEX_FLOAT_LITERAL
+    : ('0x' | '0X') HEX_DIGIT* '.' HEX_DIGIT+ BINARY_EXPONENT
+    | ('0x' | '0X') HEX_DIGIT+ '.' HEX_DIGIT* BINARY_EXPONENT
+    | ('0x' | '0X') HEX_DIGIT+ BINARY_EXPONENT
+    ;
+
+DEC_FLOAT_LITERAL
+    : DEC_DIGIT+ '.' DEC_DIGIT* DEC_EXPONENT?
+    | '.' DEC_DIGIT+ DEC_EXPONENT?
+    | DEC_DIGIT+ DEC_EXPONENT
+    ;
+
+HEX_INT_LITERAL
+    : ('0x' | '0X') HEX_DIGIT+
+    ;
+
+OCT_INT_LITERAL
+    : '0' OCT_DIGIT+
+    ;
+
+DEC_INT_LITERAL
+    : '0'
+    | [1-9] DEC_DIGIT*
+    ;

 WS: [ \t\r\n] -> skip;
 LINECOMMENT: '//' ~[\r\n]* -> skip;
 BLOCKCOMMENT: '/*' .*? '*/' -> skip;

+fragment DEC_DIGIT: [0-9];
+fragment OCT_DIGIT: [0-7];
+fragment HEX_DIGIT: [0-9a-fA-F];
+fragment DEC_EXPONENT: [eE] [+-]? DEC_DIGIT+;
+fragment BINARY_EXPONENT: [pP] [+-]? DEC_DIGIT+;
+
 /*===-------------------------------------------===*/
 /* Syntax rules                                    */
 /*===-------------------------------------------===*/

 compUnit
-    : funcDef EOF
+    : topLevelItem (topLevelItem)* EOF
+    ;
+
+topLevelItem
+    : decl
+    | funcDef
    ;

 decl
-    : btype varDef SEMICOLON
+    : constDecl
+    | varDecl
    ;

-btype
+constDecl
+    : CONST bType constDef (COMMA constDef)* SEMICOLON
+    ;
+
+varDecl
+    : bType varDef (COMMA varDef)* SEMICOLON
+    ;
+
+bType
    : INT
+    | FLOAT
+    ;
+
+constDef
+    : ID constIndex* ASSIGN constInitVal
    ;

 varDef
-    : lValue (ASSIGN initValue)?
+    : ID constIndex* (ASSIGN initVal)?
+    ;
+
+constIndex
+    : LBRACK constExp RBRACK
+    ;
+
+constInitVal
+    : constExp
+    | LBRACE (constInitVal (COMMA constInitVal)*)? RBRACE
    ;

-initValue
+initVal
    : exp
+    | LBRACE (initVal (COMMA initVal)*)? RBRACE
    ;

 funcDef
-    : funcType ID LPAREN RPAREN blockStmt
+    : funcType ID LPAREN funcFParams? RPAREN block
    ;

 funcType
-    : INT
+    : VOID
+    | INT
+    | FLOAT
+    ;
+
+funcFParams
+    : funcFParam (COMMA funcFParam)*
+    ;
+
+funcFParam
+    : bType ID (LBRACK RBRACK (LBRACK exp RBRACK)*)?
    ;

-blockStmt
+block
    : LBRACE blockItem* RBRACE
    ;

@ -81,28 +175,107 @@ blockItem
    ;

 stmt
-    : returnStmt
+    : lVal ASSIGN exp SEMICOLON
+    | exp? SEMICOLON
+    | block
+    | IF LPAREN cond RPAREN stmt (ELSE stmt)?
+    | WHILE LPAREN cond RPAREN stmt
+    | BREAK SEMICOLON
+    | CONTINUE SEMICOLON
+    | RETURN exp? SEMICOLON
    ;

-returnStmt
-    : RETURN exp SEMICOLON
+exp
+    : addExp
    ;

-exp
-    : LPAREN exp RPAREN          # parenExp
-    | var                        # varExp
-    | number                     # numberExp
-    | exp ADD exp                # additiveExp
+cond
+    : lOrExp
    ;

-var
-    : ID
+lVal
+    : ID (LBRACK exp RBRACK)*
    ;

-lValue
-    : ID
+primaryExp
+    : LPAREN exp RPAREN
+    | lVal
+    | number
    ;

 number
-    : ILITERAL
+    : intConst
+    | floatConst
+    ;
+
+intConst
+    : DEC_INT_LITERAL
+    | OCT_INT_LITERAL
+    | HEX_INT_LITERAL
+    ;
+
+floatConst
+    : DEC_FLOAT_LITERAL
+    | HEX_FLOAT_LITERAL
+    ;
+
+unaryExp
+    : primaryExp
+    | ID LPAREN funcRParams? RPAREN
+    | addUnaryOp unaryExp
+    ;
+
+addUnaryOp
+    : ADD
+    | SUB
+    ;
+
+funcRParams
+    : exp (COMMA exp)*
+    ;
+
+mulExp
+    : unaryExp
+    | mulExp MUL unaryExp
+    | mulExp DIV unaryExp
+    | mulExp MOD unaryExp
+    ;
+
+addExp
+    : mulExp
+    | addExp ADD mulExp
+    | addExp SUB mulExp
+    ;
+
+relExp
+    : addExp
+    | relExp LT addExp
+    | relExp GT addExp
+    | relExp LE addExp
+    | relExp GE addExp
+    ;
+
+eqExp
+    : relExp
+    | eqExp EQ relExp
+    | eqExp NE relExp
+    ;
+
+lAndExp
+    : condUnaryExp
+    | lAndExp AND condUnaryExp
+    ;
+
+lOrExp
+    : lAndExp
+    | lOrExp OR lAndExp
+    ;
+
+condUnaryExp
+    : eqExp
+    | NOT condUnaryExp
+    ;
+
+constExp
+    : addExp
    ;
--- a/src/sem/Sema.cpp
+++ b/src/sem/Sema.cpp
--- a/src/sem/SymbolTable.cpp
+++ b/src/sem/SymbolTable.cpp
@ -1,17 +1,39 @@
-// 维护局部变量声明的注册与查找。
+// 维护对象符号的注册与按作用域查找。

 #include "sem/SymbolTable.h"

-void SymbolTable::Add(const std::string& name,
-                      SysYParser::VarDefContext* decl) {
-  table_[name] = decl;
+#include <stdexcept>
+
+SymbolTable::SymbolTable() : scopes_(1) {}
+
+void SymbolTable::EnterScope() { scopes_.emplace_back(); }
+
+void SymbolTable::ExitScope() {
+  if (scopes_.size() <= 1) {
+    throw std::runtime_error("symbol table scope underflow");
+  }
+  scopes_.pop_back();
 }

-bool SymbolTable::Contains(const std::string& name) const {
-  return table_.find(name) != table_.end();
+bool SymbolTable::Add(const ObjectBinding& symbol) {
+  auto& scope = scopes_.back();
+  return scope.emplace(symbol.name, symbol).second;
 }

-SysYParser::VarDefContext* SymbolTable::Lookup(const std::string& name) const {
-  auto it = table_.find(name);
-  return it == table_.end() ? nullptr : it->second;
+bool SymbolTable::ContainsInCurrentScope(std::string_view name) const {
+  const auto& scope = scopes_.back();
+  return scope.find(std::string(name)) != scope.end();
 }
+
+const ObjectBinding* SymbolTable::Lookup(std::string_view name) const {
+  const std::string key(name);
+  for (auto it = scopes_.rbegin(); it != scopes_.rend(); ++it) {
+    auto found = it->find(key);
+    if (found != it->end()) {
+      return &found->second;
+    }
+  }
+  return nullptr;
+}
+
+size_t SymbolTable::Depth() const { return scopes_.size(); }
--- a/sysy2022.pdf
+++ b/sysy2022.pdf
--- a/test/test_case/sema_negative/break.sy
+++ b/test/test_case/sema_negative/break.sy
@ -0,0 +1 @@
+int main(){ break; return 0; }
--- a/test/test_case/sema_negative/call.sy
+++ b/test/test_case/sema_negative/call.sy
@ -0,0 +1,2 @@
+int f(int x){ return x; }
+int main(){ return f(); }
--- a/test/test_case/sema_negative/ret.sy
+++ b/test/test_case/sema_negative/ret.sy
@ -0,0 +1,2 @@
+void f(){ return 1; }
+int main(){ return 0; }
--- a/test/test_case/sema_negative/undef.sy
+++ b/test/test_case/sema_negative/undef.sy
@ -0,0 +1 @@
+int main(){ return a; }
--- a/tools/sema_check.cpp
+++ b/tools/sema_check.cpp
@ -0,0 +1,34 @@
+#include <exception>
+#include <iostream>
+#include <string>
+
+#include "frontend/AntlrDriver.h"
+#include "sem/Sema.h"
+#include "utils/Log.h"
+
+int main(int argc, char** argv) {
+  if (argc < 2) {
+    std::cerr << "usage: sema_check <input.sy> [more.sy...]\n";
+    return 2;
+  }
+
+  bool failed = false;
+  for (int i = 1; i < argc; ++i) {
+    const std::string path = argv[i];
+    try {
+      auto antlr = ParseFileWithAntlr(path);
+      auto* comp_unit = dynamic_cast<SysYParser::CompUnitContext*>(antlr.tree);
+      if (!comp_unit) {
+        throw std::runtime_error(FormatError("sema_check", "语法树根节点不是 compUnit"));
+      }
+      (void)RunSema(*comp_unit);
+      std::cout << "OK " << path << "\n";
+    } catch (const std::exception& ex) {
+      failed = true;
+      std::cout << "ERR " << path << "\n";
+      PrintException(std::cout, ex);
+    }
+  }
+
+  return failed ? 1 : 0;
+}