# OCRmyPDF GUI

OCRmyPDF的图形用户界面，让OCR处理PDF文件变得简单。

## 功能特点

- 简洁直观的图形界面
- 批量处理PDF文件
- 拖放支持
- 多语言OCR支持
- 可自定义OCR选项
- 保存处理配置

## 安装要求

- Python 3.7+
- OCRmyPDF
- Tesseract OCR
- PySide6 (Qt for Python)

## 安装步骤

1. 安装OCRmyPDF和其依赖：

```bash
# macOS
brew install ocrmypdf

# Ubuntu/Debian
apt install ocrmypdf

# 或使用pip
pip install ocrmypdf
```

2. 安装GUI依赖：

```bash
pip install PySide6
```

3. 克隆本仓库：

```bash
git clone https://github.com/yourusername/OCRmyPDF-GUI.git
cd OCRmyPDF-GUI
```

## 安装Tesseract语言包

默认情况下，OCRmyPDF只安装英语语言包。要使用其他语言进行OCR，需要安装额外的语言包：

### macOS

```bash
# 安装所有语言包
brew install tesseract-lang

# 或者手动安装特定语言包
# 1. 下载语言包文件，例如简体中文：
# https://github.com/tesseract-ocr/tessdata/raw/main/chi_sim.traineddata
# 2. 复制到Tesseract的tessdata目录：
# sudo cp chi_sim.traineddata /opt/homebrew/share/tessdata/
# 或
# sudo cp chi_sim.traineddata /usr/local/share/tessdata/
```

### Ubuntu/Debian

```bash
# 安装特定语言包，例如简体中文：
sudo apt-get install tesseract-ocr-chi-sim

# 查看所有可用语言包：
apt-cache search tesseract-ocr
```

### Fedora

```bash
# 安装特定语言包，例如简体中文：
sudo dnf install tesseract-langpack-chi_sim

# 查看所有可用语言包：
dnf search tesseract
```

### Windows

1. 从以下网址下载所需语言包文件：
   https://github.com/tesseract-ocr/tessdata/

2. 将下载的`.traineddata`文件放置在Tesseract安装目录的tessdata文件夹中，通常位于：
   `C:\Program Files\Tesseract-OCR\tessdata`

### 常用语言代码

- `eng` - 英语
- `chi_sim` - 简体中文
- `chi_tra` - 繁体中文
- `jpn` - 日语
- `kor` - 韩语
- `fra` - 法语
- `deu` - 德语
- `rus` - 俄语
- `spa` - 西班牙语
- `ita` - 意大利语

更多信息请参考：[OCRmyPDF语言包文档](https://ocrmypdf.readthedocs.io/en/latest/languages.html)

## 使用方法

运行启动脚本：

```bash
python run.py
```

或在Windows上双击`run.py`文件。

## 开发计划

- [ ] 高级OCR选项
- [ ] 多语言界面
- [ ] 暗黑模式
- [ ] 自定义输出文件名模板
- [ ] 处理历史记录

## 贡献

欢迎提交Pull Request或Issue。

## 许可证

本项目采用与OCRmyPDF相同的许可证。

## 致谢

- [OCRmyPDF](https://github.com/ocrmypdf/OCRmyPDF) - 强大的OCR工具
- [Tesseract OCR](https://github.com/tesseract-ocr/tesseract) - OCR引擎
- [Qt for Python (PySide6)](https://wiki.qt.io/Qt_for_Python) - GUI框架