|
|
|
|
@ -1,396 +0,0 @@
|
|
|
|
|
# OpenRank 复现项目
|
|
|
|
|
|
|
|
|
|
这是一个基于 open-digger 和 openrank-neo4j-gds 项目的 OpenRank 算法复现实现,用于开发者贡献度量和开源社区分析。
|
|
|
|
|
|
|
|
|
|
## 项目概述
|
|
|
|
|
|
|
|
|
|
OpenRank 是由 X-lab 开发的开源项目价值评估算法,基于 PageRank 改进而来,专门用于评估开源生态中开发者和项目的贡献价值。本项目提供了一个完整的 OpenRank 算法复现实现。
|
|
|
|
|
|
|
|
|
|
## 特性
|
|
|
|
|
|
|
|
|
|
- ✅ **完整的 OpenRank 算法实现**:基于原始论文和开源代码的忠实复现
|
|
|
|
|
- ✅ **支持多种计算模式**:全域 OpenRank 和项目级 OpenRank
|
|
|
|
|
- ✅ **灵活的数据源接口**:支持桩函数模拟和真实数据源
|
|
|
|
|
- ✅ **丰富的指标计算**:仓库、用户、社区等多维度分析
|
|
|
|
|
- ✅ **高性能图计算**:优化的图数据结构和迭代算法
|
|
|
|
|
- ✅ **完善的配置系统**:支持参数调优和环境适配
|
|
|
|
|
- ✅ **TypeScript 支持**:完整的类型定义和代码提示
|
|
|
|
|
|
|
|
|
|
## 安装
|
|
|
|
|
|
|
|
|
|
```bash
|
|
|
|
|
# 克隆项目
|
|
|
|
|
git clone <repository-url>
|
|
|
|
|
cd openrank
|
|
|
|
|
|
|
|
|
|
# 安装依赖
|
|
|
|
|
npm install
|
|
|
|
|
|
|
|
|
|
# 构建项目
|
|
|
|
|
npm run build
|
|
|
|
|
```
|
|
|
|
|
|
|
|
|
|
## 快速开始
|
|
|
|
|
|
|
|
|
|
### 基础使用
|
|
|
|
|
|
|
|
|
|
```typescript
|
|
|
|
|
import { OpenRank } from './src';
|
|
|
|
|
|
|
|
|
|
// 创建 OpenRank 实例
|
|
|
|
|
const openrank = new OpenRank('./data');
|
|
|
|
|
|
|
|
|
|
// 运行 OpenRank 计算
|
|
|
|
|
const startDate = new Date('2024-01-01');
|
|
|
|
|
const endDate = new Date('2024-12-31');
|
|
|
|
|
const results = await openrank.calculate(startDate, endDate);
|
|
|
|
|
|
|
|
|
|
// 获取 Top 10 仓库 OpenRank
|
|
|
|
|
const topRepos = await openrank.getRepoOpenrank({
|
|
|
|
|
startYear: 2024,
|
|
|
|
|
startMonth: 1,
|
|
|
|
|
endYear: 2024,
|
|
|
|
|
endMonth: 12,
|
|
|
|
|
limit: 10,
|
|
|
|
|
order: 'DESC'
|
|
|
|
|
});
|
|
|
|
|
|
|
|
|
|
console.log('Top 10 仓库:', topRepos);
|
|
|
|
|
```
|
|
|
|
|
|
|
|
|
|
### 高级查询
|
|
|
|
|
|
|
|
|
|
```typescript
|
|
|
|
|
import { MetricsCalculator, MockDataSource } from './src';
|
|
|
|
|
|
|
|
|
|
const dataSource = new MockDataSource('./data');
|
|
|
|
|
const calculator = new MetricsCalculator(dataSource);
|
|
|
|
|
|
|
|
|
|
// 获取分布统计
|
|
|
|
|
const distribution = await calculator.getOpenrankDistribution({
|
|
|
|
|
startYear: 2024,
|
|
|
|
|
startMonth: 1,
|
|
|
|
|
endYear: 2024,
|
|
|
|
|
endMonth: 12,
|
|
|
|
|
});
|
|
|
|
|
|
|
|
|
|
// 比较不同时期
|
|
|
|
|
const comparison = await calculator.compareOpenrank(
|
|
|
|
|
{ startYear: 2024, startMonth: 1, endYear: 2024, endMonth: 6 },
|
|
|
|
|
{ startYear: 2024, startMonth: 7, endYear: 2024, endMonth: 12 },
|
|
|
|
|
'repo'
|
|
|
|
|
);
|
|
|
|
|
```
|
|
|
|
|
|
|
|
|
|
## 配置
|
|
|
|
|
|
|
|
|
|
### 配置文件
|
|
|
|
|
|
|
|
|
|
在 `config/openrank.yml` 中配置算法参数:
|
|
|
|
|
|
|
|
|
|
```yaml
|
|
|
|
|
global:
|
|
|
|
|
developerRetentionFactor: 0.5 # 开发者继承比例
|
|
|
|
|
repositoryRetentionFactor: 0.3 # 仓库继承比例
|
|
|
|
|
attenuationFactor: 0.85 # OpenRank 衰减系数
|
|
|
|
|
tolerance: 0.01 # 收敛容差
|
|
|
|
|
maxIterations: 100 # 最大迭代次数
|
|
|
|
|
|
|
|
|
|
activityWeights:
|
|
|
|
|
issueComment: 0.5252 # Issue 评论权重
|
|
|
|
|
openIssue: 2.2235 # 创建 Issue 权重
|
|
|
|
|
openPull: 4.0679 # 创建 PR 权重
|
|
|
|
|
reviewComment: 0.7427 # 代码评审权重
|
|
|
|
|
mergedPull: 2.0339 # 合入 PR 权重
|
|
|
|
|
|
|
|
|
|
projectActivityWeights:
|
|
|
|
|
# 活动类型权重(项目级 OpenRank 用)
|
|
|
|
|
open: 2.0
|
|
|
|
|
comment: 0.5
|
|
|
|
|
review: 1.0
|
|
|
|
|
close: 0.3
|
|
|
|
|
commit: 1.5
|
|
|
|
|
|
|
|
|
|
# 反刷与密度抑制(推荐开启):
|
|
|
|
|
antiGaming:
|
|
|
|
|
enabled: true
|
|
|
|
|
commentTransform: sqrt # 对高频评论做亚线性变换,降低刷量影响
|
|
|
|
|
commitTransform: sqrt
|
|
|
|
|
linearThresholds: # 前 N 条线性累加,超过部分按变换(sqrt/log)
|
|
|
|
|
comment: 3
|
|
|
|
|
reviewComment: 3
|
|
|
|
|
commit: 1
|
|
|
|
|
perItemCap: # 每个 Issue/PR 的单项计数上限,防极端
|
|
|
|
|
comment: 50
|
|
|
|
|
reviewComment: 40
|
|
|
|
|
commit: 20
|
|
|
|
|
|
|
|
|
|
# PR 贡献类型与角色建模
|
|
|
|
|
contributionTypeMultipliers:
|
|
|
|
|
open: 1.0
|
|
|
|
|
comment: 0.9
|
|
|
|
|
review: 1.1
|
|
|
|
|
close: 1.0
|
|
|
|
|
commit: 1.05
|
|
|
|
|
reviewerChangeRequestBonus: 1.03 # 存在 change requests 时对评审者的轻量加成
|
|
|
|
|
roleBonus: # 角色轻量加成(叠乘后会被 roleClamp 限制)
|
|
|
|
|
author: 1.05
|
|
|
|
|
reviewer: 1.05
|
|
|
|
|
committer: 1.03
|
|
|
|
|
commenter: 1.0
|
|
|
|
|
roleClamp: # 角色乘子钳制,避免叠乘过大
|
|
|
|
|
min: 1.0
|
|
|
|
|
max: 1.2
|
|
|
|
|
clamp: # 总贡献钳制,相对原始分项总和
|
|
|
|
|
min: 0.7
|
|
|
|
|
max: 1.6
|
|
|
|
|
|
|
|
|
|
# 仓库层事件(Star/Fork/Release)可选接入
|
|
|
|
|
repoEventWeights:
|
|
|
|
|
enabled: false
|
|
|
|
|
star: 0.5
|
|
|
|
|
fork: 1.0
|
|
|
|
|
release: 1.5
|
|
|
|
|
activityRatio: 0.2
|
|
|
|
|
reverseRatio: 0.1
|
|
|
|
|
```
|
|
|
|
|
|
|
|
|
|
提示:
|
|
|
|
|
- activityDetails.roles 会在活动边上标注作者/评审者/提交者/评论者,便于后续分析与报表。
|
|
|
|
|
- 通过 `getGraphSnapshot()` 可以导出包含 activityDetails 的只读快照,用于检查来源与角色细节。
|
|
|
|
|
|
|
|
|
|
### 环境变量
|
|
|
|
|
|
|
|
|
|
```bash
|
|
|
|
|
# 设置全局收敛容差
|
|
|
|
|
export OPENRANK_GLOBAL_TOLERANCE=0.01
|
|
|
|
|
|
|
|
|
|
# 设置最大迭代次数
|
|
|
|
|
export OPENRANK_GLOBAL_MAX_ITERATIONS=100
|
|
|
|
|
```
|
|
|
|
|
|
|
|
|
|
## API 参考
|
|
|
|
|
|
|
|
|
|
### 核心类
|
|
|
|
|
|
|
|
|
|
#### OpenRank
|
|
|
|
|
|
|
|
|
|
主要的 OpenRank 计算接口。
|
|
|
|
|
|
|
|
|
|
```typescript
|
|
|
|
|
class OpenRank {
|
|
|
|
|
constructor(dataPath?: string)
|
|
|
|
|
|
|
|
|
|
// 计算 OpenRank
|
|
|
|
|
async calculate(startDate: Date, endDate: Date): Promise<OpenRankResult[]>
|
|
|
|
|
|
|
|
|
|
// 获取仓库 OpenRank
|
|
|
|
|
async getRepoOpenrank(config: QueryConfig): Promise<RepoOpenRankResult[]>
|
|
|
|
|
|
|
|
|
|
// 获取用户 OpenRank
|
|
|
|
|
async getUserOpenrank(config: QueryConfig): Promise<UserOpenRankResult[]>
|
|
|
|
|
|
|
|
|
|
// 获取社区 OpenRank
|
|
|
|
|
async getCommunityOpenrank(config: QueryConfig): Promise<CommunityOpenRankResult[]>
|
|
|
|
|
}
|
|
|
|
|
```
|
|
|
|
|
|
|
|
|
|
#### OpenRankCalculator
|
|
|
|
|
|
|
|
|
|
核心算法实现。
|
|
|
|
|
|
|
|
|
|
```typescript
|
|
|
|
|
class OpenRankCalculator {
|
|
|
|
|
constructor(config: OpenRankConfig)
|
|
|
|
|
|
|
|
|
|
async calculate(
|
|
|
|
|
activityData: ActivityData[],
|
|
|
|
|
lastMonthOpenRank: Map<string, number>
|
|
|
|
|
): Promise<OpenRankResult[]>
|
|
|
|
|
|
|
|
|
|
getCalculationStatus(): CalculationStatus
|
|
|
|
|
getGraphStats(): GraphStats
|
|
|
|
|
}
|
|
|
|
|
```
|
|
|
|
|
|
|
|
|
|
#### MetricsCalculator
|
|
|
|
|
|
|
|
|
|
指标计算器。
|
|
|
|
|
|
|
|
|
|
```typescript
|
|
|
|
|
class MetricsCalculator {
|
|
|
|
|
constructor(dataSource: DataSource)
|
|
|
|
|
|
|
|
|
|
async getRepoOpenrank(config: QueryConfig): Promise<RepoOpenRankResult[]>
|
|
|
|
|
async getUserOpenrank(config: QueryConfig): Promise<UserOpenRankResult[]>
|
|
|
|
|
async getCommunityOpenrank(config: QueryConfig): Promise<CommunityOpenRankResult[]>
|
|
|
|
|
async getOpenrankDistribution(config: QueryConfig): Promise<DistributionStats>
|
|
|
|
|
async compareOpenrank(config1: QueryConfig, config2: QueryConfig): Promise<ComparisonResult>
|
|
|
|
|
}
|
|
|
|
|
```
|
|
|
|
|
|
|
|
|
|
### 查询配置
|
|
|
|
|
|
|
|
|
|
```typescript
|
|
|
|
|
interface QueryConfig {
|
|
|
|
|
startYear: number;
|
|
|
|
|
startMonth: number;
|
|
|
|
|
endYear: number;
|
|
|
|
|
endMonth: number;
|
|
|
|
|
order?: 'DESC' | 'ASC';
|
|
|
|
|
limit: number;
|
|
|
|
|
precision: number;
|
|
|
|
|
options?: Record<string, any>;
|
|
|
|
|
}
|
|
|
|
|
```
|
|
|
|
|
|
|
|
|
|
## 算法原理
|
|
|
|
|
|
|
|
|
|
### 全域 OpenRank
|
|
|
|
|
|
|
|
|
|
全域 OpenRank 基于全局协作网络计算,考虑以下因素:
|
|
|
|
|
|
|
|
|
|
1. **网络构建**:以开发者和仓库为节点,活动关系为边
|
|
|
|
|
2. **权重计算**:使用活动度指标作为边权重
|
|
|
|
|
3. **历史继承**:节点部分继承上个月的 OpenRank 值
|
|
|
|
|
4. **迭代收敛**:使用改进的 PageRank 算法计算
|
|
|
|
|
|
|
|
|
|
### 项目级 OpenRank
|
|
|
|
|
|
|
|
|
|
项目级 OpenRank 在项目内部计算,包含更多节点类型:
|
|
|
|
|
|
|
|
|
|
1. **节点类型**:开发者、仓库、Issue、Pull Request
|
|
|
|
|
2. **复杂网络**:多种关系类型和权重配置
|
|
|
|
|
3. **精细参数**:不同节点类型的不同继承因子
|
|
|
|
|
|
|
|
|
|
### 关键参数
|
|
|
|
|
|
|
|
|
|
- **继承因子**:控制历史价值的保留程度
|
|
|
|
|
- **衰减因子**:控制不活跃节点的价值衰减
|
|
|
|
|
- **活动权重**:不同活动类型的重要性权重
|
|
|
|
|
- **收敛容差**:算法收敛的精度要求
|
|
|
|
|
|
|
|
|
|
## 运行示例
|
|
|
|
|
|
|
|
|
|
```bash
|
|
|
|
|
# 运行基础示例
|
|
|
|
|
npm run dev
|
|
|
|
|
|
|
|
|
|
# 运行测试
|
|
|
|
|
npm test
|
|
|
|
|
|
|
|
|
|
# 检查代码质量
|
|
|
|
|
npm run lint
|
|
|
|
|
```
|
|
|
|
|
|
|
|
|
|
## A/B 评估:仓库事件影响(Repo Events)
|
|
|
|
|
|
|
|
|
|
为评估是否引入仓库层事件(Star/Fork/Release 等)对项目级 OpenRank 的影响,本项目提供了 A/B 对比脚本:
|
|
|
|
|
|
|
|
|
|
- 脚本:`scripts/ab_evaluate_repo_events.ts`
|
|
|
|
|
- 运行方式:`npm run ab:repo-events`
|
|
|
|
|
- 环境变量(可选):
|
|
|
|
|
- `GITHUB_TOKEN`:GitHub 访问令牌,避免触发未认证的频率限制
|
|
|
|
|
- `OR_AB_OWNER`/`OR_AB_REPO`:目标仓库,默认 `FISCO-BCOS/FISCO-BCOS`
|
|
|
|
|
- `OR_AB_MONTHS`:时间窗口(月),默认 `3`
|
|
|
|
|
|
|
|
|
|
脚本将输出:
|
|
|
|
|
- A(不启用 repo events)与 B(启用 repo events)之间的用户 OpenRank 相关性(皮尔逊)
|
|
|
|
|
- 贡献构成按事件来源的占比对比(collaboration vs repo_event)
|
|
|
|
|
|
|
|
|
|
提示:若遇到 GitHub API 频率限制,请设置 `GITHUB_TOKEN`,或将 `OR_AB_MONTHS` 调小(如设为 1)。
|
|
|
|
|
|
|
|
|
|
## 项目结构
|
|
|
|
|
|
|
|
|
|
```
|
|
|
|
|
openrank/
|
|
|
|
|
├── src/
|
|
|
|
|
│ ├── types/ # TypeScript 类型定义
|
|
|
|
|
│ ├── config/ # 配置管理
|
|
|
|
|
│ ├── utils/ # 工具函数
|
|
|
|
|
│ ├── data/ # 数据层(桩函数)
|
|
|
|
|
│ ├── algorithm/ # 核心算法
|
|
|
|
|
│ ├── metrics/ # 指标计算
|
|
|
|
|
│ └── index.ts # 主入口
|
|
|
|
|
├── config/ # 配置文件
|
|
|
|
|
├── examples/ # 使用示例
|
|
|
|
|
├── data/ # 数据存储目录
|
|
|
|
|
└── docs/ # 文档
|
|
|
|
|
```
|
|
|
|
|
|
|
|
|
|
## 开发指南
|
|
|
|
|
|
|
|
|
|
### 添加新的数据源
|
|
|
|
|
|
|
|
|
|
1. 实现 `DataSource` 接口
|
|
|
|
|
2. 在 `src/data/` 目录下创建新的数据源类
|
|
|
|
|
3. 更新导出文件
|
|
|
|
|
|
|
|
|
|
```typescript
|
|
|
|
|
export class CustomDataSource implements DataSource {
|
|
|
|
|
async loadActivityData(startDate: Date, endDate: Date): Promise<ActivityData[]> {
|
|
|
|
|
// 实现数据加载逻辑
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
// 实现其他必需方法...
|
|
|
|
|
}
|
|
|
|
|
```
|
|
|
|
|
|
|
|
|
|
### 自定义算法参数
|
|
|
|
|
|
|
|
|
|
1. 修改 `config/openrank.yml` 配置文件
|
|
|
|
|
2. 或使用环境变量覆盖特定参数
|
|
|
|
|
3. 或在代码中动态设置配置
|
|
|
|
|
|
|
|
|
|
```typescript
|
|
|
|
|
import { setConfig } from './src/config';
|
|
|
|
|
|
|
|
|
|
setConfig({
|
|
|
|
|
global: {
|
|
|
|
|
tolerance: 0.001,
|
|
|
|
|
maxIterations: 200,
|
|
|
|
|
}
|
|
|
|
|
});
|
|
|
|
|
```
|
|
|
|
|
|
|
|
|
|
## 测试
|
|
|
|
|
|
|
|
|
|
```bash
|
|
|
|
|
# 运行单元测试
|
|
|
|
|
npm test
|
|
|
|
|
|
|
|
|
|
# 运行覆盖率测试
|
|
|
|
|
npm run test:coverage
|
|
|
|
|
|
|
|
|
|
# 运行集成测试
|
|
|
|
|
npm run test:integration
|
|
|
|
|
```
|
|
|
|
|
|
|
|
|
|
## 贡献
|
|
|
|
|
|
|
|
|
|
欢迎提交 Issue 和 Pull Request!
|
|
|
|
|
|
|
|
|
|
1. Fork 本仓库
|
|
|
|
|
2. 创建特性分支 (`git checkout -b feature/amazing-feature`)
|
|
|
|
|
3. 提交更改 (`git commit -m 'Add some amazing feature'`)
|
|
|
|
|
4. 推送到分支 (`git push origin feature/amazing-feature`)
|
|
|
|
|
5. 开启 Pull Request
|
|
|
|
|
|
|
|
|
|
## 许可证
|
|
|
|
|
|
|
|
|
|
本项目采用 Apache-2.0 许可证,详见 [LICENSE](LICENSE) 文件。
|
|
|
|
|
|
|
|
|
|
## 参考资料
|
|
|
|
|
|
|
|
|
|
- [open-digger](https://github.com/X-lab2017/open-digger) - 原始项目和数据平台
|
|
|
|
|
- [openrank-neo4j-gds](https://github.com/X-lab2017/openrank-neo4j-gds) - Neo4j 插件实现
|
|
|
|
|
- [OpenRank 算法论文](https://blog.frankzhao.cn/openrank_in_project/) - 算法设计思路
|
|
|
|
|
- [X-lab 开放实验室](https://x-lab.info) - 项目发起方
|
|
|
|
|
|
|
|
|
|
## 联系方式
|
|
|
|
|
|
|
|
|
|
如有问题或建议,请通过以下方式联系:
|
|
|
|
|
|
|
|
|
|
- 提交 GitHub Issue
|
|
|
|
|
- 发送邮件至 [contact@example.com]
|
|
|
|
|
- 加入讨论群组 [链接]
|