You can not select more than 25 topics Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.

28 lines
1.1 KiB

This file contains ambiguous Unicode characters!

This file contains ambiguous Unicode characters that may be confused with others in your current locale. If your use case is intentional and legitimate, you can safely ignore this warning. Use the Escape button to highlight these characters.

# BliBili_danmu_crawl
本程序利用爬虫B站爬取所需弹幕数据以搜索关键词“2024巴黎奥运会”爬取综合排序前300的所有视频弹幕。统计每种弹幕的数量并输出数量排名前8的弹幕
## release
内含本次所运行使用到的程序
- **test_1.py**:爬取一个指定BV号视频的所有弹幕
- **test_getfor.py**for循环一条一条爬取综合排序前300的所有视频弹幕
- **test_getthread.py**:优化--利用线程池并发爬取综合排序前300的所有视频弹幕
- **数据分析.p**y将所有.txt文件整合并输出数量排名前8的弹幕到xlsx文件中
- **mywordcloud.py**:制作出精美的词云图,里面有两种方法,第一种普通,第二种奖杯状
## output
- **弹幕收集按序**:所有视频弹幕.txt的文件夹
- **wordcloud.jpg**:正常词云图展示
- **wordcloud_cup.jpg**:奖杯状词云图展示
- **奖杯.png**:作为掩膜
## requirements
代码中所需的外部依赖库及其版本
## 附加题:莎莎和陈梦
内部文件命名类似如上