You can not select more than 25 topics
Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.
|
2 years ago | |
---|---|---|
App.java | 2 years ago | |
CalculatorJFrame.java | 2 years ago | |
README.md | 2 years ago | |
bilibili.py | 2 years ago | |
items.py | 2 years ago | |
middlewares.py | 2 years ago | |
pipelines.py | 2 years ago | |
requirements.txt | 2 years ago | |
settings.py | 2 years ago | |
可视化处理.py | 2 years ago | |
完整爬取b站.py | 2 years ago | |
完整爬取b站视频弹幕.py | 2 years ago | |
搜索博主.py | 2 years ago | |
爬取弹幕.py | 2 years ago | |
爬取弹幕并做词云可视化处理.py | 2 years ago | |
爬取视频.py | 2 years ago | |
爬取评论.py | 2 years ago | |
获取所有视频的url.py | 2 years ago |
README.md
spider_bilibili
基于 scrapy-redis 的分布式爬虫爬取B站博主视频
目录
*快速开始 *下载安装 *创建一个分布式爬虫 *修改配置文件 *启动爬虫
###下载安装 1、安装redis 2、安装依赖库,pip install -r requirements.txt
###创建一个分布式爬虫 scrapy startproject XXX(项目名) cd XXX scrapy genspider xxx(爬虫名) www.baidu.com(域名-示例)
###修改配置文件 1、配置setting.py文件 2、根据任务需求更改items.py,middlewares.py,pipelines.py文件
###启动爬虫 1、启动redis服务,在命令行输入 redis-server 2、运行"获取所有视频的url.py"文件将视频的url加入到redis中 3、运行bilibili.py文件,你需要找到该文件并在命令行中输入 scrapy runspider bilibili.py(可以打开多个脚本任务)