Go to file

Lin c648389959 final version of clawer		1 year ago
.scannerwork	final version of clawer	1 year ago
Linux version	init	1 year ago
history version	init	1 year ago
xlsx_output	final version of clawer	1 year ago
LICENSE	init	1 year ago
README.md	final version of clawer	1 year ago
csv_to_xlsx_converter.py	final version of clawer	1 year ago
ctrip_flights_scraper_V3(undetected_chromedriver).py	init	1 year ago
ctrip_flights_scraper_V3.py	final version of clawer	1 year ago
db_import.py	final version of clawer	1 year ago
sonar-project.properties	final version of clawer	1 year ago
stealth.min.js	init	1 year ago

README.md

Unescape Escape

Ctrip-Crawler

概述

Ctrip-Crawler 是一个携程航班信息的专业爬虫工具，主要基于 Selenium 框架进行实现。 request 方法访问携程 API 的方法，由于 IP 限制和 JS 逆向工程的挑战，该途径已不再适用。（报错）

携程支持IPV6访问，因此可以通过生成大量IPV6规避 IP 限制。

主要特性

Selenium 自动化框架：与直接请求 API 的方法不同，该项目基于 Selenium，提供高度可定制和交互式的浏览器模拟。

灵活的错误处理机制：针对不同类型的异常（如超时、验证码出现、未知错误等），实施相应的处理策略，包括重试和人工干预。

IP限制解决方案：利用页面特性和用户模拟，规避了 IP 限制，提高了爬取稳定性。

数据校验与解析：对获取的数据进行严格的数据质量和完整性校验，包括 gzip 解压缩和 JSON 格式解析。

版本迭代与优化：V2版本解决了验证码问题；V3版本提高了系统的稳定性和可用性；V3.5版本增加了linux系统下多IPV6网口的生成与代理

文档和教程

详细的使用指南和开发文档可在以下博客中查看：

基于selenium的携程机票爬取程序

基于selenium的携程机票爬取程序V2

基于request的携程机票爬取程序

基于request的航班历史票价爬取

TO DO

V4.0增加多线程分片运行……

贡献与反馈

如果你有更好的优化建议或发现任何 bug，请通过 Issues 或 Pull Requests 与我们交流。我们非常欢迎各种形式的贡献！