基本功能初步完成,优化:取消一条一条写入数据库,任务结果爬取完毕后一次性写入数据库;优化分布式集群;优化系统参数配置,只需要在settings.ini里改就行;优化任务分发模块,若当前接收任务太多,多出的任务将处于等待状态,当分布式节点或服务器爬虫出现空闲的时候等待状态的任务才开始运行

master
wufayuan 2 years ago
parent f4aedd9cfd
commit 3c186535e9

@ -185,6 +185,7 @@ class Spider_task(threading.Thread):
self.free_remote_nodes.remove(f_node)
break
else:
logger.warning(f'[TASK] generate failed, no free remote nodes! spider task {task.request_map} is at state waiting...')
logger.info(f'[TASK] generating local task {task.request_map}')
if global_var.spider.crawlers >= global_var.spider.max_count_of_crawlers:
logger.warning(f'[TASK] generate failed, crawlers exceed! spider task {task.request_map} is at state waiting...')

Loading…
Cancel
Save