django+apscheduler+scrapy

通过apscheduler调用scrapy的spider会报错:
Error:builtins.ValueError: signal only works in main thread

按以下地址方案可以解决:
https://stackoverflow.com/questions/53605039/apschedulerscrapy-signal-only-works-in-main-thread

但使用django时,该方案中的BlockingScheduler()会阻塞django主进程启动,
将BlockingScheduler()替换为BackgroundScheduler()或TwistedScheduler()即可。

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
from apscheduler.schedulers.background import BackgroundScheduler

import crochet
crochet.setup()

settings = get_project_settings()
configure_logging(settings)
runner = CrawlerRunner(settings)

# Note: Removing defer here for the example
# @defer.inlineCallbacks

@crochet.run_in_reactor
def crawl():
runner.crawl(Jobaispider)#this is my spider
runner.crawl(Jobpythonspider)#this is my spider

# sched = BlockingScheduler()
sched = BackgroundScheduler()
sched.add_job(crawl, 'date', run_date=datetime(2018, 12, 4, 10, 45, 10))
sched.start()
作者

MOK-BOX

发布于

2022-09-26

更新于

2022-09-27

许可协议