django+apscheduler+scrapy

通过apscheduler调用scrapy的spider会报错:
Error:builtins.ValueError: signal only works in main thread

按以下地址方案可以解决:
https://stackoverflow.com/questions/53605039/apschedulerscrapy-signal-only-works-in-main-thread

但使用django时,该方案中的BlockingScheduler()会阻塞django主进程启动,
将BlockingScheduler()替换为BackgroundScheduler()或TwistedScheduler()即可。

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
from apscheduler.schedulers.background import BackgroundScheduler

import crochet
crochet.setup()

settings = get_project_settings()
configure_logging(settings)
runner = CrawlerRunner(settings)

# Note: Removing defer here for the example
# @defer.inlineCallbacks

@crochet.run_in_reactor
def crawl():
runner.crawl(Jobaispider)#this is my spider
runner.crawl(Jobpythonspider)#this is my spider

# sched = BlockingScheduler()
sched = BackgroundScheduler()
sched.add_job(crawl, 'date', run_date=datetime(2018, 12, 4, 10, 45, 10))
sched.start()

Hello World

Welcome to Hexo! This is your very first post. Check documentation for more info. If you get any problems when using Hexo, you can find the answer in troubleshooting or you can ask me on GitHub.

Create a new post

1
$ hexo new "My New Post"