AC
Я поставил DOWNLOAD_DELAY = 5, но в какой-то момент у меня все еще была ошибка 403 для сайта, чей robots.txt:
Size: a a a
AC
AC
A
AR
AC
AC
AR
AR
AC
S
AC
B
A
scrapyd_node_3:
build: ./scrapyd_node_3
environment:
RESULT_DIR: "/app/results"
SPLASH_SERVER: "splash:8050"
ports:
- "6802:6800"
links:
- splash
volumes:
- ./data:/var/lib/scrapyd
- ./data/results:/app/results
restart: unless-stopped
splash:
image: scrapinghub/splash
ports:
- "8050:8050"
A
SPLASH_URL = os.environ.get('SPLASH_SERVER', 'http://127.0.0.1:8050')
DOWNLOADER_MIDDLEWARES = {
'scrapy_splash.SplashCookiesMiddleware': 723,
'scrapy_splash.SplashMiddleware': 725,
'scrapy.downloadermiddlewares.httpcompression.HttpCompressionMiddleware': 810,
}
A
И
МС
json.loads
AR