Size: a a a

2021 June 08

МС

Михаил Синегубов... in Scrapy
ээээ, а они вмазали алгоритм получения проксей? раньше они паузой обходились
источник

A

Alex in Scrapy
Да, там про паузу написано при получении 429 статуса, но вроде что-то там и про випиэн проскакивало
источник

A

Antonio in Scrapy
Hello everybody
источник

A

Antonio in Scrapy
I have a problem with my bot, maybe someone can help me?
источник

A

Antonio in Scrapy
I am scraping a web page of tech components and getting results to compare later. For this task, I am using Scrapy and Python-

After two months scraping a web, I am getting 403 status error.

I have tried to change:

   The bot name
   User Agent with some different agents
   Launch scraper from my friend computer
   I have tried to launch scraper in differents IP
   3 and 4 together

This five steps make me think they have info about my scraper and not about my computer and they have blocked my bot.

This is not the first time happens. They blocked my bot one month ago and unblocked the same bot a week later.

I can not afford me to stop the scraping for more than one or two days. I am looking for fresh ideas because everybody on forums and scraping webs recommend to change user-agents.
источник

A

Antonio in Scrapy
Maybe someone can help me or recommend me some fresh ideas? Thank U
источник

(

(o_O) in Scrapy
This looks more like a job than a question.
источник

A

Antonio in Scrapy
Sorry if it's so long, I just wanted to explain correctly haha
источник

A

Antonio in Scrapy
The issue is just I am getting 403 after one month of correct scrap
источник

(

(o_O) in Scrapy
Slow down your bot
источник

A

Antonio in Scrapy
twenty seconds of delay does not avoid it
источник

(

(o_O) in Scrapy
What's your target?
источник

A

Antonio in Scrapy
источник

A

Antonio in Scrapy
I can access from my browser and using request library
источник

(

(o_O) in Scrapy
So this site is behind Cloudflare. I searched briefly but didn't found the real server ip, so out of luck with simple methods. The standard bypassing is to rotate ips while avoiding tracking. Use some rotating proxy service and disable cookies/referer/etc
источник

A

Antonio in Scrapy
Maybe disable cookies and referer could work. The fact about am worried is I just tried my bot in my friends computer with his own ip
источник

A

Antonio in Scrapy
I have disabled cookies right now, it does not work neither but I will keep this conf
источник

(

(o_O) in Scrapy
IP rotating is a *must*.
источник

A

Antonio in Scrapy
Do you think scrapoxy could be the easiest way to do ir?
источник

A

Antonio in Scrapy
do it*?
источник