You get a bonus - 1 coin for daily activity. Now you have 1 coin

Protecting the site from unwanted bots. Bot MJ12bot, BLEXBot, AhrefsBot and others

Lecture



Large sites with thousands of pages and online stores with a wide range of products often face the problem of a sudden high server load.

The reason very often is not ddos ​​attacks, viruses, or hackers' actions, but ordinary robots of little-known search engines or various services, which, due to the large number of requests to the site per unit of time, lead to an increase in the load and exceeding the limits allowed on the hosting.

I note that this problem is relevant for large online stores, because if your site consists of 100-500 pages or less, even an average hosting will cope with such a sudden load without any special problems. VDS servers are able to withstand much higher loads and, as a rule, such problems are noticeable for online stores on VDS only during the New Year boom or on the eve of holidays, when servers are running at their limits.

The reason for the sudden high load on the server is sometimes possible only through the analysis of logs, but sometimes Yandex Metrics is enough, which sometimes takes bots for users.

Signs of a bot on the site

  • A large number of pages viewed per unit of time
  • Unusual for this time and day of the week high site traffic
  • A sharp increase in user traffic from other countries of the world and with browsers unusual for Russia
  • Unusually slow loading pages
  • Server response long
  • Complaints from the hosting about the consumption of a large number of resource sites
  • The same type of behavior of users with numerous visits to the same page

Examples of unwanted bots

Unwanted bots are often not spammers or site parsers. Very often, these bots represent various services or little-known search engines.

They do not carry a direct threat, but due to incorrect configuration, internal error or for some other reason they can create a high load on the site due to the large number of hits per unit of time.

Bot mj12bot

Search robot service Majestic, which collects data about outbound links to sites. The robot normally perceives canonical pages, but on sites where there are no canonical URLs, it begins to "slip" very strongly on the pages whose URLs contain parameters.

Bot BLEXBot

The robot BLEXBot Crawler is declared as a search engine robot - which particular search engine is not specified on the official page.

Bot ahrefsbot

This robot, as well as MJ12bot, analyzes the pages of the site for the presence of external links. The site itself ahrefs.com provides a bad professional service for the evaluation and analysis of the reference.

Bot HubSpot Webcrawler

This is the search engine robot for amazon.com. At Amazon itself, it is stated that this platform is designed for companies that need to attract visitors, i.e. in essence, the robot is a content collector from online stores.

Other unwanted bots

The list of bots below was not personally met by me, but it is quite possible that they still exist:

Aboundex
80legs
360Spider
Java
Cogentbot
Alexibot
asterias
attach
BackDoorBot
Backweb
Bandit
Batchftp
Bigfoot
Black.Hole
Blackwidow
Blowfish
Botalot
Buddy
BuiltBotTough
Bullseye
Bunnyslippers
Cegbfeieh
Cheesebot
Cherry picker
ChinaClaw
Collector
Copier
CopyRightCheck
cosmos
Crescent
Custo
AIBOT
DISCo
DIIbot
DittoSpyder
Download Demon
Download Devil
Download Wonder
dragonfly
Drip
eCatch
Easydl
ebingbong
Eirgrabber
EmailCollector
Emailsiphon
Emailwolf
EroCrawler
Exabot
Express WebPictures
Extractor
EyeNetIE
Foobot
flunky
Frontpage
Go-Ahead-Got-It
gotit
Grabnet
Grafula
Harvest
hloader
HMView
HTTrack
humanlinks
Ilsebot
Image stripper
Image sucker
Indy Library
InfoNavibot
Infotexies
Intelliseek
Interget
Internet ninja
Iria
Jakarta
JennyBot
Jetcar
Joc
Justview
Jyxobot
Kenjin.Spider
Keyword.Density
larbin
Lexibot
lftp
libWeb / clsHTTP
likse
LinkextractorPro
LinkScan / 8.1a.Unix
Lnspiderguy
Linkwalker
lwp-trivial
Lwp :: simple
Magnet
Mag-net
MarkWatch
Mass downloader
Mata.Hari
Memo
Microsoft.URL
Microsoft URL Control
MIDown tool
MIIxpc
Mirror
Missigua locator
Mister pix
moget
Mozilla / 3.Mozilla / 2.01
Mozilla. * NEWT
NAMEPROTECT
Navroad
NearSite
Netants
Netcraft
Netmechanic
NetSpider
Net vampire
Netzip
NextGenSearchBot
NG
NICErsPRO
niki-bot
NimbleCrawler
Ninja
NPbot
Octopus
Offline explorer
Offline navigator
Openfind
Outfoxbot
Pagegrabber
Papa foto
pavuk
pcBrowser
PHP version tracker
Pockey
ProPowerBot / 2.14
Prowebwalker
psbot
Pump
QueryN.Metasearch
Realdownload
Reaper
Recorder
Reget
RepoMonkey
RMA
Siphon
SiteSnagger
Slysearch
Smartdownload
Snake
Snapbot
Snoopy
sogou
Spacebison
SpankBot
spanner
Sqworm
Stripper
Sucker
Superbot
SuperHTTP
Surfbot
suzuran
Szukacz / 1.4
tAkeOut
Teleport
Telesoft
TurnitinBot / 1.5
The.Intraformant
Thenomad
Tighttwatbot
Titan
True_bot
turingos
Turnitinbot
URLy.Warning
Vacuum
VCI
Voideye
Web Image Collector
Web sucker
WebAuto
Webbandit
Webclipping.com
Webcopier
WebEMailExtrac. * "Bot
WebEnhancer
WebFetch
WebGo IS
Web.Image.Collector
Webleacher
WebmasterWorldForumBot
WebReaper
Websauger
Website eXtractor
Website Quester
Webster
Webstripper
Webwhacker
Webzip
Whacker
Widow
WISENutbot
WWWOFFLE
WWW-Collector-E
Xaldon
Xenu
Zeus
ZmEu
Zyborg
AhrefsBot
archive.org_bot
bingbot
Wget
Acunetix
Fhscan

Limit bot activity using robots.txt

The universal solution is to limit the number of requests through the Crawl-delay directive in the robots.txt file. The numerical value indicates the pause in seconds between calls to the site.

Typical example suitable for most sites.

User-agent: *
Crawl-delay: 10

10 seconds is more than enough to limit the search engine robots to the site.

However, some unwanted bots ignore this directive and even a direct ban on access via robots.txt does not save from high load.

User-agent: MJ12bot
Disallow: /

In such cases, there remains only the option of blocking access to the site by ip addresses, where the requests come from, or by the User-agent.

The second option is more preferable, because when blocking access by ip, the site becomes unavailable for all devices, including normal users.

Blocking bots by User-agent via .htaccess file

The vast majority of sites run on the linux platform, where the web server role is performed by the Apache server. The web server processes user requests and renders pages on the site.

To block access by the User-agent, you must add the .htaccess file to the root of the site (if it is not there yet) and add the following lines.

SetEnvIfNoCase User-Agent "Aboundex" bot
SetEnvIfNoCase User-Agent "80legs" bot
SetEnvIfNoCase User-Agent "360Spider" bot
SetEnvIfNoCase User-Agent "^Java" bot
SetEnvIfNoCase User-Agent "^Cogentbot" bot
SetEnvIfNoCase User-Agent "^Alexibot" bot
SetEnvIfNoCase User-Agent "^asterias" bot
SetEnvIfNoCase User-Agent "^attach" bot
SetEnvIfNoCase User-Agent "^BackDoorBot" bot
SetEnvIfNoCase User-Agent "^BackWeb" bot
SetEnvIfNoCase User-Agent "Bandit" bot
SetEnvIfNoCase User-Agent "^BatchFTP" bot
SetEnvIfNoCase User-Agent "^Bigfoot" bot
SetEnvIfNoCase User-Agent "^Black.Hole" bot
SetEnvIfNoCase User-Agent "^BlackWidow" bot
SetEnvIfNoCase User-Agent "^BlowFish" bot
SetEnvIfNoCase User-Agent "^BotALot" bot
SetEnvIfNoCase User-Agent "Buddy" bot
SetEnvIfNoCase User-Agent "^BuiltBotTough" bot
SetEnvIfNoCase User-Agent "^Bullseye" bot
SetEnvIfNoCase User-Agent "^BunnySlippers" bot
SetEnvIfNoCase User-Agent "^Cegbfeieh" bot
SetEnvIfNoCase User-Agent "^CheeseBot" bot
SetEnvIfNoCase User-Agent "^CherryPicker" bot
SetEnvIfNoCase User-Agent "^ChinaClaw" bot
SetEnvIfNoCase User-Agent "Collector" bot
SetEnvIfNoCase User-Agent "Copier" bot
SetEnvIfNoCase User-Agent "^CopyRightCheck" bot
SetEnvIfNoCase User-Agent "^cosmos" bot
SetEnvIfNoCase User-Agent "^Crescent" bot
SetEnvIfNoCase User-Agent "^Custo" bot
SetEnvIfNoCase User-Agent "^AIBOT" bot
SetEnvIfNoCase User-Agent "^DISCo" bot
SetEnvIfNoCase User-Agent "^DIIbot" bot
SetEnvIfNoCase User-Agent "^DittoSpyder" bot
SetEnvIfNoCase User-Agent "^Download\ Demon" bot
SetEnvIfNoCase User-Agent "^Download\ Devil" bot
SetEnvIfNoCase User-Agent "^Download\ Wonder" bot
SetEnvIfNoCase User-Agent "^dragonfly" bot
SetEnvIfNoCase User-Agent "^Drip" bot
SetEnvIfNoCase User-Agent "^eCatch" bot
SetEnvIfNoCase User-Agent "^EasyDL" bot
SetEnvIfNoCase User-Agent "^ebingbong" bot
SetEnvIfNoCase User-Agent "^EirGrabber" bot
SetEnvIfNoCase User-Agent "^EmailCollector" bot
SetEnvIfNoCase User-Agent "^EmailSiphon" bot
SetEnvIfNoCase User-Agent "^EmailWolf" bot
SetEnvIfNoCase User-Agent "^EroCrawler" bot
SetEnvIfNoCase User-Agent "^Exabot" bot
SetEnvIfNoCase User-Agent "^Express\ WebPictures" bot
SetEnvIfNoCase User-Agent "Extractor" bot
SetEnvIfNoCase User-Agent "^EyeNetIE" bot
SetEnvIfNoCase User-Agent "^Foobot" bot
SetEnvIfNoCase User-Agent "^flunky" bot
SetEnvIfNoCase User-Agent "^FrontPage" bot
SetEnvIfNoCase User-Agent "^Go-Ahead-Got-It" bot
SetEnvIfNoCase User-Agent "^gotit" bot
SetEnvIfNoCase User-Agent "^GrabNet" bot
SetEnvIfNoCase User-Agent "^Grafula" bot
SetEnvIfNoCase User-Agent "^Harvest" bot
SetEnvIfNoCase User-Agent "^hloader" bot
SetEnvIfNoCase User-Agent "^HMView" bot
SetEnvIfNoCase User-Agent "^HTTrack" bot
SetEnvIfNoCase User-Agent "^humanlinks" bot
SetEnvIfNoCase User-Agent "^IlseBot" bot
SetEnvIfNoCase User-Agent "^Image\ Stripper" bot
SetEnvIfNoCase User-Agent "^Image\ Sucker" bot
SetEnvIfNoCase User-Agent "Indy\ Library" bot
SetEnvIfNoCase User-Agent "^InfoNavibot" bot
SetEnvIfNoCase User-Agent "^InfoTekies" bot
SetEnvIfNoCase User-Agent "^Intelliseek" bot
SetEnvIfNoCase User-Agent "^InterGET" bot
SetEnvIfNoCase User-Agent "^Internet\ Ninja" bot
SetEnvIfNoCase User-Agent "^Iria" bot
SetEnvIfNoCase User-Agent "^Jakarta" bot
SetEnvIfNoCase User-Agent "^JennyBot" bot
SetEnvIfNoCase User-Agent "^JetCar" bot
SetEnvIfNoCase User-Agent "^JOC" bot
SetEnvIfNoCase User-Agent "^JustView" bot
SetEnvIfNoCase User-Agent "^Jyxobot" bot
SetEnvIfNoCase User-Agent "^Kenjin.Spider" bot
SetEnvIfNoCase User-Agent "^Keyword.Density" bot
SetEnvIfNoCase User-Agent "^larbin" bot
SetEnvIfNoCase User-Agent "^LexiBot" bot
SetEnvIfNoCase User-Agent "^lftp" bot
SetEnvIfNoCase User-Agent "^libWeb/clsHTTP" bot
SetEnvIfNoCase User-Agent "^likse" bot
SetEnvIfNoCase User-Agent "^LinkextractorPro" bot
SetEnvIfNoCase User-Agent "^LinkScan/8.1a.Unix" bot
SetEnvIfNoCase User-Agent "^LNSpiderguy" bot
SetEnvIfNoCase User-Agent "^LinkWalker" bot
SetEnvIfNoCase User-Agent "^lwp-trivial" bot
SetEnvIfNoCase User-Agent "^LWP::Simple" bot
SetEnvIfNoCase User-Agent "^Magnet" bot
SetEnvIfNoCase User-Agent "^Mag-Net" bot
SetEnvIfNoCase User-Agent "^MarkWatch" bot
SetEnvIfNoCase User-Agent "^Mass\ Downloader" bot
SetEnvIfNoCase User-Agent "^Mata.Hari" bot
SetEnvIfNoCase User-Agent "^Memo" bot
SetEnvIfNoCase User-Agent "^Microsoft.URL" bot
SetEnvIfNoCase User-Agent "^Microsoft\ URL\ Control" bot
SetEnvIfNoCase User-Agent "^MIDown\ tool" bot
SetEnvIfNoCase User-Agent "^MIIxpc" bot
SetEnvIfNoCase User-Agent "^Mirror" bot
SetEnvIfNoCase User-Agent "^Missigua\ Locator" bot
SetEnvIfNoCase User-Agent "^Mister\ PiX" bot
SetEnvIfNoCase User-Agent "^moget" bot
SetEnvIfNoCase User-Agent "^Mozilla/3.Mozilla/2.01" bot
SetEnvIfNoCase User-Agent "^Mozilla.*NEWT" bot
SetEnvIfNoCase User-Agent "^NAMEPROTECT" bot
SetEnvIfNoCase User-Agent "^Navroad" bot
SetEnvIfNoCase User-Agent "^NearSite" bot
SetEnvIfNoCase User-Agent "^NetAnts" bot
SetEnvIfNoCase User-Agent "^Netcraft" bot
SetEnvIfNoCase User-Agent "^NetMechanic" bot
SetEnvIfNoCase User-Agent "^NetSpider" bot
SetEnvIfNoCase User-Agent "^Net\ Vampire" bot
SetEnvIfNoCase User-Agent "^NetZIP" bot
SetEnvIfNoCase User-Agent "^NextGenSearchBot" bot
SetEnvIfNoCase User-Agent "^NG" bot
SetEnvIfNoCase User-Agent "^NICErsPRO" bot
SetEnvIfNoCase User-Agent "^niki-bot" bot
SetEnvIfNoCase User-Agent "^NimbleCrawler" bot
SetEnvIfNoCase User-Agent "^Ninja" bot
SetEnvIfNoCase User-Agent "^NPbot" bot
SetEnvIfNoCase User-Agent "^Octopus" bot
SetEnvIfNoCase User-Agent "^Offline\ Explorer" bot
SetEnvIfNoCase User-Agent "^Offline\ Navigator" bot
SetEnvIfNoCase User-Agent "^Openfind" bot
SetEnvIfNoCase User-Agent "^OutfoxBot" bot
SetEnvIfNoCase User-Agent "^PageGrabber" bot
SetEnvIfNoCase User-Agent "^Papa\ Foto" bot
SetEnvIfNoCase User-Agent "^pavuk" bot
SetEnvIfNoCase User-Agent "^pcBrowser" bot
SetEnvIfNoCase User-Agent "^PHP\ version\ tracker" bot
SetEnvIfNoCase User-Agent "^Pockey" bot
SetEnvIfNoCase User-Agent "^ProPowerBot/2.14" bot
SetEnvIfNoCase User-Agent "^ProWebWalker" bot
SetEnvIfNoCase User-Agent "^psbot" bot
SetEnvIfNoCase User-Agent "^Pump" bot
SetEnvIfNoCase User-Agent "^QueryN.Metasearch" bot
SetEnvIfNoCase User-Agent "^RealDownload" bot
SetEnvIfNoCase User-Agent "Reaper" bot
SetEnvIfNoCase User-Agent "Recorder" bot
SetEnvIfNoCase User-Agent "^ReGet" bot
SetEnvIfNoCase User-Agent "^RepoMonkey" bot
SetEnvIfNoCase User-Agent "^RMA" bot
SetEnvIfNoCase User-Agent "Siphon" bot
SetEnvIfNoCase User-Agent "^SiteSnagger" bot
SetEnvIfNoCase User-Agent "^SlySearch" bot
SetEnvIfNoCase User-Agent "^SmartDownload" bot
SetEnvIfNoCase User-Agent "^Snake" bot
SetEnvIfNoCase User-Agent "^Snapbot" bot
SetEnvIfNoCase User-Agent "^Snoopy" bot
SetEnvIfNoCase User-Agent "^sogou" bot
SetEnvIfNoCase User-Agent "^SpaceBison" bot
SetEnvIfNoCase User-Agent "^SpankBot" bot
SetEnvIfNoCase User-Agent "^spanner" bot
SetEnvIfNoCase User-Agent "^Sqworm" bot
SetEnvIfNoCase User-Agent "Stripper" bot
SetEnvIfNoCase User-Agent "Sucker" bot
SetEnvIfNoCase User-Agent "^SuperBot" bot
SetEnvIfNoCase User-Agent "^SuperHTTP" bot
SetEnvIfNoCase User-Agent "^Surfbot" bot
SetEnvIfNoCase User-Agent "^suzuran" bot
SetEnvIfNoCase User-Agent "^Szukacz/1.4" bot
SetEnvIfNoCase User-Agent "^tAkeOut" bot
SetEnvIfNoCase User-Agent "^Teleport" bot
SetEnvIfNoCase User-Agent "^Telesoft" bot
SetEnvIfNoCase User-Agent "^TurnitinBot/1.5" bot
SetEnvIfNoCase User-Agent "^The.Intraformant" bot
SetEnvIfNoCase User-Agent "^TheNomad" bot
SetEnvIfNoCase User-Agent "^TightTwatBot" bot
SetEnvIfNoCase User-Agent "^Titan" bot
SetEnvIfNoCase User-Agent "^True_bot" bot
SetEnvIfNoCase User-Agent "^turingos" bot
SetEnvIfNoCase User-Agent "^TurnitinBot" bot
SetEnvIfNoCase User-Agent "^URLy.Warning" bot
SetEnvIfNoCase User-Agent "^Vacuum" bot
SetEnvIfNoCase User-Agent "^VCI" bot
SetEnvIfNoCase User-Agent "^VoidEYE" bot
SetEnvIfNoCase User-Agent "^Web\ Image\ Collector" bot
SetEnvIfNoCase User-Agent "^Web\ Sucker" bot
SetEnvIfNoCase User-Agent "^WebAuto" bot
SetEnvIfNoCase User-Agent "^WebBandit" bot
SetEnvIfNoCase User-Agent "^Webclipping.com" bot
SetEnvIfNoCase User-Agent "^WebCopier" bot
SetEnvIfNoCase User-Agent "^WebEMailExtrac.*" bot
SetEnvIfNoCase User-Agent "^WebEnhancer" bot
SetEnvIfNoCase User-Agent "^WebFetch" bot
SetEnvIfNoCase User-Agent "^WebGo\ IS" bot
SetEnvIfNoCase User-Agent "^Web.Image.Collector" bot
SetEnvIfNoCase User-Agent "^WebLeacher" bot
SetEnvIfNoCase User-Agent "^WebmasterWorldForumBot" bot
SetEnvIfNoCase User-Agent "^WebReaper" bot
SetEnvIfNoCase User-Agent "^WebSauger" bot
SetEnvIfNoCase User-Agent "^Website\ eXtractor" bot
SetEnvIfNoCase User-Agent "^Website\ Quester" bot
SetEnvIfNoCase User-Agent "^Webster" bot
SetEnvIfNoCase User-Agent "^WebStripper" bot
SetEnvIfNoCase User-Agent "^WebWhacker" bot
SetEnvIfNoCase User-Agent "^WebZIP" bot
SetEnvIfNoCase User-Agent "Whacker" bot
SetEnvIfNoCase User-Agent "^Widow" bot
SetEnvIfNoCase User-Agent "^WISENutbot" bot
SetEnvIfNoCase User-Agent "^WWWOFFLE" bot
SetEnvIfNoCase User-Agent "^WWW-Collector-E" bot
SetEnvIfNoCase User-Agent "^Xaldon" bot
SetEnvIfNoCase User-Agent "^Xenu" bot
SetEnvIfNoCase User-Agent "^Zeus" bot
SetEnvIfNoCase User-Agent "ZmEu" bot
SetEnvIfNoCase User-Agent "^Zyborg" bot
SetEnvIfNoCase User-Agent "AhrefsBot" bot
SetEnvIfNoCase User-Agent "HubSpot" bot
SetEnvIfNoCase User-Agent "BLEXBot" bot
SetEnvIfNoCase User-Agent "archive.org_bot" bot
SetEnvIfNoCase User-Agent "bingbot" bot
SetEnvIfNoCase User-Agent "^Wget" bot
Deny from env=bot

It is clear that you can use this list in its original form, but you can leave in the list only those unwanted bots that really created in the past and create a high load on your site at the moment.


Comments


To leave a comment
If you have any suggestion, idea, thanks or comment, feel free to write. We really value feedback and are glad to hear your opinion.
To reply

seo, smo, monetization, basics of internet marketing

Terms: seo, smo, monetization, basics of internet marketing