В какое-то время Вы можете задуматься над тем, что на Ваш сайт могут поступать запросы не только от «хороших» ботов таких, как Яндекс робот (поиск), Google робот и другие... И Вы захотите найти список плохих роботов, которым можно заблокировать доступ к сайту навсегда...

Если Вам необходим трафик ботов от Яндекса (КРОМЕ ПОИСКА — т.е. поисковый робот НЕ заблокирован), то поищите и удалите все, что связано с Yandex...

Внимание! Оригинал взят с просторов GitHub, но в оригинале блокируются также и различные referer, в том числе и «хорошие». Осторожнее с блокировкой по referer! Здесь ЕЁ НЕТ. Только по UserAgent.

Список для .htaccess (в корне Вашего сайта должен быть этот файл и в нем эти строки, если Вы используете Apache сервер):

<# Block via User Agentbr />RewriteCond %{HTTP_USER_AGENT} (80legs|360Spider|Aboundex|Abonti|Acunetix|^AIBOT|^Alexibot|Alligator|AllSubmitter|Apexoo|^asterias|^attach|^BackDoorBot|^BackStreet|^BackWeb|Badass|Bandit|Baid|Baiduspider|^BatchFTP|^Bigfoot|^Black.Hole|^BlackWidow|BlackWidow|^BlowFish|Blow|^BotALot|Buddy|^BuiltBotTough|^Bullseye|^BunnySlippers|BBBike|^Cegbfeieh|^CheeseBot|^CherryPicker|^ChinaClaw|^Cogentbot|CPython|Collector|cognitiveseo|Copier|^CopyRightCheck|^cosmos|^Crescent|CSHttp|^Custo|^Demon|^Devil|^DISCo|^DIIbot|discobot|^DittoSpyder|Download.Demon|Download.Devil|Download.Wonder|^dragonfly|^Drip|^eCatch|^EasyDL|^ebingbong|^EirGrabber|^EmailCollector|^EmailSiphon|^EmailWolf|^EroCrawler|^Exabot|^Express|Extractor|^EyeNetIE|FHscan|^FHscan|^flunky|^Foobot|^FrontPage|GalaxyBot|^gotit|Grabber|^GrabNet|^Grafula|^Harvest|^HEADMasterSEO|^hloader|^HMView|^HTTrack|httrack|HTTrack|htmlparser|^humanlinks|^IlseBot|Image.Stripper|Sucker|imagefetch|^InfoNaviRobot|^InfoTekies|^Intelliseek|^InterGET|^Iria|^Jakarta|^JennyBot|^JetCar|JikeSpider|^JOC|^JustView|^Jyxobot|^Kenjin.Spider|^Keyword.Density|libwww|^larbin|LeechFTP|LeechGet|^LexiBot|^lftp|^libWeb|^likse|^LinkextractorPro|^LinkScan|^LNSpiderguy|^LinkWalker|msnbot|MSIECrawler|MJ12bot|MegaIndex|^Magnet|^Mag-Net|^MarkWatch|Mass.Downloader|masscan|^Mata.Hari|^Memo|^MIIxpc|^NAMEPROTECT|^Navroad|^NearSite|^NetAnts|^Netcraft|^NetMechanic|^NetSpider|^NetZIP|^NextGenSearchBot|^NICErsPRO|^niki-bot|^NimbleCrawler|^Ninja|^Nmap|nmap|^NPbot|Offline.Explorer|Offline.Navigator|OpenLinkProfiler|^Octopus|^Openfind|^OutfoxBot|Pixray|probethenet|proximic|^PageGrabber|^pavuk|^pcBrowser|^Pockey|^ProPowerBot|^ProWebWalker|^psbot|^Pump|python-requests|^QueryN.Metasearch|^RealDownload|Reaper|^Reaper|^Ripper|Ripper|Recorder|^ReGet|^RepoMonkey|^RMA|scanbot|seoscanners|^Stripper|^Sucker|Siphon|Siteimprove|^SiteSnagger|SiteSucker|^SlySearch|^SmartDownload|^Snake|^Snapbot|^Snoopy|Sosospider|^sogou|spbot|^SpaceBison|^spanner|^SpankBot|Spinn3r|^Sqworm|Sqworm|Stripper|Sucker|^SuperBot|SuperHTTP|^SuperHTTP|^Surfbot|^suzuran|^Szukacz|^tAkeOut|^Teleport|^Telesoft|^TurnitinBot|^The.Intraformant|^TheNomad|^TightTwatBot|^Titan|^True_Robot|^turingos|^URLy.Warning|^Vacuum|^VCI|VidibleScraper|^VoidEYE|^WebAuto|^WebBandit|^WebCopier|^WebEnhancer|^WebFetch|^Web.Image.Collector|^WebLeacher|^WebmasterWorldForumBot|WebPix|^WebReaper|^WebSauger|Website.eXtractor|^Webster|WebShag|^WebStripper|WebSucker|^WebWhacker|^WebZIP|Whack|Whacker|^Widow|Widow|WinHTTrack|^WISENutbot|WWWOFFLE|^WWWOFFLE|^WWW-Collector-E|^Xaldon|^Xenu|^Zade|^Zeus|ZmEu|^Zyborg|SemrushBot|^WebFuck|^MJ12bot|^majestic12|^WallpapersHD|SputnikBot|Crowsnest|PaperLiBot|peerindex|ia_archiver|Slurp|Aport|NING|JS-Kit|rogerbot|BLEXBot|Twiceler|Java|CommentReader|Yeti|BTWebClient|Tagoobot|Ezooms|igdeSpyder|AhrefsBot|Offline|DISCo|netvampire|^Copier|omgili|socialmediascanner|Jooblebot|SeznamBot|Scrapy|CCBot|linkfluence|veoozbot|Leikibot|Seopult|Faraday|hybrid|Go-http-client|SMUrlExpander|SNAPSHOT|getintent|ltx71|Nuzzel|SMTBot|Laserlikebot|facebookexternalhit|mfibot|OptimizationCrawler|crazy|Dispatch|ubermetrics|^HTMLParser|musobot|filterdb|InfoSeek|omgilibot|DomainSigma|SafeSearch|meanpathbot|statdom|spredbot|StatOnlineRuBot|openstat|DeuSu|semantic|postano|Embedly|NewShareCounts|linkdexbot|GrapeshotCrawler|Digincore|NetSeer|help.jp|getprismatic|Ahrefs|ApacheBench|Applebot|archive|BaiduBot|Birubot|bsalsa|Butterfly|Buzzbot|BuzzSumo|CamontSpider|curl|dataminr|DomainTools|DotBot|FairShare|FeedFetcher|FlaxCrawler|FlightDeckReportsBot|FlipboardProxy|FyberSpider|Gigabot|gold\ crawler|InternetSeer|Jakarta|km.ru|kmSearchBot|Kraken|larbin|Lightspeedsystems|Linguee|LinkBot|EvilBot|ScumSucker|FakeAgent|LinkExchanger|bingbot|msnbot|LinkpadBot|LivelapBot|LoadImpactPageAnalyzer|majestic|Mediatoolkitbot|MetaURI|MLBot|NerdByNature|NjuiceBot|Nutch|OpenHoseBot|Panopta|pflab|PHP/|pirst|PostRank|ptd-crawler|Purebot|PycURL|Python|QuerySeekerSpider|Ruby|SearchBot|SISTRIX|SiteBot|Sogou|solomono|Soup|suggybot|Superfeedr|SurveyBot|SWeb|trendictionbot|TSearcher|ttCrawler|TurnitinBot|TweetmemeBot|UnwindFetchor|urllib|uTorrent|Voyager|WBSearchBot|Wget|WordPress|woriobot|YottosBot|Zeus|zitebot|bingot|mail.ru|tut.by|Br.by|Zubr.com|All.by|Tit.by|21.by|Rambler|Lycos|nigma.ru|Yahoo|alexa.com|archiver|LiveInternet|BegunAdvertising|vkShare|WebArtexBot|Web-Monitoring|Runet-Research-Crawler|YandexDirect|SputnikFaviconBot|CNCat|Virusdie|YoudaoBot|WorldSearch|YandexVideo|YandexMarket|Wotbox|securepoint|Facebot|AltaVista|Bot|Custo|Demon|eCatch|WebWhacker|Express|WebPictures|ExtractorPro|FlashGet|GetRight|GetWeb!|Go!Zilla|Go-Ahead-Got-It|rafula|Stripper|Indy|Spider|Vampire|Foto|WebSpider|WebGo|Quester|Twengabot|perl|scan|email|Pyth|PyQ|WebCollector|WebCopy|webcraw|AcoonBot|adbeat_bot|AddThis.com|adidxbot|ADmantX|ExpertSearch|ExpertSearchSpider|extract|F2S|FastSeek|feedfinder|FeedlyBot|finbot|Flamingo_SearchEngine|FlappyBot|flicky|Flipboard|g00g1e|genieo|Genieo|GigablastOpenSource|GozaikBot|grab|GT::WWW|GTB5|Guzzle|harvest|heritrix|HomePageBot|HTTP::Lite|HubSpot|icarus6|id-search|IDBot|IlseBot|Indigonet|integromedb|IRLbot|ISC\ Systems\ iRc\ Search\ 2.1|JobdiggerSpider|JOC\ Web\ Spider|Jorgee|kanagawa|KINGSpider|kmccrew|mailto:craftbot@yahoo.com|AngloINFO|Antelope|BeetleBot|billigerbot|binlar|bitlybot|BLP_bbot|BoardReader|Bolt\ 0|BOT\ for\ JCE|Bot\ casper|CazoodleBot|checkprivacy|chromeframe|Clerkbot|Cliqzbot|clshttp|DTS.Agent|EasouSpider|ecxi|Elmer|ExaleadCloudView|CommonCrawler|comodo|crawler4j|Crawlera|CRAZYWEBCRAWLER|Curious|CWS_proxy|Default\ Browser\ 0|diavol|DigExt|DoCoMo|DotBot|DISCo\ Watchman|ahrefs.com|PubMatic\ Crawler\ Bot|seokicks.de|zgrab) [NC]
RewriteRule (.*) - [F,L]

Список для nginx конфигурации внутри директивы server{...HERE...} (конфигурация получилась большой, поэтому разбита на две части, собирать В ОДНУ ЧАСТЬ НЕ НАДО, будет ошибка при nginx -t):


server{

server_name_in_redirect off;
if ($http_user_agent ~ 80legs|360Spider|Aboundex|Abonti|Acunetix|^AIBOT|^Alexibot|Alligator|AllSubmitter|Apexoo|^asterias|^attach|^BackDoorBot|^BackStreet|^BackWeb|Badass|Bandit|Baid|Baiduspider|^BatchFTP|^Bigfoot|^Black.Hole|^BlackWidow|BlackWidow|^BlowFish|Blow|^BotALot|Buddy|^BuiltBotTough|^Bullseye|^BunnySlippers|BBBike|^Cegbfeieh|^CheeseBot|^CherryPicker|^ChinaClaw|^Cogentbot|CPython|Collector|cognitiveseo|Copier|^CopyRightCheck|^cosmos|^Crescent|CSHttp|^Custo|^Demon|^Devil|^DISCo|^DIIbot|discobot|^DittoSpyder|Download.Demon|Download.Devil|Download.Wonder|^dragonfly|^Drip|^eCatch|^EasyDL|^ebingbong|^EirGrabber|^EmailCollector|^EmailSiphon|^EmailWolf|^EroCrawler|^Exabot|^Express|Extractor|^EyeNetIE|FHscan|^FHscan|^flunky|^Foobot|^FrontPage|GalaxyBot|^gotit|Grabber|^GrabNet|^Grafula|^Harvest|^HEADMasterSEO|^hloader|^HMView|^HTTrack|httrack|HTTrack|htmlparser|^humanlinks|^IlseBot|Image.Stripper|Sucker|imagefetch|^InfoNaviRobot|^InfoTekies|^Intelliseek|^InterGET|^Iria|^Jakarta|^JennyBot|^JetCar|JikeSpider|^JOC|^JustView|^Jyxobot|^Kenjin.Spider|^Keyword.Density|libwww|^larbin|LeechFTP|LeechGet|^LexiBot|^lftp|^libWeb|^likse|^LinkextractorPro|^LinkScan|^LNSpiderguy|^LinkWalker|msnbot|MSIECrawler|MJ12bot|MegaIndex|^Magnet|^Mag-Net|^MarkWatch|Mass.Downloader|masscan|^Mata.Hari|^Memo|^MIIxpc|^NAMEPROTECT|^Navroad|^NearSite|^NetAnts|^Netcraft|^NetMechanic|^NetSpider|^NetZIP|^NextGenSearchBot|^NICErsPRO|^niki-bot|^NimbleCrawler|^Ninja|^Nmap|nmap|^NPbot|Offline.Explorer|Offline.Navigator|OpenLinkProfiler|^Octopus|^Openfind|^OutfoxBot|Pixray|probethenet|proximic|^PageGrabber|^pavuk|^pcBrowser|^Pockey|^ProPowerBot|^ProWebWalker|^psbot|^Pump|python-requests|^QueryN.Metasearch|^RealDownload|Reaper|^Reaper|^Ripper|Ripper|Recorder|^ReGet|^RepoMonkey|^RMA|scanbot|seoscanners|^Stripper|^Sucker|Siphon|Siteimprove|^SiteSnagger|SiteSucker|^SlySearch|^SmartDownload|^Snake|^Snapbot|^Snoopy|Sosospider|^sogou|spbot|^SpaceBison|^spanner|^SpankBot|Spinn3r|^Sqworm|Sqworm|Stripper|Sucker|^SuperBot|SuperHTTP|^SuperHTTP|^Surfbot|^suzuran|^Szukacz|^tAkeOut|^Teleport|^Telesoft|^TurnitinBot|^The.Intraformant|^TheNomad|^TightTwatBot|^Titan|^True_Robot|^turingos|^URLy.Warning|^Vacuum|^VCI|VidibleScraper|^VoidEYE|^WebAuto|^WebBandit|^WebCopier|^WebEnhancer|^WebFetch|^Web.Image.Collector|^WebLeacher|^WebmasterWorldForumBot|WebPix|^WebReaper|^WebSauger|Website.eXtractor|^Webster|WebShag|^WebStripper|WebSucker|^WebWhacker|^WebZIP|Whack|Whacker|^Widow|Widow|WinHTTrack|^WISENutbot|WWWOFFLE|^WWWOFFLE|^WWW-Collector-E|^Xaldon|^Xenu|^Zade|^Zeus|ZmEu|^Zyborg|SemrushBot|^WebFuck|^MJ12bot|^majestic12|^WallpapersHD|SputnikBot|Crowsnest|PaperLiBot|peerindex|ia_archiver|Slurp|Aport|NING|JS-Kit|rogerbot|BLEXBot|Twiceler|Java|CommentReader|Yeti|BTWebClient|Tagoobot|Ezooms|igdeSpyder|AhrefsBot|Offline|DISCo|netvampire|^Copier|omgili|socialmediascanner|Jooblebot|SeznamBot|Scrapy|CCBot|linkfluence|veoozbot|Leikibot|Seopult|Faraday|hybrid|Go-http-client|SMUrlExpander|SNAPSHOT|getintent|ltx71|Nuzzel|SMTBot|Laserlikebot|facebookexternalhit|mfibot|OptimizationCrawler|crazy|Dispatch|ubermetrics|^HTMLParser|musobot|filterdb|InfoSeek|omgilibot|DomainSigma|SafeSearch|meanpathbot|statdom|spredbot|StatOnlineRuBot|openstat|DeuSu|semantic|postano|Embedly|NewShareCounts|linkdexbot|GrapeshotCrawler|Digincore|NetSeer|help.jp|getprismatic|Ahrefs|ApacheBench|Applebot|archive|BaiduBot|Birubot|bsalsa|Butterfly|Buzzbot|BuzzSumo|CamontSpider|curl|dataminr|DomainTools|DotBot|FairShare|FeedFetcher|FlaxCrawler|FlightDeckReportsBot|FlipboardProxy|FyberSpider|Gigabot|gold\ crawler|InternetSeer|Jakarta|km.ru|kmSearchBot|Kraken|larbin|Lightspeedsystems|Linguee|LinkBot) {
return 403;
}


if ($http_user_agent ~ LinkExchanger|bingbot|msnbot|LinkpadBot|LivelapBot|LoadImpactPageAnalyzer|majestic|Mediatoolkitbot|MetaURI|MLBot|NerdByNature|NjuiceBot|Nutch|OpenHoseBot|Panopta|pflab|PHP/|pirst|PostRank|ptd-crawler|Purebot|PycURL|Python|QuerySeekerSpider|Ruby|SearchBot|SISTRIX|SiteBot|Sogou|solomono|Soup|suggybot|Superfeedr|SurveyBot|SWeb|trendictionbot|TSearcher|ttCrawler|TurnitinBot|TweetmemeBot|UnwindFetchor|urllib|uTorrent|Voyager|WBSearchBot|Wget|WordPress|woriobot|YottosBot|Zeus|zitebot|bingot|mail.ru|tut.by|Br.by|Zubr.com|All.by|Tit.by|21.by|Rambler|Lycos|nigma.ru|Yahoo|alexa.com|archiver|LiveInternet|BegunAdvertising|vkShare|WebArtexBot|Web-Monitoring|Runet-Research-Crawler|YandexDirect|SputnikFaviconBot|CNCat|Virusdie|YoudaoBot|WorldSearch|YandexVideo|YandexMarket|Wotbox|securepoint|Facebot|YandexWebmaster|AltaVista|Bot|Custo|Demon|eCatch|WebWhacker|Express|WebPictures|ExtractorPro|FlashGet|GetRight|GetWeb!|Go!Zilla|Go-Ahead-Got-It|rafula|Stripper|Indy|Spider|Vampire|Foto|WebSpider|WebGo|Quester|Twengabot|perl|scan|email|Pyth|PyQ|WebCollector|WebCopy|webcraw|AcoonBot|adbeat_bot|AddThis.com|adidxbot|ADmantX|ExpertSearch|ExpertSearchSpider|extract|F2S|FastSeek|feedfinder|FeedlyBot|finbot|Flamingo_SearchEngine|FlappyBot|flicky|Flipboard|g00g1e|genieo|Genieo|GigablastOpenSource|GozaikBot|grab|GT::WWW|GTB5|Guzzle|harvest|heritrix|HomePageBot|HTTP::Lite|HubSpot|icarus6|id-search|IDBot|IlseBot|Indigonet|integromedb|IRLbot|ISC\ Systems\ iRc\ Search\ 2.1|JobdiggerSpider|JOC\ Web\ Spider|Jorgee|kanagawa|KINGSpider|kmccrew|mailto:craftbot@yahoo.com|AngloINFO|Antelope|BeetleBot|billigerbot|binlar|bitlybot|BLP_bbot|BoardReader|Bolt\ 0|BOT\ for\ JCE|Bot\ casper|CazoodleBot|checkprivacy|chromeframe|Clerkbot|Cliqzbot|clshttp|DTS.Agent|EasouSpider|ecxi|Elmer|ExaleadCloudView|CommonCrawler|comodo|crawler4j|Crawlera|CRAZYWEBCRAWLER|Curious|CWS_proxy|Default\ Browser\ 0|diavol|DigExt|DoCoMo|DotBot) {
return 403;
}
#...... ЕЩЕ КОД
}

Скопируйте, вставьте в файл nginx.conf или файл конфигурации Вашего домена/поддомена, затем потестите конфигурацию:

nginx -t

Затем мягко перезапустите сервер:

service nginx restart

или

/etc/init.d/nginx restart
Список плохих ботов