Nginx屏蔽个别User-Agent蜘蛛访问网站的方法

发表时间：2015-02-09 11:46 | 分类：Linux | 浏览：4,330 次

对于做国内站的我来说，我不希望国外蜘蛛来访问我的网站，特别是个别垃圾蜘蛛，它们访问特别频繁。这些垃圾流量多了之后，严重浪费服务器的带宽和资源。通过判断user agent，在nginx中禁用这些蜘蛛可以节省一些流量，也可以防止一些恶意的访问。

步骤

1、进入nginx的配置目录，例如cd /usr/local/nginx/conf

2、添加agent_deny.conf配置文件

#禁止Scrapy等工具的抓取
if ($http_user_agent ~* (Scrapy|Curl|HttpClient)) {
        return 403;
}
#禁止指定UA及UA为空的访问
if ($http_user_agent ~ "FeedDemon|JikeSpider|Indy Library|Alexa Toolbar|AskTbFXTV|AhrefsBot|CrawlDaddy|CoolpadWebkit|Java|Feedly|UniversalFeedParser|ApacheBench|Microsoft URL Control|Swiftbot|ZmEu|oBot|jaunty|Python-urllib|lightDeckReports Bot|YYSpider|DigExt|YisouSpider|HttpClient|MJ12bot|heritrix|EasouSpider|LinkpadBot|Ezooms|^$" )
{
        return 403;
}
#禁止非GET|HEAD|POST方式的抓取
if ($request_method !~ ^(GET|HEAD|POST)$) {
        return 403;
}

3、在网站相关配置文件中插入代码“include agent_deny.conf ;”。

location ~ [^/]\.php(/|$)
{
    try_files $uri =404;
    fastcgi_pass  unix:/tmp/php-cgi.sock;
    fastcgi_index index.php;
    include fastcgi.conf;
    include agent_deny.conf ;
}

4、重新加载nginx

/etc/init.d/nginx reload

测试

通过curl模拟蜘蛛抓取访问。

root@vm-199:~# curl -I -A "BaiduSpider" blog.nbhao.org
HTTP/1.1 200 OK
Server: nginx
Date: Mon, 09 Feb 2015 03:37:20 GMT
Content-Type: text/html; charset=UTF-8
Connection: keep-alive
Vary: Accept-Encoding
X-Powered-By: PHP/5.5.19
Vary: Accept-Encoding, Cookie
Cache-Control: max-age=3, must-revalidate
WP-Super-Cache: Served supercache file from PHP

root@vm-199:~# curl -I -A "JikeSpider" blog.nbhao.org           
HTTP/1.1 403 Forbidden
Server: nginx
Date: Mon, 09 Feb 2015 03:37:44 GMT
Content-Type: text/html
Content-Length: 162
Connection: keep-alive

root@vm-199:~# curl -I -A "" blog.nbhao.org             
HTTP/1.1 403 Forbidden
Server: nginx
Date: Mon, 09 Feb 2015 03:37:52 GMT
Content-Type: text/html
Content-Length: 162
Connection: keep-alive

nginx日志上的效果如下。

20150209114009

到这里，nginx通过判断User-Agent屏蔽蜘蛛访问网站就已经完成，可以根据实际情况对agent_deny.conf中的蜘蛛进行增加、删除或者修改。

本文标签：Nginx

本文链接：https://www.sijitao.net/1930.html

欢迎您在本博客中留下评论，如需转载原创文章请注明出处，谢谢！

下一篇：Ubuntu中webbench安装及简单使用
上一篇：Linux网络流量监测iftop的使用及参数介绍

现在只有1个回复

Comment (1)

Trackbacks (0)

ken.z 　( 2015.02.10 13:11 ) : #-9

好东西，正好需要。。。

还没有Trackbacks

日历
2025年四月

一二三四五六日

« 十

1 2 3 4 5 6

7 8 9 10 11 12 13

14 15 16 17 18 19 20

21 22 23 24 25 26 27

28 29 30
标签
360 apache CentOS chrome Fail2ban find Firefox GAE Gmail Google htaccess Life linux MongoDB MSN Mysql nagios Nginx PHP Postfix PostgresQL Python QQ Redis SEO Shell SQL ssl tomcat ubuntu virtualbox VPS windows Wordpress XML Zabbix 主机代理发牢骚域名小百科搜索热门百度邮箱

2025年四月
一	二	三	四	五	六	日
« 十
	1	2	3	4	5	6
7	8	9	10	11	12	13
14	15	16	17	18	19	20
21	22	23	24	25	26	27
28	29	30

Nginx屏蔽个别User-Agent蜘蛛访问网站的方法

步骤

测试

日历

标签

最新发表