在网站运维过程中,需明确区分正常搜索引擎蜘蛛与恶意爬虫的差异。正常蜘蛛(如百度蜘蛛、Googlebot等)对网站收录及搜索引擎排名至关重要,随意屏蔽可能导致网站权重下降、流量流失,甚至造成客户资源损失。若当前服务器资源面临压力,建议优先考虑升级虚拟主机套餐以提升流量配额,或迁移至云服务器架构(支持不限流量配置),从硬件层面优化访问承载能力。更多服务器升级方案可参考:http://www.west.cn/faq/list.asp?unid=626。
针对需要屏蔽特定蜘蛛的场景,需先完成伪静态组件的配置部署。若使用网站管理助手搭建环境,请参照指南启用伪静态组件:http://www.west.cn/faq/list.asp?unid=650;对于Windows Server 2003 + IIS手工建站环境,可参考:http://www.west.cn/faq/list.asp?unid=639 完成组件加载。完成前置配置后,需根据服务器操作系统类型,在对应配置文件中添加屏蔽规则。
Linux环境(Apache/Nginx)
在站点根目录创建`.htaccess`文件,添加如下规则:
```apache
RewriteEngine On
#Block spider
RewriteCond %{HTTP_USER_AGENT} "SemrushBot|Webdup|AcoonBot|AhrefsBot|Ezooms|EdisterBot|EC2LinkFinder|jikespider|Purebot|MJ12bot|WangIDSpider|WBSearchBot|Wotbox|xbfMozilla|Yottaa|YandexBot|Jorgee|SWEBot|spbot|TurnitinBot-Agent|mail.RU|curl|perl|Python|Wget|Xenu|ZmEu" [NC]
RewriteRule !(^robots\.txt$) - [F]
```
Windows Server 2003 + IIS环境
在`httpd.conf`文件中配置规则:
```apache
#Block spider
RewriteCond %{HTTP_USER_AGENT} (SemrushBot|Webdup|AcoonBot|AhrefsBot|Ezooms|EdisterBot|EC2LinkFinder|jikespider|Purebot|MJ12bot|WangIDSpider|WBSearchBot|Wotbox|xbfMozilla|Yottaa|YandexBot|Jorgee|SWEBot|spbot|TurnitinBot-Agent|mail.RU|curl|perl|Python|Wget|Xenu|ZmEu) [NC]
RewriteRule !(^/robots.txt$) - [F]
```
Windows Server 2008 + IIS环境
在`web.config`文件中添加规则:
```xml
```
Nginx环境
在站点配置文件的`server`段内添加规则:
```nginx
if ($http_user_agent ~ "Bytespider|Java|PhantomJS|SemrushBot|Scrapy|Webdup|AcoonBot|AhrefsBot|Ezooms|EdisterBot|EC2LinkFinder|jikespider|Purebot|MJ12bot|WangIDSpider|WBSearchBot|Wotbox|xbfMozilla|Yottaa|YandexBot|Jorgee|SWEBot|spbot|TurnitinBot-Agent|mail.RU|perl|Python|Wget|Xenu|ZmEu|^$")
{
return 444;
}
```
注:上述规则默认屏蔽部分非必要蜘蛛,如需扩展屏蔽范围,可根据规则格式添加目标蜘蛛特征标识。
附:常见搜索引擎蜘蛛名称参考
- Google:googlebot
- 百度:baiduspider、baiduboxapp(移动端)
- Yahoo:slurp
- Alexa:ia_archiver
- Bing:bingbot
- 有道:YodaoBot、OutfoxBot
- 搜狗:sogou spider
- SOSO:sosospider
- 360:360spider
来源:西部数码