<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	>

<channel>
	<title>Kaizeku Ban &#187; Web Crawler</title>
	<atom:link href="http://blog.kaizeku.com/topics/search_engine/webcrawler/feed/" rel="self" type="application/rss+xml" />
	<link>http://blog.kaizeku.com</link>
	<description>So many evil plans, so little time...</description>
	<pubDate>Sat, 13 Dec 2008 17:01:04 +0000</pubDate>
	<generator>http://wordpress.org/?v=2.7</generator>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<xhtml:meta xmlns:xhtml="http://www.w3.org/1999/xhtml" name="robots" content="noindex" />
	<meta xmlns="http://pipes.yahoo.com" name="pipes" content="noprocess" />
	<image>
		<link>http://blog.kaizeku.com/</link>
		<url>http://i.istalker.net/1.6.2/stamp.png</url>
		<title>Kaizeku Ban</title>
	</image>
		<item>
		<title>Most Common Trackback Spammer</title>
		<link>http://blog.kaizeku.com/wordpress/most-common-trackback-spammer/</link>
		<comments>http://blog.kaizeku.com/wordpress/most-common-trackback-spammer/#comments</comments>
		<pubDate>Wed, 05 Nov 2008 21:55:44 +0000</pubDate>
		<dc:creator>Avice</dc:creator>
		
		<category><![CDATA[Web Crawler]]></category>

		<category><![CDATA[WordPress]]></category>

		<category><![CDATA[ranting]]></category>

		<category><![CDATA[env_if]]></category>

		<category><![CDATA[htaccess]]></category>

		<category><![CDATA[tips]]></category>

		<category><![CDATA[wpistalker]]></category>

		<guid isPermaLink="false">http://blog.kaizeku.com/?p=513</guid>
		<description><![CDATA[If you used the latest version of WP-iStalker theme it will show you the client&#8217;s user-agent for sending trackback and pingback. The small signature is display on the comment list &#038; Akismet.

 I actually made this little feature for separating WordPress client pingback and trackback comments. I&#8217;m positive that you can also used it for [...]]]></description>
			<content:encoded><![CDATA[<p>If you used the latest version of <a href="http://wp.istalker.net/" rel="external" title="WP-iStalker">WP-iStalker</a> theme it will show you the client&#8217;s user-agent for sending trackback and pingback. The small signature is display on the comment list &#038; <a href="http://akismet.com" title="Akismet" rel="external">Akismet</a>.</p>
<p><img src="http://blog.kaizeku.com/wp-content/uploads/2008/11/trackback-wpistalker.png" alt="" title="trackback-wpistalker" width="500" height="48" class="alignleft size-full wp-image-530" longdesc="/" /></p>
<p class="mgt pdt cb"> I actually made this little feature for separating WordPress client pingback and trackback comments. I&#8217;m positive that you can also used it for comparing real trackback from spammer.</p>
<p><span id="more-513"></span></p>
<div style="width: 410px;" class="wp-caption aligncenter" id="attachment_523"><img height="207" width="400" class="size-full wp-image-523" title="code-injection" alt="Most shell bots used perl http package" src="http://blog.kaizeku.com/wp-content/uploads/2008/11/code-injection.gif"/>
<p class="wp-caption-text">Most shell bots used perl http package</p>
</div>
<h6 class="pdt">Here come the stats</h6>
<p>According to my made-in-france bot tracker, <a href="http://ftp.ics.uci.edu/pub/websoft/libwww-perl/" rel="external" title=" libwww-perl@ics.uci.edu">libwww-perl</a> is the most abused clients for sending code injection and <a href="http://jakarta.apache.org/" rel="external" title="Jakarta commons">Jakarta Commons-HttpClient</a> is largely used for sending trackback <abbr title="SPAM | Simply Pointless Annoying Message" class="ttip">SPAM</abbr>. </p>
<div style="width: 410px;" class="wp-caption aligncenter" id="attachment_521"><img height="234" width="400" class="size-full wp-image-521" title="trackback" alt="90% are trackback spam" src="http://blog.kaizeku.com/wp-content/uploads/2008/11/trackback.gif"/>
<p class="wp-caption-text">90% are trackbacks spam</p>
</div>
<h6 class="mgt">Simple rules</h6>
<p>User agent string can be easily spoof so the target victim wont know the actual client. If the client is not coming from a blogging software or CMS its 90% spam oriented. And if the client is using generic user agent software HTTP package likes libwww-perl, Larbin, Nutch, et cetera it should be ignore. </p>
<h6 class="mgt">How to banned user agent with htaccess</h6>
<p>Banning these two user agents reduces my daily trackback spams to bearable amount. I&#8217;m gladly say these two user agent is the cause of many bandwidth misery. Both client is now honored in my <tt>htaccess</tt> rules &darr;</p>
<pre class="prebox" style="height:186px">&#60;IfModule mod_setenvif&#46;c&#62;
SetEnvIfNoCase User&#45;Agent &#34;&#94;libwww&#45;perl&#42;&#34; shell_bots&#61;1
SetEnvIfNoCase User&#45;Agent &#34;&#94;Jakarta&#42;&#34; shell_bots&#61;1
&#60;&#47;IfModule&#62;

&#60;FilesMatch &#34;&#40;&#46;&#42;&#41;&#34;&#62;
Order Allow&#44;Deny
Allow from all
Deny from env&#61;shell_bots
&#60;&#47;FilesMatch&#62;</pre>]]></content:encoded>
			<wfw:commentRss>http://blog.kaizeku.com/wordpress/most-common-trackback-spammer/feed/</wfw:commentRss>
		</item>
		<item>
		<title>Kaizeku Crawler lists</title>
		<link>http://blog.kaizeku.com/search_engine/webcrawler/kaizeku-crawler-lists/</link>
		<comments>http://blog.kaizeku.com/search_engine/webcrawler/kaizeku-crawler-lists/#comments</comments>
		<pubDate>Mon, 02 Jul 2007 02:58:32 +0000</pubDate>
		<dc:creator>Avice</dc:creator>
		
		<category><![CDATA[Web Crawler]]></category>

		<guid isPermaLink="false">http://blog.kaizeku.com/animepaper/kaizeku-crawler-lists/</guid>
		<description><![CDATA[ (also known as a Web spider or Web robot) is a program or automated script which browses the World Wide Web in a methodical, automated manner. Other less frequently used names for Web crawlers are ants, automatic indexers, bots, and worms (Kobayashi and Takeda, 2000).

This process is called Web crawling or spidering. Many sites, in particular search engines, use spidering as a means of providing up-to-date data. Web crawlers are mainly used to create a copy of all the visited pages for later processing by a search engine that will index the downloaded pages to provide fast searches. Crawlers can also be used for automating maintenance tasks on a Web site, such as checking links or validating HTML code. Also, crawlers can be used to gather specific types of information from Web pages, such as harvesting e-mail addresses (usually for spam).

A Web crawler is one type of bot, or software agent. In general, it starts with a list of URLs to visit, called the seeds. As the crawler visits these URLs, it identifies all the hyperlinks in the page and adds them to the list of URLs to visit, called the crawl frontier. URLs from the frontier are recursively visited according to a set of policies.]]></description>
			<content:encoded><![CDATA[<h2>Jul,2007 - WebCrawler</h2>
<ul>
<li><strong>Xirq</strong> xirq/0.1-beta (xirq; http://www.xirq.com; xirq@xirq.com)</li>
<li><strong>WebSearchBench</strong> WebSearchBench WebCrawler V1.0 (Beta), Prof. Dr.-Ing. Christoph Lindemann, Universität Dortmund, cl@cs.uni-dortmund.de, http://websearchbench.cs.uni-dortmund.de/</li>
<li>Yahoo Search Japan robot Y!J-BSC/1.0 (http://help.yahoo.co.jp/help/jp/search/indexing/indexing-15.html)</li>
<li>NimbleCrawler Mozilla/5.0 (Windows; U; Windows NT 5.0; en-US; rv:1.7.7) NimbleCrawler 1.11 obeys UserAgent NimbleCrawler For problems contact: crawler_at_dataalchemy.com</li>
<li>Fastbot fastbot crawler beta 2.0 (+http://www.fastbot.de)</li>
<li>Gigabot Gigabot/2.0/gigablast.com/spider.html</li>
<li>Jambot Jambot/0.1.1 (Jambot; http://www.jambot.com/blog; crawler@jambot.com)</li>
</ul>
<p><span id="more-46"></span></p>
<ul>
<li>Netluchs Netluchs/0.8-dev ( ; http://www.netluchs.de/; ___don\&#8217;t___spam_me_@netluchs.de)</li>
<li>NutchEC2Test NutchEC2Test/Nutch-0.9-dev (Testing Nutch on Amazon EC2.; http://lucene.apache.org/nutch/bot.html; ec2test at lucene.com)</li>
<li>Bigsearch Bigsearch.ca/Nutch-0.9-dev (Bigsearch.ca Internet Spider; http://www.bigsearch.ca/; info@enhancededge.com)</li>
<li>UKWizz UKWizz/Nutch-0.8.1 (UKWizz Nutch crawler; http://www.ukwizz.com/)</li>
<li>Ilial/Nutch ilial/Nutch-0.9 (Ilial, Inc. is a Los Angeles based Internet startup company. For more information please visit http://www.ilial.com/crawler; http://www.ilial.com/crawler; crawl@ilial.com)</li>
<li>Pmoz Mozilla/5.0 (compatible; pmoz.info ODP link checker; +http://pmoz.info/doc/botinfo.htm)</li>
<li>Holmes holmes/3.11 (OnetSzukaj/5.0; +http://szukaj.onet.pl)</li>
<li>Flatlandbot flatlandbot/flatlandbot (Flatland Industries Web Spider; http://www.flatlandindustries.com/flatlandbot.php; jason@flatlandindustries.com)</li>
<li>IDBot Mozilla/5.0 (compatible; IDBot/1.0; +http://www.id-search.org/bot.html)</li>
<li>Spam Bot Mozilla/2.0 (compatible; NEWT ActiveX; Win32)</li>
<li>Greaterera Mozilla/5.0 (compatible; heritrix/1.7.0 +http://www.greaterera.com/)</li>
<li>GEXTEST-00393 gsa-crawler (Enterprise; GEXTEST-00393; gsasymbiosys@gmail.com,xeonbox4@gmail.com)</li>
<li>Pagebull Pagebull http://www.pagebull.com/</li>
<li>RSS One Engine RSS One Engine/0.72 (+http://www.rss-one.com)</li>
<li>Dodgebot dodgebot/experimental</li>
<li>Bot bot/1.0 (bot; http://; bot@bot.bot)</li>
<li>Bigsearch Bigsearch.ca/Nutch-1.0-dev (Bigsearch.ca Internet Spider; http://www.bigsearch.ca/; info@enhancededge.com)</li>
<li>FindLinks findlinks/1.1.4-beta1 ( http://wortschatz.uni-leipzig.de/findlinks/)</li>
<li>ConveraCrawler ConveraCrawler/0.9e ( http://www.authoritativeweb.com/crawl)</li>
<li>Blaiz-Bee Blaiz-Bee/2.00.5622 ( http://www.blaiz.net)</li>
<li>KIT_Fireball KIT_Fireball/2.0</li>
<li>ICC-Crawler ICC-Crawler(Mozilla-compatible;http://kc.nict.go.jp/icc/crawl.html;icc-crawl-contact(at)ml(dot)nict(dot)go(dot)jp)</li>
<li>Pubblisito info@pubblisito.com- (http://www.pubblisito.com) il Sud dei Motori di Ricerca</li>
<li>SkreemRBot Mozilla/5.0 (compatible; SkreemRBot +http://skreemr.com)</li>
<li>WebAlta Crawler WebAlta Crawler/1.3.33 (http://www.webalta.net/ru/about_webmaster.html) (Windows; U; Windows NT 5.1; ru-RU)</li>
<li>Pumpkin blogsearchbot-pumpkin-3</li>
<li>Mail.Ru Mail.Ru/1.0</li>
<li>Mammoth Mozilla/5.0 (+http://www.eurekster.com/mammoth) Mammoth/0.1</li>
<li>Attentio Attentio/Nutch-0.9-dev (Attentio\&#8217;s beta blog crawler; www.attentio.com; info@attentio.com)</li>
<li>GurujiBot GurujiBot/1.0 (+http://www.guruji.com/en/WebmasterFAQ.html)</li>
<li>Gigabot Gigabot/3.0 (http://www.gigablast.com/spider.html)</li>
<li>Jobs.de-Robot Mozilla/5.0 (compatible; jobs.de-Robot http://www.jobs.de; jobsde@jobscout24.de) ( newsexpress e-mail: newsexpress-l@neofonie.de http://www.neofonie.de/loesungen/search/robot.html )</li>
<li>ArabyBot ArabyBot (compatible; Mozilla/5.0; GoogleBot; FAST Crawler 6.4; http://www.araby.com;)</li>
<li>VWBOT VWBOT/Nutch-0.9-dev (VWBOT Nutch Crawler; http://vwbot.cs.uiuc.edu;+vwbot@cs.uiuc.edu</li>
<li>IWAgent IWAgent/ 1.0 - www.brandprotect.com</li>
<li>Sirketcebot Sirketcebot/v.01 (http://www.sirketce.com/bot.html)</li>
<li>Spock Crawler Spock Crawler (http://www.spock.com/crawler)</li>
<li>Flatlandbot great-plains-web-spider/flatlandbot (Flatland Industries Web Spider; http://www.flatlandindustries.com/flatlandbot.php; jason@flatlandindustries.com)</li>
<li>Nebulla Nebullabot/2.2 (http://bot.nebulla.de)</li>
<li>EasyDL EasyDL/3.04 http://keywen.com/Encyclopedia/Bot</li>
<li>LapozzBot LapozzBot/1.4 (+http://robot.lapozz.hu)</li>
<li>WWW.fi crawler www.fi crawler, contact crawler@www.fi</li>
<li>Uni-koblenz http://www.uni-koblenz.de/~flocke/robot-info.txt</li>
<li>NimbleCrawler Mozilla/5.0 (Windows;) NimbleCrawler 2.0.1 obeys UserAgent NimbleCrawler For problems contact: crawler@healthline.com</li>
<li>YodaoBot Mozilla/5.0 (compatible; YodaoBot/1.0; http://www.yodao.com/help/webmaster/spider/; )</li>
<li>DAUM RSS Robot ELI/20070402:2.0 (DAUM RSS Robot, Daum Communications Corp.; +http://ws.daum.net/aboutkr.html)</li>
<li>DAUM Web Robot Mozilla/4.0 (compatible; MSIE enviable; DAUMOA/1.0.1; DAUM Web Robot; Daum Communications Corp., Korea; +http://ws.daum.net/aboutkr.html)</li>
<li>Changedetection Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1; http://www.changedetection.com/bot.html )</li>
<li>ICC-Crawler ICC-Crawler(Mozilla-compatible; http://kc.nict.go.jp/icc/crawl.html; icc-crawl-contact(at)ml(dot)nict(dot)go(dot)jp)</li>
<li>Semager Semager/1.1 (http://www.semager.de/blog/semager-bots/)</li>
<li>Multicrawler multicrawler ( http://sw.deri.org/2006/04/multicrawler/robots.html)</li>
<li>NetinfoBot NetinfoBot/1.0 (http://netinfo.bg/netinfobot.html)</li>
<li>Envolkspider envolk/1.7 (+http://www.envolk.com/envolkspiderinfo.html)</li>
<li>CazoodleBot CazoodleBot/CazoodleBot-0.1 (CazoodleBot Crawler; http://www.cazoodle.com/cazoodlebot; cazoodlebot@cazoodle.com)</li>
<li>RutterBot RutterBot(+http://www.aktienbetreuer.de/bot.html)</li>
<li>Worio bot Mozilla/5.0 (compatible; woriobot heritrix/1.10.0 +http://worio.com)</li>
<li>Tags2dir tags2dir.com/0.8 (+http://tags2dir.com/directory/)</li>
<li>Combine Combine/3 http://combine.it.lth.se/</li>
<li>Lawinfo-crawler lawinfo-crawler/Nutch-0.9-dev (Crawler for lawinfo.com pages; http://www.lawinfo.com; webmaster@lawinfo.com)</li>
<li>FuseBulb FuseBulb.Com</li>
<li>Earthcom Mozilla/5.0 (compatible; EARTHCOM/2.2; +http://enter4u.eu)</li>
<li>Askpeter_bot Mozilla/5.0 (compatible; askpeter_bot/3.2; +http://www.askpeter.info)</li>
<li>LapozzBot LapozzBot/1.5 (+http://robot.lapozz.hu)</li>
<li>FAST-WebCrawler FAST Enterprise Crawler/6.4.18 (crawler@fast.no)</li>
<li>BuiltWith Mozilla/5.0 (compatible; BuiltWith/0.1; +http://builtwith.com/bot.html)</li>
<li>Hiiglespider Hiiglespider/0.1, Hiigle.com, http://hiigle.com/spider</li>
<li>Page-store Mozilla/5.0 (compatible; heritrix/1.12.1 +http://www.page-store.com)</li>
<li>Metacarta Mozilla/5.0 (compatible; heritrix/1.5 +http://www.metacarta.com)</li>
<li>Multicrawler multicrawler (+http://sw.deri.org/2006/04/multicrawler/robots.html)</li>
<li>LibertyW LibertyW (+http://www.libertyw.eu)</li>
<li>BlogRefsBot Mozilla/5.0 (compatible; BlogRefsBot/0.1; http://www.blogrefs.com/about/bloggers)</li>
<li>Holmes holmes/3.11 (http://morfeo.centrum.cz/bot)</li>
<li>DataparkSearch DataparkSearch/4.47 (+http://dataparksearch.org/bot)</li>
<li>ImageWalker ImageWalker/2.0 (www.bdbrandprotect.com)</li>
<li>SeznamBot SeznamBot/2.0-test (+http://fulltext.sblog.cz/)</li>
<li>Entireweb Speedy Spider (http://www.entireweb.com/about/search_tech/speedy_spider/)</li>
<li>BrightCrawler BrightCrawler (http://www.brightcloud.com/brightcrawler.asp)</li>
<li>BabalooSpider BabalooSpider/1.2 (BabalooSpider; http://www.babaloo.si; spider@babaloo.si)</li>
</ul>
<h2>February 2008 - Crawler List</h2>
<table cellpadding="0" cellspacing="0" width="750">
<tbody>
<tr>
<td class="tableau1">Crawlers</td>
<td class="tableau2">User agent</td>
</tr>
<tr>
<td class="tableau3">Xirq</td>
<td class="tableau5">xirq/0.1-beta (xirq; http://www.xirq.com; xirq@xirq.com)</td>
</tr>
<tr>
<td class="tableau30">WebSearchBench</td>
<td class="tableau50">WebSearchBench WebCrawler V1.0 (Beta), Prof. Dr.-Ing. Christoph Lindemann, Universität Dortmund, cl@cs.uni-dortmund.de, http://websearchbench.cs.uni-dortmund.de/</td>
</tr>
<tr>
<td class="tableau3">Yahoo Search Japan robot</td>
<td class="tableau5">Y!J-BSC/1.0 (http://help.yahoo.co.jp/help/jp/search/indexing/indexing-15.html)</td>
</tr>
<tr>
<td class="tableau30">NimbleCrawler</td>
<td class="tableau50">Mozilla/5.0 (Windows; U; Windows NT 5.0; en-US; rv:1.7.7) NimbleCrawler 1.11 obeys UserAgent NimbleCrawler For problems contact: crawler_at_dataalchemy.com</td>
</tr>
<tr>
<td class="tableau3">Fastbot</td>
<td class="tableau5">fastbot crawler beta 2.0 (+http://www.fastbot.de)</td>
</tr>
<tr>
<td class="tableau30">Gigabot</td>
<td class="tableau50">Gigabot/2.0/gigablast.com/spider.html</td>
</tr>
<tr>
<td class="tableau3">Jambot</td>
<td class="tableau5">Jambot/0.1.1 (Jambot; http://www.jambot.com/blog; crawler@jambot.com)</td>
</tr>
<tr>
<td class="tableau30">Netluchs</td>
<td class="tableau50">Netluchs/0.8-dev ( ; http://www.netluchs.de/; ___don\&#8217;t___spam_me_@netluchs.de)</td>
</tr>
<tr>
<td class="tableau3">NutchEC2Test</td>
<td class="tableau5">NutchEC2Test/Nutch-0.9-dev (Testing Nutch on Amazon EC2.; http://lucene.apache.org/nutch/bot.html; ec2test at lucene.com)</td>
</tr>
<tr>
<td class="tableau30">Bigsearch</td>
<td class="tableau50">Bigsearch.ca/Nutch-0.9-dev (Bigsearch.ca Internet Spider; http://www.bigsearch.ca/; info@enhancededge.com)</td>
</tr>
<tr>
<td class="tableau3">UKWizz</td>
<td class="tableau5">UKWizz/Nutch-0.8.1 (UKWizz Nutch crawler; http://www.ukwizz.com/)</td>
</tr>
<tr>
<td class="tableau30">Ilial/Nutch</td>
<td class="tableau50">ilial/Nutch-0.9 (Ilial, Inc. is a Los Angeles based Internet startup company. For more information please visit http://www.ilial.com/crawler; http://www.ilial.com/crawler; crawl@ilial.com)</td>
</tr>
<tr>
<td class="tableau3">Pmoz</td>
<td class="tableau5">Mozilla/5.0 (compatible; pmoz.info ODP link checker; +http://pmoz.info/doc/botinfo.htm)</td>
</tr>
<tr>
<td class="tableau30">Holmes</td>
<td class="tableau50">holmes/3.11 (OnetSzukaj/5.0; +http://szukaj.onet.pl)</td>
</tr>
<tr>
<td class="tableau3">Flatlandbot</td>
<td class="tableau5">flatlandbot/flatlandbot (Flatland Industries Web Spider; http://www.flatlandindustries.com/flatlandbot.php; jason@flatlandindustries.com)</td>
</tr>
<tr>
<td class="tableau30">IDBot</td>
<td class="tableau50">Mozilla/5.0 (compatible; IDBot/1.0; +http://www.id-search.org/bot.html)</td>
</tr>
<tr>
<td class="tableau3">Spam Bot</td>
<td class="tableau5">Mozilla/2.0 (compatible; NEWT ActiveX; Win32)</td>
</tr>
<tr>
<td class="tableau30">Greaterera</td>
<td class="tableau50">Mozilla/5.0 (compatible; heritrix/1.7.0 +http://www.greaterera.com/)</td>
</tr>
<tr>
<td class="tableau3">GEXTEST-00393</td>
<td class="tableau5">gsa-crawler (Enterprise; GEXTEST-00393; gsasymbiosys@gmail.com,xeonbox4@gmail.com)</td>
</tr>
<tr>
<td class="tableau30">Pagebull</td>
<td class="tableau50">Pagebull http://www.pagebull.com/</td>
</tr>
<tr>
<td class="tableau3">RSS One Engine</td>
<td class="tableau5">RSS One Engine/0.72 (+http://www.rss-one.com)</td>
</tr>
<tr>
<td class="tableau30">Dodgebot</td>
<td class="tableau50">dodgebot/experimental</td>
</tr>
<tr>
<td class="tableau3">Bot</td>
<td class="tableau5">bot/1.0 (bot; http://; bot@bot.bot)</td>
</tr>
<tr>
<td class="tableau30">Bigsearch</td>
<td class="tableau50">Bigsearch.ca/Nutch-1.0-dev (Bigsearch.ca Internet Spider; http://www.bigsearch.ca/; info@enhancededge.com)</td>
</tr>
<tr>
<td class="tableau3">FindLinks</td>
<td class="tableau5">findlinks/1.1.4-beta1 ( http://wortschatz.uni-leipzig.de/findlinks/)</td>
</tr>
<tr>
<td class="tableau30">ConveraCrawler</td>
<td class="tableau50">ConveraCrawler/0.9e ( http://www.authoritativeweb.com/crawl)</td>
</tr>
<tr>
<td class="tableau3">Blaiz-Bee</td>
<td class="tableau5">Blaiz-Bee/2.00.5622 ( http://www.blaiz.net)</td>
</tr>
<tr>
<td class="tableau30">KIT_Fireball</td>
<td class="tableau50">KIT_Fireball/2.0</td>
</tr>
<tr>
<td class="tableau3">ICC-Crawler</td>
<td class="tableau5">ICC-Crawler(Mozilla-compatible;http://kc.nict.go.jp/icc/crawl.html;icc-crawl-contact(at)ml(dot)nict(dot)go(dot)jp)</td>
</tr>
<tr>
<td class="tableau30">Pubblisito</td>
<td class="tableau50">info@pubblisito.com- (http://www.pubblisito.com) il Sud dei Motori di Ricerca</td>
</tr>
<tr>
<td class="tableau3">SkreemRBot</td>
<td class="tableau5">Mozilla/5.0 (compatible; SkreemRBot +http://skreemr.com)</td>
</tr>
<tr>
<td class="tableau30">WebAlta Crawler</td>
<td class="tableau50">WebAlta Crawler/1.3.33 (http://www.webalta.net/ru/about_webmaster.html) (Windows; U; Windows NT 5.1; ru-RU)</td>
</tr>
<tr>
<td class="tableau3">Pumpkin</td>
<td class="tableau5">blogsearchbot-pumpkin-3</td>
</tr>
<tr>
<td class="tableau30">Mail.Ru</td>
<td class="tableau50">Mail.Ru/1.0</td>
</tr>
<tr>
<td class="tableau3">Mammoth</td>
<td class="tableau5">Mozilla/5.0 (+http://www.eurekster.com/mammoth) Mammoth/0.1</td>
</tr>
<tr>
<td class="tableau30">Attentio</td>
<td class="tableau50">Attentio/Nutch-0.9-dev (Attentio\&#8217;s beta blog crawler; www.attentio.com; info@attentio.com)</td>
</tr>
<tr>
<td class="tableau3">GurujiBot</td>
<td class="tableau5">GurujiBot/1.0 (+http://www.guruji.com/en/WebmasterFAQ.html)</td>
</tr>
<tr>
<td class="tableau30">Gigabot</td>
<td class="tableau50">Gigabot/3.0 (http://www.gigablast.com/spider.html)</td>
</tr>
<tr>
<td class="tableau3">Jobs.de-Robot</td>
<td class="tableau5">Mozilla/5.0 (compatible; jobs.de-Robot http://www.jobs.de; jobsde@jobscout24.de) ( newsexpress e-mail: newsexpress-l@neofonie.de http://www.neofonie.de/loesungen/search/robot.html )</td>
</tr>
<tr>
<td class="tableau30">ArabyBot</td>
<td class="tableau50">ArabyBot (compatible; Mozilla/5.0; GoogleBot; FAST Crawler 6.4; http://www.araby.com;)</td>
</tr>
<tr>
<td class="tableau3">VWBOT</td>
<td class="tableau5">VWBOT/Nutch-0.9-dev (VWBOT Nutch Crawler; http://vwbot.cs.uiuc.edu;+vwbot@cs.uiuc.edu</td>
</tr>
<tr>
<td class="tableau30">IWAgent</td>
<td class="tableau50">IWAgent/ 1.0 - www.brandprotect.com</td>
</tr>
<tr>
<td class="tableau3">Sirketcebot</td>
<td class="tableau5">Sirketcebot/v.01 (http://www.sirketce.com/bot.html)</td>
</tr>
<tr>
<td class="tableau30">Spock Crawler</td>
<td class="tableau50">Spock Crawler (http://www.spock.com/crawler)</td>
</tr>
<tr>
<td class="tableau3">Flatlandbot</td>
<td class="tableau5">great-plains-web-spider/flatlandbot (Flatland Industries Web Spider; http://www.flatlandindustries.com/flatlandbot.php; jason@flatlandindustries.com)</td>
</tr>
<tr>
<td class="tableau30">Nebulla</td>
<td class="tableau50">Nebullabot/2.2 (http://bot.nebulla.de)</td>
</tr>
<tr>
<td class="tableau3">EasyDL</td>
<td class="tableau5">EasyDL/3.04 http://keywen.com/Encyclopedia/Bot</td>
</tr>
<tr>
<td class="tableau30">LapozzBot</td>
<td class="tableau50">LapozzBot/1.4 (+http://robot.lapozz.hu)</td>
</tr>
<tr>
<td class="tableau3">WWW.fi crawler</td>
<td class="tableau5">www.fi crawler, contact crawler@www.fi</td>
</tr>
<tr>
<td class="tableau30">Uni-koblenz</td>
<td class="tableau50">http://www.uni-koblenz.de/~flocke/robot-info.txt</td>
</tr>
<tr>
<td class="tableau3">NimbleCrawler</td>
<td class="tableau5">Mozilla/5.0 (Windows;) NimbleCrawler 2.0.1 obeys UserAgent NimbleCrawler For problems contact: crawler@healthline.com</td>
</tr>
<tr>
<td class="tableau30">YodaoBot</td>
<td class="tableau50">Mozilla/5.0 (compatible; YodaoBot/1.0; http://www.yodao.com/help/webmaster/spider/; )</td>
</tr>
<tr>
<td class="tableau3">DAUM RSS Robot</td>
<td class="tableau5">ELI/20070402:2.0 (DAUM RSS Robot, Daum Communications Corp.; +http://ws.daum.net/aboutkr.html)</td>
</tr>
<tr>
<td class="tableau30">DAUM Web Robot</td>
<td class="tableau50">Mozilla/4.0 (compatible; MSIE enviable; DAUMOA/1.0.1; DAUM Web Robot; Daum Communications Corp., Korea; +http://ws.daum.net/aboutkr.html)</td>
</tr>
<tr>
<td class="tableau3">Changedetection</td>
<td class="tableau5">Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1; http://www.changedetection.com/bot.html )</td>
</tr>
<tr>
<td class="tableau30">ICC-Crawler</td>
<td class="tableau50">ICC-Crawler(Mozilla-compatible; http://kc.nict.go.jp/icc/crawl.html; icc-crawl-contact(at)ml(dot)nict(dot)go(dot)jp)</td>
</tr>
<tr>
<td class="tableau3">Semager</td>
<td class="tableau5">Semager/1.1 (http://www.semager.de/blog/semager-bots/)</td>
</tr>
<tr>
<td class="tableau30">Multicrawler</td>
<td class="tableau50">multicrawler ( http://sw.deri.org/2006/04/multicrawler/robots.html)</td>
</tr>
<tr>
<td class="tableau3">NetinfoBot</td>
<td class="tableau5">NetinfoBot/1.0 (http://netinfo.bg/netinfobot.html)</td>
</tr>
<tr>
<td class="tableau30">Envolkspider</td>
<td class="tableau50">envolk/1.7 (+http://www.envolk.com/envolkspiderinfo.html)</td>
</tr>
<tr>
<td class="tableau3">CazoodleBot</td>
<td class="tableau5">CazoodleBot/CazoodleBot-0.1 (CazoodleBot Crawler; http://www.cazoodle.com/cazoodlebot; cazoodlebot@cazoodle.com)</td>
</tr>
<tr>
<td class="tableau30">RutterBot</td>
<td class="tableau50">RutterBot(+http://www.aktienbetreuer.de/bot.html)</td>
</tr>
<tr>
<td class="tableau3">Worio bot</td>
<td class="tableau5">Mozilla/5.0 (compatible; woriobot heritrix/1.10.0 +http://worio.com)</td>
</tr>
<tr>
<td class="tableau30">Tags2dir</td>
<td class="tableau50">tags2dir.com/0.8 (+http://tags2dir.com/directory/)</td>
</tr>
<tr>
<td class="tableau3">Combine</td>
<td class="tableau5">Combine/3 http://combine.it.lth.se/</td>
</tr>
<tr>
<td class="tableau30">Lawinfo-crawler</td>
<td class="tableau50">lawinfo-crawler/Nutch-0.9-dev (Crawler for lawinfo.com pages; http://www.lawinfo.com; webmaster@lawinfo.com)</td>
</tr>
<tr>
<td class="tableau3">FuseBulb</td>
<td class="tableau5">FuseBulb.Com</td>
</tr>
<tr>
<td class="tableau30">Earthcom</td>
<td class="tableau50">Mozilla/5.0 (compatible; EARTHCOM/2.2; +http://enter4u.eu)</td>
</tr>
<tr>
<td class="tableau3">Askpeter_bot</td>
<td class="tableau5">Mozilla/5.0 (compatible; askpeter_bot/3.2; +http://www.askpeter.info)</td>
</tr>
<tr>
<td class="tableau30">LapozzBot</td>
<td class="tableau50">LapozzBot/1.5 (+http://robot.lapozz.hu)</td>
</tr>
<tr>
<td class="tableau3">FAST-WebCrawler</td>
<td class="tableau5">FAST Enterprise Crawler/6.4.18 (crawler@fast.no)</td>
</tr>
<tr>
<td class="tableau30">BuiltWith</td>
<td class="tableau50">Mozilla/5.0 (compatible; BuiltWith/0.1; +http://builtwith.com/bot.html)</td>
</tr>
<tr>
<td class="tableau3">Hiiglespider</td>
<td class="tableau5">Hiiglespider/0.1, Hiigle.com, http://hiigle.com/spider</td>
</tr>
<tr>
<td class="tableau30">Page-store</td>
<td class="tableau50">Mozilla/5.0 (compatible; heritrix/1.12.1 +http://www.page-store.com)</td>
</tr>
<tr>
<td class="tableau3">Metacarta</td>
<td class="tableau5">Mozilla/5.0 (compatible; heritrix/1.5 +http://www.metacarta.com)</td>
</tr>
<tr>
<td class="tableau30">Multicrawler</td>
<td class="tableau50">multicrawler (+http://sw.deri.org/2006/04/multicrawler/robots.html)</td>
</tr>
<tr>
<td class="tableau3">LibertyW</td>
<td class="tableau5">LibertyW (+http://www.libertyw.eu)</td>
</tr>
<tr>
<td class="tableau30">BlogRefsBot</td>
<td class="tableau50">Mozilla/5.0 (compatible; BlogRefsBot/0.1; http://www.blogrefs.com/about/bloggers)</td>
</tr>
<tr>
<td class="tableau3">Holmes</td>
<td class="tableau5">holmes/3.11 (http://morfeo.centrum.cz/bot)</td>
</tr>
<tr>
<td class="tableau30">DataparkSearch</td>
<td class="tableau50">DataparkSearch/4.47 (+http://dataparksearch.org/bot)</td>
</tr>
<tr>
<td class="tableau3">ImageWalker</td>
<td class="tableau5">ImageWalker/2.0 (www.bdbrandprotect.com)</td>
</tr>
<tr>
<td class="tableau30">SeznamBot</td>
<td class="tableau50">SeznamBot/2.0-test (+http://fulltext.sblog.cz/)</td>
</tr>
<tr>
<td class="tableau3">Entireweb</td>
<td class="tableau5">Speedy Spider (http://www.entireweb.com/about/search_tech/speedy_spider/)</td>
</tr>
<tr>
<td class="tableau30">BrightCrawler</td>
<td class="tableau50">BrightCrawler (http://www.brightcloud.com/brightcrawler.asp)</td>
</tr>
<tr>
<td class="tableau3">BabalooSpider</td>
<td class="tableau5">BabalooSpider/1.2 (BabalooSpider; http://www.babaloo.si; spider@babaloo.si)</td>
</tr>
<tr>
<td class="tableau30">WebRankSpider</td>
<td class="tableau50">WebRankSpider/1.37 (+http://ulm191.server4you.de/crawler/)</td>
</tr>
<tr>
<td class="tableau3">Gungho-crawler</td>
<td class="tableau5">Gungho/0.08004 (http://code.google.com/p/gungho-crawler/wiki/Index)</td>
</tr>
<tr>
<td class="tableau30">PWeBot</td>
<td class="tableau50">Mozilla/5.0 (compatible; PWeBot/3.1; http://www.programacionweb.net/robot.php)</td>
</tr>
<tr>
<td class="tableau3">PWeBot</td>
<td class="tableau5">PWeBot/1.2 Inspector (http://www.programacionweb.net/robot.php)</td>
</tr>
<tr>
<td class="tableau30">Exabot</td>
<td class="tableau50">Mozilla/5.0 (compatible; Exabot/3.0; +http://www.exabot.com/go/robot)</td>
</tr>
<tr>
<td class="tableau3">Bloglines-Images</td>
<td class="tableau5">Bloglines-Images/0.1 (http://www.bloglines.com)</td>
</tr>
<tr>
<td class="tableau30">Doubanbot</td>
<td class="tableau50">Doubanbot/1.0 (bot@douban.com http://www.douban.com)</td>
</tr>
<tr>
<td class="tableau3">Disco-crawl</td>
<td class="tableau5">disco/Nutch-0.9 (experimental crawler; www.discoveryengine.com; disco-crawl@discoveryengine.com)</td>
</tr>
<tr>
<td class="tableau30">Disco-crawl</td>
<td class="tableau50">disco/Nutch-1.0-dev (experimental crawler; www.discoveryengine.com; disco-crawl@discoveryengine.com)</td>
</tr>
<tr>
<td class="tableau3">BotSeer</td>
<td class="tableau5">Mozilla 4.0(compatible; BotSeer/1.0; +http://botseer.ist.psu.edu)</td>
</tr>
<tr>
<td class="tableau30">ForAll.pl-Crawler</td>
<td class="tableau50">ForAll.pl-Crawler/1.0</td>
</tr>
<tr>
<td class="tableau3">Podtech</td>
<td class="tableau5">Mozilla/5.0 (compatible; MSIE 6.0; Podtech Network; crawler_admin@podtech.net)</td>
</tr>
<tr>
<td class="tableau30">MSRBot</td>
<td class="tableau50">MSRBOT (http://research.microsoft.com/research/sv/msrbot/</td>
</tr>
<tr>
<td class="tableau3">Nsyght</td>
<td class="tableau5">nsyght.com/Nutch-0.9 (nsyght.com; search.nsyght.com)</td>
</tr>
<tr>
<td class="tableau30">Backlink-Check</td>
<td class="tableau50">Backlink-Check.de (+http://www.backlink-check.de/bot.html)</td>
</tr>
<tr>
<td class="tableau3">ASAHA</td>
<td class="tableau5">ASAHA Search Engine Turkey V.001 (http://www.asaha.com/)</td>
</tr>
<tr>
<td class="tableau30">Sphsearch</td>
<td class="tableau50">FAST Enterprise Crawler 6 used by Singapore Press Holdings (crawler@sphsearch.sg)</td>
</tr>
<tr>
<td class="tableau3">Google-Adsense</td>
<td class="tableau5">Mediapartners-Google</td>
</tr>
<tr>
<td class="tableau30">SAIT</td>
<td class="tableau50">sait/Nutch-0.9 (SAIT Research; http://www.samsung.com)</td>
</tr>
<tr>
<td class="tableau3">Teemer</td>
<td class="tableau5">Teemer (NetSeer, Inc. is a Los Angeles based Internet startup company.; http://www.netseer.com/crawler.html; crawler@netseer.com)</td>
</tr>
<tr>
<td class="tableau30">Euro-spider</td>
<td class="tableau50">Euro-Spider Shopping 1.0</td>
</tr>
<tr>
<td class="tableau3">Lovel</td>
<td class="tableau5">Lovel as 1.0 ( +http://www.everatom.com)</td>
</tr>
<tr>
<td class="tableau30">Hermits Search</td>
<td class="tableau50">Mozilla/5.0 (compatible; Hermit Search. Com; +http://www.hermitsearch.com)</td>
</tr>
<tr>
<td class="tableau3">ScoutAnt</td>
<td class="tableau5">ScoutAnt/0.1; +http://www.ant.com/what_is_ant.com/</td>
</tr>
<tr>
<td class="tableau30">Voyager</td>
<td class="tableau50">voyager-hc/1.0</td>
</tr>
<tr>
<td class="tableau3">De.com</td>
<td class="tableau5">Mozilla/5.0 (compatible; de/1.13.2 +http://www.de.com)</td>
</tr>
<tr>
<td class="tableau30">Yahoo Japan robot</td>
<td class="tableau50">DoCoMo/2.0 SH902i (compatible; Y!J-SRD/1.0; http://help.yahoo.co.jp/help/jp/search/indexing/indexing-27.html)</td>
</tr>
<tr>
<td class="tableau3">LijitSpider</td>
<td class="tableau5">LijitSpider/Nutch-0.9 (Reports crawler; http://www.lijit.com/; info(a)lijit(d)com)</td>
</tr>
<tr>
<td class="tableau30">Acoon-Robot</td>
<td class="tableau50">Acoon-Robot v3.00 (http://www.acoon.de and http://www.acoon.com)</td>
</tr>
<tr>
<td class="tableau3">KAIST AITrc Crawler</td>
<td class="tableau5">KAIST AITrc Crawler</td>
</tr>
<tr>
<td class="tableau30">DAUM Web Robot</td>
<td class="tableau50">Mozilla/4.0 (compatible; MSIE enviable; DAUMOA 2.0; DAUM Web Robot; Daum Communications Corp., Korea; +http://ws.daum.net/aboutkr.html)</td>
</tr>
<tr>
<td class="tableau3">Folkd.com Spider</td>
<td class="tableau5">Folkd.com Spider/0.1 beta 1 (www.folkd.com)</td>
</tr>
<tr>
<td class="tableau30">Yahoo-MMAudVid</td>
<td class="tableau50">Yahoo-MMAudVid/2.0(mms dash mm aud vid crawler dash support at yahoo dash inc.com ;Mozilla 4.0 compatible; MSIE 7.0;Windows NT 5.0; .NET CLR 2.0)</td>
</tr>
<tr>
<td class="tableau3">Hbtronix.spider</td>
<td class="tableau5">hbtronix.spider.2 &#8212; http://hbtronix.de/spider.php</td>
</tr>
<tr>
<td class="tableau30">Slurp Inktomi (Yahoo)</td>
<td class="tableau50">Mozilla/5.0 (X11; U; Linux i686 (x86_64); en-US; rv:1.8.1.4) Gecko/20071214 BonEcho/2.0.0.4</td>
</tr>
<tr>
<td class="tableau3">Slurp Inktomi (Yahoo)</td>
<td class="tableau5">Mozilla/5.0 (Macintosh; U; PPC Mac OS X Mach-O; en-US; rv:1.8.1.5) Gecko/20070713 Firefox/2.0.0.5</td>
</tr>
<tr>
<td class="tableau30">Gonzo2</td>
<td class="tableau50">gonzo2[P] +http://www.suchen.de/faq.html</td>
</tr>
<tr>
<td class="tableau3">SummizeBot</td>
<td class="tableau5">Mozilla/5.0 (compatible; SummizeBot +http://www.summize.com)</td>
</tr>
<tr>
<td class="tableau30">MSNBOT_Mobile</td>
<td class="tableau50">MSNBOT_Mobile MSMOBOT Mozilla/2.0 (compatible; MSIE 4.02; Windows CE; Default)</td>
</tr>
<tr>
<td class="tableau3">Sphere Scout</td>
<td class="tableau5">Sphere Scout&amp;v4.0 - scout at sphere dot com</td>
</tr>
<tr>
<td class="tableau30">Jambot</td>
<td class="tableau50">Jambot/0.2.1 (Jambot; http://www.jambot.com/blog/static.php?page=webmaster-robot; crawler@jambot.com)</td>
</tr>
<tr>
<td class="tableau3">R6_CommentReader</td>
<td class="tableau5">R6_CommentReader_(www.radian6.com/crawler)</td>
</tr>
<tr>
<td class="tableau30">R6_FeedFetcher</td>
<td class="tableau50">R6_FeedFetcher_(www.radian6.com/crawler)</td>
</tr>
<tr>
<td class="tableau3">MSN Bot</td>
<td class="tableau5">msnbot/1.1 (+http://search.msn.com/msnbot.htm)</td>
</tr>
<tr>
<td class="tableau30">Slurp Inktomi (Yahoo)</td>
<td class="tableau50">Mozilla/5.0 (compatible; Yahoo! Slurp/3.0; http://help.yahoo.com/help/us/ysearch/slurp)</td>
</tr>
</tbody>
</table>]]></content:encoded>
			<wfw:commentRss>http://blog.kaizeku.com/search_engine/webcrawler/kaizeku-crawler-lists/feed/</wfw:commentRss>
		</item>
	</channel>
</rss>
