Blog home Pdf Search engine
HomeWeb designSEOpdf search engineTourSearch

100+ Alternative Search Engines You Should Know

Google is the most powerful search engine available, but when it comes to searching for something specific, Google may churn out general results for some. For instance, a search for a song in Google may return the singer, the lyrics and even fan sites – depending on the keywords you entered.

This is when niche-specific search engines comes into the picture. These search engines allows you to search for the things you’re looking for, and because they are more focused, their results tend to be more accurate.

E-books & PDF files, audio & music, videos, etc. are probably the most commonly searched items everyday and with this list, you’ll be able to locate them easily from now on. Enjoy!

E-Book & PDF Search Engine

PDF Search.org

A book search engine search on sites, forums, message boards for pdf, word, power point files.

pdf-search-engine

PDF Searcher

Pdf Searcher is a pdf and ebook search engine. You can search and download every ebook or pdf for free!

PDF-Searcher

PDFGeni

PDFGeni is a dedicated pdf search engine for PDF ebooks, sheets, forms and documents.

pdfgeni

 

Data-Sheet

Data-Sheet is a pdf ebook datasheet manual search engine where you can search any ebook datasheet pdf files and download.

data-sheet

PDFDatabase

PDF Database is a new search engine which uses a unique algorithm to search for pdf and doc files from all over the web quickly and efficiently.

pdfDatabase

PDF Search Engine

PDF Search Engine is a book search engine search on sites, forums, message boards for pdf files.

pdf_search_engine

More

  • AddAll – AddALL is a free service that searches for the best deal in books anywhere on-line.
  • BookFinder – A one-stop ecommerce search engine that searches over 150 million books for sale—new, used, rare, out-of-print, and textbooks. .
  • Boocu – Access thousands of eBooks and other digital products.
  • Brupt – Brupt.com is based on Google Custom Search Engine, it uses Filetype parameters offered by Google to filter the information and show the pages according.
  • CCEbook – A free ebook search engine, find rapidshare/mihd/ebooks by search.
  • ComicSeeker – Search the Internet’s most popular online comic book stores and auctions.
  • DocJax – DocJax.com is a search engine for documents, which allow you to search documents and e-book from everywhere, preview them and even download them for free.
  • Docstoc – All documents on docstoc can be easily searched, previewed and downloaded for free.
  • Ebook-Search-Engine – Ebook-Search-Engine.com was established to make finding an free ebook on the internet alot easier by providing visitors with the best quality free ebook sites and pages on the web.
  • Ebookee – Free download ebooks search engine.
  • Flazx – This is a well organized and searchable eBook Directory..
  • FeedBooks – A universal e-reading platform compatible with all mobile devices where you can download thousands of free e-books.
  • Gigle.ws – A PDF search engine with 1 150 000+ items indexed.
  • Google Books – Find the perfect book for your purposes and discover new ones that interest you.
  • Issuu – Collect, share and publish in a format designed to make your documents look their very best.
  • LazyLibrary – LazyLibrary, where you can find books on any topic without having to worry about high page counts..
  • Myplick – Myplick is a free service that lets you share, embed and discover presentations and slide shows online.
  • Printfu – Search engine for free ebooks.
  • PDFster – PDFster.net is a meta search engine for PDF documents.
  • PDFSe – User can search for an ebook by entering a book title,author name or topic
  • PDF-Search.org – A book search engine search on sites, forums, message boards for pdf, word, power point files.
  • Scribd – Scribd is a social publishing site, where tens of millions of people share original writings and documents.
  • Search-PDF-Books – PDF search engine for free PDF books.
  • The-Manuals – Search engine for free manual online.
  • TooDoc – Through this search engine, you’ll be able to search the web for PDF files, and nothing else. .
  • Yudu – Explore the Library to find magazines, newspapers, eBooks and more. Over 1 million publications read each month..

Audio/Music Search Engine

BeeMp3

BeeMP3 is a music search engine for locating an mp3-audio files over the Internet. Today they have 800 000 mp3 files in our search database and approximately 10 000 files are added daily.

beemp3

MP3Realm

Mp3Realm is an evolving search engine focused distinctively on audio in Mp3 format. Mp3Realm allows you to search for your favorite songs and clips, and filter them out by file duration. Your searches can be based on artist, title, genre or album. Mp3Realm also index’s lyrics, so you can sing along to your favorite songs.

mp3realm

WuZam

WuZAM’s powerful engine allows users to search for free music downloads from all over the internet. Unlike P2P programs, you don’t have to wait for users to upload the file to you. Many of the free music downloads found on WuZAM are stored on high speed servers so your free mp3 downloads are quick and flawless.

wuzam

SKreemR

SkreemR is a search engine for locating audio files on the web. SkreemR currently indexes over 6 million mp3 files from over 100,000 web sites.

skreemr

FindSounds

FindSounds.com, a free site where you can search the Web for sound effects and musical instrument samples.

findsound

More

  • AirMP3 – AirMP3 is a music search engine. It searches for free mp3 downloads on several music sites, and lists all results on a single page.
  • CaptainCrawl – A very massive music blog search engine.
  • DesMP3 – Listens the music that you want all is in DesMp3! Search in 500 million results.
  • Dilandau – In Dilandau.com you can listen and download all your MP3 music.
  • Dorble – Dorble is another application that lets you search, listen and download music you like to your computer for free.
  • Gig-Listing – Gig Listing is an independent web search engine for live music events. You can look for music events, bands or venues by typing a few keywords in a text box.
  • HuntMyMusic – HuntMyMusic is primarily a music search engine that provides a convenient way for users to find, preview, and download songs.
  • Liveplasma – Liveplasma is a new way to broaden your cultural horizons according to your taste in music.
  • MP3Raid – Mp3Raid.com is dedicated to provide its visitors the best mp3 search engine on the net.
  • Mix Turtle – Web-based music search tool Mix Turtle creates playlists of songs you find online.
  • MP3-Search Engine – A comprehensive MP3 search engine.
  • Muscorch – Music Score Search Engine, searching for free music score in various formats.
  • MusicBiatch – MusicBiatch is a music search engine designed for locating an audio files in various file sharing and uploading sites.
  • Musgle – To see Musgle in action just type a song title, or the artist name, or both in a search bar and hit ‘Enter’ – you will be redirected to the Google page with relevant search results
  • Midomi – Midomi is the ultimate music search tool because it is powered by your voice.
  • MP3Search.Mobi – Your free music search engine.
  • MP3Fountain – Download Hip-Hop mp3. Download Hip-Hop music.
  • MP3Series – The Ultimate Audio Search Engine
  • OOnly – Free MP3 search engine.
  • SharingMusics – Audio, mp3 and music search engine.
  • Songza – Songza lets you listen to any song or band. Search for it.
  • SeekMP3 – SeekMp3 is worth a try. SeekMp3 is an Mp3 search engine, pure and simple. Music is pulled from all over the web—blogs, music sites, personal websites, etc.
  • SeekASong – Mp3s and lyrics search engine.
  • Woonz – Woonz crawls the internet for audio tracks and then provides the information here for you to listen, download, or find out more about audio that you are interested in.

Video Search Engine

blinkx

blinkx is the world抯 largest and most advanced video search engine. blinkx pioneered video search on the Internet, developing an engine based on technology.

blinkx

VideoSurf

VideoSurf has created a better way for users to search, discover and watch online videos. VideoSurf抯 computer vision video search engine provides more relevant results and a better experience to let users find and discover the videos they really want to watch.

videosurf

HelloMovies

HelloMovies helps you find a movie to watch. See what’s playing on Hulu, iTunes, Netflix, and more. Narrow your choices. Watch movies.

hellomovies

MetaTube

MetaTube is an easy way to browse 100 of the most popular video sharing sites around the world. The phenomenal benefit of MetaTube: You need to enter your search term just once for all 100 sites!

metatube

ClipBlast

ClipBlast! technology helps Viewers search, navigate, watch and personalize their experience with the Video Web, and helps Video Content Providers, Advertisers and Marketers monetize their investments.

clipblast

More

  • Cleepr – Cleepr, the music video search engine.
  • ChizMax – A music video search engine.
  • Google Videos – Search and watch millions of videos. Includes forum and personalized recommendations.
  • IsInvideo – IsInVideo.com is a search engine that provides real-time results as links to videos, developed by Javier Guillen from Malaga.
  • myMovo – The myMovo server provides it with the addresses of the media matching your search.
  • Nanocrowd – Nanocrowd has magical search algorithms that interpret comments people like you have written about movies.
  • Pixsy – The Pixsy Index is a massive library of video and image content that can be integrated into approved premium websites and applications.
  • Podscope – Podscope lets you search the spoken word for audio and video that interests you.
  • SearchForVideo – Searchforvideo.com is a leading video search engine that indexes online video clips from over 10,000 sources.
  • ScoopVid – Track down the most popular videos by the hour, day or even month. You can also search the web for video results.
  • Truveo – Truveo video search lets you search and find videos from across the Web.
  • Truveo – Search video across the Web.
  • VidSea – VidSea, short for Video Search is a vertical search engine specifically for searching Video clips on the Internet.
  • Vdoogle – This search engine will search for videos from popular video sharing websites like YouTube and Google Videos.

RapidShare Search Engine

Rapidshare-Search-Engine

Rapidshare-search-engine.com – Find any file on rapidshare.

rapidshare-search-engine

RapidShare1

Rapidshare1.com is a Rapidshare search engine search on sites, forums, message boards for Rapidshare file links.

rapidshare1

FilesPump

FilesPump.com is a search engine designed to search files in various file sharing and uploading sites. FilesPump is a “multi-hoster” search engine, and currently lists 16 different popular file hosters. As a file sharing search engine FilesPump.com finds files matching your search criteria among the files that has been seen recently in uploading sites by their search spider.

filespump

FileCrop

Search RapidShare and Megaupload files easy and fast. Over 3 millions files stored in database.

filecrop

RSFind

RSFind.com, a powerful free rapidshare search engine, and one of the best search engines for rapid share.

rsfind

  • Front Address – Search softwares, ebooks, games, photos, documents, audio and videos etc located on RapidShare.
  • FilesTube – FilesTube.com is a search engine designed to search files
    in various file sharing and uploading sites,like rapidshare, megaupload, mediafire.
  • FileOnFire – This site gives users the chance to get a rapid share and mega upload search engine that has more than 19,000,000 files.
  • MegaDownload – MegaDownload.net – megaupload.com and rapidshare.com search engine.
  • MegaRapidSearch – MegaRapidSearch helps you to find anything on RapidShare.
  • Rapidsharesearch – Search Rapidshare with their Fast Online Search Engine. All files are automatically checked to insure what you’re looking for is there
  • Rapid4Shared – Rapid4shared.com search engine that provides results from a hand-selected list of sites.
  • RSearch – Rapidshare search engine with simple interface.
  • RapidShareData – RapidshareData.com is a test files search project objected to rapidshare.com.
  • RapidShareSearcher – A massive RapidShare search engine! With thousands of links, theres nothing you won’t be able to find!
  • RapidAll – RapidAll Rapidshare Download Search is a website that search files on the website hosts which have been upload by Rapidshare members for sharing with their friends, family or at forum sites.
  • RapidSearch.In – Free Search Engine for Rapidshare, Badongo, Sendspace, MediaFire and 4Shared Files.
  • RapidZearch – RapidZearch.com is a search engine to search on sites, forums, message boards for shared file links.
  • Rapidshare-Provider – Rapidshare search engine.
  • RarDir – A powerful RapidShare search engine.
  • Rapid TvPHP – A Rapidshare library with 1069578 files in their database.
  • ShareMiner – Using ShareMiner.com you can search Rapidshare, Megaupload and many other uploading sites and file hosting services.
  • ShareMiner – Using ShareMiner.com you can search Rapidshare with an easy to use interface.
  • TotMe – World’s Best Rapidshare Search Engine! 1,256,339 Rapidshare Link Available To Search!

Install 64bit versions of Apache, PHP and MySQL on Windows 64bit

Currently no official 64bit versions of Apache and PHP exists for Microsoft Windows. Only MySQL supports officialy 64bit Windows. If you have a 64bit version of Windows (2003/XP/Vista) and want to keep your system pure 64bit here is the solution! In this guide I will show you how to install and set up Apache 2.2 x64 web server, PHP 5.2 x64 and MySQL 5.0/5.1 x64 on Windows 2003/XP/Vista 64bit using unofficial binaries. Althought this setup has been tested successfully on Windows Vista 64bit Home Premium, I am not responsible for any damages may occur to your computer by this guide. Proceed at your own risk.

Download needed software

Download unofficial binaries for Apache x64 from blackdot.be:
http://www.blackdot.be/?inc=apache/binaries
Current version (November 2008): httpd-2.2.10-win64.zip

Download PHP x64 from fusionxlan.com:
http://www.fusionxlan.com/PHPx64.php
Current version (September 2008): 5.2.5

Download latest official MySQL 64bit binaries for Windows:
http://dev.mysql.com/downloads/mysql/

Install Apache 64bit

Create a folder inside your C drive and named it something like apache64. Unzip the contents of the Apache zip package you previously downloaded to folder: C:/apache64.
Edit Apache configuration file C:/apache64/conf/httpd.conf and change paths to match your system.

ServerRoot “C:/apache64″
ServerName localhost:80
DocumentRoot “C:/apache64/htdocs”
<Directory “C:/apache64/htdocs”>
DirectoryIndex index.html index.htm index.php
ScriptAlias /cgi-bin/ “C:/apache64/cgi-bin/”

If you want to set up virtual hosts uncomment (remove the “#” symbol) the line bellow and edit the hosts.conf file respectively. Setting up virtual hosts on Windows.
#Include conf/extra/httpd-vhosts.conf

Uncomment the following line to load extension mod_rewite needed by Elxis SEO PRO. Also uncomment any other lines you wish to load the corresponding Apache extensions.
LoadModule rewrite_module modules/mod_rewrite.so

Open the Windows command prompt (Start -> Run/Search -> cmd) and navigate to folder C:/apache64 (CD C:\apache64). Execute the following commands:

bin\httpd.exe -k install
bin\httpd.exe -k start

Your Apache should work. Open bin folder and double click the ApacheMonitor.exe file. An icon will be displayed in your Windows taskbar. From there you can start/stop/restart Apache easily. We set the Document root to C:/apache64/htdocs, so this is the folder where you should put your web files (Elxis CMS for example). Open your browser and go to http://localhost/ to ensure Apache runs.

Install PHP 64bit

Create a folder inside your C drive and named it “php”. Unzip the contents of the PHP zip package you previously downloaded and copy the contents of the “php-5.2.5 (x64)” (or what ever version you downloaded) to folder: C:/php. We will install PHP as an Apache module. Open your Apache configuration file (C:/apache64/conf/httpd.conf) to tell apache to load the PHP module. Under the existing LoadModule directives add the following:
LoadModule php5_module “C:/php/php5apache2_2.dll”
AddType application/x-httpd-php .php

Also add these lines to tell Apache where PHP is located:
# configure the path to php.ini
PHPIniDir “C:/php”

Copy the following files to your Widnows system folder (C:/Windows/system32):
C:/php/php5ts.dll
C:/php/php5isapi.dll
C:/php/php5apache2_2.dll
C:/php/ext/php_mysql.dll

Copy php.ini-dist to the same folder and rename it as php.ini. Open this file to edit PHP configuration parameters.
extension_dir = “C:/php/ext/”
allow_url_fopen = Off

Load at least the following PHP extesnions by removing the “#” symbol in front of each line:
extension=php_gd2.dll
extension=php_mysql.dll
extension=php_oci8.dll
(if you have Oracle database installed)
extension=php_pgsql.dll (if you have postgre database installed)
extension=php_zip.dll

Set sendmail from e-mail address:
sendmail_from = me@example.com

Some settings for MySQL:
mysql.default_port = 3306
mysql.default_host = localhost

Set the session save path to a writable (by anyone) folder in your computer. You can set this to any existing path you wish (For example C:/tmp).
session.save_path = “C:/tmp”

Restart Apache to test if your PHP is working properly.

Install MySQL 64bit

This is the easiest part of the overall procedure as we have downloaded an official 64bit msi package from MySQL. Just double click it to run the installer. Install MySQL as a service.

PHPMyAdmin

Nikos Timiopoulos reported us on February 14, 2009, that he had problems get connected to phpmyadmin. The solution for him was to copy libmysql.dll in C:/Windows/ directory. An alternative, and much more reccommended solution, is to use MySQL GUI tools (Query browser and Administrator).

Finish

Unless I forgotten something :-) your system is ready. You have a pure 64bit WAMP system, congratulations! You can now copy Elxis at C:/apache64/htdocs and run the Elxis installation wizard. If you wish to set up virtual hosts follow this guide: Setting up virtual hosts on Windows.

Written by Ioannis Sannos (datahell),
September 12, 2008
Last updated: February 14, 2009

Play piano online, try to use your keyboard to play music! It’s really very relaxing~~~

Ode To Joy 

JJKLLKJI   HHIJJIIJJKLLKJI 

HHIJIHH    IIJHIJKJH  IJKJIHIE

JJKLLKJ    IHHIJIHH

 

A Wedding March

HKKK HLJK HKNNMLKJKL HKKK HLJK

HKMOMKILMKNMLII JKLL NMLII JKLL

HKKK HLJK HKMOMKILMKILMKK

 

Persain Market

OOOMLMLJJ MMMLJLJII

OOOMOMOJJ IMJJLMMM

 

To Alice

QPQPQNPO MHJMN JLNO JQPQPQNPOMHJM NJONM NOPQLRQP KQPO

JPON NJQQQ QPQPQNPOM HJMNJLNOJ QPQPQNPO MHJMN JONM ORQQPPRTSR

QPONMMLMNOPPQRMOPNOPQSPNOPQSPNQQQQQP

 

http://www.pdf-search.org/Play_piano_online.html

 

The disadvantage of this game is that the tone can not last for a long time. Does the game remind you of your real piano? I often print some E-books of music score and practice at home, it’s really fantastic~~

Search bots behavior analyzed

“A large scale experiment on search engine behaviour was staged with more than two billion different web pages. This experiment lasted exactly one year, until April 13th. In this period the three mayor search engines requested more than one million pages of the tree, from more than hundred thousand different URLs.”

On Bots

Introduction

In the previous edition – Binary Search Tree 2 – a large scale experiment on search engine behaviour was staged with more than two billion different web pages. This experiment lasted exactly one year, until April 13th. In this period the three major search engines requested more than one million pages of the tree, from more than hundred thousand different URLs. The home page of drunkmenworkhere.org grew from 1.6 kB to over 4 MB due to the visit log and the comment spam displayed there.

This edition presents the results of the experiment.

Setup

2,147,483,647 web pages (’nodes’) were numbered and arranged in a binary search tree. In such a tree, the branch to the left of each node contains only values less than the node’s value, while the right branch contains only values higher than the node’s value. So the leftmost node in this tree has value 1 and the rightmost node has value 2,147,483,647.

The depth of the tree is the number of nodes you have to traverse from the root to the most remote leaf. Since you can arrange 2n+1 – 1 numbers in a tree of depth n, the resulting tree has a depth of 30 (231 = 2,147,483,648). The value at the root of the tree is 1073741824 (230).

For each page the traffic of the three major search bots (Yahoo! Slurp, Googlebot and msnbot) was monitored over a period of one year (between 2005-4-13 and 2006-4-13).

To make the content of each page more interesting for the search engines, the value of each node is written out in American English (short scale) and each page request from a search bot is displayed in reversed chronological order. To enrich the zero-content even more, a comment box was added to each page (it was removed on 2006-4-13). These measures were improvements over the initial Binary Search Tree which uses inconvenient long URLs.

Every node shows an image of three trees. Each tree in the image visualises which nodes are crawled by each search engine. Each line in the image represents a node, the number of times a search bot visited the node determines the length of the line. The tree images below are modified large versions of the original image, without the very long root node and with disconnected (wild) branches.

Overall results

From the start Yahoo! Slurp was by far the most active search bot. In one year it requested more than one million pages and crawled more than hundred thousand different nodes. Although this is a large number, it still is only 0.0049% of all nodes. The overall statistics of all bots is shown in the table below.

overall statistics by search engine
  Yahoo! Google MSN
total number of pageviews 1,030,396 20,633 4,699
number of nodes crawled 105,971 7,556 1,390
percentage of tree crawled 0.0049% 0.00035% 0.000065%
number of indexed nodes 120,000 554 1
indexed/crawled ratio 113.23% 7.33% 0.07%

The growth of the number of pageviews and the number of crawled nodes over the year the experiment lasted, is shown in figure 1 and 2. The way the bots crawled the tree is visualised in detail with the animations for each bot in the sections below.

pageviews
Fig. 1 – The cumulative number of pageviews by the search bots in time.

 nodes_crawled
Fig. 2 – The cumulative number of nodes crawled by the search bots in time.

 The graph below (fig. 3) shows how many nodes of each level of the tree were crawled by the bots (on a logarithmic scale). The root of the tree is at level 0, while the most remote nodes (e.g. node 1) are at level 30. Since there are 2n nodes at level n (there is only 1 root and there are 230 nodes at level 30) crawling the entire tree would result in a straight line.

nodes_crawled_depth
Fig. 3 – The number of nodes crawled after 1 year, grouped by node level.

 

Google closely follows this straight line, until it breaks down after level 12. Most nodes at level 12 or less were crawled (5524 out of 8191), but only very few nodes at higher levels were crawled by Googlebot. MSN shows similar behaviour, but breaks down much earlier, at level 9 (656 out of 1023 nodes were crawled). Yahoo, however, does not break down. At high levels it gradually fails to request all nodes.

The nodes at high levels that were crawled by Yahoo, were requested quite often compared to the other bots: at level 14 to 30 each page was requested 10 times at average (see fig. 4).

pageviews_depth
Fig. 4 – The average number of pageviews per node after 1 year, grouped by node level.

 

Yahoo! Slurp

yahoo_small

  • large version (4273×3090, 1.5MB)
  • animated version over 1 year (2005-04-13 – 2006-04-13, 13MB)
  • animated version of the first 2 hours (2006-04-14 00:40:00-02:40:00, 2.2MB)

Fig. 5 – The Yahoo! Slurp tree.

Yahoo! Slurp was the first search engine to discover Binary Search Tree 2. In the first hours after discovery it crawled the tree vigorously, at a speed of over 2.3 nodes per second (see the short animation). The first day it crawled approximately 30,000 nodes.

In the following month Slurp’s activity was low, but after exactly one month it requested all pages it visited before, for the second time. In the animation you can see the size of the tree double on 2005-05-14. This phenomenon is repeated a month later: on 2005-06-13 the tree grows to three times it original size. The number of pageviews is then almost 90,000 while the number of crawled nodes still is 30,000. Figure 6 shows this stepwise increment in the number of pageviews during the first months.

yahoo_pageviews1
Fig. 6 – The cumulative number of pageviews by Yahoo! Slurp in time.

 

After four months Slurp requested a large number of ‘new’ nodes, for the first time since the initial round. It simply requested all URLs it had. Since it had already indexed 30,000 pages, that each link to two pages at a deeper level, it requested 60,000 pages at the end of August (the number of pageviews jumps from 100,000 to 160,000 pages in fig. 6) and it doubled the number of nodes it had crawled (see the fig. 7).

After 5 months Yahoo! Slurp started requesting nodes more regularly. It still had periods of ‘discovery’ (e.g. after 10 months).

yahoo_nodes_crawled
Fig. 7 – The cumulative number of nodes crawled by Yahoo! Slurp in time.

 

Yahoo reported 120,000 pages in it’s index (current value). This may seem impossible since it only visited 105,971 nodes, but every node is available on two different domain names: www.drunkmenworkhere.org and drunkmenworkhere.org.

Note: the query submitted to Google and MSN yielded 35,600 pages on Yahoo. Yahoo is the only search engine that returns results with the query used above.

 

Googlebotgoogle_small

 

  • large version (4067×4815, 180kB)
  • animated version (2005-04-13 – 2006-04-13, 1.2MB)

Fig. 8 – The Googlebot tree.

In comparison with Yahoo’s tree, Google’s tree looks more like a natural tree. This is because Google visited nodes at deeper levels less frequently than their parent nodes. Yahoo only visited the nodes at the first three levels more frequently, while Google did so for the first 12 levels (see fig. 4).

The form of the tree follows from Google’s PageRank algorithm. PageRank is defined as follows:

 

“We assume page A has pages T1…Tn which point to it (i.e., are citations). The parameter d is a damping factor which can be set between 0 and 1. We usually set d to 0.85. There are more details about d in the next section. Also C(A) is defined as the number of links going out of page A. The PageRank of a page A is given as follows:

PR(A) = (1-d) + d (PR(T1)/C(T1) + … + PR(Tn)/C(Tn)) “

Since most nodes in the tree are not linked to by other sites, the PageRank of a node can be calculated with this formula (ignoring links in the comments):

PR(node) = 0.15 + 0.85 (PR(parent) + PR(left child) + PR(right child))/3

The only unknown when applying this formula iteratively, is the PageRank of the root node of the tree. Since this node was the homepage of drunkmenworkhere.org for a year, a high rank may be assumed. The calculated PageRank tree (fig. 9) shows similar proportions as Googlebot’s real tree, so the frequency of visiting a page seems to be related to the PageRank of a page.

pagerank
Fig. 9 – A binary tree of depth 17 visualising calculated PageRank as length of each line, when the PageRank of the root node is set to 100.

 

The animation of the Googlebot tree shows some interesting erratic behaviour, that cannot be explained with PageRank.

The rightmost branch
From the start Googlebot crawled more nodes on the right hand side of the tree. On 2005-07-04 it tries to visit the rightmost node, i.e. the node with the highest value. After selecting the right branch starting at the root for 20 levels Googlebot stopped. This produced the arc at the right end of the tree. google_right
Searching node 1
On 2005-06-30 Googlebot visited node 1, the leftmost node. It did not crawl the path from the root to this node, so how did it find the page? Did it guess the URL or did it follow some external link?
A few hours later, Googlebot crawled node 2, which is linked as a parent node by node 1. These two nodes are displayed as a tiny dot in the animation on 2005-06-30, floating above the left branch. Then, a week later, on 2005-07-06 (two days after the attempt to find rightmost node), between 06:39:39 and 06:39:59 Googlebot finds the path to these disconnected nodes by visiting the 24 missing nodes in 20 seconds. It started at the root and found it’s way up to node 2, without selecting a right branch. In the large version of the Googlebot tree, this path is clearly visible. The nodes halfway the path were not requested for a second time and are represented by thin short line segments, hence the steep curve.
 google_to_node1
Yahoo-like subtree
On 2005-07-23 Google suddenly spends some hours crawling 600 new nodes near node 1073872896. Most of these nodes were not visited ever again.
This subtree is the reason the number of nodes crawled by Googlebot, grouped by level, increases again from level 18 to level 30 in fig. 3.
 google_subtree

Over the last six months Googlebot requested pages at a fixed rate (about 260 pages per month, fig. 10). Like Yahoo! Slurp it seems to alternate between periods of discovery (see fig. 11) and periods of refreshing it’s cache.

google_pageviews

Fig. 10 – The cumulative number of pageviews by Googlebot in time.

 

google_nodes_crawled
Fig. 11 – The cumulative number of nodes crawled by Googlebot in time.

 Google returned 554 results when searching for nodes. The first nodes reported by Google are node 1 and 2, which are very deep inside the tree at level 29 and 30. Their higher rank is also reflected in the curve shown above (Searching node 1), which indicates a high number of pageviews. They probably appear first because of their short URLs. The other nodes at the first result page are all at level 4, probably because the first three levels are penalised because of comment spam. The current number of results can be checked here.

 

MSNbot

msn_small

  • large version (4200×2795, 123kB)
  • animated version (2005-04-13 – 2006-04-13, 846kB)

Fig. 12 – The msnbot tree

 

The Msnbot tree is much smaller than Yahoo’s and Google’s. The most interesting feature is the disconnected large branch to the right of the tree. It appears on 2005-04-29, when msnbot visits node 2045877824. This node contains one comment, posted two weeks before:

I hereby claim this name in the name of…well, mine. Paul Pigg.

A week before msnbot requested this node, Googlebot already visited this node. This random node at level 24 was crawled because of a link from Paul Pigg’s website masterpigg.com (now dead, Google cache). All three search engines visited the node via this link, and all three failed to connect it to the rest of the tree. You can check this by clicking the ‘to trunk’ links starting at node 2045877824.

Msnbot crawled from the disconnected node in upward and downward direction, creating a large subtree. This subtree caused the upward line between level 18 and 30 in figure 3.

The second large disconnected branch, at the top in the middle, originated from a link on uu-dot.com. Both disconnected branches are clearly visible in the Googlebot tree as well.

msn_pageviews

Fig. 13 – The cumulative number of pageviews by msnbot in time.

 msn_nodes_crawled
Fig. 14 – The cumulative number of nodes crawled by msnbot in time.

 

As the graphs above show, msnbot virtually ceased to crawl Binary Search Tree 2 after five months. How the number of results MSN Search returns, relates to the above graphs is unclear.

 

Spam bots

In one year 5265 comments were posted to 103 different nodes. 32 of these nodes were never visited by any of the search bots. Most comments (3652) were posted to the root node (the home page). The word frequency of the submitted comments was calculated.

top 50 of most frequently spammed words
  count word
1 32743 http
2 23264 com
3 12375 url
4 8636 www
5 5541 info
6 4631 viagra
7 4570 online
8 4533 phentermine
9 4512 buy
10 4469 html
11 3531 org
12 3346 blogstudio
13 3194 drunkmenworkhere
14 2801 free
15 2772 cialis
16 2371 to
17 2241 u
18 2169 generic
19 2054 cheap
20 1921 ringtones
21 1914 view
22 1835 a
23 1818 net
24 1756 the
25 1658 buddy4u
26 1633 of
27 1633 lelefa
28 1580 xanax
29 1572 blogspot
30 1570 tramadol
31 1488 mp3sa
32 1390 insurance
33 1379 poker
34 1310 cgi
35 1232 sex
36 1198 teen
37 1193 in
38 1158 content
39 1105 aol
40 1099 mime
41 1095 and
42 1081 home
43 1034 us
44 1022 valium
45 1020 josm
46 1012 order
47 992 is
48 948 de
49 908 ringtone
50 907 i

complete list (360 kB)

As the top 50 clearly shows, most spam was related to pharmaceutical products. The pie chart below shows the share of each medicine.

spam_pie
Fig. 15 – The share of various medicines in comment spam.

 

Submitted domain names were filtered from the text. All top-level domain names are shown in figure 16, ordered by frequency.

spam_tld

Fig. 16 – Number of spammed domains by top level domain

 Many email addressses submitted by the spam bots were non-existing addresses @drunkmenworkhere.org, which explains the high rank of this domain in the chart of most frequently spammed domains (fig. 17).

spam_domain

Fig. 17 – Most frequently spammed domains

Hello pdf search!

pdf-searchWelcome to www.pdf-search.org .

email newsletter

Add our headlines to your online news reader

About Author

authorabout author ..read more ?

WHAT'S NEW - PDF SEARCH COLLECTION

  1. fjkasdlfjkalsfjaslfjas