<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>Pdf Search colection &#187; SEO soso..</title>
	<atom:link href="http://blog.pdf-search.org/category/seo-soso/feed" rel="self" type="application/rss+xml" />
	<link>http://blog.pdf-search.org</link>
	<description>Just another WordPress weblog</description>
	<lastBuildDate>Sun, 22 Nov 2009 06:01:34 +0000</lastBuildDate>
	<generator>http://wordpress.org/?v=2.8.6</generator>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
			<item>
		<title>Google Adsense Hacks: Tips to Improve your eCPM with Google Adsense</title>
		<link>http://blog.pdf-search.org/seo-soso/google-adsense-hacks-tips-to-improve-your-ecpm-with-google-adsense/704</link>
		<comments>http://blog.pdf-search.org/seo-soso/google-adsense-hacks-tips-to-improve-your-ecpm-with-google-adsense/704#comments</comments>
		<pubDate>Wed, 28 Oct 2009 03:37:04 +0000</pubDate>
		<dc:creator>john</dc:creator>
				<category><![CDATA[SEO soso..]]></category>

		<guid isPermaLink="false">http://blog.pdf-search.org/?p=704</guid>
		<description><![CDATA[Getting the most out of Google Adsense doesn’t involve anything dirty or deceptive.  It involves implementing Adsense the way Google
 intended.  These “hacks” will help you get the most money out of your content, and will do so with in the TOS of Adsense.
Google Adsense Hack #1:  Write Good Content
This would seem like a no-brainer but [...]


Related posts:<ol><li><a href='http://blog.pdf-search.org/tech-news/google-goes-for-speed-security-in-chrome-os/752' rel='bookmark' title='Permanent Link: Google goes for speed, security in Chrome OS'>Google goes for speed, security in Chrome OS</a></li><li><a href='http://blog.pdf-search.org/tech-news/googles-chrome-os-hits-bittorrent/741' rel='bookmark' title='Permanent Link: Google&#8217;s Chrome OS hits BitTorrent'>Google&#8217;s Chrome OS hits BitTorrent</a></li></ol>]]></description>
			<content:encoded><![CDATA[<p>Getting the most out of Google Adsense doesn’t involve anything dirty or deceptive.  It involves implementing Adsense the way Google<br />
 intended.  These “hacks” will help you get the most money out of your content, and will do so with in the TOS of Adsense.</p>
<p>Google Adsense Hack #1:  Write Good Content</p>
<p>This would seem like a no-brainer but a lot of people forget that the ads are contextual.  If you don’t write long enough posts, or posts with sufficient keyword density you will not get ads that make sense for your content.  So good SEO is also good for eCPM’s.</p>
<p>Google Adsense Hack #2:  Write Good Clean Content</p>
<p>Sure you write to an urban hip hop audience and the F-bomb is just part of the gait of your speech, but those words on the negative keyword list<br />
 are driving your eCPM’s down and getting you more PSA’s than ads.  There are a whole host of words I’m not going to list which you should avoid.  Most are four letters, but some of them are just words that are negative, so try and avoid implying something is less than stellar unless you can do it in long phrases that the robots can’t understand.<span id="more-704"></span></p>
<p>Google Adsense Hack #3:  Put Ads Where They Will Be Seen</p>
<p>Putting one of your 3 Google Adsense blocks in the footer, where it isn’t likely to get seen, isn’t going to make you any money.  Put your ads in or near your content.  Don’t try and put them next to the scroll bar hoping for mis-clicks, just put them where they can be seen.  And make them a color scheme that compliments your site, but makes them stand out. Not stick out like a sore thumb, just make sure they aren’t invisible.</p>
<p>Google Adsense Hack #4:  Help Google Find Context</p>
<p>Most people don’t know that they can tell Google what part of their content is important, and relevant.</p>
<p>Using the tags</p>
<p>google_ad_section_start</p>
<p>google_ad_section_end</p>
<p>You can tell Google which part of your page is the content.  This helps a lot if you, like me have related content on the page.  I wouldn’t want Google to look at my category list to pick what to run ads for, because you are reading about SEO and SEM for this post, so those posts, on Camping are probably not what the ads should target.</p>
<p>Google Adsense Hack #5:  Black Lists</p>
<p>See an ad which you know will never convert with your audience? Black list it.  I even black list high paying ads if I don’t agree with  what they are selling.  Like the Daily Horoscopes SMS messaged to your Cell phone for only $1 a day… I’m sorry no one should pay $30 a month for a computer generated horoscope.</p>
<p>Google Adsense Hack #6:  Use Link Units Sparingly</p>
<p>Link Units don’t pay as much per click.  As a result you have to consider when they are a good investment of space.  On a page which is less content dense a link unit may perform really well. I also find that they do really well on pages which are driven by search traffic which doesn’t really relate to what is on the page.  I have a page about how I’m an Omniscient Deity of Video.  It is the top hit for “Omniscient Deity”.  People who arrive on this page are rarely looking for me, or things related to video.  Poof a great choice for a link unit.</p>
<p>Link units also do well on pages which are about things that don’t monetize well.  Or pages that are only getting PSA’s. </p>
<p>Google Adsense Hack #7:  Guard the Golden Egg</p>
<p>So another ad company which doesn’t do CPC ads offers you 75 cent CPM’s to add their ad to your page.  I’m getting $29 eCPM’s on my pages meaning an adblock is worth $9.50 or so eCPM.  I could take that 75 cents and if the ad was placed some where no one would see it, I Might make an extra 50 cent CPM, because it wouldn’t hurt my CPC ads that much.  But if I put it somewhere that made sense for the advertiser likely it wouldn’t make sense for me.</p>
<p>I am careful to way risk/reward when using other advertisers.  Having a back up so that you don’t get PSA’s is a good idea.  Because 75 cent CPM’s vs 0 cents PSA’s is a good deal.  But be careful not to cannibalize the Goose, in hopes of getting something that will actually lower your over all performance.</p>
<p>Google Adsense Hack #8:  Experiment</p>
<p>It takes a couple of tries to get your ad layout and colors right.  So change them up from time to time to see what works.</p>
<p>Google Adsense Hack #9:  Change Them Up</p>
<p>Just because you did what I said in #8, doesn’t mean you are done.  If you have loyal readership you need to move the ads every so often, so that they aren’t always in the same place, or your readers will learn to ignore the ads.  They get so used to the lay out of your page they stop looking at the places that aren’t the content.</p>
<p>It’s too Cliche to do 10</p>
<p>I’m only doing 9 Hacks/Tips for Google Adsense because everyone does top 10’s.  This should help you out, and while it is not comprehensive it is a really good start and I didn’t see any articles that laid things out this simply, and this completely. </p>
<p>Tags: cellular telephone;, cent;, Google;, search traffic;, SMS;, USD;<br />
You can skip to the end and leave a response. Pinging is currently not allowed.</p>


<p>Related posts:<ol><li><a href='http://blog.pdf-search.org/tech-news/google-goes-for-speed-security-in-chrome-os/752' rel='bookmark' title='Permanent Link: Google goes for speed, security in Chrome OS'>Google goes for speed, security in Chrome OS</a></li><li><a href='http://blog.pdf-search.org/tech-news/googles-chrome-os-hits-bittorrent/741' rel='bookmark' title='Permanent Link: Google&#8217;s Chrome OS hits BitTorrent'>Google&#8217;s Chrome OS hits BitTorrent</a></li></ol></p>]]></content:encoded>
			<wfw:commentRss>http://blog.pdf-search.org/seo-soso/google-adsense-hacks-tips-to-improve-your-ecpm-with-google-adsense/704/feed</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Best Practices for Speeding Up Your Web Site</title>
		<link>http://blog.pdf-search.org/seo-soso/best-practices-for-speeding-up-your-web-site/513</link>
		<comments>http://blog.pdf-search.org/seo-soso/best-practices-for-speeding-up-your-web-site/513#comments</comments>
		<pubDate>Thu, 10 Sep 2009 06:23:27 +0000</pubDate>
		<dc:creator>john</dc:creator>
				<category><![CDATA[SEO soso..]]></category>

		<guid isPermaLink="false">http://blog.pdf-search.org/?p=513</guid>
		<description><![CDATA[Minimize HTTP Requests
tag: content
80% of the end-user response time is spent on the front-end. Most of this time is tied up in downloading all the components in the page: images, stylesheets, scripts, Flash, etc. Reducing the number of components in turn reduces the number of HTTP requests required to render the page. This is the [...]


Related posts:<ol><li><a href='http://blog.pdf-search.org/tech-news/opera-in-top-secret-iphone-talks/727' rel='bookmark' title='Permanent Link: Opera in top secret iPhone talks?'>Opera in top secret iPhone talks?</a></li></ol>]]></description>
			<content:encoded><![CDATA[<h3 id="num_http" style="margin-top: 1em; margin-right: 0px; margin-bottom: 0.4em; margin-left: 0px; font-size: 21px; font-weight: normal; color: #ff8800; padding: 0px;">Minimize HTTP Requests</h3>
<p style="padding-top: 0px; padding-right: 0px; padding-bottom: 1em; padding-left: 0px; line-height: 1.49em; margin: 0px;">tag: content</p>
<p style="padding-top: 0px; padding-right: 0px; padding-bottom: 1em; padding-left: 0px; line-height: 1.49em; margin: 0px;">80% of the end-user response time is spent on the front-end. Most of this time is tied up in downloading all the components in the page: images, stylesheets, scripts, Flash, etc. Reducing the number of components in turn reduces the number of HTTP requests required to render the page. This is the key to faster pages.</p>
<p style="padding-top: 0px; padding-right: 0px; padding-bottom: 1em; padding-left: 0px; line-height: 1.49em; margin: 0px;">One way to reduce the number of components in the page is to simplify the page&#8217;s design. But is there a way to build pages with richer content while also achieving fast response times? Here are some techniques for reducing the number of HTTP requests, while still supporting rich page designs.</p>
<p style="padding-top: 0px; padding-right: 0px; padding-bottom: 1em; padding-left: 0px; line-height: 1.49em; margin: 0px;"><strong style="font-style: normal; font-weight: bold;">Combined files</strong> are a way to reduce the number of HTTP requests by combining all scripts into a single script, and similarly combining all CSS into a single stylesheet. Combining files is more challenging when the scripts and stylesheets vary from page to page, but making this part of your release process improves response times.</p>
<p style="padding-top: 0px; padding-right: 0px; padding-bottom: 1em; padding-left: 0px; line-height: 1.49em; margin: 0px;"><a style="color: #006ca2; text-decoration: none;" href="http://alistapart.com/articles/sprites"><strong style="font-style: normal; font-weight: bold;">CSS Sprites</strong></a> are the preferred method for reducing the number of image requests. Combine your background images into a single image and use the CSS <code style="font-style: normal; font-weight: normal; font-family: monospace; line-height: 13px; padding: 0px; margin: 0px;">background-image</code> and <code style="font-style: normal; font-weight: normal; font-family: monospace; line-height: 13px; padding: 0px; margin: 0px;">background-position</code> properties to display the desired image segment.</p>
<p style="padding-top: 0px; padding-right: 0px; padding-bottom: 1em; padding-left: 0px; line-height: 1.49em; margin: 0px;"><a style="color: #006ca2; text-decoration: none;" href="http://www.w3.org/TR/html401/struct/objects.html#h-13.6"><strong style="font-style: normal; font-weight: bold;">Image maps</strong></a> combine multiple images into a single image. The overall size is about the same, but reducing the number of HTTP requests speeds up the page. Image maps only work if the images are contiguous in the page, such as a navigation bar. Defining the coordinates of image maps can be tedious and error prone. Using image maps for navigation is not accessible too, so it&#8217;s not recommended.</p>
<p style="padding-top: 0px; padding-right: 0px; padding-bottom: 1em; padding-left: 0px; line-height: 1.49em; margin: 0px;"><strong style="font-style: normal; font-weight: bold;">Inline images</strong> use the <a style="color: #006ca2; text-decoration: none;" href="http://tools.ietf.org/html/rfc2397"><code style="font-style: normal; font-weight: normal; font-family: monospace; line-height: 13px; padding: 0px; margin: 0px;">data:</code> URL scheme</a> to embed the image data in the actual page. This can increase the size of your HTML document. Combining inline images into your (cached) stylesheets is a way to reduce HTTP requests and avoid increasing the size of your pages. Inline images are not yet supported across all major browsers.</p>
<p style="padding-top: 0px; padding-right: 0px; padding-bottom: 1em; padding-left: 0px; line-height: 1.49em; margin: 0px;">Reducing the number of HTTP requests in your page is the place to start. This is the most important guideline for improving performance for first time visitors. As described in Tenni Theurer&#8217;s blog post <a style="color: #006ca2; text-decoration: none;" href="http://yuiblog.com/blog/2007/01/04/performance-research-part-2/">Browser Cache Usage &#8211; Exposed!</a>, 40-60% of daily visitors to your site come in with an empty cache. Making your page fast for these first time visitors is key to a better user experience.</p>
<p style="padding-top: 0px; padding-right: 0px; padding-bottom: 1em; padding-left: 0px; line-height: 1.49em; margin: 0px;"><a style="color: #006ca2; text-decoration: none;" href="http://developer.yahoo.net/blog/archives/2007/04/rule_1_make_few.html"></a></p>
<h3 id="cdn" style="margin-top: 1em; margin-right: 0px; margin-bottom: 0.4em; margin-left: 0px; font-size: 21px; font-weight: normal; color: #ff8800; padding: 0px;">Use a Content Delivery Network</h3>
<p style="padding-top: 0px; padding-right: 0px; padding-bottom: 1em; padding-left: 0px; line-height: 1.49em; margin: 0px;">tag: server</p>
<p style="padding-top: 0px; padding-right: 0px; padding-bottom: 1em; padding-left: 0px; line-height: 1.49em; margin: 0px;">The user&#8217;s proximity to your web server has an impact on response times. Deploying your content across multiple, geographically dispersed servers will make your pages load faster from the user&#8217;s perspective. But where should you start?</p>
<p style="padding-top: 0px; padding-right: 0px; padding-bottom: 1em; padding-left: 0px; line-height: 1.49em; margin: 0px;">As a first step to implementing geographically dispersed content, don&#8217;t attempt to redesign your web application to work in a distributed architecture. Depending on the application, changing the architecture could include daunting tasks such as synchronizing session state and replicating database transactions across server locations. Attempts to reduce the distance between users and your content could be delayed by, or never pass, this application architecture step.</p>
<p style="padding-top: 0px; padding-right: 0px; padding-bottom: 1em; padding-left: 0px; line-height: 1.49em; margin: 0px;">Remember that 80-90% of the end-user response time is spent downloading all the components in the page: images, stylesheets, scripts, Flash, etc. This is the <em style="font-style: italic; font-weight: normal;">Performance Golden Rule</em>. Rather than starting with the difficult task of redesigning your application architecture, it&#8217;s better to first disperse your static content. This not only achieves a bigger reduction in response times, but it&#8217;s easier thanks to content delivery networks.</p>
<p style="padding-top: 0px; padding-right: 0px; padding-bottom: 1em; padding-left: 0px; line-height: 1.49em; margin: 0px;">A content delivery network (CDN) is a collection of web servers distributed across multiple locations to deliver content more efficiently to users. The server selected for delivering content to a specific user is typically based on a measure of network proximity. For example, the server with the fewest network hops or the server with the quickest response time is chosen.</p>
<p style="padding-top: 0px; padding-right: 0px; padding-bottom: 1em; padding-left: 0px; line-height: 1.49em; margin: 0px;">Some large Internet companies own their own CDN, but it&#8217;s cost-effective to use a CDN service provider, such as<a style="color: #006ca2; text-decoration: none;" href="http://www.akamai.com/">Akamai Technologies</a>, <a style="color: #006ca2; text-decoration: none;" href="http://www.mirror-image.com/">Mirror Image Internet</a>, or <a style="color: #006ca2; text-decoration: none;" href="http://www.limelightnetworks.com/">Limelight Networks</a>. For start-up companies and private web sites, the cost of a CDN service can be prohibitive, but as your target audience grows larger and becomes more global, a CDN is necessary to achieve fast response times. At Yahoo!, properties that moved static content off their application web servers to a CDN improved end-user response times by 20% or more. Switching to a CDN is a relatively easy code change that will dramatically improve the speed of your web site.</p>
<p style="padding-top: 0px; padding-right: 0px; padding-bottom: 1em; padding-left: 0px; line-height: 1.49em; margin: 0px;"><a style="color: #006ca2; text-decoration: none;" href="http://developer.yahoo.net/blog/archives/2007/04/high_performanc_1.html"></a></p>
<h3 id="expires" style="margin-top: 1em; margin-right: 0px; margin-bottom: 0.4em; margin-left: 0px; font-size: 21px; font-weight: normal; color: #ff8800; padding: 0px;">Add an Expires or a Cache-Control Header</h3>
<p style="padding-top: 0px; padding-right: 0px; padding-bottom: 1em; padding-left: 0px; line-height: 1.49em; margin: 0px;">tag: server</p>
<p style="padding-top: 0px; padding-right: 0px; padding-bottom: 1em; padding-left: 0px; line-height: 1.49em; margin: 0px;">There are two things in this rule:</p>
<ul style="padding: 0px; margin: 0px;">
<li style="margin-top: 0px; margin-right: 0px; margin-bottom: 0.3em; margin-left: 2em; list-style-type: disc; list-style-position: initial; list-style-image: initial; padding: 0px;">For static components: implement &#8220;Never expire&#8221; policy by setting far future <code style="font-style: normal; font-weight: normal; font-family: monospace; line-height: 13px; padding: 0px; margin: 0px;">Expires</code> header</li>
<li style="margin-top: 0px; margin-right: 0px; margin-bottom: 0.3em; margin-left: 2em; list-style-type: disc; list-style-position: initial; list-style-image: initial; padding: 0px;">For dynamic components: use an appropriate <code style="font-style: normal; font-weight: normal; font-family: monospace; line-height: 13px; padding: 0px; margin: 0px;">Cache-Control</code> header to help the browser with conditional requests</li>
</ul>
<p style="padding-top: 0px; padding-right: 0px; padding-bottom: 1em; padding-left: 0px; line-height: 1.49em; margin: 0px;">Web page designs are getting richer and richer, which means more scripts, stylesheets, images, and Flash in the page. A first-time visitor to your page may have to make several HTTP requests, but by using the Expires header you make those components cacheable. This avoids unnecessary HTTP requests on subsequent page views. Expires headers are most often used with images, but they should be used on <em style="font-style: italic; font-weight: normal;">all</em> components including scripts, stylesheets, and Flash components.</p>
<p style="padding-top: 0px; padding-right: 0px; padding-bottom: 1em; padding-left: 0px; line-height: 1.49em; margin: 0px;">Browsers (and proxies) use a cache to reduce the number and size of HTTP requests, making web pages load faster. A web server uses the Expires header in the HTTP response to tell the client how long a component can be cached. This is a far future Expires header, telling the browser that this response won&#8217;t be stale until April 15, 2010.</p>
<p style="padding-top: 0px; padding-right: 0px; padding-bottom: 1em; padding-left: 0px; line-height: 1.49em; margin: 0px;"> </p>
<pre style="font-family: monospace; line-height: 13px; padding: 0px; margin: 0px;">      Expires: Thu, 15 Apr 2010 20:00:00 GMT</pre>
<p style="padding-top: 0px; padding-right: 0px; padding-bottom: 1em; padding-left: 0px; line-height: 1.49em; margin: 0px;"> </p>
<p style="padding-top: 0px; padding-right: 0px; padding-bottom: 1em; padding-left: 0px; line-height: 1.49em; margin: 0px;">If your server is Apache, use the ExpiresDefault directive to set an expiration date relative to the current date. This example of the ExpiresDefault directive sets the Expires date 10 years out from the time of the request.</p>
<p style="padding-top: 0px; padding-right: 0px; padding-bottom: 1em; padding-left: 0px; line-height: 1.49em; margin: 0px;"> </p>
<pre style="font-family: monospace; line-height: 13px; padding: 0px; margin: 0px;">      ExpiresDefault "access plus 10 years"</pre>
<p style="padding-top: 0px; padding-right: 0px; padding-bottom: 1em; padding-left: 0px; line-height: 1.49em; margin: 0px;"> </p>
<p style="padding-top: 0px; padding-right: 0px; padding-bottom: 1em; padding-left: 0px; line-height: 1.49em; margin: 0px;">Keep in mind, if you use a far future Expires header you have to change the component&#8217;s filename whenever the component changes. At Yahoo! we often make this step part of the build process: a version number is embedded in the component&#8217;s filename, for example, yahoo_2.0.6.js.</p>
<p style="padding-top: 0px; padding-right: 0px; padding-bottom: 1em; padding-left: 0px; line-height: 1.49em; margin: 0px;">Using a far future Expires header affects page views only after a user has already visited your site. It has no effect on the number of HTTP requests when a user visits your site for the first time and the browser&#8217;s cache is empty. Therefore the impact of this performance improvement depends on how often users hit your pages with a primed cache. (A &#8220;primed cache&#8221; already contains all of the components in the page.) We <a style="color: #006ca2; text-decoration: none;" href="http://yuiblog.com/blog/2007/01/04/performance-research-part-2/">measured this at Yahoo!</a> and found the number of page views with a primed cache is 75-85%. By using a far future Expires header, you increase the number of components that are cached by the browser and re-used on subsequent page views without sending a single byte over the user&#8217;s Internet connection.</p>
<p style="padding-top: 0px; padding-right: 0px; padding-bottom: 1em; padding-left: 0px; line-height: 1.49em; margin: 0px;"><a style="color: #006ca2; text-decoration: none;" href="http://developer.yahoo.net/blog/archives/2007/05/high_performanc_2.html"></a></p>
<h3 id="gzip" style="margin-top: 1em; margin-right: 0px; margin-bottom: 0.4em; margin-left: 0px; font-size: 21px; font-weight: normal; color: #ff8800; padding: 0px;">Gzip Components</h3>
<p style="padding-top: 0px; padding-right: 0px; padding-bottom: 1em; padding-left: 0px; line-height: 1.49em; margin: 0px;">tag: server</p>
<p style="padding-top: 0px; padding-right: 0px; padding-bottom: 1em; padding-left: 0px; line-height: 1.49em; margin: 0px;">The time it takes to transfer an HTTP request and response across the network can be significantly reduced by decisions made by front-end engineers. It&#8217;s true that the end-user&#8217;s bandwidth speed, Internet service provider, proximity to peering exchange points, etc. are beyond the control of the development team. But there are other variables that affect response times. Compression reduces response times by reducing the size of the HTTP response.</p>
<p style="padding-top: 0px; padding-right: 0px; padding-bottom: 1em; padding-left: 0px; line-height: 1.49em; margin: 0px;">Starting with HTTP/1.1, web clients indicate support for compression with the Accept-Encoding header in the HTTP request.</p>
<pre style="font-family: monospace; line-height: 13px; padding: 0px; margin: 0px;">      Accept-Encoding: gzip, deflate</pre>
<p style="padding-top: 0px; padding-right: 0px; padding-bottom: 1em; padding-left: 0px; line-height: 1.49em; margin: 0px;">If the web server sees this header in the request, it may compress the response using one of the methods listed by the client. The web server notifies the web client of this via the Content-Encoding header in the response.</p>
<pre style="font-family: monospace; line-height: 13px; padding: 0px; margin: 0px;">      Content-Encoding: gzip</pre>
<p style="padding-top: 0px; padding-right: 0px; padding-bottom: 1em; padding-left: 0px; line-height: 1.49em; margin: 0px;">Gzip is the most popular and effective compression method at this time. It was developed by the GNU project and standardized by <a style="color: #006ca2; text-decoration: none;" href="http://www.ietf.org/rfc/rfc1952.txt">RFC 1952</a>. The only other compression format you&#8217;re likely to see is deflate, but it&#8217;s less effective and less popular.</p>
<p style="padding-top: 0px; padding-right: 0px; padding-bottom: 1em; padding-left: 0px; line-height: 1.49em; margin: 0px;">Gzipping generally reduces the response size by about 70%. Approximately 90% of today&#8217;s Internet traffic travels through browsers that claim to support gzip. If you use Apache, the module configuring gzip depends on your version: Apache 1.3 uses <a style="color: #006ca2; text-decoration: none;" href="http://sourceforge.net/projects/mod-gzip/">mod_gzip</a> while Apache 2.x uses <a style="color: #006ca2; text-decoration: none;" href="http://httpd.apache.org/docs/2.0/mod/mod_deflate.html">mod_deflate</a>.</p>
<p style="padding-top: 0px; padding-right: 0px; padding-bottom: 1em; padding-left: 0px; line-height: 1.49em; margin: 0px;">There are known issues with browsers and proxies that may cause a mismatch in what the browser expects and what it receives with regard to compressed content. Fortunately, these edge cases are dwindling as the use of older browsers drops off. The Apache modules help out by adding appropriate Vary response headers automatically.</p>
<p style="padding-top: 0px; padding-right: 0px; padding-bottom: 1em; padding-left: 0px; line-height: 1.49em; margin: 0px;">Servers choose what to gzip based on file type, but are typically too limited in what they decide to compress. Most web sites gzip their HTML documents. It&#8217;s also worthwhile to gzip your scripts and stylesheets, but many web sites miss this opportunity. In fact, it&#8217;s worthwhile to compress any text response including XML and JSON. Image and PDF files should not be gzipped because they are already compressed. Trying to gzip them not only wastes CPU but can potentially increase file sizes.</p>
<p style="padding-top: 0px; padding-right: 0px; padding-bottom: 1em; padding-left: 0px; line-height: 1.49em; margin: 0px;">Gzipping as many file types as possible is an easy way to reduce page weight and accelerate the user experience.</p>
<p style="padding-top: 0px; padding-right: 0px; padding-bottom: 1em; padding-left: 0px; line-height: 1.49em; margin: 0px;"><a style="color: #006ca2; text-decoration: none;" href="http://developer.yahoo.net/blog/archives/2007/07/high_performanc_3.html"></a></p>
<h3 id="css_top" style="margin-top: 1em; margin-right: 0px; margin-bottom: 0.4em; margin-left: 0px; font-size: 21px; font-weight: normal; color: #ff8800; padding: 0px;">Put Stylesheets at the Top</h3>
<p style="padding-top: 0px; padding-right: 0px; padding-bottom: 1em; padding-left: 0px; line-height: 1.49em; margin: 0px;">tag: css</p>
<p style="padding-top: 0px; padding-right: 0px; padding-bottom: 1em; padding-left: 0px; line-height: 1.49em; margin: 0px;">While researching performance at Yahoo!, we discovered that moving stylesheets to the document HEAD makes pages <em style="font-style: italic; font-weight: normal;">appear</em> to be loading faster. This is because putting stylesheets in the HEAD allows the page to render progressively.</p>
<p style="padding-top: 0px; padding-right: 0px; padding-bottom: 1em; padding-left: 0px; line-height: 1.49em; margin: 0px;">Front-end engineers that care about performance want a page to load progressively; that is, we want the browser to display whatever content it has as soon as possible. This is especially important for pages with a lot of content and for users on slower Internet connections. The importance of giving users visual feedback, such as progress indicators, has been well researched and <a style="color: #006ca2; text-decoration: none;" href="http://www.useit.com/papers/responsetime.html">documented</a>. In our case the HTML page is the progress indicator! When the browser loads the page progressively the header, the navigation bar, the logo at the top, etc. all serve as visual feedback for the user who is waiting for the page. This improves the overall user experience.</p>
<p style="padding-top: 0px; padding-right: 0px; padding-bottom: 1em; padding-left: 0px; line-height: 1.49em; margin: 0px;">The problem with putting stylesheets near the bottom of the document is that it prohibits progressive rendering in many browsers, including Internet Explorer. These browsers block rendering to avoid having to redraw elements of the page if their styles change. The user is stuck viewing a blank white page.</p>
<p style="padding-top: 0px; padding-right: 0px; padding-bottom: 1em; padding-left: 0px; line-height: 1.49em; margin: 0px;">The <a style="color: #006ca2; text-decoration: none;" href="http://www.w3.org/TR/html4/struct/links.html#h-12.3">HTML specification</a> clearly states that stylesheets are to be included in the HEAD of the page: &#8220;Unlike A, [LINK] may only appear in the HEAD section of a document, although it may appear any number of times.&#8221; Neither of the alternatives, the blank white screen or flash of unstyled content, are worth the risk. The optimal solution is to follow the HTML specification and load your stylesheets in the document HEAD.</p>
<p style="padding-top: 0px; padding-right: 0px; padding-bottom: 1em; padding-left: 0px; line-height: 1.49em; margin: 0px;"><a style="color: #006ca2; text-decoration: none;" href="http://developer.yahoo.net/blog/archives/2007/07/high_performanc_4.html"></a></p>
<h3 id="js_bottom" style="margin-top: 1em; margin-right: 0px; margin-bottom: 0.4em; margin-left: 0px; font-size: 21px; font-weight: normal; color: #ff8800; padding: 0px;">Put Scripts at the Bottom</h3>
<p style="padding-top: 0px; padding-right: 0px; padding-bottom: 1em; padding-left: 0px; line-height: 1.49em; margin: 0px;">tag: javascript</p>
<p style="padding-top: 0px; padding-right: 0px; padding-bottom: 1em; padding-left: 0px; line-height: 1.49em; margin: 0px;">The problem caused by scripts is that they block parallel downloads. The <a style="color: #006ca2; text-decoration: none;" href="http://www.w3.org/Protocols/rfc2616/rfc2616-sec8.html#sec8.1.4">HTTP/1.1 specification</a> suggests that browsers download no more than two components in parallel per hostname. If you serve your images from multiple hostnames, you can get more than two downloads to occur in parallel. While a script is downloading, however, the browser won&#8217;t start any other downloads, even on different hostnames.</p>
<p style="padding-top: 0px; padding-right: 0px; padding-bottom: 1em; padding-left: 0px; line-height: 1.49em; margin: 0px;">In some situations it&#8217;s not easy to move scripts to the bottom. If, for example, the script uses <code style="font-style: normal; font-weight: normal; font-family: monospace; line-height: 13px; padding: 0px; margin: 0px;">document.write</code> to insert part of the page&#8217;s content, it can&#8217;t be moved lower in the page. There might also be scoping issues. In many cases, there are ways to workaround these situations.</p>
<p style="padding-top: 0px; padding-right: 0px; padding-bottom: 1em; padding-left: 0px; line-height: 1.49em; margin: 0px;">An alternative suggestion that often comes up is to use deferred scripts. The <code style="font-style: normal; font-weight: normal; font-family: monospace; line-height: 13px; padding: 0px; margin: 0px;">DEFER</code> attribute indicates that the script does not contain document.write, and is a clue to browsers that they can continue rendering. Unfortunately, Firefox doesn&#8217;t support the <code style="font-style: normal; font-weight: normal; font-family: monospace; line-height: 13px; padding: 0px; margin: 0px;">DEFER</code> attribute. In Internet Explorer, the script may be deferred, but not as much as desired. If a script can be deferred, it can also be moved to the bottom of the page. That will make your web pages load faster.</p>
<p style="padding-top: 0px; padding-right: 0px; padding-bottom: 1em; padding-left: 0px; line-height: 1.49em; margin: 0px;"><a style="color: #006ca2; text-decoration: none;" href="http://developer.yahoo.net/blog/archives/2007/07/high_performanc_5.html"></a></p>
<h3 id="css_expressions" style="margin-top: 1em; margin-right: 0px; margin-bottom: 0.4em; margin-left: 0px; font-size: 21px; font-weight: normal; color: #ff8800; padding: 0px;">Avoid CSS Expressions</h3>
<p style="padding-top: 0px; padding-right: 0px; padding-bottom: 1em; padding-left: 0px; line-height: 1.49em; margin: 0px;">tag: css</p>
<p style="padding-top: 0px; padding-right: 0px; padding-bottom: 1em; padding-left: 0px; line-height: 1.49em; margin: 0px;">CSS expressions are a powerful (and dangerous) way to set CSS properties dynamically. They&#8217;re supported in Internet Explorer, starting with <a style="color: #006ca2; text-decoration: none;" href="http://msdn.microsoft.com/workshop/author/dhtml/overview/recalc.asp">version 5</a>. As an example, the background color could be set to alternate every hour using CSS expressions.</p>
<p style="padding-top: 0px; padding-right: 0px; padding-bottom: 1em; padding-left: 0px; line-height: 1.49em; margin: 0px;"> </p>
<pre style="font-family: monospace; line-height: 13px; padding: 0px; margin: 0px;">      background-color: expression( (new Date()).getHours()%2 ? "#B8D4FF" : "#F08A00" );</pre>
<p style="padding-top: 0px; padding-right: 0px; padding-bottom: 1em; padding-left: 0px; line-height: 1.49em; margin: 0px;"> </p>
<p style="padding-top: 0px; padding-right: 0px; padding-bottom: 1em; padding-left: 0px; line-height: 1.49em; margin: 0px;">As shown here, the <code style="font-style: normal; font-weight: normal; font-family: monospace; line-height: 13px; padding: 0px; margin: 0px;">expression</code> method accepts a JavaScript expression. The CSS property is set to the result of evaluating the JavaScript expression. The <code style="font-style: normal; font-weight: normal; font-family: monospace; line-height: 13px; padding: 0px; margin: 0px;">expression</code> method is ignored by other browsers, so it is useful for setting properties in Internet Explorer needed to create a consistent experience across browsers.</p>
<p style="padding-top: 0px; padding-right: 0px; padding-bottom: 1em; padding-left: 0px; line-height: 1.49em; margin: 0px;">The problem with expressions is that they are evaluated more frequently than most people expect. Not only are they evaluated when the page is rendered and resized, but also when the page is scrolled and even when the user moves the mouse over the page. Adding a counter to the CSS expression allows us to keep track of when and how often a CSS expression is evaluated. Moving the mouse around the page can easily generate more than 10,000 evaluations.</p>
<p style="padding-top: 0px; padding-right: 0px; padding-bottom: 1em; padding-left: 0px; line-height: 1.49em; margin: 0px;">One way to reduce the number of times your CSS expression is evaluated is to use one-time expressions, where the first time the expression is evaluated it sets the style property to an explicit value, which replaces the CSS expression. If the style property must be set dynamically throughout the life of the page, using event handlers instead of CSS expressions is an alternative approach. If you must use CSS expressions, remember that they may be evaluated thousands of times and could affect the performance of your page.</p>
<p style="padding-top: 0px; padding-right: 0px; padding-bottom: 1em; padding-left: 0px; line-height: 1.49em; margin: 0px;"><a style="color: #006ca2; text-decoration: none;" href="http://developer.yahoo.net/blog/archives/2007/07/high_performanc_6.html"></a></p>
<h3 id="external" style="margin-top: 1em; margin-right: 0px; margin-bottom: 0.4em; margin-left: 0px; font-size: 21px; font-weight: normal; color: #ff8800; padding: 0px;">Make JavaScript and CSS External</h3>
<p style="padding-top: 0px; padding-right: 0px; padding-bottom: 1em; padding-left: 0px; line-height: 1.49em; margin: 0px;">tag: javascript, css</p>
<p style="padding-top: 0px; padding-right: 0px; padding-bottom: 1em; padding-left: 0px; line-height: 1.49em; margin: 0px;">Many of these performance rules deal with how external components are managed. However, before these considerations arise you should ask a more basic question: Should JavaScript and CSS be contained in external files, or inlined in the page itself?</p>
<p style="padding-top: 0px; padding-right: 0px; padding-bottom: 1em; padding-left: 0px; line-height: 1.49em; margin: 0px;">Using external files in the real world generally produces faster pages because the JavaScript and CSS files are cached by the browser. JavaScript and CSS that are inlined in HTML documents get downloaded every time the HTML document is requested. This reduces the number of HTTP requests that are needed, but increases the size of the HTML document. On the other hand, if the JavaScript and CSS are in external files cached by the browser, the size of the HTML document is reduced without increasing the number of HTTP requests.</p>
<p style="padding-top: 0px; padding-right: 0px; padding-bottom: 1em; padding-left: 0px; line-height: 1.49em; margin: 0px;">The key factor, then, is the frequency with which external JavaScript and CSS components are cached relative to the number of HTML documents requested. This factor, although difficult to quantify, can be gauged using various metrics. If users on your site have multiple page views per session and many of your pages re-use the same scripts and stylesheets, there is a greater potential benefit from cached external files.</p>
<p style="padding-top: 0px; padding-right: 0px; padding-bottom: 1em; padding-left: 0px; line-height: 1.49em; margin: 0px;">Many web sites fall in the middle of these metrics. For these sites, the best solution generally is to deploy the JavaScript and CSS as external files. The only exception where inlining is preferable is with home pages, such as<a style="color: #006ca2; text-decoration: none;" href="http://www.yahoo.com/">Yahoo!&#8217;s front page</a> and <a style="color: #006ca2; text-decoration: none;" href="http://my.yahoo.com/">My Yahoo!</a>. Home pages that have few (perhaps only one) page view per session may find that inlining JavaScript and CSS results in faster end-user response times.</p>
<p style="padding-top: 0px; padding-right: 0px; padding-bottom: 1em; padding-left: 0px; line-height: 1.49em; margin: 0px;">For front pages that are typically the first of many page views, there are techniques that leverage the reduction of HTTP requests that inlining provides, as well as the caching benefits achieved through using external files. One such technique is to inline JavaScript and CSS in the front page, but dynamically download the external files after the page has finished loading. Subsequent pages would reference the external files that should already be in the browser&#8217;s cache.</p>
<p style="padding-top: 0px; padding-right: 0px; padding-bottom: 1em; padding-left: 0px; line-height: 1.49em; margin: 0px;"><a style="color: #006ca2; text-decoration: none;" href="http://developer.yahoo.net/blog/archives/2007/07/rule_8_make_jav.html"></a></p>
<h3 id="dns_lookups" style="margin-top: 1em; margin-right: 0px; margin-bottom: 0.4em; margin-left: 0px; font-size: 21px; font-weight: normal; color: #ff8800; padding: 0px;">Reduce DNS Lookups</h3>
<p style="padding-top: 0px; padding-right: 0px; padding-bottom: 1em; padding-left: 0px; line-height: 1.49em; margin: 0px;">tag: content</p>
<p style="padding-top: 0px; padding-right: 0px; padding-bottom: 1em; padding-left: 0px; line-height: 1.49em; margin: 0px;">The Domain Name System (DNS) maps hostnames to IP addresses, just as phonebooks map people&#8217;s names to their phone numbers. When you type www.yahoo.com into your browser, a DNS resolver contacted by the browser returns that server&#8217;s IP address. DNS has a cost. It typically takes 20-120 milliseconds for DNS to lookup the IP address for a given hostname. The browser can&#8217;t download anything from this hostname until the DNS lookup is completed.</p>
<p style="padding-top: 0px; padding-right: 0px; padding-bottom: 1em; padding-left: 0px; line-height: 1.49em; margin: 0px;">DNS lookups are cached for better performance. This caching can occur on a special caching server, maintained by the user&#8217;s ISP or local area network, but there is also caching that occurs on the individual user&#8217;s computer. The DNS information remains in the operating system&#8217;s DNS cache (the &#8220;DNS Client service&#8221; on Microsoft Windows). Most browsers have their own caches, separate from the operating system&#8217;s cache. As long as the browser keeps a DNS record in its own cache, it doesn&#8217;t bother the operating system with a request for the record.</p>
<p style="padding-top: 0px; padding-right: 0px; padding-bottom: 1em; padding-left: 0px; line-height: 1.49em; margin: 0px;">Internet Explorer caches DNS lookups for 30 minutes by default, as specified by the <code style="font-style: normal; font-weight: normal; font-family: monospace; line-height: 13px; padding: 0px; margin: 0px;">DnsCacheTimeout</code> registry setting. Firefox caches DNS lookups for 1 minute, controlled by the <code style="font-style: normal; font-weight: normal; font-family: monospace; line-height: 13px; padding: 0px; margin: 0px;">network.dnsCacheExpiration</code> configuration setting. (Fasterfox changes this to 1 hour.)</p>
<p style="padding-top: 0px; padding-right: 0px; padding-bottom: 1em; padding-left: 0px; line-height: 1.49em; margin: 0px;">When the client&#8217;s DNS cache is empty (for both the browser and the operating system), the number of DNS lookups is equal to the number of unique hostnames in the web page. This includes the hostnames used in the page&#8217;s URL, images, script files, stylesheets, Flash objects, etc. Reducing the number of unique hostnames reduces the number of DNS lookups.</p>
<p style="padding-top: 0px; padding-right: 0px; padding-bottom: 1em; padding-left: 0px; line-height: 1.49em; margin: 0px;">Reducing the number of unique hostnames has the potential to reduce the amount of parallel downloading that takes place in the page. Avoiding DNS lookups cuts response times, but reducing parallel downloads may increase response times. My guideline is to split these components across at least two but no more than four hostnames. This results in a good compromise between reducing DNS lookups and allowing a high degree of parallel downloads.</p>
<p style="padding-top: 0px; padding-right: 0px; padding-bottom: 1em; padding-left: 0px; line-height: 1.49em; margin: 0px;"><a style="color: #006ca2; text-decoration: none;" href="http://developer.yahoo.net/blog/archives/2007/07/high_performanc_7.html"></a></p>
<h3 id="minify" style="margin-top: 1em; margin-right: 0px; margin-bottom: 0.4em; margin-left: 0px; font-size: 21px; font-weight: normal; color: #ff8800; padding: 0px;">Minify JavaScript and CSS</h3>
<p style="padding-top: 0px; padding-right: 0px; padding-bottom: 1em; padding-left: 0px; line-height: 1.49em; margin: 0px;">tag: javascript, css</p>
<p style="padding-top: 0px; padding-right: 0px; padding-bottom: 1em; padding-left: 0px; line-height: 1.49em; margin: 0px;">Minification is the practice of removing unnecessary characters from code to reduce its size thereby improving load times. When code is minified all comments are removed, as well as unneeded white space characters (space, newline, and tab). In the case of JavaScript, this improves response time performance because the size of the downloaded file is reduced. Two popular tools for minifying JavaScript code are <a style="color: #006ca2; text-decoration: none;" href="http://crockford.com/javascript/jsmin">JSMin</a> and <a style="color: #006ca2; text-decoration: none;" href="http://developer.yahoo.com/yui/compressor/">YUI Compressor</a>. The YUI compressor can also minify CSS.</p>
<p style="padding-top: 0px; padding-right: 0px; padding-bottom: 1em; padding-left: 0px; line-height: 1.49em; margin: 0px;">Obfuscation is an alternative optimization that can be applied to source code. It&#8217;s more complex than minification and thus more likely to generate bugs as a result of the obfuscation step itself. In a survey of ten top U.S. web sites, minification achieved a 21% size reduction versus 25% for obfuscation. Although obfuscation has a higher size reduction, minifying JavaScript is less risky.</p>
<p style="padding-top: 0px; padding-right: 0px; padding-bottom: 1em; padding-left: 0px; line-height: 1.49em; margin: 0px;">In addition to minifying external scripts and styles, inlined <code style="font-style: normal; font-weight: normal; font-family: monospace; line-height: 13px; padding: 0px; margin: 0px;">&lt;script&gt;</code> and <code style="font-style: normal; font-weight: normal; font-family: monospace; line-height: 13px; padding: 0px; margin: 0px;">&lt;style&gt;</code> blocks can and should also be minified. Even if you gzip your scripts and styles, minifying them will still reduce the size by 5% or more. As the use and size of JavaScript and CSS increases, so will the savings gained by minifying your code.</p>
<p style="padding-top: 0px; padding-right: 0px; padding-bottom: 1em; padding-left: 0px; line-height: 1.49em; margin: 0px;"><a style="color: #006ca2; text-decoration: none;" href="http://developer.yahoo.net/blog/archives/2007/07/high_performanc_8.html"></a></p>
<h3 id="redirects" style="margin-top: 1em; margin-right: 0px; margin-bottom: 0.4em; margin-left: 0px; font-size: 21px; font-weight: normal; color: #ff8800; padding: 0px;">Avoid Redirects</h3>
<p style="padding-top: 0px; padding-right: 0px; padding-bottom: 1em; padding-left: 0px; line-height: 1.49em; margin: 0px;">tag: content</p>
<p style="padding-top: 0px; padding-right: 0px; padding-bottom: 1em; padding-left: 0px; line-height: 1.49em; margin: 0px;">Redirects are accomplished using the 301 and 302 status codes. Here&#8217;s an example of the HTTP headers in a 301 response:</p>
<p style="padding-top: 0px; padding-right: 0px; padding-bottom: 1em; padding-left: 0px; line-height: 1.49em; margin: 0px;"> </p>
<pre style="font-family: monospace; line-height: 13px; padding: 0px; margin: 0px;">      HTTP/1.1 301 Moved Permanently
      Location: http://example.com/newuri
      Content-Type: text/html</pre>
<p style="padding-top: 0px; padding-right: 0px; padding-bottom: 1em; padding-left: 0px; line-height: 1.49em; margin: 0px;"> </p>
<p style="padding-top: 0px; padding-right: 0px; padding-bottom: 1em; padding-left: 0px; line-height: 1.49em; margin: 0px;">The browser automatically takes the user to the URL specified in the <code style="font-style: normal; font-weight: normal; font-family: monospace; line-height: 13px; padding: 0px; margin: 0px;">Location</code> field. All the information necessary for a redirect is in the headers. The body of the response is typically empty. Despite their names, neither a 301 nor a 302 response is cached in practice unless additional headers, such as <code style="font-style: normal; font-weight: normal; font-family: monospace; line-height: 13px; padding: 0px; margin: 0px;">Expires</code> or <code style="font-style: normal; font-weight: normal; font-family: monospace; line-height: 13px; padding: 0px; margin: 0px;">Cache-Control</code>, indicate it should be. The meta refresh tag and JavaScript are other ways to direct users to a different URL, but if you must do a redirect, the preferred technique is to use the standard 3xx HTTP status codes, primarily to ensure the back button works correctly.</p>
<p style="padding-top: 0px; padding-right: 0px; padding-bottom: 1em; padding-left: 0px; line-height: 1.49em; margin: 0px;">The main thing to remember is that redirects slow down the user experience. Inserting a redirect between the user and the HTML document delays everything in the page since nothing in the page can be rendered and no components can start being downloaded until the HTML document has arrived.</p>
<p style="padding-top: 0px; padding-right: 0px; padding-bottom: 1em; padding-left: 0px; line-height: 1.49em; margin: 0px;">One of the most wasteful redirects happens frequently and web developers are generally not aware of it. It occurs when a trailing slash (/) is missing from a URL that should otherwise have one. For example, going to<a style="color: #006ca2; text-decoration: none;" href="http://astrology.yahoo.com/astrology">http://astrology.yahoo.com/astrology</a> results in a 301 response containing a redirect to<a style="color: #006ca2; text-decoration: none;" href="http://astrology.yahoo.com/astrology/">http://astrology.yahoo.com/astrology/</a> (notice the added trailing slash). This is fixed in Apache by using <code style="font-style: normal; font-weight: normal; font-family: monospace; line-height: 13px; padding: 0px; margin: 0px;">Alias</code> or<code style="font-style: normal; font-weight: normal; font-family: monospace; line-height: 13px; padding: 0px; margin: 0px;">mod_rewrite</code>, or the <code style="font-style: normal; font-weight: normal; font-family: monospace; line-height: 13px; padding: 0px; margin: 0px;">DirectorySlash</code> directive if you&#8217;re using Apache handlers.</p>
<p style="padding-top: 0px; padding-right: 0px; padding-bottom: 1em; padding-left: 0px; line-height: 1.49em; margin: 0px;">Connecting an old web site to a new one is another common use for redirects. Others include connecting different parts of a website and directing the user based on certain conditions (type of browser, type of user account, etc.). Using a redirect to connect two web sites is simple and requires little additional coding. Although using redirects in these situations reduces the complexity for developers, it degrades the user experience. Alternatives for this use of redirects include using <code style="font-style: normal; font-weight: normal; font-family: monospace; line-height: 13px; padding: 0px; margin: 0px;">Alias</code> and <code style="font-style: normal; font-weight: normal; font-family: monospace; line-height: 13px; padding: 0px; margin: 0px;">mod_rewrite</code> if the two code paths are hosted on the same server. If a domain name change is the cause of using redirects, an alternative is to create a CNAME (a DNS record that creates an alias pointing from one domain name to another) in combination with <code style="font-style: normal; font-weight: normal; font-family: monospace; line-height: 13px; padding: 0px; margin: 0px;">Alias</code> or <code style="font-style: normal; font-weight: normal; font-family: monospace; line-height: 13px; padding: 0px; margin: 0px;">mod_rewrite</code>.</p>
<p style="padding-top: 0px; padding-right: 0px; padding-bottom: 1em; padding-left: 0px; line-height: 1.49em; margin: 0px;"><a style="color: #006ca2; text-decoration: none;" href="http://developer.yahoo.net/blog/archives/2007/07/high_performanc_9.html"></a></p>
<h3 id="js_dupes" style="margin-top: 1em; margin-right: 0px; margin-bottom: 0.4em; margin-left: 0px; font-size: 21px; font-weight: normal; color: #ff8800; padding: 0px;">Remove Duplicate Scripts</h3>
<p style="padding-top: 0px; padding-right: 0px; padding-bottom: 1em; padding-left: 0px; line-height: 1.49em; margin: 0px;">tag: javascript</p>
<p style="padding-top: 0px; padding-right: 0px; padding-bottom: 1em; padding-left: 0px; line-height: 1.49em; margin: 0px;">It hurts performance to include the same JavaScript file twice in one page. This isn&#8217;t as unusual as you might think. A review of the ten top U.S. web sites shows that two of them contain a duplicated script. Two main factors increase the odds of a script being duplicated in a single web page: team size and number of scripts. When it does happen, duplicate scripts hurt performance by creating unnecessary HTTP requests and wasted JavaScript execution.</p>
<p style="padding-top: 0px; padding-right: 0px; padding-bottom: 1em; padding-left: 0px; line-height: 1.49em; margin: 0px;">Unnecessary HTTP requests happen in Internet Explorer, but not in Firefox. In Internet Explorer, if an external script is included twice and is not cacheable, it generates two HTTP requests during page loading. Even if the script is cacheable, extra HTTP requests occur when the user reloads the page.</p>
<p style="padding-top: 0px; padding-right: 0px; padding-bottom: 1em; padding-left: 0px; line-height: 1.49em; margin: 0px;">In addition to generating wasteful HTTP requests, time is wasted evaluating the script multiple times. This redundant JavaScript execution happens in both Firefox and Internet Explorer, regardless of whether the script is cacheable.</p>
<p style="padding-top: 0px; padding-right: 0px; padding-bottom: 1em; padding-left: 0px; line-height: 1.49em; margin: 0px;">One way to avoid accidentally including the same script twice is to implement a script management module in your templating system. The typical way to include a script is to use the SCRIPT tag in your HTML page.</p>
<p style="padding-top: 0px; padding-right: 0px; padding-bottom: 1em; padding-left: 0px; line-height: 1.49em; margin: 0px;"> </p>
<pre style="font-family: monospace; line-height: 13px; padding: 0px; margin: 0px;">      &lt;script type="text/javascript" src="menu_1.0.17.js"&gt;&lt;/script&gt;</pre>
<p style="padding-top: 0px; padding-right: 0px; padding-bottom: 1em; padding-left: 0px; line-height: 1.49em; margin: 0px;">An alternative in PHP would be to create a function called <code style="font-style: normal; font-weight: normal; font-family: monospace; line-height: 13px; padding: 0px; margin: 0px;">insertScript</code>.</p>
<p style="padding-top: 0px; padding-right: 0px; padding-bottom: 1em; padding-left: 0px; line-height: 1.49em; margin: 0px;"> </p>
<pre style="font-family: monospace; line-height: 13px; padding: 0px; margin: 0px;">      &lt;?php insertScript("menu.js") ?&gt;</pre>
<p style="padding-top: 0px; padding-right: 0px; padding-bottom: 1em; padding-left: 0px; line-height: 1.49em; margin: 0px;">In addition to preventing the same script from being inserted multiple times, this function could handle other issues with scripts, such as dependency checking and adding version numbers to script filenames to support far future Expires headers.</p>
<p style="padding-top: 0px; padding-right: 0px; padding-bottom: 1em; padding-left: 0px; line-height: 1.49em; margin: 0px;"><a style="color: #006ca2; text-decoration: none;" href="http://developer.yahoo.net/blog/archives/2007/07/high_performanc_10.html"></a></p>
<h3 id="etags" style="margin-top: 1em; margin-right: 0px; margin-bottom: 0.4em; margin-left: 0px; font-size: 21px; font-weight: normal; color: #ff8800; padding: 0px;">Configure ETags</h3>
<p style="padding-top: 0px; padding-right: 0px; padding-bottom: 1em; padding-left: 0px; line-height: 1.49em; margin: 0px;">tag: server</p>
<p style="padding-top: 0px; padding-right: 0px; padding-bottom: 1em; padding-left: 0px; line-height: 1.49em; margin: 0px;">Entity tags (ETags) are a mechanism that web servers and browsers use to determine whether the component in the browser&#8217;s cache matches the one on the origin server. (An &#8220;entity&#8221; is another word a &#8220;component&#8221;: images, scripts, stylesheets, etc.) ETags were added to provide a mechanism for validating entities that is more flexible than the last-modified date. An ETag is a string that uniquely identifies a specific version of a component. The only format constraints are that the string be quoted. The origin server specifies the component&#8217;s ETag using the <code style="font-style: normal; font-weight: normal; font-family: monospace; line-height: 13px; padding: 0px; margin: 0px;">ETag</code>response header.</p>
<p style="padding-top: 0px; padding-right: 0px; padding-bottom: 1em; padding-left: 0px; line-height: 1.49em; margin: 0px;"> </p>
<pre style="font-family: monospace; line-height: 13px; padding: 0px; margin: 0px;">      HTTP/1.1 200 OK
      Last-Modified: Tue, 12 Dec 2006 03:03:59 GMT
      ETag: "10c24bc-4ab-457e1c1f"
      Content-Length: 12195</pre>
<p style="padding-top: 0px; padding-right: 0px; padding-bottom: 1em; padding-left: 0px; line-height: 1.49em; margin: 0px;"> </p>
<p style="padding-top: 0px; padding-right: 0px; padding-bottom: 1em; padding-left: 0px; line-height: 1.49em; margin: 0px;">Later, if the browser has to validate a component, it uses the <code style="font-style: normal; font-weight: normal; font-family: monospace; line-height: 13px; padding: 0px; margin: 0px;">If-None-Match</code> header to pass the ETag back to the origin server. If the ETags match, a 304 status code is returned reducing the response by 12195 bytes for this example.</p>
<p style="padding-top: 0px; padding-right: 0px; padding-bottom: 1em; padding-left: 0px; line-height: 1.49em; margin: 0px;"> </p>
<pre style="font-family: monospace; line-height: 13px; padding: 0px; margin: 0px;">      GET /i/yahoo.gif HTTP/1.1
      Host: us.yimg.com
      If-Modified-Since: Tue, 12 Dec 2006 03:03:59 GMT
      If-None-Match: "10c24bc-4ab-457e1c1f"
      HTTP/1.1 304 Not Modified</pre>
<p style="padding-top: 0px; padding-right: 0px; padding-bottom: 1em; padding-left: 0px; line-height: 1.49em; margin: 0px;"> </p>
<p style="padding-top: 0px; padding-right: 0px; padding-bottom: 1em; padding-left: 0px; line-height: 1.49em; margin: 0px;">The problem with ETags is that they typically are constructed using attributes that make them unique to a specific server hosting a site. ETags won&#8217;t match when a browser gets the original component from one server and later tries to validate that component on a different server, a situation that is all too common on Web sites that use a cluster of servers to handle requests. By default, both Apache and IIS embed data in the ETag that dramatically reduces the odds of the validity test succeeding on web sites with multiple servers.</p>
<p style="padding-top: 0px; padding-right: 0px; padding-bottom: 1em; padding-left: 0px; line-height: 1.49em; margin: 0px;">The ETag format for Apache 1.3 and 2.x is <code style="font-style: normal; font-weight: normal; font-family: monospace; line-height: 13px; padding: 0px; margin: 0px;">inode-size-timestamp</code>. Although a given file may reside in the same directory across multiple servers, and have the same file size, permissions, timestamp, etc., its inode is different from one server to the next.</p>
<p style="padding-top: 0px; padding-right: 0px; padding-bottom: 1em; padding-left: 0px; line-height: 1.49em; margin: 0px;">IIS 5.0 and 6.0 have a similar issue with ETags. The format for ETags on IIS is <code style="font-style: normal; font-weight: normal; font-family: monospace; line-height: 13px; padding: 0px; margin: 0px;">Filetimestamp:ChangeNumber</code>. A<code style="font-style: normal; font-weight: normal; font-family: monospace; line-height: 13px; padding: 0px; margin: 0px;">ChangeNumber</code> is a counter used to track configuration changes to IIS. It&#8217;s unlikely that the <code style="font-style: normal; font-weight: normal; font-family: monospace; line-height: 13px; padding: 0px; margin: 0px;">ChangeNumber</code> is the same across all IIS servers behind a web site.</p>
<p style="padding-top: 0px; padding-right: 0px; padding-bottom: 1em; padding-left: 0px; line-height: 1.49em; margin: 0px;">The end result is ETags generated by Apache and IIS for the exact same component won&#8217;t match from one server to another. If the ETags don&#8217;t match, the user doesn&#8217;t receive the small, fast 304 response that ETags were designed for; instead, they&#8217;ll get a normal 200 response along with all the data for the component. If you host your web site on just one server, this isn&#8217;t a problem. But if you have multiple servers hosting your web site, and you&#8217;re using Apache or IIS with the default ETag configuration, your users are getting slower pages, your servers have a higher load, you&#8217;re consuming greater bandwidth, and proxies aren&#8217;t caching your content efficiently. Even if your components have a far future <code style="font-style: normal; font-weight: normal; font-family: monospace; line-height: 13px; padding: 0px; margin: 0px;">Expires</code> header, a conditional GET request is still made whenever the user hits Reload or Refresh.</p>
<p style="padding-top: 0px; padding-right: 0px; padding-bottom: 1em; padding-left: 0px; line-height: 1.49em; margin: 0px;">If you&#8217;re not taking advantage of the flexible validation model that ETags provide, it&#8217;s better to just remove the ETag altogether. The <code style="font-style: normal; font-weight: normal; font-family: monospace; line-height: 13px; padding: 0px; margin: 0px;">Last-Modified</code> header validates based on the component&#8217;s timestamp. And removing the ETag reduces the size of the HTTP headers in both the response and subsequent requests. This <a style="color: #006ca2; text-decoration: none;" href="http://support.microsoft.com/?id=922733">Microsoft Support article</a> describes how to remove ETags. In Apache, this is done by simply adding the following line to your Apache configuration file:</p>
<pre style="font-family: monospace; line-height: 13px; padding: 0px; margin: 0px;">      FileETag none</pre>
<p style="padding-top: 0px; padding-right: 0px; padding-bottom: 1em; padding-left: 0px; line-height: 1.49em; margin: 0px;"><a style="color: #006ca2; text-decoration: none;" href="http://developer.yahoo.net/blog/archives/2007/07/high_performanc_11.html"></a></p>
<h3 id="cacheajax" style="margin-top: 1em; margin-right: 0px; margin-bottom: 0.4em; margin-left: 0px; font-size: 21px; font-weight: normal; color: #ff8800; padding: 0px;">Make Ajax Cacheable</h3>
<p style="padding-top: 0px; padding-right: 0px; padding-bottom: 1em; padding-left: 0px; line-height: 1.49em; margin: 0px;">tag: content</p>
<p style="padding-top: 0px; padding-right: 0px; padding-bottom: 1em; padding-left: 0px; line-height: 1.49em; margin: 0px;">One of the cited benefits of Ajax is that it provides instantaneous feedback to the user because it requests information asynchronously from the backend web server. However, using Ajax is no guarantee that the user won&#8217;t be twiddling his thumbs waiting for those asynchronous JavaScript and XML responses to return. In many applications, whether or not the user is kept waiting depends on how Ajax is used. For example, in a web-based email client the user will be kept waiting for the results of an Ajax request to find all the email messages that match their search criteria. It&#8217;s important to remember that &#8220;asynchronous&#8221; does not imply &#8220;instantaneous&#8221;.</p>
<p style="padding-top: 0px; padding-right: 0px; padding-bottom: 1em; padding-left: 0px; line-height: 1.49em; margin: 0px;">To improve performance, it&#8217;s important to optimize these Ajax responses. The most important way to improve the performance of Ajax is to make the responses cacheable, as discussed in <a style="color: #006ca2; text-decoration: none;" href="http://developer.yahoo.com/performance/rules.html#expires">Add an Expires or a Cache-Control Header</a>. Some of the other rules also apply to Ajax:</p>
<ul style="padding: 0px; margin: 0px;">
<li style="margin-top: 0px; margin-right: 0px; margin-bottom: 0.3em; margin-left: 2em; list-style-type: disc; list-style-position: initial; list-style-image: initial; padding: 0px;"><a style="color: #006ca2; text-decoration: none;" href="http://developer.yahoo.com/performance/rules.html#gzip">Gzip Components</a></li>
<li style="margin-top: 0px; margin-right: 0px; margin-bottom: 0.3em; margin-left: 2em; list-style-type: disc; list-style-position: initial; list-style-image: initial; padding: 0px;"><a style="color: #006ca2; text-decoration: none;" href="http://developer.yahoo.com/performance/rules.html#dns_lookups">Reduce DNS Lookups</a></li>
<li style="margin-top: 0px; margin-right: 0px; margin-bottom: 0.3em; margin-left: 2em; list-style-type: disc; list-style-position: initial; list-style-image: initial; padding: 0px;"><a style="color: #006ca2; text-decoration: none;" href="http://developer.yahoo.com/performance/rules.html#minify">Minify JavaScript</a></li>
<li style="margin-top: 0px; margin-right: 0px; margin-bottom: 0.3em; margin-left: 2em; list-style-type: disc; list-style-position: initial; list-style-image: initial; padding: 0px;"><a style="color: #006ca2; text-decoration: none;" href="http://developer.yahoo.com/performance/rules.html#redirects">Avoid Redirects</a></li>
<li style="margin-top: 0px; margin-right: 0px; margin-bottom: 0.3em; margin-left: 2em; list-style-type: disc; list-style-position: initial; list-style-image: initial; padding: 0px;"><a style="color: #006ca2; text-decoration: none;" href="http://developer.yahoo.com/performance/rules.html#etags">Configure ETags</a></li>
</ul>
<p style="padding-top: 0px; padding-right: 0px; padding-bottom: 1em; padding-left: 0px; line-height: 1.49em; margin: 0px;"> </p>
<p style="padding-top: 0px; padding-right: 0px; padding-bottom: 1em; padding-left: 0px; line-height: 1.49em; margin: 0px;">Let&#8217;s look at an example. A Web 2.0 email client might use Ajax to download the user&#8217;s address book for autocompletion. If the user hasn&#8217;t modified her address book since the last time she used the email web app, the previous address book response could be read from cache if that Ajax response was made cacheable with a future Expires or Cache-Control header. The browser must be informed when to use a previously cached address book response versus requesting a new one. This could be done by adding a timestamp to the address book Ajax URL indicating the last time the user modified her address book, for example, <code style="font-style: normal; font-weight: normal; font-family: monospace; line-height: 13px; padding: 0px; margin: 0px;">&amp;t=1190241612</code>. If the address book hasn&#8217;t been modified since the last download, the timestamp will be the same and the address book will be read from the browser&#8217;s cache eliminating an extra HTTP roundtrip. If the user has modified her address book, the timestamp ensures the new URL doesn&#8217;t match the cached response, and the browser will request the updated address book entries.</p>
<p style="padding-top: 0px; padding-right: 0px; padding-bottom: 1em; padding-left: 0px; line-height: 1.49em; margin: 0px;">Even though your Ajax responses are created dynamically, and might only be applicable to a single user, they can still be cached. Doing so will make your Web 2.0 apps faster.</p>
<p style="padding-top: 0px; padding-right: 0px; padding-bottom: 1em; padding-left: 0px; line-height: 1.49em; margin: 0px;"><a style="color: #006ca2; text-decoration: none;" href="http://developer.yahoo.net/blog/archives/2007/09/high_performanc_12.html"></a></p>
<h3 id="flush" style="margin-top: 1em; margin-right: 0px; margin-bottom: 0.4em; margin-left: 0px; font-size: 21px; font-weight: normal; color: #ff8800; padding: 0px;">Flush the Buffer Early</h3>
<p style="padding-top: 0px; padding-right: 0px; padding-bottom: 1em; padding-left: 0px; line-height: 1.49em; margin: 0px;">tag: server</p>
<p style="padding-top: 0px; padding-right: 0px; padding-bottom: 1em; padding-left: 0px; line-height: 1.49em; margin: 0px;">When users request a page, it can take anywhere from 200 to 500ms for the backend server to stitch together the HTML page. During this time, the browser is idle as it waits for the data to arrive. In PHP you have the function<a style="color: #006ca2; text-decoration: none;" href="http://php.net/flush">flush()</a>. It allows you to send your partially ready HTML response to the browser so that the browser can start fetching components while your backend is busy with the rest of the HTML page. The benefit is mainly seen on busy backends or light frontends.</p>
<p style="padding-top: 0px; padding-right: 0px; padding-bottom: 1em; padding-left: 0px; line-height: 1.49em; margin: 0px;">A good place to consider flushing is right after the HEAD because the HTML for the head is usually easier to produce and it allows you to include any CSS and JavaScript files for the browser to start fetching in parallel while the backend is still processing.</p>
<p style="padding-top: 0px; padding-right: 0px; padding-bottom: 1em; padding-left: 0px; line-height: 1.49em; margin: 0px;">Example:</p>
<pre style="font-family: monospace; line-height: 13px; padding: 0px; margin: 0px;">      ... &lt;!-- css, js --&gt;
    &lt;/head&gt;
    <span style="font-weight: bold; color: red;">&lt;?php flush(); ?&gt;</span>
    &lt;body&gt;
      ... &lt;!-- content --&gt;</pre>
<p style="padding-top: 0px; padding-right: 0px; padding-bottom: 1em; padding-left: 0px; line-height: 1.49em; margin: 0px;"><a style="color: #006ca2; text-decoration: none;" href="http://search.yahoo.com/">Yahoo! search</a> pioneered research and real user testing to prove the benefits of using this technique.</p>
<p style="padding-top: 0px; padding-right: 0px; padding-bottom: 1em; padding-left: 0px; line-height: 1.49em; margin: 0px;"><a style="color: #006ca2; text-decoration: none;" href="http://developer.yahoo.com/performance/rules.html#page-nav"></a></p>
<h3 id="ajax_get" style="margin-top: 1em; margin-right: 0px; margin-bottom: 0.4em; margin-left: 0px; font-size: 21px; font-weight: normal; color: #ff8800; padding: 0px;">Use GET for AJAX Requests</h3>
<p style="padding-top: 0px; padding-right: 0px; padding-bottom: 1em; padding-left: 0px; line-height: 1.49em; margin: 0px;">tag: server</p>
<p style="padding-top: 0px; padding-right: 0px; padding-bottom: 1em; padding-left: 0px; line-height: 1.49em; margin: 0px;">The <a style="color: #006ca2; text-decoration: none;" href="http://mail.yahoo.com/">Yahoo! Mail</a> team found that when using <code style="font-style: normal; font-weight: normal; font-family: monospace; line-height: 13px; padding: 0px; margin: 0px;">XMLHttpRequest</code>, POST is implemented in the browsers as a two-step process: sending the headers first, then sending data. So it&#8217;s best to use GET, which only takes one TCP packet to send (unless you have a lot of cookies). The maximum URL length in IE is 2K, so if you send more than 2K data you might not be able to use GET.</p>
<p style="padding-top: 0px; padding-right: 0px; padding-bottom: 1em; padding-left: 0px; line-height: 1.49em; margin: 0px;">An interesting side affect is that POST without actually posting any data behaves like GET. Based on the <a style="color: #006ca2; text-decoration: none;" href="http://www.w3.org/Protocols/rfc2616/rfc2616-sec9.html">HTTP specs</a>, GET is meant for retrieving information, so it makes sense (semantically) to use GET when you&#8217;re only requesting data, as opposed to sending data to be stored server-side.</p>
<p style="padding-top: 0px; padding-right: 0px; padding-bottom: 1em; padding-left: 0px; line-height: 1.49em; margin: 0px;"> </p>
<p style="padding-top: 0px; padding-right: 0px; padding-bottom: 1em; padding-left: 0px; line-height: 1.49em; margin: 0px;"><a style="color: #006ca2; text-decoration: none;" href="http://developer.yahoo.com/performance/rules.html#page-nav"></a></p>
<h3 id="postload" style="margin-top: 1em; margin-right: 0px; margin-bottom: 0.4em; margin-left: 0px; font-size: 21px; font-weight: normal; color: #ff8800; padding: 0px;">Post-load Components</h3>
<p style="padding-top: 0px; padding-right: 0px; padding-bottom: 1em; padding-left: 0px; line-height: 1.49em; margin: 0px;">tag: content</p>
<p style="padding-top: 0px; padding-right: 0px; padding-bottom: 1em; padding-left: 0px; line-height: 1.49em; margin: 0px;">You can take a closer look at your page and ask yourself: &#8220;What&#8217;s absolutely required in order to render the page initially?&#8221;. The rest of the content and components can wait.</p>
<p style="padding-top: 0px; padding-right: 0px; padding-bottom: 1em; padding-left: 0px; line-height: 1.49em; margin: 0px;">JavaScript is an ideal candidate for splitting before and after the onload event. For example if you have JavaScript code and libraries that do drag and drop and animations, those can wait, because dragging elements on the page comes after the initial rendering. Other places to look for candidates for post-loading include hidden content (content that appears after a user action) and images below the fold.</p>
<p style="padding-top: 0px; padding-right: 0px; padding-bottom: 1em; padding-left: 0px; line-height: 1.49em; margin: 0px;">Tools to help you out in your effort: <a style="color: #006ca2; text-decoration: none;" href="http://developer.yahoo.com/yui/imageloader/">YUI Image Loader</a> allows you to delay images below the fold and the <a style="color: #006ca2; text-decoration: none;" href="http://developer.yahoo.com/yui/get/">YUI Get utility</a> is an easy way to include JS and CSS on the fly. For an example in the wild take a look at <a style="color: #006ca2; text-decoration: none;" href="http://www.yahoo.com/">Yahoo! Home Page</a> with Firebug&#8217;s Net Panel turned on.</p>
<p style="padding-top: 0px; padding-right: 0px; padding-bottom: 1em; padding-left: 0px; line-height: 1.49em; margin: 0px;">It&#8217;s good when the performance goals are inline with other web development best practices. In this case, the idea of progressive enhancement tells us that JavaScript, when supported, can improve the user experience but you have to make sure the page works even without JavaScript. So after you&#8217;ve made sure the page works fine, you can enhance it with some post-loaded scripts that give you more bells and whistles such as drag and drop and animations.</p>
<p style="padding-top: 0px; padding-right: 0px; padding-bottom: 1em; padding-left: 0px; line-height: 1.49em; margin: 0px;"><a style="color: #006ca2; text-decoration: none;" href="http://developer.yahoo.com/performance/rules.html#page-nav"></a></p>
<h3 id="preload" style="margin-top: 1em; margin-right: 0px; margin-bottom: 0.4em; margin-left: 0px; font-size: 21px; font-weight: normal; color: #ff8800; padding: 0px;">Preload Components</h3>
<p style="padding-top: 0px; padding-right: 0px; padding-bottom: 1em; padding-left: 0px; line-height: 1.49em; margin: 0px;">tag: content</p>
<p style="padding-top: 0px; padding-right: 0px; padding-bottom: 1em; padding-left: 0px; line-height: 1.49em; margin: 0px;">Preload may look like the opposite of post-load, but it actually has a different goal. By preloading components you can take advantage of the time the browser is idle and request components (like images, styles and scripts) you&#8217;ll need in the future. This way when the user visits the next page, you could have most of the components already in the cache and your page will load much faster for the user.</p>
<p style="padding-top: 0px; padding-right: 0px; padding-bottom: 1em; padding-left: 0px; line-height: 1.49em; margin: 0px;">There are actually several types of preloading:</p>
<ul style="padding: 0px; margin: 0px;">
<li style="margin-top: 0px; margin-right: 0px; margin-bottom: 0.3em; margin-left: 2em; list-style-type: disc; list-style-position: initial; list-style-image: initial; padding: 0px;"><em style="font-style: italic; font-weight: normal;">Unconditional</em> preload &#8211; as soon as onload fires, you go ahead and fetch some extra components. Check google.com for an example of how a sprite image is requested onload. This sprite image is not needed on the google.com homepage, but it is needed on the consecutive search result page.</li>
<li style="margin-top: 0px; margin-right: 0px; margin-bottom: 0.3em; margin-left: 2em; list-style-type: disc; list-style-position: initial; list-style-image: initial; padding: 0px;"><em style="font-style: italic; font-weight: normal;">Conditional</em> preload &#8211; based on a user action you make an educated guess where the user is headed next and preload accordingly. On <a style="color: #006ca2; text-decoration: none;" href="http://search.yahoo.com/">search.yahoo.com</a> you can see how some extra components are requested after you start typing in the input box.</li>
<li style="margin-top: 0px; margin-right: 0px; margin-bottom: 0.3em; margin-left: 2em; list-style-type: disc; list-style-position: initial; list-style-image: initial; padding: 0px;"><em style="font-style: italic; font-weight: normal;">Anticipated</em> preload &#8211; preload in advance before launching a redesign. It often happens after a redesign that you hear: &#8220;The new site is cool, but it&#8217;s slower than before&#8221;. Part of the problem could be that the users were visiting your old site with a full cache, but the new one is always an empty cache experience. You can mitigate this side effect by preloading some components before you even launched the redesign. Your old site can use the time the browser is idle and request images and scripts that will be used by the new site</li>
</ul>
<p style="padding-top: 0px; padding-right: 0px; padding-bottom: 1em; padding-left: 0px; line-height: 1.49em; margin: 0px;"><a style="color: #006ca2; text-decoration: none;" href="http://developer.yahoo.com/performance/rules.html#page-nav"></a></p>
<h3 id="min_dom" style="margin-top: 1em; margin-right: 0px; margin-bottom: 0.4em; margin-left: 0px; font-size: 21px; font-weight: normal; color: #ff8800; padding: 0px;">Reduce the Number of DOM Elements</h3>
<p style="padding-top: 0px; padding-right: 0px; padding-bottom: 1em; padding-left: 0px; line-height: 1.49em; margin: 0px;">tag: content</p>
<p style="padding-top: 0px; padding-right: 0px; padding-bottom: 1em; padding-left: 0px; line-height: 1.49em; margin: 0px;">A complex page means more bytes to download and it also means slower DOM access in JavaScript. It makes a difference if you loop through 500 or 5000 DOM elements on the page when you want to add an event handler for example.</p>
<p style="padding-top: 0px; padding-right: 0px; padding-bottom: 1em; padding-left: 0px; line-height: 1.49em; margin: 0px;">A high number of DOM elements can be a symptom that there&#8217;s something that should be improved with the markup of the page without necessarily removing content. Are you using nested tables for layout purposes? Are you throwing in more <code style="font-style: normal; font-weight: normal; font-family: monospace; line-height: 13px; padding: 0px; margin: 0px;">&lt;div&gt;</code>s only to fix layout issues? Maybe there&#8217;s a better and more semantically correct way to do your markup.</p>
<p style="padding-top: 0px; padding-right: 0px; padding-bottom: 1em; padding-left: 0px; line-height: 1.49em; margin: 0px;">A great help with layouts are the <a style="color: #006ca2; text-decoration: none;" href="http://developer.yahoo.com/yui/">YUI CSS utilities</a>: grids.css can help you with the overall layout, fonts.css and reset.css can help you strip away the browser&#8217;s defaults formatting. This is a chance to start fresh and think about your markup, for example use <code style="font-style: normal; font-weight: normal; font-family: monospace; line-height: 13px; padding: 0px; margin: 0px;">&lt;div&gt;</code>s only when it makes sense semantically, and not because it renders a new line.</p>
<p style="padding-top: 0px; padding-right: 0px; padding-bottom: 1em; padding-left: 0px; line-height: 1.49em; margin: 0px;">The number of DOM elements is easy to test, just type in Firebug&#8217;s console:<br />
<code style="font-style: normal; font-weight: normal; font-family: monospace; line-height: 13px; padding: 0px; margin: 0px;">document.getElementsByTagName('*').length</code></p>
<p style="padding-top: 0px; padding-right: 0px; padding-bottom: 1em; padding-left: 0px; line-height: 1.49em; margin: 0px;">And how many DOM elements are too many? Check other similar pages that have good markup. For example the<a style="color: #006ca2; text-decoration: none;" href="http://www.yahoo.com/">Yahoo! Home Page</a> is a pretty busy page and still under 700 elements (HTML tags).</p>
<p style="padding-top: 0px; padding-right: 0px; padding-bottom: 1em; padding-left: 0px; line-height: 1.49em; margin: 0px;"><a style="color: #006ca2; text-decoration: none;" href="http://developer.yahoo.com/performance/rules.html#page-nav"></a></p>
<h3 id="split" style="margin-top: 1em; margin-right: 0px; margin-bottom: 0.4em; margin-left: 0px; font-size: 21px; font-weight: normal; color: #ff8800; padding: 0px;">Split Components Across Domains</h3>
<p style="padding-top: 0px; padding-right: 0px; padding-bottom: 1em; padding-left: 0px; line-height: 1.49em; margin: 0px;">tag: content</p>
<p style="padding-top: 0px; padding-right: 0px; padding-bottom: 1em; padding-left: 0px; line-height: 1.49em; margin: 0px;">Splitting components allows you to maximize parallel downloads. Make sure you&#8217;re using not more than 2-4 domains because of the DNS lookup penalty. For example, you can host your HTML and dynamic content on<code style="font-style: normal; font-weight: normal; font-family: monospace; line-height: 13px; padding: 0px; margin: 0px;">www.example.org</code> and split static components between <code style="font-style: normal; font-weight: normal; font-family: monospace; line-height: 13px; padding: 0px; margin: 0px;">static1.example.org</code> and <code style="font-style: normal; font-weight: normal; font-family: monospace; line-height: 13px; padding: 0px; margin: 0px;">static2.example.org</code></p>
<p style="padding-top: 0px; padding-right: 0px; padding-bottom: 1em; padding-left: 0px; line-height: 1.49em; margin: 0px;">For more information check &#8220;<a style="color: #006ca2; text-decoration: none;" href="http://yuiblog.com/blog/2007/04/11/performance-research-part-4/">Maximizing Parallel Downloads in the Carpool Lane</a>&#8221; by Tenni Theurer and Patty Chi.</p>
<p style="padding-top: 0px; padding-right: 0px; padding-bottom: 1em; padding-left: 0px; line-height: 1.49em; margin: 0px;"><a style="color: #006ca2; text-decoration: none;" href="http://developer.yahoo.com/performance/rules.html#page-nav"></a></p>
<h3 id="iframes" style="margin-top: 1em; margin-right: 0px; margin-bottom: 0.4em; margin-left: 0px; font-size: 21px; font-weight: normal; color: #ff8800; padding: 0px;">Minimize the Number of iframes</h3>
<p style="padding-top: 0px; padding-right: 0px; padding-bottom: 1em; padding-left: 0px; line-height: 1.49em; margin: 0px;">tag: content</p>
<p style="padding-top: 0px; padding-right: 0px; padding-bottom: 1em; padding-left: 0px; line-height: 1.49em; margin: 0px;">Iframes allow an HTML document to be inserted in the parent document. It&#8217;s important to understand how iframes work so they can be used effectively.</p>
<p style="padding-top: 0px; padding-right: 0px; padding-bottom: 1em; padding-left: 0px; line-height: 1.49em; margin: 0px;"><code style="font-style: normal; font-weight: normal; font-family: monospace; line-height: 13px; padding: 0px; margin: 0px;">&lt;iframe&gt;</code> pros:</p>
<ul style="padding: 0px; margin: 0px;">
<li style="margin-top: 0px; margin-right: 0px; margin-bottom: 0.3em; margin-left: 2em; list-style-type: disc; list-style-position: initial; list-style-image: initial; padding: 0px;">Helps with slow third-party content like badges and ads</li>
<li style="margin-top: 0px; margin-right: 0px; margin-bottom: 0.3em; margin-left: 2em; list-style-type: disc; list-style-position: initial; list-style-image: initial; padding: 0px;">Security sandbox</li>
<li style="margin-top: 0px; margin-right: 0px; margin-bottom: 0.3em; margin-left: 2em; list-style-type: disc; list-style-position: initial; list-style-image: initial; padding: 0px;">Download scripts in parallel</li>
</ul>
<p style="padding-top: 0px; padding-right: 0px; padding-bottom: 1em; padding-left: 0px; line-height: 1.49em; margin: 0px;"><code style="font-style: normal; font-weight: normal; font-family: monospace; line-height: 13px; padding: 0px; margin: 0px;">&lt;iframe&gt;</code> cons:</p>
<ul style="padding: 0px; margin: 0px;">
<li style="margin-top: 0px; margin-right: 0px; margin-bottom: 0.3em; margin-left: 2em; list-style-type: disc; list-style-position: initial; list-style-image: initial; padding: 0px;">Costly even if blank</li>
<li style="margin-top: 0px; margin-right: 0px; margin-bottom: 0.3em; margin-left: 2em; list-style-type: disc; list-style-position: initial; list-style-image: initial; padding: 0px;">Blocks page onload</li>
<li style="margin-top: 0px; margin-right: 0px; margin-bottom: 0.3em; margin-left: 2em; list-style-type: disc; list-style-position: initial; list-style-image: initial; padding: 0px;">Non-semantic</li>
</ul>
<p style="padding-top: 0px; padding-right: 0px; padding-bottom: 1em; padding-left: 0px; line-height: 1.49em; margin: 0px;"><a style="color: #006ca2; text-decoration: none;" href="http://developer.yahoo.com/performance/rules.html#page-nav"></a></p>
<h3 id="no404" style="margin-top: 1em; margin-right: 0px; margin-bottom: 0.4em; margin-left: 0px; font-size: 21px; font-weight: normal; color: #ff8800; padding: 0px;">No 404s</h3>
<p style="padding-top: 0px; padding-right: 0px; padding-bottom: 1em; padding-left: 0px; line-height: 1.49em; margin: 0px;">tag: content</p>
<p style="padding-top: 0px; padding-right: 0px; padding-bottom: 1em; padding-left: 0px; line-height: 1.49em; margin: 0px;">HTTP requests are expensive so making an HTTP request and getting a useless response (i.e. 404 Not Found) is totally unnecessary and will slow down the user experience without any benefit.</p>
<p style="padding-top: 0px; padding-right: 0px; padding-bottom: 1em; padding-left: 0px; line-height: 1.49em; margin: 0px;">Some sites have helpful 404s &#8220;Did you mean X?&#8221;, which is great for the user experience but also wastes server resources (like database, etc). Particularly bad is when the link to an external JavaScript is wrong and the result is a 404. First, this download will block parallel downloads. Next the browser may try to parse the 404 response body as if it were JavaScript code, trying to find something usable in it.</p>
<p style="padding-top: 0px; padding-right: 0px; padding-bottom: 1em; padding-left: 0px; line-height: 1.49em; margin: 0px;"><a style="color: #006ca2; text-decoration: none;" href="http://developer.yahoo.com/performance/rules.html#page-nav"></a></p>
<h3 id="cookie_size" style="margin-top: 1em; margin-right: 0px; margin-bottom: 0.4em; margin-left: 0px; font-size: 21px; font-weight: normal; color: #ff8800; padding: 0px;">Reduce Cookie Size</h3>
<p style="padding-top: 0px; padding-right: 0px; padding-bottom: 1em; padding-left: 0px; line-height: 1.49em; margin: 0px;">tag: cookie</p>
<p style="padding-top: 0px; padding-right: 0px; padding-bottom: 1em; padding-left: 0px; line-height: 1.49em; margin: 0px;">HTTP cookies are used for a variety of reasons such as authentication and personalization. Information about cookies is exchanged in the HTTP headers between web servers and browsers. It&#8217;s important to keep the size of cookies as low as possible to minimize the impact on the user&#8217;s response time.</p>
<p style="padding-top: 0px; padding-right: 0px; padding-bottom: 1em; padding-left: 0px; line-height: 1.49em; margin: 0px;">For more information check <a style="color: #006ca2; text-decoration: none;" href="http://yuiblog.com/blog/2007/03/01/performance-research-part-3/">&#8220;When the Cookie Crumbles&#8221;</a> by Tenni Theurer and Patty Chi. The take-home of this research:</p>
<p style="padding-top: 0px; padding-right: 0px; padding-bottom: 1em; padding-left: 0px; line-height: 1.49em; margin: 0px;"> </p>
<ul style="padding: 0px; margin: 0px;">
<li style="margin-top: 0px; margin-right: 0px; margin-bottom: 0.3em; margin-left: 2em; list-style-type: disc; list-style-position: initial; list-style-image: initial; padding: 0px;">Eliminate unnecessary cookies</li>
<li style="margin-top: 0px; margin-right: 0px; margin-bottom: 0.3em; margin-left: 2em; list-style-type: disc; list-style-position: initial; list-style-image: initial; padding: 0px;">Keep cookie sizes as low as possible to minimize the impact on the user response time</li>
<li style="margin-top: 0px; margin-right: 0px; margin-bottom: 0.3em; margin-left: 2em; list-style-type: disc; list-style-position: initial; list-style-image: initial; padding: 0px;">Be mindful of setting cookies at the appropriate domain level so other sub-domains are not affected</li>
<li style="margin-top: 0px; margin-right: 0px; margin-bottom: 0.3em; margin-left: 2em; list-style-type: disc; list-style-position: initial; list-style-image: initial; padding: 0px;">Set an Expires date appropriately. An earlier Expires date or none removes the cookie sooner, improving the user response time</li>
</ul>
<p style="padding-top: 0px; padding-right: 0px; padding-bottom: 1em; padding-left: 0px; line-height: 1.49em; margin: 0px;"><a style="color: #006ca2; text-decoration: none;" href="http://developer.yahoo.com/performance/rules.html#page-nav"></a></p>
<h3 id="cookie_free" style="margin-top: 1em; margin-right: 0px; margin-bottom: 0.4em; margin-left: 0px; font-size: 21px; font-weight: normal; color: #ff8800; padding: 0px;">Use Cookie-free Domains for Components</h3>
<p style="padding-top: 0px; padding-right: 0px; padding-bottom: 1em; padding-left: 0px; line-height: 1.49em; margin: 0px;">tag: cookie</p>
<p style="padding-top: 0px; padding-right: 0px; padding-bottom: 1em; padding-left: 0px; line-height: 1.49em; margin: 0px;">When the browser makes a request for a static image and sends cookies together with the request, the server doesn&#8217;t have any use for those cookies. So they only create network traffic for no good reason. You should make sure static components are requested with cookie-free requests. Create a subdomain and host all your static components there.</p>
<p style="padding-top: 0px; padding-right: 0px; padding-bottom: 1em; padding-left: 0px; line-height: 1.49em; margin: 0px;">If your domain is <code style="font-style: normal; font-weight: normal; font-family: monospace; line-height: 13px; padding: 0px; margin: 0px;">www.example.org</code>, you can host your static components on <code style="font-style: normal; font-weight: normal; font-family: monospace; line-height: 13px; padding: 0px; margin: 0px;">static.example.org</code>. However, if you&#8217;ve already set cookies on the top-level domain <code style="font-style: normal; font-weight: normal; font-family: monospace; line-height: 13px; padding: 0px; margin: 0px;">example.org</code> as opposed to <code style="font-style: normal; font-weight: normal; font-family: monospace; line-height: 13px; padding: 0px; margin: 0px;">www.example.org</code>, then all the requests to <code style="font-style: normal; font-weight: normal; font-family: monospace; line-height: 13px; padding: 0px; margin: 0px;">static.example.org</code> will include those cookies. In this case, you can buy a whole new domain, host your static components there, and keep this domain cookie-free. Yahoo! uses <code style="font-style: normal; font-weight: normal; font-family: monospace; line-height: 13px; padding: 0px; margin: 0px;">yimg.com</code>, YouTube uses<code style="font-style: normal; font-weight: normal; font-family: monospace; line-height: 13px; padding: 0px; margin: 0px;">ytimg.com</code>, Amazon uses <code style="font-style: normal; font-weight: normal; font-family: monospace; line-height: 13px; padding: 0px; margin: 0px;">images-amazon.com</code> and so on.</p>
<p style="padding-top: 0px; padding-right: 0px; padding-bottom: 1em; padding-left: 0px; line-height: 1.49em; margin: 0px;">Another benefit of hosting static components on a cookie-free domain is that some proxies might refuse to cache the components that are requested with cookies. On a related note, if you wonder if you should use example.org or www.example.org for your home page, consider the cookie impact. Omitting www leaves you no choice but to write cookies to <code style="font-style: normal; font-weight: normal; font-family: monospace; line-height: 13px; padding: 0px; margin: 0px;">*.example.org</code>, so for performance reasons it&#8217;s best to use the www subdomain and write the cookies to that subdomain.</p>
<p style="padding-top: 0px; padding-right: 0px; padding-bottom: 1em; padding-left: 0px; line-height: 1.49em; margin: 0px;"><a style="color: #006ca2; text-decoration: none;" href="http://developer.yahoo.com/performance/rules.html#page-nav"></a></p>
<h3 id="dom_access" style="margin-top: 1em; margin-right: 0px; margin-bottom: 0.4em; margin-left: 0px; font-size: 21px; font-weight: normal; color: #ff8800; padding: 0px;">Minimize DOM Access</h3>
<p style="padding-top: 0px; padding-right: 0px; padding-bottom: 1em; padding-left: 0px; line-height: 1.49em; margin: 0px;">tag: javascript</p>
<p style="padding-top: 0px; padding-right: 0px; padding-bottom: 1em; padding-left: 0px; line-height: 1.49em; margin: 0px;">Accessing DOM elements with JavaScript is slow so in order to have a more responsive page, you should:</p>
<ul style="padding: 0px; margin: 0px;">
<li style="margin-top: 0px; margin-right: 0px; margin-bottom: 0.3em; margin-left: 2em; list-style-type: disc; list-style-position: initial; list-style-image: initial; padding: 0px;">Cache references to accessed elements</li>
<li style="margin-top: 0px; margin-right: 0px; margin-bottom: 0.3em; margin-left: 2em; list-style-type: disc; list-style-position: initial; list-style-image: initial; padding: 0px;">Update nodes &#8220;offline&#8221; and then add them to the tree</li>
<li style="margin-top: 0px; margin-right: 0px; margin-bottom: 0.3em; margin-left: 2em; list-style-type: disc; list-style-position: initial; list-style-image: initial; padding: 0px;">Avoid fixing layout with JavaScript</li>
</ul>
<p style="padding-top: 0px; padding-right: 0px; padding-bottom: 1em; padding-left: 0px; line-height: 1.49em; margin: 0px;">For more information check the YUI theatre&#8217;s <a style="color: #006ca2; text-decoration: none;" href="http://yuiblog.com/blog/2007/12/20/video-lecomte/">&#8220;High Performance Ajax Applications&#8221;</a> by Julien Lecomte.</p>
<p style="padding-top: 0px; padding-right: 0px; padding-bottom: 1em; padding-left: 0px; line-height: 1.49em; margin: 0px;"><a style="color: #006ca2; text-decoration: none;" href="http://developer.yahoo.com/performance/rules.html#page-nav"></a></p>
<h3 id="events" style="margin-top: 1em; margin-right: 0px; margin-bottom: 0.4em; margin-left: 0px; font-size: 21px; font-weight: normal; color: #ff8800; padding: 0px;">Develop Smart Event Handlers</h3>
<p style="padding-top: 0px; padding-right: 0px; padding-bottom: 1em; padding-left: 0px; line-height: 1.49em; margin: 0px;">tag: javascript</p>
<p style="padding-top: 0px; padding-right: 0px; padding-bottom: 1em; padding-left: 0px; line-height: 1.49em; margin: 0px;">Sometimes pages feel less responsive because of too many event handlers attached to different elements of the DOM tree which are then executed too often. That&#8217;s why using <em style="font-style: italic; font-weight: normal;">event delegation</em> is a good approach. If you have 10 buttons inside a <code style="font-style: normal; font-weight: normal; font-family: monospace; line-height: 13px; padding: 0px; margin: 0px;">div</code>, attach only one event handler to the div wrapper, instead of one handler for each button. Events bubble up so you&#8217;ll be able to catch the event and figure out which button it originated from.</p>
<p style="padding-top: 0px; padding-right: 0px; padding-bottom: 1em; padding-left: 0px; line-height: 1.49em; margin: 0px;">You also don&#8217;t need to wait for the onload event in order to start doing something with the DOM tree. Often all you need is the element you want to access to be available in the tree. You don&#8217;t have to wait for all images to be downloaded. <code style="font-style: normal; font-weight: normal; font-family: monospace; line-height: 13px; padding: 0px; margin: 0px;">DOMContentLoaded</code> is the event you might consider using instead of onload, but until it&#8217;s available in all browsers, you can use the <a style="color: #006ca2; text-decoration: none;" href="http://developer.yahoo.com/yui/event/">YUI Event</a> utility, which has an <code style="font-style: normal; font-weight: normal; font-family: monospace; line-height: 13px; padding: 0px; margin: 0px;"><a style="color: #006ca2; text-decoration: none;" href="http://developer.yahoo.com/yui/event/#onavailable">onAvailable</a></code> method.</p>
<p style="padding-top: 0px; padding-right: 0px; padding-bottom: 1em; padding-left: 0px; line-height: 1.49em; margin: 0px;">For more information check the YUI theatre&#8217;s <a style="color: #006ca2; text-decoration: none;" href="http://yuiblog.com/blog/2007/12/20/video-lecomte/">&#8220;High Performance Ajax Applications&#8221;</a> by Julien Lecomte.</p>
<p style="padding-top: 0px; padding-right: 0px; padding-bottom: 1em; padding-left: 0px; line-height: 1.49em; margin: 0px;"><a style="color: #006ca2; text-decoration: none;" href="http://developer.yahoo.com/performance/rules.html#page-nav"></a></p>
<h3 id="csslink" style="margin-top: 1em; margin-right: 0px; margin-bottom: 0.4em; margin-left: 0px; font-size: 21px; font-weight: normal; color: #ff8800; padding: 0px;">Choose &lt;link&gt; over @import</h3>
<p style="padding-top: 0px; padding-right: 0px; padding-bottom: 1em; padding-left: 0px; line-height: 1.49em; margin: 0px;">tag: css</p>
<p style="padding-top: 0px; padding-right: 0px; padding-bottom: 1em; padding-left: 0px; line-height: 1.49em; margin: 0px;">One of the previous best practices states that CSS should be at the top in order to allow for progressive rendering.</p>
<p style="padding-top: 0px; padding-right: 0px; padding-bottom: 1em; padding-left: 0px; line-height: 1.49em; margin: 0px;">In IE <code style="font-style: normal; font-weight: normal; font-family: monospace; line-height: 13px; padding: 0px; margin: 0px;">@import</code> behaves the same as using <code style="font-style: normal; font-weight: normal; font-family: monospace; line-height: 13px; padding: 0px; margin: 0px;">&lt;link&gt;</code> at the bottom of the page, so it&#8217;s best not to use it.</p>
<p style="padding-top: 0px; padding-right: 0px; padding-bottom: 1em; padding-left: 0px; line-height: 1.49em; margin: 0px;"><a style="color: #006ca2; text-decoration: none;" href="http://developer.yahoo.com/performance/rules.html#page-nav"></a></p>
<h3 id="no_filters" style="margin-top: 1em; margin-right: 0px; margin-bottom: 0.4em; margin-left: 0px; font-size: 21px; font-weight: normal; color: #ff8800; padding: 0px;">Avoid Filters</h3>
<p style="padding-top: 0px; padding-right: 0px; padding-bottom: 1em; padding-left: 0px; line-height: 1.49em; margin: 0px;">tag: css</p>
<p style="padding-top: 0px; padding-right: 0px; padding-bottom: 1em; padding-left: 0px; line-height: 1.49em; margin: 0px;">The IE-proprietary <code style="font-style: normal; font-weight: normal; font-family: monospace; line-height: 13px; padding: 0px; margin: 0px;">AlphaImageLoader</code> filter aims to fix a problem with semi-transparent true color PNGs in IE versions &lt; 7. The problem with this filter is that it blocks rendering and freezes the browser while the image is being downloaded. It also increases memory consumption and is applied per element, not per image, so the problem is multiplied.</p>
<p style="padding-top: 0px; padding-right: 0px; padding-bottom: 1em; padding-left: 0px; line-height: 1.49em; margin: 0px;">The best approach is to avoid <code style="font-style: normal; font-weight: normal; font-family: monospace; line-height: 13px; padding: 0px; margin: 0px;">AlphaImageLoader</code> completely and use gracefully degrading PNG8 instead, which are fine in IE. If you absolutely need <code style="font-style: normal; font-weight: normal; font-family: monospace; line-height: 13px; padding: 0px; margin: 0px;">AlphaImageLoader</code>, use the underscore hack <code style="font-style: normal; font-weight: normal; font-family: monospace; line-height: 13px; padding: 0px; margin: 0px;">_filter</code> as to not penalize your IE7+ users.</p>
<p style="padding-top: 0px; padding-right: 0px; padding-bottom: 1em; padding-left: 0px; line-height: 1.49em; margin: 0px;"><a style="color: #006ca2; text-decoration: none;" href="http://developer.yahoo.com/performance/rules.html#page-nav"></a></p>
<h3 id="opt_images" style="margin-top: 1em; margin-right: 0px; margin-bottom: 0.4em; margin-left: 0px; font-size: 21px; font-weight: normal; color: #ff8800; padding: 0px;">Optimize Images</h3>
<p style="padding-top: 0px; padding-right: 0px; padding-bottom: 1em; padding-left: 0px; line-height: 1.49em; margin: 0px;">tag: images</p>
<p style="padding-top: 0px; padding-right: 0px; padding-bottom: 1em; padding-left: 0px; line-height: 1.49em; margin: 0px;">After a designer is done with creating the images for your web page, there are still some things you can try before you FTP those images to your web server.</p>
<ul style="padding: 0px; margin: 0px;">
<li style="margin-top: 0px; margin-right: 0px; margin-bottom: 0.3em; margin-left: 2em; list-style-type: disc; list-style-position: initial; list-style-image: initial; padding: 0px;">You can check the GIFs and see if they are using a palette size corresponding to the number of colors in the image. Using <a style="color: #006ca2; text-decoration: none;" href="http://www.imagemagick.org/">imagemagick</a> it&#8217;s easy to check using<br />
<code style="font-style: normal; font-weight: normal; font-family: monospace; line-height: 13px; padding: 0px; margin: 0px;">identify -verbose image.gif</code><br />
When you see an image useing 4 colors and a 256 color &#8220;slots&#8221; in the palette, there is room for improvement.</li>
<li style="margin-top: 0px; margin-right: 0px; margin-bottom: 0.3em; margin-left: 2em; list-style-type: disc; list-style-position: initial; list-style-image: initial; padding: 0px;">Try converting GIFs to PNGs and see if there is a saving. More often than not, there is. Developers often hesitate to use PNGs due to the limited support in browsers, but this is now a thing of the past. The only real problem is alpha-transparency in true color PNGs, but then again, GIFs are not true color and don&#8217;t support variable transparency either. So anything a GIF can do, a palette PNG (PNG8) can do too (except for animations). This simple imagemagick command results in totally safe-to-use PNGs:<br />
<code style="font-style: normal; font-weight: normal; font-family: monospace; line-height: 13px; padding: 0px; margin: 0px;">convert image.gif image.png</code><br />
&#8220;All we are saying is: Give PiNG a Chance!&#8221;</li>
<li style="margin-top: 0px; margin-right: 0px; margin-bottom: 0.3em; margin-left: 2em; list-style-type: disc; list-style-position: initial; list-style-image: initial; padding: 0px;">Run <a style="color: #006ca2; text-decoration: none;" href="http://pmt.sourceforge.net/pngcrush/">pngcrush</a> (or any other PNG optimizer tool) on all your PNGs. Example:<br />
<code style="font-style: normal; font-weight: normal; font-family: monospace; line-height: 13px; padding: 0px; margin: 0px;">pngcrush image.png -rem alla -reduce -brute result.png</code></li>
<li style="margin-top: 0px; margin-right: 0px; margin-bottom: 0.3em; margin-left: 2em; list-style-type: disc; list-style-position: initial; list-style-image: initial; padding: 0px;">Run jpegtran on all your JPEGs. This tool does lossless JPEG operations such as rotation and can also be used to optimize and remove comments and other useless information (such as EXIF information) from your images.<br />
<code style="font-style: normal; font-weight: normal; font-family: monospace; line-height: 13px; padding: 0px; margin: 0px;">jpegtran -copy none -optimize -perfect src.jpg dest.jpg</code></li>
</ul>
<p style="padding-top: 0px; padding-right: 0px; padding-bottom: 1em; padding-left: 0px; line-height: 1.49em; margin: 0px;"><a style="color: #006ca2; text-decoration: none;" href="http://developer.yahoo.com/performance/rules.html#page-nav"></a></p>
<h3 id="opt_sprites" style="margin-top: 1em; margin-right: 0px; margin-bottom: 0.4em; margin-left: 0px; font-size: 21px; font-weight: normal; color: #ff8800; padding: 0px;">Optimize CSS Sprites</h3>
<p style="padding-top: 0px; padding-right: 0px; padding-bottom: 1em; padding-left: 0px; line-height: 1.49em; margin: 0px;">tag: images</p>
<ul style="padding: 0px; margin: 0px;">
<li style="margin-top: 0px; margin-right: 0px; margin-bottom: 0.3em; margin-left: 2em; list-style-type: disc; list-style-position: initial; list-style-image: initial; padding: 0px;">Arranging the images in the sprite horizontally as opposed to vertically usually results in a smaller file size.</li>
<li style="margin-top: 0px; margin-right: 0px; margin-bottom: 0.3em; margin-left: 2em; list-style-type: disc; list-style-position: initial; list-style-image: initial; padding: 0px;">Combining similar colors in a sprite helps you keep the color count low, ideally under 256 colors so to fit in a PNG8.</li>
<li style="margin-top: 0px; margin-right: 0px; margin-bottom: 0.3em; margin-left: 2em; list-style-type: disc; list-style-position: initial; list-style-image: initial; padding: 0px;">&#8220;Be mobile-friendly&#8221; and don&#8217;t leave big gaps between the images in a sprite. This doesn&#8217;t affect the file size as much but requires less memory for the user agent to decompress the image into a pixel map. 100&#215;100 image is 10 thousand pixels, where 1000&#215;1000 is 1 million pixels</li>
</ul>
<p style="padding-top: 0px; padding-right: 0px; padding-bottom: 1em; padding-left: 0px; line-height: 1.49em; margin: 0px;"><a style="color: #006ca2; text-decoration: none;" href="http://developer.yahoo.com/performance/rules.html#page-nav"></a></p>
<h3 id="no_scale" style="margin-top: 1em; margin-right: 0px; margin-bottom: 0.4em; margin-left: 0px; font-size: 21px; font-weight: normal; color: #ff8800; padding: 0px;">Don&#8217;t Scale Images in HTML</h3>
<p style="padding-top: 0px; padding-right: 0px; padding-bottom: 1em; padding-left: 0px; line-height: 1.49em; margin: 0px;">tag: images</p>
<p style="padding-top: 0px; padding-right: 0px; padding-bottom: 1em; padding-left: 0px; line-height: 1.49em; margin: 0px;">Don&#8217;t use a bigger image than you need just because you can set the width and height in HTML. If you need<br />
<code style="font-style: normal; font-weight: normal; font-family: monospace; line-height: 13px; padding: 0px; margin: 0px;">&lt;img width="100" height="100" src="mycat.jpg" alt="My Cat" /&gt;</code><br />
then your image (mycat.jpg) should be 100&#215;100px rather than a scaled down 500&#215;500px image.</p>
<p style="padding-top: 0px; padding-right: 0px; padding-bottom: 1em; padding-left: 0px; line-height: 1.49em; margin: 0px;"><a style="color: #006ca2; text-decoration: none;" href="http://developer.yahoo.com/performance/rules.html#page-nav"></a></p>
<h3 id="favicon" style="margin-top: 1em; margin-right: 0px; margin-bottom: 0.4em; margin-left: 0px; font-size: 21px; font-weight: normal; color: #ff8800; padding: 0px;">Make favicon.ico Small and Cacheable</h3>
<p style="padding-top: 0px; padding-right: 0px; padding-bottom: 1em; padding-left: 0px; line-height: 1.49em; margin: 0px;">tag: images</p>
<p style="padding-top: 0px; padding-right: 0px; padding-bottom: 1em; padding-left: 0px; line-height: 1.49em; margin: 0px;">The favicon.ico is an image that stays in the root of your server. It&#8217;s a necessary evil because even if you don&#8217;t care about it the browser will still request it, so it&#8217;s better not to respond with a <code style="font-style: normal; font-weight: normal; font-family: monospace; line-height: 13px; padding: 0px; margin: 0px;">404 Not Found</code>. Also since it&#8217;s on the same server, cookies are sent every time it&#8217;s requested. This image also interferes with the download sequence, for example in IE when you request extra components in the onload, the favicon will be downloaded before these extra components.</p>
<p style="padding-top: 0px; padding-right: 0px; padding-bottom: 1em; padding-left: 0px; line-height: 1.49em; margin: 0px;">So to mitigate the drawbacks of having a favicon.ico make sure:</p>
<ul style="padding: 0px; margin: 0px;">
<li style="margin-top: 0px; margin-right: 0px; margin-bottom: 0.3em; margin-left: 2em; list-style-type: disc; list-style-position: initial; list-style-image: initial; padding: 0px;">It&#8217;s small, preferably under 1K.</li>
<li style="margin-top: 0px; margin-right: 0px; margin-bottom: 0.3em; margin-left: 2em; list-style-type: disc; list-style-position: initial; list-style-image: initial; padding: 0px;">Set Expires header with what you feel comfortable (since you cannot rename it if you decide to change it). You can probably safely set the Expires header a few months in the future. You can check the last modified date of your current favicon.ico to make an informed decision.</li>
</ul>
<p style="padding-top: 0px; padding-right: 0px; padding-bottom: 1em; padding-left: 0px; line-height: 1.49em; margin: 0px;"><a style="color: #006ca2; text-decoration: none;" href="http://www.imagemagick.org/">Imagemagick</a> can help you create small favicons</p>
<p style="padding-top: 0px; padding-right: 0px; padding-bottom: 1em; padding-left: 0px; line-height: 1.49em; margin: 0px;"><a style="color: #006ca2; text-decoration: none;" href="http://developer.yahoo.com/performance/rules.html#page-nav"></a></p>
<h3 id="under25" style="margin-top: 1em; margin-right: 0px; margin-bottom: 0.4em; margin-left: 0px; font-size: 21px; font-weight: normal; color: #ff8800; padding: 0px;">Keep Components under 25K</h3>
<p style="padding-top: 0px; padding-right: 0px; padding-bottom: 1em; padding-left: 0px; line-height: 1.49em; margin: 0px;">tag: mobile</p>
<p style="padding-top: 0px; padding-right: 0px; padding-bottom: 1em; padding-left: 0px; line-height: 1.49em; margin: 0px;">This restriction is related to the fact that iPhone won&#8217;t cache components bigger than 25K. Note that this is the<em style="font-style: italic; font-weight: normal;">uncompressed</em> size. This is where minification is important because gzip alone may not be sufficient.</p>
<p style="padding-top: 0px; padding-right: 0px; padding-bottom: 1em; padding-left: 0px; line-height: 1.49em; margin: 0px;">For more information check &#8220;<a style="color: #006ca2; text-decoration: none;" href="http://yuiblog.com/blog/2008/02/06/iphone-cacheability/">Performance Research, Part 5: iPhone Cacheability &#8211; Making it Stick</a>&#8221; by Wayne Shea and Tenni Theurer.</p>
<p style="padding-top: 0px; padding-right: 0px; padding-bottom: 1em; padding-left: 0px; line-height: 1.49em; margin: 0px;"><a style="color: #006ca2; text-decoration: none;" href="http://developer.yahoo.com/performance/rules.html#page-nav"></a></p>
<h3 id="multipart" style="margin-top: 1em; margin-right: 0px; margin-bottom: 0.4em; margin-left: 0px; font-size: 21px; font-weight: normal; color: #ff8800; padding: 0px;">Pack Components into a Multipart Document</h3>
<p style="padding-top: 0px; padding-right: 0px; padding-bottom: 1em; padding-left: 0px; line-height: 1.49em; margin: 0px;">tag: mobile</p>
<p style="padding-top: 0px; padding-right: 0px; padding-bottom: 1em; padding-left: 0px; line-height: 1.49em; margin: 0px;">Packing components into a multipart document is like an email with attachments, it helps you fetch several components with one HTTP request (remember: HTTP requests are expensive). When you use this technique, first check if the user agent supports it (iPhone does not).</p>
<p style="padding-top: 0px; padding-right: 0px; padding-bottom: 1em; padding-left: 0px; line-height: 1.49em; margin: 0px;"><a style="color: #006ca2; text-decoration: none;" href="http://developer.yahoo.com/performance/rules.html#page-nav">top</a></p>


<p>Related posts:<ol><li><a href='http://blog.pdf-search.org/tech-news/opera-in-top-secret-iphone-talks/727' rel='bookmark' title='Permanent Link: Opera in top secret iPhone talks?'>Opera in top secret iPhone talks?</a></li></ol></p>]]></content:encoded>
			<wfw:commentRss>http://blog.pdf-search.org/seo-soso/best-practices-for-speeding-up-your-web-site/513/feed</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
		<item>
		<title>Search bots behavior analyzed</title>
		<link>http://blog.pdf-search.org/seo-soso/search-bots-behavior-analyzed/7</link>
		<comments>http://blog.pdf-search.org/seo-soso/search-bots-behavior-analyzed/7#comments</comments>
		<pubDate>Fri, 29 May 2009 15:07:17 +0000</pubDate>
		<dc:creator>john</dc:creator>
				<category><![CDATA[SEO soso..]]></category>

		<guid isPermaLink="false">http://blog.pdf-search.org/?p=7</guid>
		<description><![CDATA[&#8220;A large scale experiment on search engine behaviour was staged with more than two billion different web pages. This experiment lasted exactly one year, until April 13th. In this period the three mayor search engines requested more than one million pages of the tree, from more than hundred thousand different URLs.&#8221;
On Bots
Introduction
In the previous edition [...]


No related posts.]]></description>
			<content:encoded><![CDATA[<p><span>&#8220;A large scale experiment on search engine behaviour was staged with more than two billion different web pages. This experiment lasted exactly one year, until April 13th. In this period the three mayor search engines requested more than one million pages of the tree, from more than hundred thousand different URLs.&#8221;</span></p>
<p>On Bots</p>
<h2 id="introduction">Introduction</h2>
<p>In the previous edition &#8211; Binary Search Tree 2 &#8211; a large scale experiment on search engine behaviour was staged with more than two billion different web pages. This experiment lasted exactly one year, until April 13th. In this period the three major search engines requested more than one million pages of the tree, from more than hundred thousand different URLs. The home page of drunkmenworkhere.org grew from 1.6 kB to over 4 MB due to the visit log and the comment spam displayed there.</p>
<p>This edition presents the results of the experiment.</p>
<p>Setup</p>
<p>2,147,483,647 web pages (&#8217;nodes&#8217;) were numbered and arranged in a binary search tree. In such a tree, the branch to the left of each node contains only values less than the node&#8217;s value, while the right branch contains only values higher than the node&#8217;s value. So the leftmost node in this tree has value 1 and the rightmost node has value 2,147,483,647.</p>
<p>The depth of the tree is the number of nodes you have to traverse from the root to the most remote leaf. Since you can arrange 2<sup>n+1</sup> &#8211; 1 numbers in a tree of depth n, the resulting tree has a depth of 30 (2<sup>31</sup> = 2,147,483,648). The value at the root of the tree is 1073741824 (2<sup>30</sup>).</p>
<p>For each page the traffic of the three major search bots (Yahoo! Slurp, Googlebot and msnbot) was monitored over a period of one year (between 2005-4-13 and 2006-4-13).</p>
<p>To make the content of each page more interesting for the search engines, the value of each node is written out in American English (short scale) and each page request from a search bot is displayed in reversed chronological order. To enrich the zero-content even more, a comment box was added to each page (it was removed on 2006-4-13). These measures were improvements over the initial Binary Search Tree which uses inconvenient long URLs.</p>
<p>Every node shows an image of three trees. Each tree in the image visualises which nodes are crawled by each search engine. Each line in the image represents a node, the number of times a search bot visited the node determines the length of the line. The tree images below are modified large versions of the original image, without the very long root node and with disconnected (wild) branches.</p>
<p>Overall results</p>
<p>From the start Yahoo! Slurp was by far the most active search bot. In one year it requested more than one million pages and crawled more than hundred thousand different nodes. Although this is a large number, it still is only 0.0049% of all nodes. The overall statistics of all bots is shown in the table below.</p>
<table style="width: 450px; text-align: right;" border="0">
<caption>overall statistics by search engine</caption>
<tbody>
<tr>
<th> </th>
<th style="text-align: right;">Yahoo!</th>
<th style="text-align: right;">Google</th>
<th style="text-align: right;">MSN</th>
</tr>
<tr>
<th>total number of pageviews</th>
<td>1,030,396</td>
<td>20,633</td>
<td>4,699</td>
</tr>
<tr>
<th>number of nodes crawled</th>
<td>105,971</td>
<td>7,556</td>
<td>1,390</td>
</tr>
<tr>
<th>percentage of tree crawled</th>
<td>0.0049%</td>
<td>0.00035%</td>
<td>0.000065%</td>
</tr>
<tr>
<th>number of indexed nodes</th>
<td>120,000</td>
<td>554</td>
<td>1</td>
</tr>
<tr>
<th>indexed/crawled ratio</th>
<td>113.23%</td>
<td>7.33%</td>
<td>0.07%</td>
</tr>
</tbody>
</table>
<p>The growth of the number of pageviews and the number of crawled nodes over the year the experiment lasted, is shown in figure 1 and 2. The way the bots crawled the tree is visualised in detail with the animations for each bot in the sections below.</p>
<p><img class="alignnone size-full wp-image-10" title="pageviews" src="http://blog.pdf-search.org/wp-content/uploads/2009/05/pageviews.png" alt="pageviews" width="457" height="301" /><br />
<strong>Fig. 1</strong> &#8211; The cumulative number of pageviews by the search bots in time.</p>
<p> <img class="alignnone size-full wp-image-11" title="nodes_crawled" src="http://blog.pdf-search.org/wp-content/uploads/2009/05/nodes_crawled.png" alt="nodes_crawled" width="457" height="301" /><br />
<strong>Fig. 2</strong> &#8211; The cumulative number of nodes crawled by the search bots in time.</p>
<p> The graph below (fig. 3) shows how many nodes of each level of the tree were crawled by the bots (on a logarithmic scale). The root of the tree is at level 0, while the most remote nodes (e.g. node 1) are at level 30. Since there are 2<sup>n</sup> nodes at level n (there is only 1 root and there are 2<sup>30</sup> nodes at level 30) crawling the entire tree would result in a straight line.</p>
<p><img class="alignnone size-full wp-image-12" title="nodes_crawled_depth" src="http://blog.pdf-search.org/wp-content/uploads/2009/05/nodes_crawled_depth.png" alt="nodes_crawled_depth" width="457" height="301" /><br />
<strong>Fig. 3</strong> &#8211; The number of nodes crawled after 1 year, grouped by node level.</p>
<p> </p>
<p>Google closely follows this straight line, until it breaks down after level 12. Most nodes at level 12 or less were crawled (5524 out of 8191), but only very few nodes at higher levels were crawled by Googlebot. MSN shows similar behaviour, but breaks down much earlier, at level 9 (656 out of 1023 nodes were crawled). Yahoo, however, does not break down. At high levels it gradually fails to request all nodes.</p>
<p>The nodes at high levels that were crawled by Yahoo, were requested quite often compared to the other bots: at level 14 to 30 each page was requested 10 times at average (see fig. 4).</p>
<p><img class="alignnone size-full wp-image-14" title="pageviews_depth" src="http://blog.pdf-search.org/wp-content/uploads/2009/05/pageviews_depth.png" alt="pageviews_depth" width="457" height="301" /><br />
<strong>Fig. 4</strong> &#8211; The average number of pageviews per node after 1 year, grouped by node level.</p>
<p> </p>
<h2 id="yahoo">Yahoo! Slurp</h2>
<p><img class="alignnone size-full wp-image-15" title="yahoo_small" src="http://blog.pdf-search.org/wp-content/uploads/2009/05/yahoo_small.png" alt="yahoo_small" width="501" height="362" /></p>
<ul>
<li>large version (4273&#215;3090, 1.5MB)</li>
<li>animated version over 1 year (2005-04-13 &#8211; 2006-04-13, 13MB)</li>
<li>animated version of the first 2 hours (2006-04-14 00:40:00-02:40:00, 2.2MB)</li>
</ul>
<p><strong>Fig. 5</strong> &#8211; The Yahoo! Slurp tree.</p>
<p>Yahoo! Slurp was the first search engine to discover Binary Search Tree 2. In the first hours after discovery it crawled the tree vigorously, at a speed of over 2.3 nodes per second (see the short animation). The first day it crawled approximately 30,000 nodes.</p>
<p>In the following month Slurp&#8217;s activity was low, but after exactly one month it requested all pages it visited before, for the second time. In the animation you can see the size of the tree double on 2005-05-14. This phenomenon is repeated a month later: on 2005-06-13 the tree grows to three times it original size. The number of pageviews is then almost 90,000 while the number of crawled nodes still is 30,000. Figure 6 shows this stepwise increment in the number of pageviews during the first months.</p>
<p><img class="alignnone size-full wp-image-16" title="yahoo_pageviews1" src="http://blog.pdf-search.org/wp-content/uploads/2009/05/yahoo_pageviews1.png" alt="yahoo_pageviews1" width="485" height="301" /><br />
<strong>Fig. 6</strong> &#8211; The cumulative number of pageviews by Yahoo! Slurp in time.</p>
<p> </p>
<p>After four months Slurp requested a large number of &#8216;new&#8217; nodes, for the first time since the initial round. It simply requested all URLs it had. Since it had already indexed 30,000 pages, that each link to two pages at a deeper level, it requested 60,000 pages at the end of August (the number of pageviews jumps from 100,000 to 160,000 pages in fig. 6) and it doubled the number of nodes it had crawled (see the fig. 7).</p>
<p>After 5 months Yahoo! Slurp started requesting nodes more regularly. It still had periods of &#8216;discovery&#8217; (e.g. after 10 months).</p>
<p><img class="alignnone size-full wp-image-17" title="yahoo_nodes_crawled" src="http://blog.pdf-search.org/wp-content/uploads/2009/05/yahoo_nodes_crawled.png" alt="yahoo_nodes_crawled" width="485" height="301" /><br />
<strong>Fig. 7</strong> &#8211; The cumulative number of nodes crawled by Yahoo! Slurp in time.</p>
<p> </p>
<p>Yahoo reported 120,000 pages in it&#8217;s index (current value). This may seem impossible since it only visited 105,971 nodes, but every node is available on two different domain names: www.drunkmenworkhere.org and drunkmenworkhere.org.</p>
<p>Note: the query submitted to Google and MSN yielded 35,600 pages on Yahoo. Yahoo is the only search engine that returns results with the query used above.</p>
<p> </p>
<h2 id="google">Googlebot<img class="alignnone size-full wp-image-18" title="google_small" src="http://blog.pdf-search.org/wp-content/uploads/2009/05/google_small.png" alt="google_small" width="500" height="608" /></h2>
<p> </p>
<ul>
<li>large version (4067&#215;4815, 180kB)</li>
<li>animated version (2005-04-13 &#8211; 2006-04-13, 1.2MB)</li>
</ul>
<p><strong>Fig. 8</strong> &#8211; The Googlebot tree.</p>
<p>In comparison with Yahoo&#8217;s tree, Google&#8217;s tree looks more like a natural tree. This is because Google visited nodes at deeper levels less frequently than their parent nodes. Yahoo only visited the nodes at the first three levels more frequently, while Google did so for the first 12 levels (see fig. 4).</p>
<p>The form of the tree follows from Google&#8217;s PageRank algorithm. PageRank is defined as follows:</p>
<p> </p>
<blockquote style="width: 450px;"><p>&#8220;We assume page A has pages T1&#8230;Tn which point to it (i.e., are citations). The parameter d is a damping factor which can be set between 0 and 1. We usually set d to 0.85. There are more details about d in the next section. Also C(A) is defined as the number of links going out of page A. The PageRank of a page A is given as follows:</p>
<p>PR(A) = (1-d) + d (PR(T1)/C(T1) + &#8230; + PR(Tn)/C(Tn)) &#8220;</p></blockquote>
<p>Since most nodes in the tree are not linked to by other sites, the PageRank of a node can be calculated with this formula (ignoring links in the comments):</p>
<blockquote><p>PR(node) = 0.15 + 0.85 (PR(parent) + PR(left child) + PR(right child))/3</p></blockquote>
<p>The only unknown when applying this formula iteratively, is the PageRank of the root node of the tree. Since this node was the homepage of drunkmenworkhere.org for a year, a high rank may be assumed. The calculated PageRank tree (fig. 9) shows similar proportions as Googlebot&#8217;s real tree, so the frequency of visiting a page seems to be related to the PageRank of a page.</p>
<p><img class="alignnone size-full wp-image-19" title="pagerank" src="http://blog.pdf-search.org/wp-content/uploads/2009/05/pagerank.png" alt="pagerank" width="500" height="515" /><br />
<strong>Fig. 9</strong> &#8211; A binary tree of depth 17 visualising calculated PageRank as length of each line, when the PageRank of the root node is set to 100.</p>
<p> </p>
<p>The animation of the Googlebot tree shows some interesting erratic behaviour, that cannot be explained with PageRank.</p>
<dl style="width: 460px;">
<dt>The rightmost branch </dt>
<dd>From the start Googlebot crawled more nodes on the right hand side of the tree. On 2005-07-04 it tries to visit the rightmost node, i.e. the node with the highest value. After selecting the right branch starting at the root for 20 levels Googlebot stopped. This produced the arc at the right end of the tree. <img class="alignnone size-full wp-image-20" title="google_right" src="http://blog.pdf-search.org/wp-content/uploads/2009/05/google_right.png" alt="google_right" width="450" height="269" /> </dd>
<dt>Searching node 1 </dt>
<dd>On 2005-06-30 Googlebot visited node 1, the leftmost node. It did not crawl the path from the root to this node, so how did it find the page? Did it guess the URL or did it follow some external link?<br />
A few hours later, Googlebot crawled node 2, which is linked as a parent node by node 1. These two nodes are displayed as a tiny dot in the animation on 2005-06-30, floating above the left branch. Then, a week later, on 2005-07-06 (two days after the attempt to find rightmost node), between 06:39:39 and 06:39:59 Googlebot finds the path to these disconnected nodes by visiting the 24 missing nodes in 20 seconds. It started at the root and found it&#8217;s way up to node 2, without selecting a right branch. In the large version of the Googlebot tree, this path is clearly visible. The nodes halfway the path were not requested for a second time and are represented by thin short line segments, hence the steep curve.<br />
 <img class="alignnone size-full wp-image-22" title="google_to_node1" src="http://blog.pdf-search.org/wp-content/uploads/2009/05/google_to_node1.png" alt="google_to_node1" width="450" height="252" /></dd>
<dt>Yahoo-like subtree </dt>
<dd>On 2005-07-23 Google suddenly spends some hours crawling 600 new nodes near node 1073872896. Most of these nodes were not visited ever again.<br />
This subtree is the reason the number of nodes crawled by Googlebot, grouped by level, increases again from level 18 to level 30 in fig. 3.<br />
 <img class="alignnone size-full wp-image-21" title="google_subtree" src="http://blog.pdf-search.org/wp-content/uploads/2009/05/google_subtree.png" alt="google_subtree" width="450" height="250" /></dd>
</dl>
<p>Over the last six months Googlebot requested pages at a fixed rate (about 260 pages per month, fig. 10). Like Yahoo! Slurp it seems to alternate between periods of discovery (see fig. 11) and periods of refreshing it&#8217;s cache.</p>
<p><img class="alignnone size-full wp-image-23" title="google_pageviews" src="http://blog.pdf-search.org/wp-content/uploads/2009/05/google_pageviews.png" alt="google_pageviews" width="485" height="301" /></p>
<p><strong>Fig. 10</strong> &#8211; The cumulative number of pageviews by Googlebot in time.</p>
<p> </p>
<p><img class="alignnone size-full wp-image-24" title="google_nodes_crawled" src="http://blog.pdf-search.org/wp-content/uploads/2009/05/google_nodes_crawled.png" alt="google_nodes_crawled" width="485" height="301" /><br />
<strong>Fig. 11</strong> &#8211; The cumulative number of nodes crawled by Googlebot in time.</p>
<p> Google returned 554 results when searching for nodes. The first nodes reported by Google are node 1 and 2, which are very deep inside the tree at level 29 and 30. Their higher rank is also reflected in the curve shown above (Searching node 1), which indicates a high number of pageviews. They probably appear first because of their short URLs. The other nodes at the first result page are all at level 4, probably because the first three levels are penalised because of comment spam. The current number of results can be checked here.</p>
<p> </p>
<h2 id="msn">MSNbot</h2>
<p><img class="alignnone size-full wp-image-25" title="msn_small" src="http://blog.pdf-search.org/wp-content/uploads/2009/05/msn_small.png" alt="msn_small" width="500" height="338" /></p>
<ul>
<li>large version (4200&#215;2795, 123kB)</li>
<li>animated version (2005-04-13 &#8211; 2006-04-13, 846kB)</li>
</ul>
<p><strong>Fig. 12</strong> &#8211; The msnbot tree</p>
<p> </p>
<p>The Msnbot tree is much smaller than Yahoo&#8217;s and Google&#8217;s. The most interesting feature is the disconnected large branch to the right of the tree. It appears on 2005-04-29, when msnbot visits node 2045877824. This node contains one comment, posted two weeks before:</p>
<blockquote><p>I hereby claim this name in the name of&#8230;well, mine. Paul Pigg.</p></blockquote>
<p>A week before msnbot requested this node, Googlebot already visited this node. This random node at level 24 was crawled because of a link from Paul Pigg&#8217;s website masterpigg.com (now dead, Google cache). All three search engines visited the node via this link, and all three failed to connect it to the rest of the tree. You can check this by clicking the &#8216;to trunk&#8217; links starting at node 2045877824.</p>
<p>Msnbot crawled from the disconnected node in upward and downward direction, creating a large subtree. This subtree caused the upward line between level 18 and 30 in figure 3.</p>
<p>The second large disconnected branch, at the top in the middle, originated from a link on uu-dot.com. Both disconnected branches are clearly visible in the Googlebot tree as well.</p>
<p><img class="alignnone size-full wp-image-26" title="msn_pageviews" src="http://blog.pdf-search.org/wp-content/uploads/2009/05/msn_pageviews.png" alt="msn_pageviews" width="485" height="301" /></p>
<p><strong>Fig. 13</strong> &#8211; The cumulative number of pageviews by msnbot in time.</p>
<p> <img class="alignnone size-full wp-image-27" title="msn_nodes_crawled" src="http://blog.pdf-search.org/wp-content/uploads/2009/05/msn_nodes_crawled.png" alt="msn_nodes_crawled" width="485" height="301" /><br />
<strong>Fig. 14</strong> &#8211; The cumulative number of nodes crawled by msnbot in time.</p>
<p> </p>
<p>As the graphs above show, msnbot virtually ceased to crawl Binary Search Tree 2 after five months. How the number of results MSN Search returns, relates to the above graphs is unclear.</p>
<p> </p>
<h2 id="spam">Spam bots</h2>
<p>In one year 5265 comments were posted to 103 different nodes. 32 of these nodes were never visited by any of the search bots. Most comments (3652) were posted to the root node (the home page). The word frequency of the submitted comments was calculated.</p>
<table style="width: 450px;" border="0">
<caption>top 50 of most frequently spammed words</caption>
<tbody>
<tr>
<th> </th>
<th>count</th>
<th>word</th>
</tr>
<tr>
<td>1</td>
<td>32743</td>
<td>http</td>
</tr>
<tr>
<td>2</td>
<td>23264</td>
<td>com</td>
</tr>
<tr>
<td>3</td>
<td>12375</td>
<td>url</td>
</tr>
<tr>
<td>4</td>
<td>8636</td>
<td>www</td>
</tr>
<tr>
<td>5</td>
<td>5541</td>
<td>info</td>
</tr>
<tr>
<td>6</td>
<td>4631</td>
<td>viagra</td>
</tr>
<tr>
<td>7</td>
<td>4570</td>
<td>online</td>
</tr>
<tr>
<td>8</td>
<td>4533</td>
<td>phentermine</td>
</tr>
<tr>
<td>9</td>
<td>4512</td>
<td>buy</td>
</tr>
<tr>
<td>10</td>
<td>4469</td>
<td>html</td>
</tr>
<tr>
<td>11</td>
<td>3531</td>
<td>org</td>
</tr>
<tr>
<td>12</td>
<td>3346</td>
<td>blogstudio</td>
</tr>
<tr>
<td>13</td>
<td>3194</td>
<td>drunkmenworkhere</td>
</tr>
<tr>
<td>14</td>
<td>2801</td>
<td>free</td>
</tr>
<tr>
<td>15</td>
<td>2772</td>
<td>cialis</td>
</tr>
<tr>
<td>16</td>
<td>2371</td>
<td>to</td>
</tr>
<tr>
<td>17</td>
<td>2241</td>
<td>u</td>
</tr>
<tr>
<td>18</td>
<td>2169</td>
<td>generic</td>
</tr>
<tr>
<td>19</td>
<td>2054</td>
<td>cheap</td>
</tr>
<tr>
<td>20</td>
<td>1921</td>
<td>ringtones</td>
</tr>
<tr>
<td>21</td>
<td>1914</td>
<td>view</td>
</tr>
<tr>
<td>22</td>
<td>1835</td>
<td>a</td>
</tr>
<tr>
<td>23</td>
<td>1818</td>
<td>net</td>
</tr>
<tr>
<td>24</td>
<td>1756</td>
<td>the</td>
</tr>
<tr>
<td>25</td>
<td>1658</td>
<td>buddy4u</td>
</tr>
<tr>
<td>26</td>
<td>1633</td>
<td>of</td>
</tr>
<tr>
<td>27</td>
<td>1633</td>
<td>lelefa</td>
</tr>
<tr>
<td>28</td>
<td>1580</td>
<td>xanax</td>
</tr>
<tr>
<td>29</td>
<td>1572</td>
<td>blogspot</td>
</tr>
<tr>
<td>30</td>
<td>1570</td>
<td>tramadol</td>
</tr>
<tr>
<td>31</td>
<td>1488</td>
<td>mp3sa</td>
</tr>
<tr>
<td>32</td>
<td>1390</td>
<td>insurance</td>
</tr>
<tr>
<td>33</td>
<td>1379</td>
<td>poker</td>
</tr>
<tr>
<td>34</td>
<td>1310</td>
<td>cgi</td>
</tr>
<tr>
<td>35</td>
<td>1232</td>
<td>sex</td>
</tr>
<tr>
<td>36</td>
<td>1198</td>
<td>teen</td>
</tr>
<tr>
<td>37</td>
<td>1193</td>
<td>in</td>
</tr>
<tr>
<td>38</td>
<td>1158</td>
<td>content</td>
</tr>
<tr>
<td>39</td>
<td>1105</td>
<td>aol</td>
</tr>
<tr>
<td>40</td>
<td>1099</td>
<td>mime</td>
</tr>
<tr>
<td>41</td>
<td>1095</td>
<td>and</td>
</tr>
<tr>
<td>42</td>
<td>1081</td>
<td>home</td>
</tr>
<tr>
<td>43</td>
<td>1034</td>
<td>us</td>
</tr>
<tr>
<td>44</td>
<td>1022</td>
<td>valium</td>
</tr>
<tr>
<td>45</td>
<td>1020</td>
<td>josm</td>
</tr>
<tr>
<td>46</td>
<td>1012</td>
<td>order</td>
</tr>
<tr>
<td>47</td>
<td>992</td>
<td>is</td>
</tr>
<tr>
<td>48</td>
<td>948</td>
<td>de</td>
</tr>
<tr>
<td>49</td>
<td>908</td>
<td>ringtone</td>
</tr>
<tr>
<td>50</td>
<td>907</td>
<td>i</td>
</tr>
</tbody>
</table>
<p>complete list (360 kB)</p>
<p>As the top 50 clearly shows, most spam was related to pharmaceutical products. The pie chart below shows the share of each medicine.</p>
<p><img class="alignnone size-full wp-image-28" title="spam_pie" src="http://blog.pdf-search.org/wp-content/uploads/2009/05/spam_pie.png" alt="spam_pie" width="500" height="500" /><br />
<strong>Fig. 15</strong> &#8211; The share of various medicines in comment spam.</p>
<p> </p>
<p>Submitted domain names were filtered from the text. All top-level domain names are shown in figure 16, ordered by frequency.</p>
<p><img class="alignnone size-full wp-image-29" title="spam_tld" src="http://blog.pdf-search.org/wp-content/uploads/2009/05/spam_tld.png" alt="spam_tld" width="500" height="700" /></p>
<p><strong>Fig. 16</strong> &#8211; Number of spammed domains by top level domain</p>
<p> Many email addressses submitted by the spam bots were non-existing addresses @drunkmenworkhere.org, which explains the high rank of this domain in the chart of most frequently spammed domains (fig. 17).</p>
<p><img class="alignnone size-full wp-image-30" title="spam_domain" src="http://blog.pdf-search.org/wp-content/uploads/2009/05/spam_domain.png" alt="spam_domain" width="500" height="714" /></p>
<p><strong>Fig. 17</strong> &#8211; Most frequently spammed domains</p>


<p>No related posts.</p>]]></content:encoded>
			<wfw:commentRss>http://blog.pdf-search.org/seo-soso/search-bots-behavior-analyzed/7/feed</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
	</channel>
</rss>
