<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	xmlns:series="http://unfoldingneurons.com/"
	>

<channel>
	<title>Stephen Foskett, Pack Rat &#187; SEO Archives  &#8211; Stephen Foskett, Pack Rat</title>
	<atom:link href="http://blog.fosketts.net/tag/seo/feed/" rel="self" type="application/rss+xml" />
	<link>http://blog.fosketts.net</link>
	<description>Understanding the accumulation of data</description>
	<lastBuildDate>Fri, 10 Feb 2012 17:40:43 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=</generator>
<atom:link rel="hub" href="http://pubsubhubbub.appspot.com" />
	<atom:link rel="hub" href="http://superfeedr.com/hubbub" />
			<item>
		<title>How To Force Apache To Redirect To Canonical Hostnames, or ServerAlias Is Not Your Friend</title>
		<link>http://blog.fosketts.net/2010/08/01/force-apache-redirect-canonical-hostnames-serveralias-friend/</link>
		<comments>http://blog.fosketts.net/2010/08/01/force-apache-redirect-canonical-hostnames-serveralias-friend/#comments</comments>
		<pubDate>Sun, 01 Aug 2010 15:19:42 +0000</pubDate>
		<dc:creator>Stephen</dc:creator>
				<category><![CDATA[Enterprise storage]]></category>
		<category><![CDATA[Everything]]></category>
		<category><![CDATA[Apache]]></category>
		<category><![CDATA[Google]]></category>
		<category><![CDATA[HTTP]]></category>
		<category><![CDATA[lighttpd]]></category>
		<category><![CDATA[redirect]]></category>
		<category><![CDATA[SEO]]></category>
		<category><![CDATA[ServerAlias]]></category>
		<category><![CDATA[VirtualHost]]></category>

		<guid isPermaLink="false">http://blog.fosketts.net/?p=3475</guid>
		<description><![CDATA[Today I came up against a frustrating realization: Apache doesn't have a satisfying way to redirect multiple domains to canonical hostnames! In other words, it's fairly easy to redirect one domain's content from "www.example.com" to just plain "example.com" or to make both hostnames work, but there's no one-stop solution to do this with a dozen domains. I've hit on a method that correctly redirects alternate hostnames and will save you aggravation in the long run.]]></description>
			<content:encoded><![CDATA[<p>Yesterday I discussed <a href="http://blog.fosketts.net/2010/07/30/high-performance-memory-apache-php-virtual-private-server/"  target="_blank">how to set up a lightweight PHP web server using Apache</a>. Next we have to get everything running smoothly, and I came up against a frustrating realization: Apache doesn&#8217;t have a satisfying way to redirect multiple domains to canonical hostnames! In other words, it&#8217;s fairly easy to redirect one domain&#8217;s content from &#8220;www.example.com&#8221; to just plain &#8220;example.com&#8221; or to make both hostnames work, but there&#8217;s no one-stop solution to do this with a dozen domains. But I&#8217;ve hit on a method that correctly redirects alternate hostnames and will save you aggravation in the long run.</p>
<h3>The Easy Way</h3>
<p>Apache is one amazingly flexible web server. It handles multiple domains with ease, using the VirtualHost method, and the ServerAlias directive allows you to host permutations easily. Consider the following totally made-up example:</p>
<pre>&lt;VirtualHost *:80&gt;
  ServerName blog.fosketts.net
  ServerAlias www.blog.fosketts.net
  ServerAlias fosketts.net
  ServerAlias www.fosketts.net
  DocumentRoot /var/www/
&lt;/VirtualHost&gt;</pre>
<p>This looks great, right? It tells Apache to watch any IP on port 80 for an HTTP request for a server called blog.fosketts.net and to serve it the content from /var/www/. It also tells Apache to accept plain old &#8220;fosketts.net&#8221;, &#8220;www.fosketts.net&#8221;, and even &#8221;www.blog.fosketts.net&#8221;.</p>
<h3>What&#8217;s Wrong With Easy?</h3>
<p>Although accepting all these hostnames seems like the friendly and correct thing to do, it&#8217;s not in your best interest. It tells web clients that the exact same content lives on four different servers, and they&#8217;ll start linking to your content every which way. Pretty soon you&#8217;ll have incoming links for all four hostnames. So what&#8217;s wrong with this?</p>
<ol>
<li><strong>It&#8217;s confusing for users</strong> &#8211; They&#8217;ll start asking, &#8220;is your site www.fosketts.net or blog.fosketts.net?&#8221; It&#8217;s fine to segment things, but confusing to do it unnecessarily.</li>
<li><strong>It&#8217;s hard to configure and maintain</strong> &#8211; Once your site starts getting linked to and shared around, you&#8217;re stuck supporting all possible combinations. When you switch hosts or server platforms (<a href="http://blog.fosketts.net/2010/07/30/high-performance-memory-apache-php-virtual-private-server/"  target="_blank">ahem</a>) you have to make sure everything still works.</li>
<li><strong>It hurts your search ranking</strong> &#8211; You might not be all that concerned with search engine placement, and <a rel="nofollow" href="http://googlewebmastercentral.blogspot.com/2008/09/demystifying-duplicate-content-penalty.html"  target="_blank">it&#8217;s not as bad</a> <a href="http://www.seomoz.org/blog/gonna-set-it-straight-this-watergate"  target="_blank">as some say</a>, but splitting your traffic between multiple sites also splits your &#8220;SEO juice&#8221;.</li>
<li><strong>Web crawls overload your servers</strong> &#8211; Search engines treat each host name as a different server. If you allow links to multiple names without a proper redirect, you&#8217;ll get multiple crawls, often at the same time.</li>
</ol>
<p>In summary, the easy was isn&#8217;t good. ServerAlias looks friendly, but it&#8217;s not a friend when used this way.</p>
<p>Let&#8217;s say your name was Stephen, but some people call you Steve. Rather than insist on one or the other, you could just go through life accepting either. But imprecision can lead to issues, even in the real world. Will people know to look up Stephen in the company directory when they know you as Steve? You might start getting duplicate junk mail for both names as they find their way onto mailing lists. Then there&#8217;s the embarrassing &#8220;I always called him Steve&#8221; moment at the company party, when someone feels like they&#8217;re not part of the &#8220;in crowd&#8221; that knows your real name. It&#8217;s best to be friendly and accept anything but politely suggest that everyone uses just one name in the interest of sanity.</p>
<h3>Redirection is Right</h3>
<p>The best approach in life is also the correct method on the web. Your server should be set to accept any number of possible names in case someone comes in with the wrong one. But rather than blithely accepting the name, your server should issue a proper &#8220;redirect&#8221; call, instructing the browser or crawler to reload the page using the correct name from that point on.</p>
<p>This is simple when using Lighttpd. I just added the following lines to my lighttpd.conf file and it magically issued a proper redirect whenever someone came in using the &#8220;www&#8221; name:</p>
<pre>$HTTP["host"] =~ "^www\.(.*)$" {
  url.redirect  = (
    "^/(.*)" =&gt; "http://%1/$1",
  )
}</pre>
<p>I was amazed that I could locate no such universal redirect option in Apache. You can do all the RedirectMatch calls you want, but their regular expressions only operate on the path part of the URL, not the hostname. This is great for adding a &#8220;www&#8221; but makes it impossible to create a generic rule to eliminate them!</p>
<p>Instead, we have to use RedirectMatch on each VirtualHost domain individually. This also opens the possibility to deal with other conditions we might come across, but it&#8217;s not as simple and clean as the Lighttpd method.</p>
<p>Here&#8217;s where the magic is. Each VirtualHost configuration you add (in /etc/apache2/sites-available on Ubuntu) should include rules to deal with the incorrect names as well as the single correct one. Here&#8217;s the correct redirect rule for the example above:</p>
<pre>&lt;VirtualHost *:80&gt;
  ServerName fosketts.net
  ServerAlias www.fosketts.net
  ServerAlias www.blog.fosketts.net
  RedirectMatch 301 (.*) http://blog.fosketts.net$1
&lt;/VirtualHost&gt;

&lt;VirtualHost *:80&gt;
  DocumentRoot /var/www/
  ServerName blog.fosketts.net
&lt;/VirtualHost&gt;</pre>
<p>The first VirtualHost block matches all the incorrect hostnames and redirects them (with a code of 301 for &#8220;Permanent&#8221;) to the correct hostname. The &#8220;(.*)&#8221; part matches any and all paths and arguments and the &#8220;$1&#8243; part appends them to the new hostname. Then we set up another VirtualHost block for only the correct hostname and put any and all rules in there.</p>
<p>This way, any clients or crawlers that hit &#8220;www.fosketts.net&#8221; or any of the other alternatives will get a proper 301 redirect to &#8220;blog.fosketts.net&#8221; and go about its business. It tells Google that there is only one proper server name for this content and encourages users (who will likely copy and paste from the address bar) to use it, too. Neat and tidy, and very friendly.</p>
<p>I&#8217;d love to hear alternative methods of doing this. Please leave a comment if you have a suggestion that uses a 301 redirect and works across multiple domains!</p>
<div id="crp_related"><h3>You might also want to read these other posts...</h3><ul><li><a href="http://blog.fosketts.net/2010/07/30/high-performance-memory-apache-php-virtual-private-server/"  rel="bookmark" class="crp_title">A High-Performance, Low-Memory Apache/PHP Virtual Private Server</a></li><li><a href="http://blog.fosketts.net/2009/06/26/multiserver-web-host-environment/"  rel="bookmark" class="crp_title">Setting Up a Multi-Server Web Hosting Environment</a></li><li><a href="http://blog.fosketts.net/guides/ipad-exchange-activesync/ipad-exchange-activesync-troubleshooting-guide/"  rel="bookmark" class="crp_title">iPad Exchange ActiveSync Troubleshooting Guide</a></li><li><a href="http://blog.fosketts.net/2011/04/26/5307/"  rel="bookmark" class="crp_title"></a></li><li><a href="http://blog.fosketts.net/2009/06/29/tuning-lighttpd-linux/"  rel="bookmark" class="crp_title">Tuning Lighttpd For Linux</a></li></ul></div><script src="http://feeds.feedburner.com/~s/sfoskett?i=http://blog.fosketts.net/2010/08/01/force-apache-redirect-canonical-hostnames-serveralias-friend/" type="text/javascript" charset="utf-8"></script><hr />
<p><small>© sfoskett for <a href="http://blog.fosketts.net">Stephen Foskett, Pack Rat</a>, 2010. |
<a href="http://blog.fosketts.net/2010/08/01/force-apache-redirect-canonical-hostnames-serveralias-friend/">How To Force Apache To Redirect To Canonical Hostnames, or ServerAlias Is Not Your Friend</a>
<br/>
This post was categorized as <a href="http://blog.fosketts.net/category/everything/enterprisestorage/" title="View all posts in Enterprise storage" rel="category tag">Enterprise storage</a>, <a href="http://blog.fosketts.net/category/everything/" title="View all posts in Everything" rel="category tag">Everything</a>. Each of my categories has its own feed if you'd like to filter out or focus on posts like this.<br/>
</small></p>]]></content:encoded>
			<wfw:commentRss>http://blog.fosketts.net/2010/08/01/force-apache-redirect-canonical-hostnames-serveralias-friend/feed/</wfw:commentRss>
		<slash:comments>7</slash:comments>
	
		<series:name><![CDATA[Web Hosting]]></series:name>
	</item>
		<item>
		<title>Google Is Heading For A Cliff; What Will They Do?</title>
		<link>http://blog.fosketts.net/2009/05/22/google-nofollow/</link>
		<comments>http://blog.fosketts.net/2009/05/22/google-nofollow/#comments</comments>
		<pubDate>Fri, 22 May 2009 14:02:26 +0000</pubDate>
		<dc:creator>Stephen</dc:creator>
				<category><![CDATA[Computer History]]></category>
		<category><![CDATA[Personal]]></category>
		<category><![CDATA[Baidu]]></category>
		<category><![CDATA[Bit.ly]]></category>
		<category><![CDATA[blog]]></category>
		<category><![CDATA[blogging]]></category>
		<category><![CDATA[Cuil]]></category>
		<category><![CDATA[Digg]]></category>
		<category><![CDATA[email]]></category>
		<category><![CDATA[Facebook]]></category>
		<category><![CDATA[Friendfeed]]></category>
		<category><![CDATA[Gmail]]></category>
		<category><![CDATA[Google]]></category>
		<category><![CDATA[Google Apps]]></category>
		<category><![CDATA[Google Reader]]></category>
		<category><![CDATA[IRC]]></category>
		<category><![CDATA[LAMP]]></category>
		<category><![CDATA[LinkedIn]]></category>
		<category><![CDATA[Microsoft]]></category>
		<category><![CDATA[Nofollow]]></category>
		<category><![CDATA[PageRank]]></category>
		<category><![CDATA[Picasa]]></category>
		<category><![CDATA[Reader]]></category>
		<category><![CDATA[RSS]]></category>
		<category><![CDATA[search]]></category>
		<category><![CDATA[SEO]]></category>
		<category><![CDATA[Slashdot]]></category>
		<category><![CDATA[social networks]]></category>
		<category><![CDATA[spam]]></category>
		<category><![CDATA[spider]]></category>
		<category><![CDATA[StimbleUpon]]></category>
		<category><![CDATA[technology]]></category>
		<category><![CDATA[Twitter]]></category>
		<category><![CDATA[USENET]]></category>
		<category><![CDATA[wiki]]></category>
		<category><![CDATA[Wikipedia]]></category>
		<category><![CDATA[Yahoo]]></category>

		<guid isPermaLink="false">http://blog.fosketts.net/?p=1892</guid>
		<description><![CDATA[Google is the most important company to the Internet. Hyberbole? I think not! Without Google, the Internet that we all know and love would be a very different place, as would the business of IT. Along with Microsoft and the supporting community around LAMP, Google is the very foundation of modern computing. But the foundation of Google itself, its ability to rank Internet content and present relevant information to its users, is at risk. What will they do to fix it?
]]></description>
			<content:encoded><![CDATA[<p>Google is the most important company to the Internet. Hyberbole? I think not! <strong>Without Google, the Internet that we all know and love would be a very different place</strong>, as would the business of IT. Along with Microsoft and the supporting community around LAMP, Google is the very foundation of modern computing. But the foundation of Google itself, its ability to rank Internet content and present relevant information to its users, is at risk. What will they do to fix it?</p>
<blockquote><p>Note: This post is about Google, because it is by far the dominant search engine, advertiser, and &#8220;portal&#8221; in the English-speaking world. Nearly everything mentioned here applies equally to other search engines and advertising providers.</p></blockquote>
<h3 class="post-subhead">Ranking Pages</h3>
<p>Google&#8217;s relevance comes from their historical ability to present a quality searchable portal to the entire Internet. The majority of <a href="http://www.thestandard.com/news/2009/01/22/picture-guess-where-google-gets-97-its-revenue"  target="_blank">Google&#8217;s revenue</a> is also derived from quality information, giving them the ability to present more-compelling advertising to web users.</p>
<p><strong>Google&#8217;s core success is based on its ability to discover and rank the quality of Internet content</strong>. Gmail, Reader, Picasa, Apps, and the rest of the Google properties are surely excellent sources of information on the preferences of individual users, but they contribute only slightly to the other side of the coin: Information about Internet content. For that, they still rely on the core technology invented at Stanford a decade ago: <a rel="nofollow" href="http://en.wikipedia.org/wiki/PageRank"  target="_blank">PageRank</a>.</p>
<p>Every time it encounters a link, Google&#8217;s software &#8220;spider&#8221; follows it, adding the content of the linked web page to an index. Google, like other early search engines, counts each link as a vote for the quality of the page. The genius of PageRank is that Google weights each vote based on the quality of the page it comes from. Although PageRank is not the entirety of Google, it is a singular key element.</p>
<p>Put simply, <strong>Google&#8217;s success depends on its ability to gather and rank the links we all make and match them to the data we provide about ourselves</strong>. Without this, Google will fail.</p>
<h3 class="post-subhead">The Changing Web</h3>
<p><strong>The graphical Web is not the Internet</strong>. My first experiences online came well before graphical hypertext clients (what we now call browsers) dominated the user experience and became the web. Although the network we call the Internet now supports a very wide variety of traffic, <strong>Google&#8217;s preeminence comes only from the Web</strong>. They have little or no reach into the massive streams of corporate data, multimedia, and other non-hypertext content streaming across the &#8216;net.</p>
<p>When it was first developed, <strong>the web was manual and links were hand-selected and carefully put into context</strong>. It was difficult to put together a web page, and those pages that were developed were were static. The social networks of the time (USENET, IRC, and email mostly) were not integrated into the web, did not generally include links. So the first search engines, and later ones like Google, focused on this relatively small pool of pages and links.</p>
<p>But <strong>the web soon became automated</strong>, subsuming most other interactive services. Social (user-generated) interaction moved into the web in a big way, with blogs, wikis, and discussion forums enabling rapid content creation and reference by users. Sharing links in the social web, and through social bookmarking services, generally replaced the manual pages of old.</p>
<p>At first, this explosion of user-generated content was a dream scenario for Google. They could harvest the collective intelligence of us all to identify and rank content. But as the number of pages and links exploded, <strong>the notion of a &#8220;web page&#8221; was radically shifted from a stable and predictable set of data to a dynamic portal into a vast store of content</strong>. Where everyone once saw the same content at a given URL, now each of us has his own experience.</p>
<p>Spammers and scammers realized the value of Google placement and <strong>flooded this dynamic social web with links</strong>. This threatened not only to undermine the relevance that supports Google&#8217;s search (and advertising) business, but it also threatened these new social services themselves. Each honest, relevant link added to a Wikipedia article, included in a Slashdot comment, or shared on a service like Digg was dwarfed by the thousands or millions of spam links injected to boost the PageRank of &#8220;client&#8221; sites.</p>
<h3 class="post-subhead">I Don&#8217;t Follow</h3>
<p>Google and the social net fought valiantly against this wave of link spam, but it became clear that something more radical was needed. <strong>The only way to fight spam was to make it useless to the spammers</strong>. Thus was born a simple but highly-effective tool: <a rel="nofollow" href="http://en.wikipedia.org/wiki/Nofollow"  target="_blank">Nofollow</a>.</p>
<p>Webmasters long had the ability to tell the Google spider to ignore a certain set of hosted pages through the use of a server-side list called robots.txt. But spammers wanted the exact opposite. What was needed was a client-side way to specify that a link was not worthy of being spidered and ranked by the search engines. This would eliminate the primary benefit of link spam.</p>
<p>Implementing client-side spider blocking was trivial: <strong>A simple tag, &#8220;rel=nofollow&#8221;, was added alongside the url in a web link</strong>. This way, Google&#8217;s spider would simply ignore every &#8220;nofollow&#8221; link it encountered, and they would never be searched or ranked in the index.</p>
<p>But spammers would never put the nofollow tag in their own links. So sites quickly began implementing <a href="http://www.seomoz.org/blog/nofollow-is-dying-the-impact-of-microblogging-and-nofollow-on-seo"  target="_blank">blanket nofollow policies</a>: Every link submitted by users in any form would receive the tag by default. The idea would be that links that had not yet been vetted by users would get the nofollow tag and those that were deemed acceptable would not. But most sites never figured out the right process to allow the nofollow tag to be removed. Today, <strong>nearly every social service, from FaceBook to Twitter to Digg to StumbleUpon, permanently marks nearly every link this way</strong>. Even Wikipedia, a long-time holdout, finally switched to a <a rel="nofollow" href="http://meta.wikimedia.org/wiki/Nofollow"  target="_blank">default nofollow on all but the English site</a>.</p>
<h3 class="post-subhead">The Nofollow War</h3>
<p>What does this mean for Google? If the vast majority of user-generated links are tossed into the spam category as far as the search engine is concerned, it means <strong>that their entire system of discovering and ranking links is in jeopardy</strong>. The major social services, most of which attract the majority of end-user traffic, content, and links, are rendered useless in generating relevancy.</p>
<p>But these are the exact sources that Google ought to be focusing on the most. Many have noted that they hear about news more rapidly through real-time sources like Twitter than through less-dynamic traditional news sites and blogs. <strong>Even if Google had the ability to spider a service like Twitter in real time, </strong><a href="http://news.digitaltrends.com/news-article/19978/twitter-beating-google-on-real-time-information"  target="_blank"><strong>which is doubtful</strong></a><strong>, they would gain no insight from the links included in these sources</strong>. Social bookmarking sites like Digg are chock full of user-vetted links and should be gold mines for Google, but the nofollow tag makes them invisible.</p>
<p>This scarcity of user-generated links has <strong>made the links that are followable even more valuable</strong>. Scammers constantly create fake blogs of scraped (read &#8220;stolen&#8221;) content and users are paid to include followable links anywhere they can. Sites with a high PageRank value are constantly inundated with offers and attacked by hackers to siphon off high-value &#8220;votes&#8221;.</p>
<p><strong>High-profile content providers are circling their wagons</strong>, drastically cutting down on <a href="http://louisgray.com/live/2007/09/internal-linking-on-some-tech-blogs-is.html"  target="_blank">outside links</a> in order to focus PageRank on their own properties. <strong>Smaller publishers and blogs are striking back at the big guys</strong>, decrying their dearth of external links. Some even go so far as to initiate <a href="http://www.inverudio.com/programs/WordPressBlog/NofollowReciprocity.php"  target="_blank">blanket nofollow policies against these big, respected, but non-linking sites</a>.</p>
<p>This leaves Google with even fewer useful links with which to examine the Web. It also leaves the biggest content providers and networks and the savviest search engine optimization (SEO) pros with a bigger slice of the <a href="http://blog.fosketts.net/2009/01/15/googles-analytics-measuring-page-seo/"  target="_blank">valuable top-of-Google result real estate</a>.</p>
<h3 class="post-subhead">The Fix Is In</h3>
<p>Google is left with a looming nightmare scenario: <strong>As smaller, alternative, social, and real-time content providers disappear from the search engine, its overall relevance and value declines</strong>. Soon, a tipping point will be reached when users would rather rely on Twitter, FaceBook, and the rest for their Internet interactions than the old-fashioned search engine, email, and RSS readers that Google currently dominates. <strong>This house-of-cards collapse can only be avoided by including user-generated content in the Google index</strong>.</p>
<p><strong>Search engines could simply ignore the nofollow tag</strong>, wading into the social stream and combatting spam in other ways. But this would lead to another rapid upswing of link spam, shifting the burden to content providers once again. And it might also expose links that actually should not be followed, leading to technical and even legal trouble.</p>
<p>The best solution would see the <strong>social networks designing in some method of removing the nofollow attribute</strong> once links are verified to be relevant and correct. But there is no incentive for them to help drive Google traffic to other sites. Indeed, Twitter recently took the next step, <a href="http://www.techcrunch.com/2009/03/24/twitter-tweaks-its-title-tags-for-better-google-juice/"  target="_blank">arranging the titles of user pages</a> in an attempt to SEO their way to the top page of Google searches for user&#8217;s names. Only altruistic systems like Wikipedia are likely to design in this type of response.</p>
<p>Another possible scenario (to be explored another day) is <strong>the usurpation of today&#8217;s social web and its content by a new next-generation service</strong>. A web-based social client like <a href="http://www.louisgray.com/live/2009/05/friendfeed-simplifies-joining-process.html"  target="_blank">FriendFeed could rapidly siphon away</a> both existing and net-new content and users in the guise of openness and interoperability. Although new web spiders like Cuil have failed, perhaps old-fashioned crawling capability is no longer all that valuable in the social web.</p>
<p>The most likely fix is both predictable and pragmatic: <strong>Google must buy all successful source of social links</strong> (like Twitter, Bit.ly, StumbleUpon, and even FaceBook) and integrate them into their search system. Owning Twitter would enable Google to decide which links to follow and which to ignore. The reward of improving search results would be the incentive needed to add &#8220;re-follow&#8221; capability. <strong>Buying these services would also give Google an open pipe of the real-time traffic flowing through these services</strong>, a critical resource that they currently lack.</p>
<p><strong>Google simply can not afford not owning the real-time web</strong>, and they must continue to buy up similar sources of content as they appear. Yahoo was unable to extract value from StumbleUpon, but Google&#8217;s other competitors will certainly try to undermine the search giant. Frankly, I&#8217;m shocked that Microsoft, FaceBook, or even Baidu have not yet snapped up services like Twitter, LinkedIn, and Digg even if only to keep them and the information they contain out of Google&#8217;s hands.</p>
<blockquote><p>If you enjoyed reading this, you&#8217;ll probably also like <a href="http://foskettservices.com"  target="_blank">my Foskett Services blog</a>!</p></blockquote>
<div id="crp_related"><h3>You might also want to read these other posts...</h3><ul><li><a href="http://blog.fosketts.net/2009/05/27/google-recalculated-pagerank/"  rel="bookmark" class="crp_title">Google Just Recalculated PageRank!</a></li><li><a href="http://blog.fosketts.net/2009/01/15/googles-analytics-measuring-page-seo/"  rel="bookmark" class="crp_title">Measuring the Importance of Google&#8217;s First Page</a></li><li><a href="http://blog.fosketts.net/2010/01/20/vendor-twitter/"  rel="bookmark" class="crp_title">Vendor Non-Blogs</a></li><li><a href="http://blog.fosketts.net/2009/07/15/google-reader-social/"  rel="bookmark" class="crp_title">Google Reader Gets More Social</a></li><li><a href="http://blog.fosketts.net/2010/02/12/googles-evil-buzz-building/"  rel="bookmark" class="crp_title">Google&#8217;s Evil Buzz Is Building</a></li></ul></div><script src="http://feeds.feedburner.com/~s/sfoskett?i=http://blog.fosketts.net/2009/05/22/google-nofollow/" type="text/javascript" charset="utf-8"></script><hr />
<p><small>© sfoskett for <a href="http://blog.fosketts.net">Stephen Foskett, Pack Rat</a>, 2009. |
<a href="http://blog.fosketts.net/2009/05/22/google-nofollow/">Google Is Heading For A Cliff; What Will They Do?</a>
<br/>
This post was categorized as <a href="http://blog.fosketts.net/category/everything/computerhistory/" title="View all posts in Computer History" rel="category tag">Computer History</a>, <a href="http://blog.fosketts.net/category/everything/personal/" title="View all posts in Personal" rel="category tag">Personal</a>. Each of my categories has its own feed if you'd like to filter out or focus on posts like this.<br/>
</small></p>]]></content:encoded>
			<wfw:commentRss>http://blog.fosketts.net/2009/05/22/google-nofollow/feed/</wfw:commentRss>
		<slash:comments>3</slash:comments>
		</item>
		<item>
		<title>Measuring the Importance of Google&#8217;s First Page</title>
		<link>http://blog.fosketts.net/2009/01/15/googles-analytics-measuring-page-seo/</link>
		<comments>http://blog.fosketts.net/2009/01/15/googles-analytics-measuring-page-seo/#comments</comments>
		<pubDate>Fri, 16 Jan 2009 01:00:15 +0000</pubDate>
		<dc:creator>Stephen</dc:creator>
				<category><![CDATA[Personal]]></category>
		<category><![CDATA[blogging]]></category>
		<category><![CDATA[Google]]></category>
		<category><![CDATA[Google Analytics]]></category>
		<category><![CDATA[Google Webmaster Tools]]></category>
		<category><![CDATA[Microsoft]]></category>
		<category><![CDATA[SEO]]></category>
		<category><![CDATA[Yahoo]]></category>
		<category><![CDATA[Yoast]]></category>

		<guid isPermaLink="false">http://blog.fosketts.net/?p=1348</guid>
		<description><![CDATA[I'm not blogging to get traffic; I'm blogging because I have something to say. But, being a curious person, I do measure my blog traffic, and I've become interested in how the search engine optimization gurus ply their trade. So a recent tip on using a custom Google Analytics filter to determine "front page" placement on that search engine piqued my curiosity.]]></description>
			<content:encoded><![CDATA[<div id="attachment_1349" class="wp-caption alignright" style="width: 310px;  border: 1px solid #dddddd; background-color: #f3f3f3; padding-top: 4px; margin: 10px; text-align:center; float: right;"><a href="http://blog.fosketts.net/wp-content/uploads/2009/01/visits-from-all-sources.png" ><img class="size-medium wp-image-1349" title="visits-from-all-sources" src="http://blog.fosketts.net/wp-content/uploads/2009/01/visits-from-all-sources-300x262.png" alt="Google's first page accounts for more than 2/3 of my web site traffic!" width="300" height="262" /></a><p style=' padding: 0 4px 5px; margin: 0;'  class="wp-caption-text">Google&#39;s first page accounts for more than two thirds of my web site traffic!</p></div>
<p>I&#8217;m not blogging to get traffic; I&#8217;m blogging <a rel="nofollow" href="http://bhc3.wordpress.com/2008/12/08/why-professionals-should-continue-to-blog-in-the-era-of-twitter/"  target="_blank">because</a> I have something to say. But, being a curious person, I do measure my blog traffic, and I&#8217;ve become interested in how the search engine optimization gurus ply their trade. So a recent tip on <a href="http://yoast.com/track-seo-rankings-google-analytics/"  target="_blank">using a custom Google Analytics filter</a> to determine &#8220;front page&#8221; placement on that search engine piqued my curiosity.</p>
<p>After just one week of monitoring, the results speak for themselves: <strong>Google&#8217;s first page of results accounts for more than two thirds of my web site traffic</strong>, and less than a quarter of my visits come from a source other than Google! It should come as no surprise that the search giant&#8217;s first page has come to dominate Internet traffic, but it was amazing for me to see it demonstrated so graphically using data from my own blog.</p>
<p>Since the first of the year, I&#8217;ve noticed that my traffic has been up by about 50%, too. I was puzzled, since the uptick comes across all topics and posts. My friend <a href="http://www.louisgray.com/live/index.html"  target="_blank">Louis Gray</a> suggests that Google might be responsible for this as well: <a href="http://www.searchenginegenie.com/pagerank-10-sites.htm"  target="_blank">Google updated their Page Rank results</a>, potentially moving many of my pages into the first page of results.</p>
<p>Of course, I also recently switched to <a href="http://www.winextra.com/index.php/2008/12/10/dont-you-get-it-yet-partial-feeds-kill-readership/"  target="_blank">a full-text RSS feed</a>, so perhaps fewer people are visiting my site to read longer articles. But my examination of direct and referring traffic in Google Analytics is inconclusive on this matter.</p>
<h3 class="post-subhead">Using Google Analytics</h3>
<p>Google is notoriously cagey on releasing information about their ranking system. They don&#8217;t want those SEO experts gaming the system to get their pages on top. Of course this hasn&#8217;t stopped anyone, but it can be difficult to determine where exactly one&#8217;s site is ranked.</p>
<p>The best information I&#8217;ve found is in <a rel="nofollow" href="https://www.google.com/webmasters/tools"  target="_blank">Google&#8217;s Webmaster Tools</a>. Here, one can view the top search queries for one&#8217;s own (verified) site, including the absolute position for both impressions (views in the search results) and traffic generated. Although this information is useful, it is not as quantitative as one might desire: I can see that a search for &#8220;<a rel="nofollow" href="http://www.google.com/search?client=safari&amp;rls=en-us&amp;q=iphone+exchange+setup&amp;ie=UTF-8&amp;oe=UTF-8"  target="_blank">iphone exchange setup</a>&#8221; puts my <a href="http://blog.fosketts.net/2008/07/10/how-to-set-up-iphone-exchange-activesync/"  target="_blank">iPhone ActiveSync setup</a> page on top, but I can&#8217;t compare this to Yahoo or direct visits.</p>
<p><a rel="nofollow" href="https://www.google.com/analytics"  target="_blank">Google&#8217;s Analytics service</a> is another great resource, with excellent recording and reporting of web site metrics over time. But the company seems reluctant to link this information to their search data, perhaps worried that they&#8217;ll make life too easy for those who are trying to game the system to get more traffic.</p>
<h3 class="post-subhead">The Google Analytics SEO Filter</h3>
<p><strong>The SEO filter </strong><a href="http://andrescholten.nl/index.php/seo-rankings-meten-met-google-analytics/"  target="_blank">proposed by André Scholten</a> certainly is clever: Although Google doesn&#8217;t include absolute ranking data in Analytics, their own URLs do include an indication of <strong>which page of results</strong> a user was viewing when they clicked to visit a site. When one is not satisfied with the results on the first page of Google&#8217;s search, they click &#8220;Next&#8221; and are taken to another page of results. Google tracks which page to display using an element in the URL called &#8220;start&#8221;. For example, the second page of results is called by including &#8220;&amp;start=10&#8243; in the URL, while the 10th page includes &#8220;&amp;start=90&#8243;.</p>
<p>Since Google Analytics normally records the full referring URL, we can use an advanced feature called filtering to tag Google referrals with an indication of which page the user was looking at when they clicked. André doesn&#8217;t include <strong>the whole recipe</strong>, but it&#8217;s not difficult for a novice to implement:</p>
<ol>
<li>Get Google Analytics up and running for your site. I&#8217;m not going to go into how to do this here.</li>
<li>In the main Analytics page listing Website Profiles for your user, click &#8220;+ Add new profile&#8221; in the gray bar. It is important that anything you do is <strong>done in a new profile</strong>, since this is a filter, and will remove (filter out) things that don&#8217;t match.</li>
<li>Select &#8220;Add a Profile for an existing domain&#8221; (since you already set this domain up in step 1) and give it a Profile Name like &#8220;blog SEO&#8221;.</li>
<li>Click Finish, and now we have a profile to work with without messing up our main Analytics results.</li>
<li>Find this new profile in the Website Profiles list and click &#8220;Edit&#8221; under the &#8220;Actions&#8221; heading.</li>
<li>The third box is &#8220;Filters&#8221; &#8211; this is where the magic happens. We will create three filters to only collect information on Google searches, and once you create them you can re-apply them in other profiles for other sites without reinventing the wheel.</li>
<li>Click &#8220;+ Add Filter&#8221; and add a new one called &#8220;Ranking 1&#8243;. &#8220;Filter Type&#8221; will be &#8220;Custom filter&#8221;, and you will select &#8220;Include&#8221; under this. Then use &#8220;Campaign Source&#8221; as the &#8220;Filter Field&#8221; and enter &#8220;Google&#8221; as the &#8220;Filter Pattern&#8221;. This will filter out any traffic that didn&#8217;t come from a <strong>Google referral</strong>.</li>
<li>Click &#8220;+ Add Filter&#8221; and add a new one called &#8220;Ranking 2&#8243;. &#8220;Filter Type&#8221; will be &#8220;Custom filter&#8221;, and you will select &#8220;Include&#8221; under this. Then use &#8220;Campaign Medium&#8221; as the &#8220;Filter Field&#8221; and enter &#8220;organic&#8221; as the &#8220;Filter Pattern&#8221;. This will filter out any <strong>paid traffic from Google ads</strong>.</li>
<div id="attachment_1350" class="wp-caption alignright" style="width: 310px;  border: 1px solid #dddddd; background-color: #f3f3f3; padding-top: 4px; margin: 10px; text-align:center; float: right;"><a href="http://blog.fosketts.net/wp-content/uploads/2009/01/picture-32.png" ><img class="size-medium wp-image-1350" title="google-seo-filter" src="http://blog.fosketts.net/wp-content/uploads/2009/01/picture-32-300x219.png" alt="The magic Google Analytics SEO filter" width="300" height="219" /></a><p style=' padding: 0 4px 5px; margin: 0;'  class="wp-caption-text">The magic Google Analytics SEO filter</p></div>
<li>Now comes the big one. Click &#8220;+ Add Filter&#8221; and add a new one called &#8220;Ranking 3&#8243;. &#8220;Filter Type&#8221; will be &#8220;Custom filter&#8221;, and you will select &#8220;Advanced&#8221; under this.</li>
<li>For &#8220;Field A -&gt; Extract A&#8221;, select &#8220;Referral&#8221; and enter &#8220;(\?|&amp;)q=([^&amp;]*)&#8221; in the box (no quotes!) This matches the key/value sets in the URL.</li>
<li>Next, for &#8220;Field B -&gt; Extract B&#8221;, select &#8220;Referral&#8221; again and enter &#8220;(\?|&amp;)start=([^&amp;]*)&#8221; in the box (no quotes again!) This extracts the value for the &#8220;start&#8221; key, if present.</li>
<li>Finally, for &#8220;Output To -&gt; Constructor&#8221;, select &#8220;User Defined&#8221; and enter &#8220;$A2 (page: $B2)&#8221; in the box (think you include quotes? Think again!) This <strong>adds a custom tag</strong> with the full URL and the value of the &#8220;start&#8221; key.</li>
<li>Finish it off by checking &#8220;Yes&#8221; for &#8220;Field A Required&#8221;, &#8220;No&#8221; for &#8220;Field B Required&#8221;, and &#8220;Yes&#8221; for &#8220;Override Output Field&#8221;.</li>
<li>Now <strong>give it some time</strong> &#8211; at least a day for Google to start adding data to this profile. Assuming you did everything correctly, you shouldn&#8217;t have any trouble with the statistics recording in your main profile.</li>
</ol>
<h3 class="post-subhead">Watching the Results</h3>
<p>Assuming it worked, you can now <strong>add the correct chart</strong> to Analytics in the new profile:</p>
<div id="attachment_1352" class="wp-caption alignright" style="width: 304px;  border: 1px solid #dddddd; background-color: #f3f3f3; padding-top: 4px; margin: 10px; text-align:center; float: right;"><a href="http://blog.fosketts.net/wp-content/uploads/2009/01/visits-by-google-results-page.png" ><img class="size-medium wp-image-1352" title="visits-by-google-results-page" src="http://blog.fosketts.net/wp-content/uploads/2009/01/visits-by-google-results-page-294x300.png" alt="Google's first page dominates their referrals" width="294" height="300" /></a><p style=' padding: 0 4px 5px; margin: 0;'  class="wp-caption-text">Google&#39;s first page dominates their referrals</p></div>
<ol>
<li>Enter the new profile&#8217;s Dashboard by clicking &#8220;View report&#8221;.</li>
<li>Select &#8220;Traffic Sources&#8221; on the left side.</li>
<li>Click &#8220;google (organic)&#8221; in the &#8220;Top Traffic Sources&#8221; table &#8211; it should be the only item there!</li>
<li>In the &#8220;Dimension&#8221; drop-down box, select &#8220;User Defined Value&#8221;. <strong>This is the payoff</strong> for all this work!</li>
<li>Add this to the dashboard by clicking the &#8220;Add to Dashboard&#8221; button at the top.</li>
<li>Now use your creativity to examine the data! For example, enter &#8220;page: 10&#8243; in the &#8220;Find User Defined Value&#8221; box. These are search terms that showed up on the second page for some reason, so you might try some SEO kung fu to improve their ranking&#8230;</li>
</ol>
<p>You will notice a number of visits coming up as &#8220;(not set)&#8221;. This means that the Google referral didn&#8217;t include the long URL with keys and values, so you don&#8217;t know where the user was when they clicked.</p>
<p>With some creativity, you could <strong>adapt this filter</strong> to grab Yahoo, Microsoft, or other SEO information. But Google is definitely the 800 lb gorilla right now, and page 1 is the target &#8211; it accounts for 90 to 95% of Google referrals to my site!</p>
<blockquote><p>If you enjoyed reading this, you&#8217;ll probably also like <a href="http://foskettservices.com"  target="_blank">my Foskett Services blog</a>!</p></blockquote>
<div id="crp_related"><h3>You might also want to read these other posts...</h3><ul><li><a href="http://blog.fosketts.net/2009/05/27/google-recalculated-pagerank/"  rel="bookmark" class="crp_title">Google Just Recalculated PageRank!</a></li><li><a href="http://blog.fosketts.net/2009/07/15/google-reader-social/"  rel="bookmark" class="crp_title">Google Reader Gets More Social</a></li><li><a href="http://blog.fosketts.net/search/"  rel="bookmark" class="crp_title">Google Custom Search Results</a></li><li><a href="http://blog.fosketts.net/2010/07/02/cool-google-spreadsheet-importxml-xpath/"  rel="bookmark" class="crp_title">Cool Google Spreadsheet XML/XPath Mojo</a></li><li><a href="http://blog.fosketts.net/2009/05/22/google-nofollow/"  rel="bookmark" class="crp_title">Google Is Heading For A Cliff; What Will They Do?</a></li></ul></div><script src="http://feeds.feedburner.com/~s/sfoskett?i=http://blog.fosketts.net/2009/01/15/googles-analytics-measuring-page-seo/" type="text/javascript" charset="utf-8"></script><hr />
<p><small>© sfoskett for <a href="http://blog.fosketts.net">Stephen Foskett, Pack Rat</a>, 2009. |
<a href="http://blog.fosketts.net/2009/01/15/googles-analytics-measuring-page-seo/">Measuring the Importance of Google&#8217;s First Page</a>
<br/>
This post was categorized as <a href="http://blog.fosketts.net/category/everything/personal/" title="View all posts in Personal" rel="category tag">Personal</a>. Each of my categories has its own feed if you'd like to filter out or focus on posts like this.<br/>
</small></p>]]></content:encoded>
			<wfw:commentRss>http://blog.fosketts.net/2009/01/15/googles-analytics-measuring-page-seo/feed/</wfw:commentRss>
		<slash:comments>5</slash:comments>
		</item>
	</channel>
</rss>

