Yesterday I discussed how to set up a lightweight PHP web server using Apache. Next we have to get everything running smoothly, and I came up against a frustrating realization: Apache doesn’t have a satisfying way to redirect multiple domains to canonical hostnames! In other words, it’s fairly easy to redirect one domain’s content from “www.example.com” to just plain “example.com” or to make both hostnames work, but there’s no one-stop solution to do this with a dozen domains. But I’ve hit on a method that correctly redirects alternate hostnames and will save you aggravation in the long run.
The Easy Way
Apache is one amazingly flexible web server. It handles multiple domains with ease, using the VirtualHost method, and the ServerAlias directive allows you to host permutations easily. Consider the following totally made-up example:
<VirtualHost *:80> ServerName blog.fosketts.net ServerAlias www.blog.fosketts.net ServerAlias fosketts.net ServerAlias www.fosketts.net DocumentRoot /var/www/ </VirtualHost>
This looks great, right? It tells Apache to watch any IP on port 80 for an HTTP request for a server called blog.fosketts.net and to serve it the content from /var/www/. It also tells Apache to accept plain old “fosketts.net”, “www.fosketts.net”, and even “www.blog.fosketts.net”.
What’s Wrong With Easy?
Although accepting all these hostnames seems like the friendly and correct thing to do, it’s not in your best interest. It tells web clients that the exact same content lives on four different servers, and they’ll start linking to your content every which way. Pretty soon you’ll have incoming links for all four hostnames. So what’s wrong with this?
- It’s confusing for users – They’ll start asking, “is your site www.fosketts.net or blog.fosketts.net?” It’s fine to segment things, but confusing to do it unnecessarily.
- It’s hard to configure and maintain – Once your site starts getting linked to and shared around, you’re stuck supporting all possible combinations. When you switch hosts or server platforms (ahem) you have to make sure everything still works.
- It hurts your search ranking – You might not be all that concerned with search engine placement, and it’s not as bad as some say, but splitting your traffic between multiple sites also splits your “SEO juice”.
- Web crawls overload your servers – Search engines treat each host name as a different server. If you allow links to multiple names without a proper redirect, you’ll get multiple crawls, often at the same time.
In summary, the easy was isn’t good. ServerAlias looks friendly, but it’s not a friend when used this way.
Let’s say your name was Stephen, but some people call you Steve. Rather than insist on one or the other, you could just go through life accepting either. But imprecision can lead to issues, even in the real world. Will people know to look up Stephen in the company directory when they know you as Steve? You might start getting duplicate junk mail for both names as they find their way onto mailing lists. Then there’s the embarrassing “I always called him Steve” moment at the company party, when someone feels like they’re not part of the “in crowd” that knows your real name. It’s best to be friendly and accept anything but politely suggest that everyone uses just one name in the interest of sanity.
Redirection is Right
The best approach in life is also the correct method on the web. Your server should be set to accept any number of possible names in case someone comes in with the wrong one. But rather than blithely accepting the name, your server should issue a proper “redirect” call, instructing the browser or crawler to reload the page using the correct name from that point on.
This is simple when using Lighttpd. I just added the following lines to my lighttpd.conf file and it magically issued a proper redirect whenever someone came in using the “www” name:
$HTTP["host"] =~ "^www\.(.*)$" { url.redirect = ( "^/(.*)" => "http://%1/$1", ) }
I was amazed that I could locate no such universal redirect option in Apache. You can do all the RedirectMatch calls you want, but their regular expressions only operate on the path part of the URL, not the hostname. This is great for adding a “www” but makes it impossible to create a generic rule to eliminate them!
Instead, we have to use RedirectMatch on each VirtualHost domain individually. This also opens the possibility to deal with other conditions we might come across, but it’s not as simple and clean as the Lighttpd method.
Here’s where the magic is. Each VirtualHost configuration you add (in /etc/apache2/sites-available on Ubuntu) should include rules to deal with the incorrect names as well as the single correct one. Here’s the correct redirect rule for the example above:
<VirtualHost *:80> ServerName fosketts.net ServerAlias www.fosketts.net ServerAlias www.blog.fosketts.net RedirectMatch 301 (.*) http://blog.fosketts.net$1 </VirtualHost> <VirtualHost *:80> DocumentRoot /var/www/ ServerName blog.fosketts.net </VirtualHost>
The first VirtualHost block matches all the incorrect hostnames and redirects them (with a code of 301 for “Permanent”) to the correct hostname. The “(.*)” part matches any and all paths and arguments and the “$1” part appends them to the new hostname. Then we set up another VirtualHost block for only the correct hostname and put any and all rules in there.
This way, any clients or crawlers that hit “www.fosketts.net” or any of the other alternatives will get a proper 301 redirect to “blog.fosketts.net” and go about its business. It tells Google that there is only one proper server name for this content and encourages users (who will likely copy and paste from the address bar) to use it, too. Neat and tidy, and very friendly.
I’d love to hear alternative methods of doing this. Please leave a comment if you have a suggestion that uses a 301 redirect and works across multiple domains!
Guest says
Why can’t you use the mod_rewrite engine in Apache to do this?
RewriteEngine On
RewriteCond %{HTTP_HOST} !^blog.fosketts.net$ [NC]
RewriteRule ^(.*)$ http://blog.fosketts.net/$1 [R=301,L]
(In English: If not blog.fosketts.net then send an HTTP 301 error – a permanent redirect – to the browser telling it to request it as such.)
sfoskett says
Yes, you can do this as well. In fact that’s the official/correct method documented. But it only works with one hostname, sending all traffic there. If you’ve got multiple VirtualHosts it won’t work. Unless you put it in the VirtualHost configuration file surrounded by blocks and then we’re back where we started!
Or am I terribly mistaken? I guess one could also do this in a .htaccess file, but I hate them…
sfoskett says
Yes, you can do this as well. In fact that's the official/correct method documented. But it only works with one hostname, sending all traffic there. If you've got multiple VirtualHosts it won't work. Unless you put it in the VirtualHost configuration file surrounded by <VirtualHost> blocks and then we're back where we started!
Or am I terribly mistaken? I guess one could also do this in a .htaccess file, but I hate them…
Guest says
Yeah it would have to be in blocks. I’ll admit it’s not as elegant as the Lighttpd solution you found if all you want to do is drop the www, but I’ve always loved the power available to you through an embedded rewriting engine. 🙂
Berto Martin says
why do you have Servername in the first virtual host block?
theMusician5044 says
That is awesome. I found that you can use any servername in the first virtual host directive and it still works properly. Thanks for doing the research.
Yannick says
Great trick.
Just adding this in case someone else would be in the same situation I was: if you use a reverse proxy and try to optimize the config by pre-translating the calls to the backend, it will cancel the effect of the redirect here. Make sure you don’t have any pre-transformation in your reverse proxy settings.
Bachsau says
ServerAlias accepts multiple arguments and wildcards. You could use something like “ServerAlias *.fosketts.* otherdomain.com”.
You could even use mod_rewrite in a default vHost with wildcard matching, which works exactly like in lighthttpd. I’m using this to add www. to any domain on my server with just on general rule. I could make it remove the www. in the same way.
choffee says
RewriteEngine On
RewriteCond %{HTTP_HOST} ^www.(.*)$ [NC]
RewriteRule ^(.*)$ http://%1/$1 [R=301,L]
Should be a rule that you can get to drop the www. from all requests I think.
ecreality says
Thanks!
Skippy says
Thanks for the tip. Note that if you use Internationalized Domain Names (IDN) you have to use the encoded domain name in the VirtualHost directives.
That is : ServerName xn-blah, ServerAlias http://www.xn-blah, etc. — otherwise it won’t work but drive you nuts for some time, trust me. 😉
jeremyclarke says
Thanks! This was a really useful article. Even though it seems like something that should be obvious it really isn’t.
I’ll add another note which is that you need to do it twice if you want to support both HTTP and HTTPS, though you should probably do HTTPS-only if you’re starting fresh.
Bishop Clark says
Seems this is a job for https://httpd.apache.org/docs/2.4/mod/core.html#usecanonicalname . It’s in version 2.0 at least, so it should be available some time after 2002 .