regular expressions in lighttpd host redirects

So, you have a rails app installed at, say, blog.site.com, and you want http://www.site.com and site.com to redirect to it. Here’s how to do it in lighttpd.

rewrite incoming urls

$HTTP["host"] =~ "^(www\.|)site\.com$" {
  url.redirect = (
    "^/(.*)" => "http://blog.site.com/$1",
    ""       => "http://blog.site.com/"
  )
}

This will forward visitors; their browsers will show blog.site.com. If you want Rails to listen on a group of virtual hosts (so that the hostname the user sees for the site is not changed), you can do a similar thing in the fastcgi.server section for that Rails instance, but leave out the rewrite rules.

listen on multiple hosts

$HTTP["host"] =~ "^(blog\.|www\.|)site\.com$" {
  # the rest of your site configuration here
}

Usually you would want the first, so that you can refer to a canonical hostname without confusing the user (is there one site or are there really two?). But there are some situations where you would have a number of virtually identical sites all run from the same Rails instance, and let Rails check ENV[‘HOST’] to make some minor customizations. A webmail provider who offers multiple hostnames would probably want this.

How does this work? Here’s a character-by-character explanation.

breakdown
$HTTP["host"]             # we will compare the regex to the
                          # requested hostname HTTP environment variable
=~                        # the regex match operator
"                         # the regex string must be quoted for lightty
^                         # match the beginning of the string
(                         # start a submatch
www\.                     # match "www."
|                         # logical OR
                          # match nothing at all (remember, the
                          # character to our left must be the beginning
                          # of the string, thus matching site.com but
                          # not mysite.com)
)                         # close the submatch
site\.com                 # match "site.com"
$                         # match the end of the string (for instance,
                          # "site.com.au" won't match, but would if you
                          # left out the $)
"                         # end the regex

If the above expression matches and returns true, then the block is executed:

cont.
 {                        # start block
  url.redirect = (        

The last line there asks to perform the comparison on the incoming url string, then assigns the result, below, to the redirect. This acts more or less as a function call, redirecting the client via a 301-moved (I think it’s 301) HTTP header. Anyway the important part is that it does it through official HTTP headers and not some wacky way or via an HTML META tag.

Now we begin what is effectively a SWITCH/CASE statement. If the string matches this on the left, return that on the right.

cont.
"^                        # start of string (the hostname is excluded;
                          # lightty treats it seperately from the URL)
/                         # match a regular forward slash
(.*)                      # match everything that comes after
" =>                      # return the following value
"http://blog.site.com/    # nothing unusual here
$1                        # what is this? 

Regarding the $1: notice that the “match everything” line above has parentheses around it. Another function of parens, borrowed from Perl, is to save the enclosed match to a variable. The first set goes into $1, the second into $2, etc. So what we end up doing is appending the /somewhere/on/your/site part of the requested URL to blog.site.com.

cont.
",
""                        # or match nothing (this only will be true
                          # when the site is accessed without a slash:
                          # "http://blog.site.com"
=> "http://blog.site.com/" # redirect to the home
  )
}

That’s all. Regular expressions are powerful.

3 responses

  1. You could simplify the first one to:

    $HTTP["host"] =~ "^(www\.|)site\.com$" {
      url.redirect = (
        "^(.*)" => "http://blog.site.com$1"
      )
    }