The blog of , a Ruby on Rails development team

How to hide your beta site from Google and not shoot yourself in the foot

When we work on a new application for a client, we give them a place such as, where they can follow the project's progress. This beta site should remain hidden from crawlers, so it doesn't accidentally appear in Google before the application has launched.

The standard way to dismiss crawlers is robots.txt. The problem with robots.txt is when you forget to remove it as the site launches. Since both beta and production site are often deployed from the same repository, this can happen easily during the excitement surrounding a product launch. Now you screwed up big time: The production site will remain hidden from Google until someone notices the missing traffic. Then it can take weeks to get back into Google's index.

Here is a suggestion how to fail less. Instead of adding a robots.txt to your document root, name it robots.beta.txt. In order to dismiss all crawlers, the file should look like this:

User-Agent: *
Disallow: /

We can now tell our web server to redirect requests from robots.txt to robots.beta.txt, but only those that refer to the beta site. The production site should rightfully return a 404 not found error when asked for robots.txt. To achieve this using Apache, add the following lines to the virtual host of the beta site:

RewriteEngine On
RewriteCond %{HTTP_HOST} ^$
RewriteRule ^/robots.txt$ /robots.beta.txt

Now you won't have to remember removing the robots.txt file when the product launches.

Growing Rails Applications in Practice
Check out our e-book:
Learn to structure large Ruby on Rails codebases with the tools you already know and love.

Recent posts

Our address:
makandra GmbH
Werner-von-Siemens-Str. 6
86159 Augsburg
Contact us:
+49 821 58866 180
Commercial register court:
Augsburg Municipal Court
Register number:
HRB 24202
Sales tax identification number:
Chief executive officers:
Henning Koch
Thomas Eisenbarth