Suffix

Published by Simon Schoeters

Sitemaps in Ruby on Rails

Sitemaps are an easy way for webmasters to inform search engines about pages on their sites that are available for crawling. In its simplest form, a Sitemap is an XML file that lists URLs for a site along with additional metadata about each URL (when it was last updated, how often it usually changes, and how important it is, relative to other URLs in the site) so that search engines can more intelligently crawl the site.

Sitemaps have a wide adoption including Google, Yahoo! and Microsoft so I thought it would be a good idea to integrate this in my website. A few posts later I was tired updating this XML file manually every time I changed something in the URL scheme so why not pass the task to Ruby on Rails and build the file automatically?

The controller

First you'll need a method to collect the data. I choose for the application controller as the Sitemap doesn't really belong anywhere else. The pages controller would be a better choice if you have one that manages all your sites URL's but that's entirely up to you.

class ApplicationController < ActionController::Base
def sitemap
  @pages = Page.find(:all)
  render_without_layout :template => "layouts/sitemap"
end

The ‘render_without_layout’ part calls the view. My view is in the views/layouts folder but again, this can be anything you want.

The view

As defined in the Sitemap method above we need a view that renders the data in an XML file. Create the Sitemap XML template in the folder you defined above (views/layouts in my case) and call it ‘sitemap.rxml’. Now build the structure of the Sitemap:

xml.instruct!
xml.urlset('xmlns'=>'http://www.sitemaps.org/schemas/sitemap/0.9',
'xmlns:xsi'=>'http://www.w3.org/2001/XMLSchema-instance',
'xsi:schemaLocation'=>'http://www.sitemaps.org/schemas/sitemap/0.9
http://www.sitemaps.org/schemas/sitemap/0.9/sitemap.xsd') {
  for page in @pages
    xml.url {
      xml.loc("http://" + request.env["HTTP_HOST"] + "/" + page.permalink + "/")
      xml.lastmod(page.updated_at.strftime('%Y-%m-%d'))
      xml.changefreq("weekly")
      xml.priority("0.7")
    }
  end
}

This snippet assumes your page object has a permalink and an updated_at parameter, change these if yours looks different.

There are a few things you need to know about Sitemaps: ‘loc’ is the only required element so you can drop the ‘lastmod’, ‘changefreq’ and ‘priority’ elements if you don't have any useful data for these parameters. More in detail:

  • loc - required: the full URL to the page, include your domain as well.
  • lastmod - optional: last modification date for that page in the W3C Datetime format, probably something like YYYY-MM-DD.
  • changefreq - optional: how frequently the page is likely to change. Valid values are: always, hourly, daily, weekly, monthly, yearly or never.
  • priority - optional: the priority of this URL relative to other URLs in your site. Valid values range from 0.0 to 1.0.

See the official Sitemap protocol definition site for a full description.

The route

You have a automatically generated Sitemap but no way to get there. Tell Rails to call your Sitemap in the routes.rb file by adding the following mapping (change the controller if you choose a different one above):

map.connect 'sitemap.xml', :controller => 'application', :action => 'sitemap'

Request your new Sitemap with http://www.example.com/sitemap.xml. You may need to restart your Rails server to enable the new route.

The robots.txt

Almost done. The Sitemap should be working by now but how does a crawler (like the Googlebot) where to look for you Sitemap? That's where the robots.txt file is for. Every crawler should request the robots.txt file first to see what it may or may not index so this is the ideal place to advertise our Sitemap. Add the following line to the robots.txt file in you public directory and make sure to use the full URL to your Sitemap (including the domain):

Sitemap: http://www.example.com/sitemap.xml

That's it for today! The next time a crawler visits you site it will find the Sitemap and index it.

Resources

This blog post is open source. Did you spot a mistake? Have any ideas for improvements? Contribute to this post via Github. Thank you!