= BigSitemap
== DESCRIPTION
BigSitemap is a Sitemap (http://sitemaps.org) generator specifically designed for large sites (although it works equally well with small sites). It splits large Sitemaps into multiple files, gzips the files to minimize bandwidth usage, batches database queries so it doesn't take your site down, can be set up with just a few lines of code and is compatible with just about any framework.
== INSTALL
Via git:
git clone git://github.com/alexrabarts/big_sitemap.git
Via gem:
gem install alexrabarts-big_sitemap -s http://gems.github.com
== SYNOPSIS
The minimum required to generate a sitemap is:
BigSitemap.new(:base_url => 'http://example.com').add(:model => MyModel, :path => 'my_controller').generate
You can put this in a rake/thor task and create a cron job to run it periodically. It should be enough for most Rails/Merb applications. You can add more models by further calls to the add
method. Note that the methods are chainable, although you can call them on an instance variable if you prefer:
sitemap = BigSitemap.new(:base_url => 'http://example.com')
sitemap.add(:model => Posts, :path => 'articles')
sitemap.add(:model => Comments, :path => 'comments')
sitemap.generate
=== Find Methods
Your models must provide either a find_for_sitemap
or all
class method that returns the instances that are to be included in the sitemap.
Additionally, you models must provide a count_for_sitemap
or count
class method that returns a count of the instances to be included.
If you're using ActiveRecord (Rails) or DataMapper then all
and count
are already provided and you don't need to do anything unless you want to include a subset of records. If you provide your own find_for_sitemap
or all
method then it should be able to handle the :offset
and :limit
options, in the same way that ActiveRecord and DataMapper handle them. This is especially important if you have more than 50,000 URLs.
=== URL Format
To generate the URLs, BigSitemap will combine the constructor arguments with the to_param
method of each instance returned (provided by ActiveRecord but not DataMapper). If this method is not present, id
will be used. The URL is constructed as:
:base_url/:path/:to_param (if to_param exists)
:base_url/:path/:id (if to_param does not exist)
=== Sitemap Location
BigSitemap knows about the document root of Rails and Merb. If you are using another framework then you can specify the document root with the :document_root
option. e.g.:
BigSitemap.new(:base_url => 'http://example.com', :document_root => "#{FOO_ROOT}/httpdocs")
By default, the sitemap files are created under /sitemaps
. You can modify this with the :path
option:
BigSitemap.new(:base_url => 'http://example.com', :path => 'google-sitemaps') # places Sitemaps under /google-sitemaps
=== Cleaning the Sitemaps Directory
Calling the clean
method will remove all files from the Sitemaps directory.
=== Maximum Number of URLs
Sitemaps will be split across several files if more than 50,000 records are returned. You can customize this limit with the :max_per_sitemap
option:
BigSitemap.new(:base_url => 'http://example.com', :max_per_sitemap => 1000) # Max of 1000 URLs per Sitemap
=== Batched Database Queries
The database is queried in batches to prevent large SQL select statements from locking the database for too long. By default, the batch size is 1001 (not 1000 due to an obscure bug in DataMapper). You can customize the batch size with the :batch_size
option:
BigSitemap.new(:base_url => 'http://example.com', :batch_size => 5000) # Database is queried in batches of 5,000
=== Search Engine Notification
Google, Yahoo!, MSN and Ask are pinged once the Sitemap files are generated. You can turn one or more of these off:
BigSitemap.new(
:base_url => 'http://example.com',
:ping_google => false,
:ping_yahoo => false,
:ping_msn => false,
:ping_ask => false
)
You must provide an App ID in order to ping Yahoo! (more info at http://developer.yahoo.com/search/siteexplorer/V1/updateNotification.html):
BigSitemap.new(:base_url => 'http://example.com', :yahoo_app_id => 'myYahooAppId') # Yahoo! will now be pinged
== LIMITATIONS
If your database is likely to shrink during the time it takes to create the sitemap then you might run into problems (the final, batched SQL select will overrun by setting a limit that is too large since it is calculated from the count, which is queried at the very beginning). Patches welcome!
== TODO
* Support for priority
* Support for changefreq
(currently hard-coded to weekly
)
== CREDITS
Thanks to Alastair Brunton and Harry Love, who's work provided a starting point for this library.
http://scoop.cheerfactory.co.uk/2008/02/26/google-sitemap-generator/
== COPYRIGHT
Copyright (c) 2009 Stateless Systems (http://statelesssystems.com). See LICENSE for details.