README.rdoc in alexrabarts-big_sitemap-0.2.1 vs README.rdoc in alexrabarts-big_sitemap-0.3.0
- old
+ new
@@ -1,104 +1,110 @@
= BigSitemap
-== DESCRIPTION
+BigSitemap is a Sitemap (http://sitemaps.org) generator suitable for applications with greater than 50,000 URLs. It splits large Sitemaps into multiple files, gzips the files to minimize bandwidth usage, batches database queries to minimize memory usage, can be set up with just a few lines of code and is compatible with just about any framework.
-BigSitemap is a Sitemap (http://sitemaps.org) generator specifically designed for large sites (although it works equally well with small sites). It splits large Sitemaps into multiple files, gzips the files to minimize bandwidth usage, batches database queries so it doesn't take your site down, can be set up with just a few lines of code and is compatible with just about any framework.
+BigSitemap is best run periodically through a Rake/Thor task.
-== INSTALL
+ sitemap = BigSitemap.new(:url_options => {:host => 'example.com'})
-Via git:
+ # Add a model
+ sitemap.add Product
- git clone git://github.com/alexrabarts/big_sitemap.git
+ # Add another model with some options
+ sitemap.add(Post, {
+ :conditions => {:published => true},
+ :path => 'articles',
+ :change_frequency => 'daily',
+ :priority => 0.5
+ })
-Via gem:
+ # Generate the files
+ sitemap.generate
- gem install alexrabarts-big_sitemap -s http://gems.github.com
+The code above will create a minimum of three files:
-== SYNOPSIS
+1. public/sitemaps/sitemap_index.xml.gz
+2. public/sitemaps/sitemap_products.xml.gz
+3. public/sitemaps/sitemap_posts.xml.gz
-The minimum required to generate a sitemap is:
+If your sitemaps grow beyond 50,000 URLs (this limit can be overridden with the <code>:max_per_sitemap</code> option), the sitemap files will be partitioned into multiple files (<code>sitemap_products_1.xml.gz</code>, <code>sitemap_products_2.xml.gz</code>, …).
- BigSitemap.new(:base_url => 'http://example.com').add(:model => MyModel, :path => 'my_controller').generate
+If you're using Rails then the URLs for each database record are generated with the <code>polymorphic_url</code> helper. That means that the URL for a record will be exactly what you would expect: generated with respect to the routing setup of your app. In other contexts where this helper isn't available, the URLs are generated in the form:
-You can put this in a rake/thor task and create a cron job to run it periodically. It should be enough for most Rails/Merb applications. You can add more models by further calls to the <code>add</code> method. Note that the methods are chainable, although you can call them on an instance variable if you prefer:
+ :base_url/:path/:to_param
- sitemap = BigSitemap.new(:base_url => 'http://example.com')
- sitemap.add(:model => Posts, :path => 'articles')
- sitemap.add(:model => Comments, :path => 'comments')
- sitemap.generate
+If the <code>to_param</code> method does not exist, then <code>id</code> will be used.
-=== Find Methods
+== Install
-Your models must provide either a <code>find_for_sitemap</code> or <code>all</code> class method that returns the instances that are to be included in the sitemap.
+Via gem:
-Additionally, you models must provide a <code>count_for_sitemap</code> or <code>count</code> class method that returns a count of the instances to be included.
+ gem install alexrabarts-big_sitemap -s http://gems.github.com
-If you're using ActiveRecord (Rails) or DataMapper then <code>all</code> and <code>count</code> are already provided and you don't need to do anything unless you want to include a subset of records. If you provide your own <code>find_for_sitemap</code> or <code>all</code> method then it should be able to handle the <code>:offset</code> and <code>:limit</code> options, in the same way that ActiveRecord and DataMapper handle them. This is especially important if you have more than 50,000 URLs.
+== Advanced
-=== URL Format
+=== Options
-To generate the URLs, BigSitemap will combine the constructor arguments with the <code>to_param</code> method of each instance returned (provided by ActiveRecord but not DataMapper). If this method is not present, <code>id</code> will be used. The URL is constructed as:
+ * <code>:url_options</code> -- hash with <code>:host</code>, optionally <code>:port</code> and <code>:protocol</code>
+ * <code>:base_url</code> -- string alternative to <code>:url_options</code>, e.g. "https://example.com:8080/"
+ * <code>:document_root</code> -- string defaults to <code>Rails.root</code> or <code>Merb.root</code> if available
+ * <code>:path</code> -- string defaults to 'sitemaps', which places sitemap files under the <code>/sitemaps</code> directory
+ * <code>:max_per_sitemap</code> -- <code>50000</code>, which is the limit dictated by Google but can be less
+ * <code>:batch_size</code> -- <code>1001</code> (not <code>1000</code> due to a bug in DataMapper)
+ * <code>:gzip</code> -- <code>true</code>
+ * <code>:ping_google</code> -- <code>true</code>
+ * <code>:ping_yahoo</code> -- <code>false</code>, needs <code>:yahoo_app_id</code>
+ * <code>:ping_msn</code> -- <code>false</code>
+ * <code>:pink_ask</code> -- <code>false</code>
- :base_url/:path/:to_param (if to_param exists)
- :base_url/:path/:id (if to_param does not exist)
+=== Chaining
-=== Sitemap Location
+You can chain methods together. You could even get away with as little code as:
-BigSitemap knows about the document root of Rails and Merb. If you are using another framework then you can specify the document root with the <code>:document_root</code> option. e.g.:
+ BigSitemap.new(:url_options => {:host => 'example.com'}).add(Post).generate
- BigSitemap.new(:base_url => 'http://example.com', :document_root => "#{FOO_ROOT}/httpdocs")
+=== Pinging Search Engines
-By default, the sitemap files are created under <code>/sitemaps</code>. You can modify this with the <code>:path</code> option:
+To ping search engines, call <code>ping_search_engines</code> after you generate the sitemap:
- BigSitemap.new(:base_url => 'http://example.com', :path => 'google-sitemaps') # places Sitemaps under /google-sitemaps
+ sitemap.generate
+ sitemap.ping_search_engines
-=== Cleaning the Sitemaps Directory
+=== Change Frequency and Priority
-Calling the <code>clean</code> method will remove all files from the Sitemaps directory.
+You can control "changefreq" and "priority" values for each record individually by passing lambdas instead of fixed values:
-=== Maximum Number of URLs
+ sitemap.add(Posts,
+ :change_frequency => lambda {|post| ... },
+ :priority => lambda {|post| ... }
+ )
-Sitemaps will be split across several files if more than 50,000 records are returned. You can customize this limit with the <code>:max_per_sitemap</code> option:
+=== Find Methods
- BigSitemap.new(:base_url => 'http://example.com', :max_per_sitemap => 1000) # Max of 1000 URLs per Sitemap
+Your models must provide either a <code>find_for_sitemap</code> or <code>all</code> class method that returns the instances that are to be included in the sitemap.
-=== Batched Database Queries
+Additionally, you models must provide a <code>count_for_sitemap</code> or <code>count</code> class method that returns a count of the instances to be included.
-The database is queried in batches to prevent large SQL select statements from locking the database for too long. By default, the batch size is 1001 (not 1000 due to an obscure bug in DataMapper). You can customize the batch size with the <code>:batch_size</code> option:
+If you're using ActiveRecord (Rails) or DataMapper then <code>all</code> and <code>count</code> are already provided and you don't need to do anything unless you want to include a subset of records. If you provide your own <code>find_for_sitemap</code> or <code>all</code> method then it should be able to handle the <code>:offset</code> and <code>:limit</code> options, in the same way that ActiveRecord and DataMapper handle them. This is especially important if you have more than 50,000 URLs.
- BigSitemap.new(:base_url => 'http://example.com', :batch_size => 5000) # Database is queried in batches of 5,000
+=== Cleaning the Sitemaps Directory
-=== Search Engine Notification
+Calling the <code>clean</code> method will remove all files from the Sitemaps directory.
-Google, Yahoo!, MSN and Ask are pinged once the Sitemap files are generated. You can turn one or more of these off:
+== Limitations
- BigSitemap.new(
- :base_url => 'http://example.com',
- :ping_google => false,
- :ping_yahoo => false,
- :ping_msn => false,
- :ping_ask => false
- )
-
-You must provide an App ID in order to ping Yahoo! (more info at http://developer.yahoo.com/search/siteexplorer/V1/updateNotification.html):
-
- BigSitemap.new(:base_url => 'http://example.com', :yahoo_app_id => 'myYahooAppId') # Yahoo! will now be pinged
-
-== LIMITATIONS
-
If your database is likely to shrink during the time it takes to create the sitemap then you might run into problems (the final, batched SQL select will overrun by setting a limit that is too large since it is calculated from the count, which is queried at the very beginning). Patches welcome!
== TODO
-* Support for <code>priority</code>
-* Support for <code>changefreq</code> (currently hard-coded to <code>weekly</code>)
+Tests for Rails components.
-== CREDITS
+== Credits
Thanks to Alastair Brunton and Harry Love, who's work provided a starting point for this library.
http://scoop.cheerfactory.co.uk/2008/02/26/google-sitemap-generator/
-== COPYRIGHT
+Thanks to Mislav Marohnić for contributing patches.
-Copyright (c) 2009 Stateless Systems (http://statelesssystems.com). See LICENSE for details.
+== Copyright
+Copyright (c) 2009 Stateless Systems (http://statelesssystems.com). See LICENSE for details.