### 0.4.0 / 2011-08-07 * Added {Spidr::Headers#content_charset}. * Pass the Page `url` and `content_charset` to Nokogiri in {Spidr::Body#doc}. This ensures that Nokogiri will preserve the body encoding. * Made {Spidr::Headers#is_content_type?} public. * Allow {Spidr::Headers#is_content_type?} to match the full Content-Type or the sub-type. ### 0.3.2 / 2011-06-20 * Added separate intitialize methods for {Spidr::Actions}, {Spidr::Events}, {Spidr::Filters} and {Spidr::Sanitizers}. * Aliased {Spidr::Events#urls_like} to {Spidr::Events#every_url_like}. * Reduce usage of `self.included` and `module_eval`. * Reduce usage of nested-blocks. * Reduce usage of `return`. ### 0.3.1 / 2011-04-22 * Require `set` in `spidr/headers.rb`. ### 0.3.0 / 2011-04-14 * Switched from Jeweler to [Ore](http://github.com/ruby-ore/ore). * Split all header related methods out of {Spidr::Page} and into {Spidr::Headers}. * Split all body related methods out of {Spidr::Page} and into {Spidr::Body}. * Split all link related methods out of {Spidr::Page} and into {Spidr::Links}. * Added {Spidr::Headers#directory?}. * Added {Spidr::Headers#json?}. * Added {Spidr::Links#each_url}. * Added {Spidr::Links#each_link}. * Added {Spidr::Links#each_redirect}. * Added {Spidr::Links#each_meta_redirect}. * Aliased {Spidr::Headers#raw_cookie} to {Spidr::Headers#cookie}. * Aliased {Spidr::Body#to_s} to {Spidr::Body#body}. * Also check for `application/xml` in {Spidr::Headers#xml?}. * Catch all exceptions when merging URIs in {Spidr::Links#to_absolute}. * Always prepend a `/` to all FTP URI paths. Fixes a Ruby 1.8 specific bug, where it expects an absolute path for all FTP URIs. * Refactored {URI.expand_path}. * Start the session in {Spidr::SessionCache#[]} to prevent multiple `CONNECT` commands being sent to HTTP Proxies (thanks falaise). ### 0.2.7 / 2010-08-17 * Added {Spidr::CookieJar#cookies_for_host} (thanks zapnap). * Renamed `Spidr::Page#cookie` to `Spidr::Page#raw_cookie`. * Rescue `URI::InvalidComponentError` exceptions in `Spidr::Page#to_absolute` (thanks zapnap). ### 0.2.6 / 2010-07-05 * Fixed a bug in `Spidr::Page#meta_redirect`, by calling `Nokogiri::XML::Element#get_attribute` instead of `attr`. ### 0.2.5 / 2010-07-02 * Added `Spidr::Page#meta_redirect`. * Added `Spidr::Page#meta_redirect?`. * Manage development dependencies with Bundler. * Support following "old-school" meta-refresh redirects (thanks zapnap). * Allow {Spidr::CookieJar} inherit cookies set by a parent domain. * Fixed a constant lookup issue in {Spidr::Agent}. * Use `yield` instead of `block.call` when necessary. ### 0.2.4 / 2010-05-05 * Added {Spidr::Filters#visit_urls}. * Added {Spidr::Filters#visit_urls_like}. * Added {Spidr::Filters#ignore_urls}. * Added {Spidr::Filters#ignore_urls_like}. * Added `Spidr::Page#is_content_type?`. * Default `Spidr::Page#body` to an empty String. * Default `Spidr::Page#content_type` to an empty String. * Default `Spidr::Page#content_types` to an empty Array. * Improved reliability of {Spidr::Page#is_redirect?}. * Improved content type detection in {Spidr::Page} to handle `Content-Type` headers containing charsets (thanks Josh Lindsey). ### 0.2.3 / 2010-02-27 * Migrated to Jeweler, for the packaging and releasing RubyGems. * Switched to MarkDown formatted YARD documentation. * Added {Spidr::Events#every_link}. * Added {Spidr::SessionCache#active?}. * Added specs for {Spidr::SessionCache}. ### 0.2.2 / 2010-01-06 * Require Web Spider Obstacle Course (WSOC) >= 0.1.1. * Integrated the new WSOC into the specs. * Removed the built-in Web Spider Obstacle Course. * Added `Spidr::Page#content_types`. * Added `Spidr::Page#cookie`. * Added `Spidr::Page#cookies`. * Added `Spidr::Page#cookie_params`. * Added {Spidr::Sanitizers}. * Added {Spidr::SessionCache}. * Added {Spidr::CookieJar} (thanks Nick Plante). * Added {Spidr::AuthStore} (thanks Nick Plante). * Added {Spidr::Agent#post_page} (thanks Nick Plante). * Renamed `Spidr::Agent#get_session` to {Spidr::SessionCache#[]}. * Renamed `Spidr::Agent#kill_session` to {Spidr::SessionCache#kill!}. ### 0.2.1 / 2009-11-25 * Added {Spidr::Events#every_ok_page}. * Added {Spidr::Events#every_redirect_page}. * Added {Spidr::Events#every_timedout_page}. * Added {Spidr::Events#every_bad_request_page}. * Added {Spidr::Events#every_unauthorized_page}. * Added {Spidr::Events#every_forbidden_page}. * Added {Spidr::Events#every_missing_page}. * Added {Spidr::Events#every_internal_server_error_page}. * Added {Spidr::Events#every_txt_page}. * Added {Spidr::Events#every_html_page}. * Added {Spidr::Events#every_xml_page}. * Added {Spidr::Events#every_xsl_page}. * Added {Spidr::Events#every_doc}. * Added {Spidr::Events#every_html_doc}. * Added {Spidr::Events#every_xml_doc}. * Added {Spidr::Events#every_xsl_doc}. * Added {Spidr::Events#every_rss_doc}. * Added {Spidr::Events#every_atom_doc}. * Added {Spidr::Events#every_javascript_page}. * Added {Spidr::Events#every_css_page}. * Added {Spidr::Events#every_rss_page}. * Added {Spidr::Events#every_atom_page}. * Added {Spidr::Events#every_ms_word_page}. * Added {Spidr::Events#every_pdf_page}. * Added {Spidr::Events#every_zip_page}. * Fixed a bug where {Spidr::Agent#delay} was not being used to delay requesting pages. * Spider `link` and `script` tags in HTML pages (thanks Nick Plante). ### 0.2.0 / 2009-10-10 * Added {URI.expand_path}. * Added `Spidr::Page#search`. * Added `Spidr::Page#at`. * Added `Spidr::Page#title`. * Added {Spidr::Agent#failures=}. * Added a HTTP session cache to {Spidr::Agent}, per suggestion of falter. * Added `Spidr::Agent#get_session`. * Added `Spidr::Agent#kill_session`. * Added {Spidr.proxy=}. * Added {Spidr.disable_proxy!}. * Aliased `Spidr::Page#txt?` to `Spidr::Page#plain_text?`. * Aliased `Spidr::Page#ok?` to `Spidr::Page#is_ok?`. * Aliased `Spidr::Page#redirect?` to `Spidr::Page#is_redirect?`. * Aliased `Spidr::Page#unauthorized?` to `Spidr::Page#is_unauthorized?`. * Aliased `Spidr::Page#forbidden?` to `Spidr::Page#is_forbidden?`. * Aliased `Spidr::Page#missing?` to `Spidr::Page#is_missing?`. * Split URL filtering code out of {Spidr::Agent} and into {Spidr::Filters}. * Split URL / Page event code out of {Spidr::Agent} and into {Spidr::Events}. * Split pause! / continue! / skip_link! / skip_page! methods out of {Spidr::Agent} and into {Spidr::Actions}. * Fixed a bug in `Spidr::Page#code`, where it was not returning an Integer. * Make sure `Spidr::Page#doc` returns `Nokogiri::XML::Document` objects for RSS/RDF/Atom pages as well. * Fixed the handling of the Location header in `Spidr::Page#links` (thanks falter). * Fixed a bug in `Spidr::Page#to_absolute` where trailing `/` characters on URI paths were not being preserved (thanks falter). * Fixed a bug where the URI query was not being sent with the request in {Spidr::Agent#get_page} (thanks Damian Steer). * Fixed a bug where SSL sessions were not being properly setup (thanks falter). * Switched {Spidr::Agent#history} to be a Set, to improve search-time of the history (thanks falter). * Switched {Spidr::Agent#failures} to a Set. * Allow a block to be passed to {Spidr::Agent#run}, which will receive all pages visited. * Allow `Spidr::Agent#start_at` and `Spidr::Agent#continue!` to pass blocks to {Spidr::Agent#run}. * Made {Spidr::Agent#visit_page} public. * Moved to YARD based documentation. ### 0.1.9 / 2009-06-13 * Upgraded to Hoe 2.0.0. * Use Hoe.spec instead of Hoe.new. * Use the Hoe signing task for signed gems. * Added the `Spidr::Agent#schemes` and `Spidr::Agent#schemes=` methods. * Added a warning message if 'net/https' cannot be loaded. * Allow the list of acceptable URL schemes to be passed into {Spidr::Agent#initialize}. * Allow history and queue information to be passed into {Spidr::Agent#initialize}. * {Spidr::Agent#start_at} no longer clears the history or the queue. * Fixed a bug in the sanitization of semi-escaped URLs. * Fixed a bug where https URLs would be followed even if 'net/https' could not be loaded. * Removed Spidr::Agent::SCHEMES. ### 0.1.8 / 2009-05-27 * Added the `Spidr::Agent#pause!` and `Spidr::Agent#continue!` methods. * Added the `Spidr::Agent#running?` and `Spidr::Agent#paused?` methods. * Added an alias for pending_urls to the queue methods. * Added {Spidr::Agent#queue} to provide read access to the queue. * Added {Spidr::Agent#queue=} and {Spidr::Agent#history=} for setting the queue and history. * Added {Spidr::Agent#to_hash} which returns a Hash of the agents queue and history. * Made {Spidr::Agent#enqueue} and {Spidr::Agent#queued?} public. * Added more specs. ### 0.1.7 / 2009-04-24 * Added `Spidr::Agent#all_headers`. * Fixed a bug where {Spidr::Page#headers} was always `nil`. * {Spidr::Agent} will now follow the Location header in HTTP 300, 301, 302, 303 and 307 Redirects. * {Spidr::Agent} will now follow iframe and frame tags. ### 0.1.6 / 2009-04-14 * Added {Spidr::Agent#failures}, a list of URLs which could not be visited. * Added {Spidr::Agent#failed?}. * Added `Spidr::Agent#every_failed_url`. * Added {Spidr::Agent#clear}, which clears the history and failures URL lists. * Improved fault tolerance in {Spidr::Agent#get_page}. * If a Network or HTTP error is encountered, the URL will be added to the failures list and the next URL will be visited. * Fixed a typo in `Spidr::Agent#ignore_exts_like`. * Updated the Web Spider Obstacle Course with links that always fail to be visited. ### 0.1.5 / 2009-03-22 * Catch malformed URIs in `Spidr::Page#to_absolute` and return `nil`. * Filter out `nil` URIs in `Spidr::Page#urls`. ### 0.1.4 / 2009-01-15 * Use Nokogiri for HTML and XML parsing. ### 0.1.3 / 2009-01-10 * Added the `:host` options to {Spidr::Agent#initialize}. * Added the Web Spider Obstacle Course files to the Manifest. * Aliased {Spidr::Agent#visited_urls} to {Spidr::Agent#history}. ### 0.1.2 / 2008-11-06 * Fixed a bug in `Spidr::Page#to_absolute` where URLs with no path were not receiving a default path of `/`. * Fixed a bug in `Spidr::Page#to_absolute` where URL paths were not being expanded, in order to remove `..` and `.` directories. * Fixed a bug where absolute URLs could have a blank path, thus causing {Spidr::Agent#get_page} to crash when it performed the HTTP request. * Added RSpec spec tests. * Created a Web-Spider Obstacle Course (http://spidr.rubyforge.org/course/start.html) which is used in the spec tests. ### 0.1.1 / 2008-10-04 * Added a reader method for the response instance variable in Page. * Fixed a bug in {Spidr::Page#method_missing}. ### 0.1.0 / 2008-05-23 * Initial release. * Black-list or white-list URLs based upon: * Host name * Port number * Full link * URL extension * Provides call-backs for: * Every visited Page. * Every visited URL. * Every visited URL that matches a specified pattern.