### 0.2.3 / 2010-02-27 * Migrated to Jeweler, for the packaging and releasing RubyGems. * Switched to MarkDown formatted YARD documentation. * Added {Spidr::Events#every_link}. * Added {Spidr::SessionCache#active?}. * Added specs for {Spidr::SessionCache}. ### 0.2.2 / 2010-01-06 * Require Web Spider Obstacle Course (WSOC) >= 0.1.1. * Integrated the new WSOC into the specs. * Removed the built-in Web Spider Obstacle Course. * Added {Spidr::Page#content_types}. * Added {Spidr::Page#cookie}. * Added {Spidr::Page#cookies}. * Added {Spidr::Page#cookie_params}. * Added {Spidr::Sanitizers}. * Added {Spidr::SessionCache}. * Added {Spidr::CookieJar} (thanks Nick Plante). * Added {Spidr::AuthStore} (thanks Nick Plante). * Added {Spidr::Agent#post_page} (thanks Nick Plante). * Renamed `Spidr::Agent#get_session` to {Spidr::SessionCache#[]}. * Renamed `Spidr::Agent#kill_session` to {Spidr::SessionCache#kill!}. ### 0.2.1 / 2009-11-25 * Added {Spidr::Events#every_ok_page}. * Added {Spidr::Events#every_redirect_page}. * Added {Spidr::Events#every_timedout_page}. * Added {Spidr::Events#every_bad_request_page}. * Added {Spidr::Events#every_unauthorized_page}. * Added {Spidr::Events#every_forbidden_page}. * Added {Spidr::Events#every_missing_page}. * Added {Spidr::Events#every_internal_server_error_page}. * Added {Spidr::Events#every_txt_page}. * Added {Spidr::Events#every_html_page}. * Added {Spidr::Events#every_xml_page}. * Added {Spidr::Events#every_xsl_page}. * Added {Spidr::Events#every_doc}. * Added {Spidr::Events#every_html_doc}. * Added {Spidr::Events#every_xml_doc}. * Added {Spidr::Events#every_xsl_doc}. * Added {Spidr::Events#every_rss_doc}. * Added {Spidr::Events#every_atom_doc}. * Added {Spidr::Events#every_javascript_page}. * Added {Spidr::Events#every_css_page}. * Added {Spidr::Events#every_rss_page}. * Added {Spidr::Events#every_atom_page}. * Added {Spidr::Events#every_ms_word_page}. * Added {Spidr::Events#every_pdf_page}. * Added {Spidr::Events#every_zip_page}. * Fixed a bug where {Spidr::Agent#delay} was not being used to delay requesting pages. * Spider `link` and `script` tags in HTML pages (thanks Nick Plante). ### 0.2.0 / 2009-10-10 * Added {URI.expand_path}. * Added {Spidr::Page#search}. * Added {Spidr::Page#at}. * Added {Spidr::Page#title}. * Added {Spidr::Agent#failures=}. * Added a HTTP session cache to {Spidr::Agent}, per suggestion of falter. * Added `Spidr::Agent#get_session`. * Added `Spidr::Agent#kill_session`. * Added {Spidr.proxy=}. * Added {Spidr.disable_proxy!}. * Aliased `Spidr::Page#txt?` to {Spidr::Page#plain_text?}. * Aliased `Spidr::Page#ok?` to {Spidr::Page#is_ok?}. * Aliased `Spidr::Page#redirect?` to {Spidr::Page#is_redirect?}. * Aliased `Spidr::Page#unauthorized?` to {Spidr::Page#is_unauthorized?}. * Aliased `Spidr::Page#forbidden?` to {Spidr::Page#is_forbidden?}. * Aliased `Spidr::Page#missing?` to {Spidr::Page#is_missing?}. * Split URL filtering code out of {Spidr::Agent} and into {Spidr::Filters}. * Split URL / Page event code out of {Spidr::Agent} and into {Spidr::Events}. * Split pause! / continue! / skip_link! / skip_page! methods out of {Spidr::Agent} and into {Spidr::Actions}. * Fixed a bug in {Spidr::Page#code}, where it was not returning an Integer. * Make sure {Spidr::Page#doc} returns `Nokogiri::XML::Document` objects for RSS/RDF/Atom pages as well. * Fixed the handling of the Location header in {Spidr::Page#links} (thanks falter). * Fixed a bug in {Spidr::Page#to_absolute} where trailing `/` characters on URI paths were not being preserved (thanks falter). * Fixed a bug where the URI query was not being sent with the request in {Spidr::Agent#get_page} (thanks Damian Steer). * Fixed a bug where SSL sessions were not being properly setup (thanks falter). * Switched {Spidr::Agent#history} to be a Set, to improve search-time of the history (thanks falter). * Switched {Spidr::Agent#failures} to a Set. * Allow a block to be passed to {Spidr::Agent#run}, which will receive all pages visited. * Allow `Spidr::Agent#start_at` and `Spidr::Agent#continue!` to pass blocks to {Spidr::Agent#run}. * Made {Spidr::Agent#visit_page} public. * Moved to YARD based documentation. ### 0.1.9 / 2009-06-13 * Upgraded to Hoe 2.0.0. * Use Hoe.spec instead of Hoe.new. * Use the Hoe signing task for signed gems. * Added the `Spidr::Agent#schemes` and `Spidr::Agent#schemes=` methods. * Added a warning message if 'net/https' cannot be loaded. * Allow the list of acceptable URL schemes to be passed into {Spidr::Agent#initialize}. * Allow history and queue information to be passed into {Spidr::Agent#initialize}. * {Spidr::Agent#start_at} no longer clears the history or the queue. * Fixed a bug in the sanitization of semi-escaped URLs. * Fixed a bug where https URLs would be followed even if 'net/https' could not be loaded. * Removed Spidr::Agent::SCHEMES. ### 0.1.8 / 2009-05-27 * Added the `Spidr::Agent#pause!` and `Spidr::Agent#continue!` methods. * Added the `Spidr::Agent#running?` and `Spidr::Agent#paused?` methods. * Added an alias for pending_urls to the queue methods. * Added {Spidr::Agent#queue} to provide read access to the queue. * Added {Spidr::Agent#queue=} and {Spidr::Agent#history=} for setting the queue and history. * Added {Spidr::Agent#to_hash} which returns a Hash of the agents queue and history. * Made {Spidr::Agent#enqueue} and {Spidr::Agent#queued?} public. * Added more specs. ### 0.1.7 / 2009-04-24 * Added `Spidr::Agent#all_headers`. * Fixed a bug where {Spidr::Page#headers} was always `nil`. * {Spidr::Spidr::Agent} will now follow the Location header in HTTP 300, 301, 302, 303 and 307 Redirects. * {Spidr::Agent} will now follow iframe and frame tags. ### 0.1.6 / 2009-04-14 * Added {Spidr::Agent#failures}, a list of URLs which could not be visited. * Added {Spidr::Agent#failed?}. * Added `Spidr::Agent#every_failed_url`. * Added {Spidr::Agent#clear}, which clears the history and failures URL lists. * Improved fault tolerance in {Spidr::Agent#get_page}. * If a Network or HTTP error is encountered, the URL will be added to the failures list and the next URL will be visited. * Fixed a typo in `Spidr::Agent#ignore_exts_like`. * Updated the Web Spider Obstacle Course with links that always fail to be visited. ### 0.1.5 / 2009-03-22 * Catch malformed URIs in {Spidr::Page#to_absolute} and return `nil`. * Filter out `nil` URIs in {Spidr::Page#urls}. ### 0.1.4 / 2009-01-15 * Use Nokogiri for HTML and XML parsing. ### 0.1.3 / 2009-01-10 * Added the `:host` options to {Spidr::Agent#initialize}. * Added the Web Spider Obstacle Course files to the Manifest. * Aliased {Spidr::Agent#visited_urls} to {Spidr::Agent#history}. ### 0.1.2 / 2008-11-06 * Fixed a bug in {Spidr::Page#to_absolute} where URLs with no path were not receiving a default path of `/`. * Fixed a bug in {Spidr::Page#to_absolute} where URL paths were not being expanded, in order to remove `..` and `.` directories. * Fixed a bug where absolute URLs could have a blank path, thus causing {Spidr::Agent#get_page} to crash when it performed the HTTP request. * Added RSpec spec tests. * Created a Web-Spider Obstacle Course (http://spidr.rubyforge.org/course/start.html) which is used in the spec tests. ### 0.1.1 / 2008-10-04 * Added a reader method for the response instance variable in Page. * Fixed a bug in {Spidr::Page#method_missing}. ### 0.1.0 / 2008-05-23 * Initial release. * Black-list or white-list URLs based upon: * Host name * Port number * Full link * URL extension * Provides call-backs for: * Every visited Page. * Every visited URL. * Every visited URL that matches a specified pattern.