[![Ruby Style Guide](https://img.shields.io/badge/code_style-rubocop-brightgreen.svg)](https://github.com/rubocop/rubocop) [![Gem Version](https://badge.fury.io/rb/spn2.svg)](https://badge.fury.io/rb/spn2) # Spn2 Spn2 is a gem for interacting with the [Wayback Machine](https://web.archive.org/)'s Save Page Now 2 (SPN2) REST API. The API (draft) specification is [here](https://docs.google.com/document/d/1Nsv52MvSjbLb2PCpHlat0gkzw0EvtSgpKHu4mk0MnrA/edit). ## Installation Install the gem and add to the application's Gemfile by executing: $ bundle add spn2 If bundler is not being used to manage dependencies, install the gem by executing: $ gem install spn2 ## Usage For the Spn2 namespace do: ```rb require 'spn2' ``` ### Authentication The API requires authentication, so you will need an account at [archive.org](https://archive.org). There are two methods of authentication; cookies and API key. Presently only the latter is implemented. API keys may be generated at https://archive.org/account/s3.php. Ensure your access key and secret key are set in environment variables SPN2_ACCESS_KEY and SPN2_SECRET_KEY respectively. ```rb > Spn2.access_key => > Spn2.secret_key => ``` ### Save a page Save (capture) a url in the Wayback Machine. This method returns the job_id in a hash. ```rb > Spn2.save(url: 'example.com') # returns a job_id => {"url"=>"http://example.com","job_id"=>"spn2-9c17e047f58f9220a7008d4f18152fee4d111d14"} # may include a "message" key too ``` Various options are available, as detailed in the [specification](https://docs.google.com/document/d/1Nsv52MvSjbLb2PCpHlat0gkzw0EvtSgpKHu4mk0MnrA/edit) in the section "Capture request". These may be passed like so: ```rb > Spn2.save(url: 'example.com', opts: { capture_all: 1, capture_outlinks: 1 }) => {"url"=>"http://example.com","job_id"=>"spn2-9c17e047f58f9220a7008d4f18152fee4d111d14"} ``` Page save errors will raise an error and look like this: ```rb => {"status"=>"error", "status_ext"=>"error:too-many-daily-captures", "message"=>"This URL has been already captured 10 times today. Please try again tomorrow. Please email us at \"info@archive.org\" if you would like to discuss this more."} (Spn2::Spn2ErrorFailedCapture) ``` The key "status_ext" contains an explanatory message - see the API [specification](https://docs.google.com/document/d/1Nsv52MvSjbLb2PCpHlat0gkzw0EvtSgpKHu4mk0MnrA/edit). ### View the status of a job Use the job_id. ```rb > Spn2.status_job_id(job_id: 'spn2-9c17e047f58f9220a7008d4f18152fee4d111d14') => {"counters"=>{"outlinks"=>1, "embeds"=>2}, "job_id"=>"spn2-9c17e047f58f9220a7008d4f18152fee4d111d14", "original_url"=>"http://example.com/", "resources"=>["http://example.com/", "http://example.com/favicon.ico"], "duration_sec"=>6.732, "outlinks"=>["https://www.iana.org/domains/example"], "http_status"=>200, "timestamp"=>"20220622224107", "status"=>"success"} ``` "status" => "success" is what you are looking for. Care is advised for domains/urls which are frequently saved into the Wayback Machine as the job_id is merely "spn2-" followed by a hash of the url\*. A status request will show the status of _the most recent capture by anyone_ of the url in question. \* Usually an sha1 hash of the url in the form http://\/\/ e.g: ```sh $ echo "http://example.com/"|tr -d "\n"|shasum 9c17e047f58f9220a7008d4f18152fee4d111d14 - ``` The status of a comma-separated list of job_id's can be obtained with: ```rb > Spn2.status_job_ids(job_ids: 'spn2-9c17e047f58f9220a7008d4f18152fee4d111d14,spn2-...') => [.. # an array of status hashes ``` Finally, the status of any outlinks captured by using the save option `capture_outlinks: 1` is available by supplying the parent job_id to: ```rb > Spn2.status_job_id_outlinks(job_id: 'spn2-cce034d987e1d72d8cbf1770bcf99024fe20dddf') => [.. # an array of outlink job status hashes ``` ### User status Information about the user is available via: ```rb > Spn2.user_status => {"daily_captures_limit"=>100000, "available"=>8, "processing"=>0, "daily_captures"=>10} ``` ### System status The status of Wayback Machine itself is available. ```rb > Spn2.system_status => {"status"=>"ok"} # if not "ok" captures may be delayed ``` ### Error handling To facilitate graceful error handling, a full list of all error classes is provided by: ```rb > Spn2.error_classes => [Spn2::Spn2Error, Spn2::Spn2ErrorBadAuth,.. ..] ``` ## Testing Just run `bundle exec rake` to run the test suite. Valid API keys must be held in SPN2_ACCESS_KEY and SPN2_SECRET_KEY for testing. Go to https://archive.org/account/s3.php to set up API keys if you need them. If you have your live keys stored in these env vars just do: `export SPN2_ACCESS_KEY= && export SPN2_SECRET_KEY=` immediately before the above command. ## Development ~~After checking out the repo, run `bin/setup` to install dependencies. Then, run `rake test` to run the tests. You can also run `bin/console` for an interactive prompt that will allow you to experiment.~~ To install this gem onto your local machine, run `bundle exec rake install`. To release a new version, update the version number in `version.rb`, and then run `bundle exec rake release`, which will create a git tag for the version, push git commits and the created tag, and push the `.gem` file to [rubygems.org](https://rubygems.org). ## Contributing Bug reports and pull requests are welcome on GitLab at https://gitlab.com/matzfan/spn2. Please run `rubocop` and correct all errors before submitting PR's. ## License The gem is available as open source under the terms of the [MIT License](https://opensource.org/licenses/MIT).