README.md in site_health-0.1.0 vs README.md in site_health-0.2.0

- old
+ new

@@ -1,22 +1,24 @@ -# SiteHealth +# SiteHealth [![Build Status](https://travis-ci.org/buren/site_health.svg?branch=master)](https://travis-ci.org/buren/site_health) :warning: Project is still experimental, API will change (a lot) without notice. Crawl a site and check various health indicators, such as: -- HTTP error status -- Invalid HTML/CSS/XML -- Missing HTML page title -- Broken links +- Server errors +- HTTP errors +- Invalid HTML/XML/JSON +- Missing HTML title/description +- Missing image alt-attribute +- Google Pagespeed ## Installation Add this line to your application's Gemfile: ```ruby -gem 'site_health' +gem "site_health" ``` And then execute: $ bundle @@ -25,40 +27,182 @@ $ gem install site_health ## Usage +[CLI usage](#cli). + +Crawl and check site + ```ruby -journal = SiteHealth.check('https://example.com') +nurse = SiteHealth.check("https://example.com") +``` -# HTML -journal.missing_html_title # List of URLs that are missing the HTML title -journal.html_error_urls # List of URLs with HTML errors in them +Check list of URLs +```ruby +nurse = SiteHealth.check_urls(["https://example.com"]) +``` -# CSS -journal.css_error_urls # List of URLs with CSS errors in them +Write raw JSON result to file +```ruby +nurse = SiteHealth.check("https://example.com") +json = JSON.pretty_generate(nurse.journal) -# XML -journal.xml_error_urls # List of URLs with XML errors in them +File.write("result.json", json) +``` -# Broken URLs -broken = journal.broken_urls.first -broken.url # The URL that failed -broken.exists_on # Array of URLs where the broken URL was present +Each issue -# HTTP -journal.http_error_urls # All URLs with HTTP status code >= 400 +```ruby +SiteHealth.check_urls(urls) do |nurse| + nurse.clerk do |clerk| + clerk.every_issue { |issue| puts "#{issue.severity}, #{issue.title}" } + end +end ``` +Simple issue reports +```ruby +nurse = SiteHealth.check("https://example.com") +report = SiteHealth::IssuesReport.new(nurse.issue) do |r| + r.fields = %i[url title detail] # issue fields + r.select { |issue| issue.url.include?('blog/') } +end + +report.to_a +report.to_csv +report.to_json +``` + +Event handlers + +```ruby +urls = ["https://example.com"] +nurse = SiteHealth.check_urls(urls) do |nurse| + nurse.clerk do |clerk| + clerk.every_journal do |journal, page| + time_in_seconds = journal[:runtime_in_seconds] + puts "Found page #{page.title} - #{page.url} (checks took #{time_in_seconds})" + end + + clerk.every_check do |check| + puts "Ran check: #{check.name}" + end + + clerk.every_failed_url do |url| + puts "Failed to fetch: #{url}" + end + end +end +``` + +Write page speed summary CSV + +```ruby +nurse = SiteHealth.check("https://example.com") +summary = SiteHealth::PageSpeedSummarizer.new(nurse.journal) +File.write("page_size_summary.csv", summary.to_csv) +``` + +## Configuration + +All configuration is optional. + +```ruby +SiteHealth.configure do |config| + # Override default checkers + config.checkers = [:json_syntax, :html] + + # Configure logger + config.logger = Logger.new(STDOUT).tap do |logger| + logger.progname = 'SiteHealth' + logger.level = Logger::INFO + end + + # Configure HTMLProofer + config.html_proofer do |proofer_config| + proofer_config.log_level = :info + proofer_config.check_opengraph = false + end + + # Configure W3C HTML/CSS validator + config.w3c_validators do |w3c_config| + w3c_config.css_uri = 'http://localhost:8888/check' + w3c_config.html_uri = 'http://localhost:8888/check' + end +end +``` + +__Load non-default checkers__: + +A few of the non-default checkers available in this gem require 3rd-party dependencies which aren't installed by default. + +| Checker name | Gem | +| ------------------ | ------------------ | +| google_page_speed | google-api-client | +| html_proofer | html-proofer | +| w3c_html | w3c_validators | +| w3c_css | w3c_validators | + +If you intend to use any of those checkers make sure to install the gem first. For example to use the `google_page_speed` checker add `google-api-client` to your Gemfile or install it manually with `gem install google-api-client`. Then you register the checker for use. + +```ruby +SiteHealth.config.register_checker :google_page_speed +# LoadError is raised if google-api-client is *not* installed +``` + +__Add your own checker__: + +```ruby +class ProfanityChecker < SiteHealth::Checker + name "profanity" + types %i[html json xml css javascript] + + def check + add_data(profanity: { + damn: page.body.include?(" damn "), + shit: page.body.include?(" shit ") + }) + end +end + +# Then register it +SiteHealth.configure do |config| + config.register_checker ProfanityChecker +end +``` + +## CLI + +``` +Usage: site_health --help + --url=val0 + --fields=priority,title,url Issue fields to include - by default all fields are included + --output=result.csv Output format, .csv or .json + --[no-]progress + -h, --help How to use +``` + ## Development -After checking out the repo, run `bin/setup` to install dependencies. Then, run `rake spec` to run the tests. You can also run `bin/console` for an interactive prompt that will allow you to experiment. +After checking out the repo, run `bin/setup` to install dependencies. Then, run `bundle exec rake` to run the tests. You can also run `bin/console` for an interactive prompt that will allow you to experiment. To install this gem onto your local machine, run `bundle exec rake install`. To release a new version, update the version number in `version.rb`, and then run `bundle exec rake release`, which will create a git tag for the version, push git commits and tags, and push the `.gem` file to [rubygems.org](https://rubygems.org). ## Contributing Bug reports and pull requests are welcome on GitHub at https://github.com/buren/site_health. ## License The gem is available as open source under the terms of the [MIT License](https://opensource.org/licenses/MIT). + +--- + +## TODO + +- Good way to render result/reports data +- Improve logger support +- Checkers + * canonical URL + * http vs https links + * links matching a pattern