README.md in site_health-0.1.0 vs README.md in site_health-0.2.0
- old
+ new
@@ -1,22 +1,24 @@
-# SiteHealth
+# SiteHealth [![Build Status](https://travis-ci.org/buren/site_health.svg?branch=master)](https://travis-ci.org/buren/site_health)
:warning: Project is still experimental, API will change (a lot) without notice.
Crawl a site and check various health indicators, such as:
-- HTTP error status
-- Invalid HTML/CSS/XML
-- Missing HTML page title
-- Broken links
+- Server errors
+- HTTP errors
+- Invalid HTML/XML/JSON
+- Missing HTML title/description
+- Missing image alt-attribute
+- Google Pagespeed
## Installation
Add this line to your application's Gemfile:
```ruby
-gem 'site_health'
+gem "site_health"
```
And then execute:
$ bundle
@@ -25,40 +27,182 @@
$ gem install site_health
## Usage
+[CLI usage](#cli).
+
+Crawl and check site
+
```ruby
-journal = SiteHealth.check('https://example.com')
+nurse = SiteHealth.check("https://example.com")
+```
-# HTML
-journal.missing_html_title # List of URLs that are missing the HTML title
-journal.html_error_urls # List of URLs with HTML errors in them
+Check list of URLs
+```ruby
+nurse = SiteHealth.check_urls(["https://example.com"])
+```
-# CSS
-journal.css_error_urls # List of URLs with CSS errors in them
+Write raw JSON result to file
+```ruby
+nurse = SiteHealth.check("https://example.com")
+json = JSON.pretty_generate(nurse.journal)
-# XML
-journal.xml_error_urls # List of URLs with XML errors in them
+File.write("result.json", json)
+```
-# Broken URLs
-broken = journal.broken_urls.first
-broken.url # The URL that failed
-broken.exists_on # Array of URLs where the broken URL was present
+Each issue
-# HTTP
-journal.http_error_urls # All URLs with HTTP status code >= 400
+```ruby
+SiteHealth.check_urls(urls) do |nurse|
+ nurse.clerk do |clerk|
+ clerk.every_issue { |issue| puts "#{issue.severity}, #{issue.title}" }
+ end
+end
```
+Simple issue reports
+```ruby
+nurse = SiteHealth.check("https://example.com")
+report = SiteHealth::IssuesReport.new(nurse.issue) do |r|
+ r.fields = %i[url title detail] # issue fields
+ r.select { |issue| issue.url.include?('blog/') }
+end
+
+report.to_a
+report.to_csv
+report.to_json
+```
+
+Event handlers
+
+```ruby
+urls = ["https://example.com"]
+nurse = SiteHealth.check_urls(urls) do |nurse|
+ nurse.clerk do |clerk|
+ clerk.every_journal do |journal, page|
+ time_in_seconds = journal[:runtime_in_seconds]
+ puts "Found page #{page.title} - #{page.url} (checks took #{time_in_seconds})"
+ end
+
+ clerk.every_check do |check|
+ puts "Ran check: #{check.name}"
+ end
+
+ clerk.every_failed_url do |url|
+ puts "Failed to fetch: #{url}"
+ end
+ end
+end
+```
+
+Write page speed summary CSV
+
+```ruby
+nurse = SiteHealth.check("https://example.com")
+summary = SiteHealth::PageSpeedSummarizer.new(nurse.journal)
+File.write("page_size_summary.csv", summary.to_csv)
+```
+
+## Configuration
+
+All configuration is optional.
+
+```ruby
+SiteHealth.configure do |config|
+ # Override default checkers
+ config.checkers = [:json_syntax, :html]
+
+ # Configure logger
+ config.logger = Logger.new(STDOUT).tap do |logger|
+ logger.progname = 'SiteHealth'
+ logger.level = Logger::INFO
+ end
+
+ # Configure HTMLProofer
+ config.html_proofer do |proofer_config|
+ proofer_config.log_level = :info
+ proofer_config.check_opengraph = false
+ end
+
+ # Configure W3C HTML/CSS validator
+ config.w3c_validators do |w3c_config|
+ w3c_config.css_uri = 'http://localhost:8888/check'
+ w3c_config.html_uri = 'http://localhost:8888/check'
+ end
+end
+```
+
+__Load non-default checkers__:
+
+A few of the non-default checkers available in this gem require 3rd-party dependencies which aren't installed by default.
+
+| Checker name | Gem |
+| ------------------ | ------------------ |
+| google_page_speed | google-api-client |
+| html_proofer | html-proofer |
+| w3c_html | w3c_validators |
+| w3c_css | w3c_validators |
+
+If you intend to use any of those checkers make sure to install the gem first. For example to use the `google_page_speed` checker add `google-api-client` to your Gemfile or install it manually with `gem install google-api-client`. Then you register the checker for use.
+
+```ruby
+SiteHealth.config.register_checker :google_page_speed
+# LoadError is raised if google-api-client is *not* installed
+```
+
+__Add your own checker__:
+
+```ruby
+class ProfanityChecker < SiteHealth::Checker
+ name "profanity"
+ types %i[html json xml css javascript]
+
+ def check
+ add_data(profanity: {
+ damn: page.body.include?(" damn "),
+ shit: page.body.include?(" shit ")
+ })
+ end
+end
+
+# Then register it
+SiteHealth.configure do |config|
+ config.register_checker ProfanityChecker
+end
+```
+
+## CLI
+
+```
+Usage: site_health --help
+ --url=val0
+ --fields=priority,title,url Issue fields to include - by default all fields are included
+ --output=result.csv Output format, .csv or .json
+ --[no-]progress
+ -h, --help How to use
+```
+
## Development
-After checking out the repo, run `bin/setup` to install dependencies. Then, run `rake spec` to run the tests. You can also run `bin/console` for an interactive prompt that will allow you to experiment.
+After checking out the repo, run `bin/setup` to install dependencies. Then, run `bundle exec rake` to run the tests. You can also run `bin/console` for an interactive prompt that will allow you to experiment.
To install this gem onto your local machine, run `bundle exec rake install`. To release a new version, update the version number in `version.rb`, and then run `bundle exec rake release`, which will create a git tag for the version, push git commits and tags, and push the `.gem` file to [rubygems.org](https://rubygems.org).
## Contributing
Bug reports and pull requests are welcome on GitHub at https://github.com/buren/site_health.
## License
The gem is available as open source under the terms of the [MIT License](https://opensource.org/licenses/MIT).
+
+---
+
+## TODO
+
+- Good way to render result/reports data
+- Improve logger support
+- Checkers
+ * canonical URL
+ * http vs https links
+ * links matching a pattern