README.md in app_status-0.1.1 vs README.md in app_status-1.0.0
- old
+ new
@@ -1,22 +1,53 @@
# AppStatus
AppStatus is a Rails engine which makes it easy to expose application status
data in a way easily consumed by Nagios or other monitoring packages.
+## Why?
+
+Defining health checks outside of your application (like in Nagios)
+has a few different problems.
+
+ 1. The people who maintain nagios aren't necessarily
+ the same people who maintain the application.
+ 1. Keeping the 2 systems in sync can be non-trivial with a fast-changing
+ application.
+ 1. Failing to monitor new features, or monitoring the wrong things, leads
+ to a false sense of security.
+
+Instead, app_status lets you define your health checks right in the application
+itself and expose the results as a JSON service which is easy for Nagios
+to consume.
+
+The benefits basically come down to 1 major thing: Nagios doesn't need to know
+anything about your application. All Nagios needs is a 'healthy/not healthy'
+status report.
+
+This is good because:
+
+ 1. As your app's feature set changes, you can deploy updated health checks
+ at the same time. No need for coordinated updates between the app and
+ the monitoring system.
+ 1. Credentials for external services (like databases) can stay with your
+ app. Nagios doesn't need them.
+ 1. You don't need nrpe to do local process checks. Your application can do
+ them for itself.
+ 1. Your health checks can be testable methods just like all your other code.
+ 1. You don't need to duplicate complex queries & other business logic over
+ to Nagios.
+
## Installation
### `Gemfile`
```ruby
gem 'app_status'
```
### `config/routes.rb`
-Wire it up.
-
```ruby
mount AppStatus::Engine, at: "/status"
```
This exposes the following URLs
@@ -28,69 +59,84 @@
### `config/initializers/app_status.rb`
This is where you set up the checks which you want to be run when
someone hits the URL above. Set up some calls which evaluate the health
-of your application and call `add` for each one.
+of your application and call `add_check` for each one.
+`add_check` expects a service name, plus a block to be evaluated to determine
+the health of that service. The block should return either a status value, or
+a 2-element array with status and some details.
+
```ruby
AppStatus::CheckCollection.configure do |c|
- value = some_service_check
- c.add(:name => 'some_service', :status => :ok, :details => value)
+
+ c.add_check('some_service') do
+ details = do_something_to_check_your_service
+ status = (details != "FAIL") ? :ok : :critical
+ [status, details]
+ end
+
+ c.add_check('failing_service') do
+ :critical # you can return just a status if desired.
+ end
end
```
-The checks that you set up here are not run when you configure them. They're
-run whenever someone hits the check URL.
+The details string should be concise. `app_status` does its best to provide
+readable output, and Nagios does its best to make this impossible to actually
+do well.
-Status values (in ascending order of seriousness)
+Valid status values (in ascending order of seriousness) are:
- :ok
- :warning
- :critical
- :unknown
These are set up to be compatible with Nagios.
-Details doesn't have to be a string. It can be anything which is serializable
-as JSON.
+Keep in mind that anyone who hits your status URL can cause your checks to run,
+so if they expose sensitive data or are a potential DOS vector you should
+probably protect them with some kind of authentication.
## Usage
`$ curl -H 'Accept: application/json' http://localhost:3000/status`
Output will look something like this:
```json
{
"status": "critical",
"status_code": 2,
- "run_time_ms": 52,
+ "ms": 52,
"finished": "2013-10-03T21:28:10Z",
"checks": {
"some_service": {
"status": "ok",
"status_code": 0,
- "details": "Looks good!"
+ "details": "Looks good!",
+ "ms": 30
},
"failing_service": {
"status": "critical",
"status_code": 2,
- "details": "Oh noes!"
+ "details": "",
+ "ms": 20
}
}
}
```
-The overall status will be the worst status which is actually observed in your
-individual checks.
+The overall status will be the worst value observed in your individual checks.
## Nagios Integration
-[bin/check_app_status.rb](https://github.com/alexdean/app_status/blob/master/bin/check_app_status.rb)
+[check_app_status.rb](check_app_status.rb)
is a Nagios check script which can be used to monitor the output from `app_status`
```
-$ bin/check_app_status.rb --help
+$ ./check_app_status.rb --help
Nagios check script for app_status. See https://github.com/alexdean/app_status
-v, --verbose Output more information
-V, --version Output version information
-h, --help Display this screen
-u, --url VAL Url to monitor
@@ -100,16 +146,20 @@
The script's exit status is derived from the overall status returned by the
server. Individual detail items will be grouped by status for display.
(Unknowns are displayed together, then criticals, then warnings, then OKs.)
-Sample output (using verbose mode)
+Sample output
```
-$ bin/check_app_status.rb --url http://localhost:3000/status -v
-2013-10-03T20:54:16-05:00 options: {:timeout=>10, :url=>"http://localhost:3000/status"}
-2013-10-03T20:54:16-05:00 timeout: 10s
-2013-10-03T20:54:16-05:00 response body: {"status":"warning","status_code":1,"run_time_ms":0,"finished":"2013-10-04T01:54:16Z","details":{"some_service":{"status":"ok","status_code":0,"details":"Looks good!"},"failing_service":{"status":"warning","status_code":1,"details":"Oh noes!"}}}
+$ ./check_app_status.rb --url http://localhost:3000/status
-WARN: failing_service:'Oh noes!'
-OK: some_service:'Looks good!'
+CRIT failed_service
+--- failed_service: shit's on fire yo, 501ms
+
+WARN problematic_service
+--- problematic_service: not looking good, 2001ms
+
+OK ok_process, ok_process_2
+--- ok_process: these are some details, 0ms
+--- ok_process_2: more details on another process, 0ms
```