Visibility into your application is one of the most critical parts of software development. At best, visibility is typically an afterthought and this is a problem. So what do you do? ![Instrument all the things](./logo.jpg?raw=true) # InstrumentAllTheThings At Terminus we use DataDog for our application visibility. InstrumentAllTheThings provides simple ways to quickly and unobtrusively add detailed instrumentation to Datadog metrics and Datadog APM. ## Installation Add this line to your application's Gemfile: ```ruby gem 'instrument_all_the_things' ``` And then execute: $ bundle Or install it yourself as: $ gem install instrument_all_the_things ## Usage *Note:* For convenience the InstrumentAllTheThings constant is aliased to IATT unless the constant is already defined. ## Method Instrumentation Method instrumentation both in APM as well as in simple counts & timings is the bread and butter of visibility, and it is trivial to implement with IATT. Each measured metric may be individually disabled, and some may be provided additional configuration. All measurments default to on, unless otherwise specified. You may disable the specified measurement by providing a falsy value to the configuration key when calling `instrument` *Example* ```ruby class Foo include InstrumentAllTheThings instrument config_key: {configuration_option: 123} def foo end end ``` ### Garbage Collection Stats _Configuration Key `gc_stats`_ Collects the difference between the specified keys during the execution of the method. Stat diffs are added to the active trace span as a tag, and a stat is emitted with the following format `klass_name.(instance|class)_methods.(stat_name)_change` #### Description of default stats _GC Stats are not thread local, if your app is multi threaded other threads may be contributing to these stats_ | Option | Description | ----- | ---- | total_allocated_pages | Total number of memory pages owned by this ruby process. Mature processes tend to see a slowdown in page allocations | total_allocated_objects | Total number of objects which have not been garbage collected yet | count | Total number of GC runs during this method's exuection #### Options | Option | Description | Default | ----- | ---- | ----- | diffed_stats | Stats to diff and record | [:total_allocated_pages, :total_allocated_objects, :count] ### Error Logging _Configuration Key `log_errors`_ When set to a non falsy value all errors raised in a span will be logged to the configured IATT logger and re-emitted. The first traces span where they are seen logs them only, they will not be re-logged by IATT at any future level By default call stacks are logged without all gem paths (as defined by the `Bundler.bundle_path`) | Option | Description | Default | ----- | ---- | ----- | rescue_class | The parent error which should be caught and logged | StandardError | exclude_bundle_path | If truthy, remove all entries from the error backtrace which are in the bundle path | true ### Tracing _Configuration Key `trace`_ When set to a non falsy value, a span for this method will be created. The defaults are listed below. This hash will also be passed to the DataDog tracer, and their [options](https://github.com/DataDog/dd-trace-rb/blob/master/docs/GettingStarted.md#manual-instrumentation) should also be understood. | Option | Description | Default | ----- | ---- | ----- | service | This is the value which shows up on the [first page of the APM screen](https://app.datadoghq.com/apm/home) this should be set at the entry point of your application or process | `nil` | resource | How this method will show up when viewing the service in APM. | For instance methods `ClassName.method_name`
For class methods `ClassName#method_name` | span_name | You probably don't want to change this | `method.execution` | span_type | See DD Docs. | `nil` | tags | Set of tags to be added to the span, expected to be a hash | {} #### Dynamic Tags If a trace's tags need to reference either an instance variable or a parameter to the method being traced, you must pass in a proc for the tag value. For instance variables, this will look like this: ```ruby instrument trace: { tags: [-> { "some_stat:#{some_instance_var}" }]} def my_instance_method ... end ``` For parameter references, your proc will need to to have a parameter of `args` or `kwargs` (or both), depending on if the parameter you need to reference is a normal parameter or a keyword parameter. Examples: ```ruby instrument trace: { tags: [->(args) { "log_args:#{args[0]}" }]} def my_instance_method(my_var) ... end instrument trace: { tags: [->(kwargs) { "log_args:#{kwargs[:my_arg]}" }]} def my_instance_method(my_arg:) ... end ``` ## Testing Support You can setup your test environment by running the following setup: ```ruby require 'instrument_all_the_things/testing/stat_tracker' require 'instrument_all_the_things/testing/trace_tracker' require 'instrument_all_the_things/testing/rspec_matchers' Datadog.configure do |c| c.tracing.transport_options = proc { |t| t.adapter :test, IATT::Testing::TraceTracker.new } end IATT.stat_reporter = IATT::Testing::StatTracker.new RSpec.configure do |config| config.include InstrumentAllTheThings::Testing::RSpecMatchers config.before(:each) do IATT::Testing::TraceTracker.tracker.reset! IATT.stat_reporter.reset! end end ``` This injects middleware and in the StatsD interface as well as in the Tracer output. By doing this you can start using some awesome rspec helpers like so: ```ruby let(:klass) do Class.new do include InstrumentAllTheThings::Helpers instrument def foo end def self.inspect 'KlassName' end end end it 'traces' do expect { klass.new.foo # Datadog writes trace to the wire and to the test harness asynchronously # This helper is provided to force the flush before expectations are stated flush_traces }.to change{ emitted_spans( filtered_by: {resource: 'KlassName.foo'} ) }.by(1) end end ``` ## Stat Transmission InstrumentAllTheThings provides no real functionality on top of the build in Datadog statsd utility. The primary goal of the IATT library is to allow for easy testing. The following method are provided on the InstrumentAllTheThings module. ### Increment Examples: ```ruby expect{ InstrumentAllTheThings.increment('my.counter') }.to change{ counter_value('my.counter') }.from(0).to(1) expect{ InstrumentAllTheThings.increment('my.counter', by: 5) }.to change { counter_value('my.counter') }.from(0).to(5) # You can also filter by tags expect { InstrumentAllTheThings.increment('my.counter', by: 3, tags: ['a:b', 'foo:bar']) InstrumentAllTheThings.increment('my.counter', by: 2, tags: ['a:b', 'foo:baz']) }.to change { counter_value('my.counter') }.from(0).to(5) .and change { counter_value('my.counter', with_tags: ['a:b']) }.from(0).to(5) .and change { counter_value('my.counter', with_tags: ['foo:bar']) }.from(0).to(3) .and change { counter_value('my.counter', with_tags: ['foo:baz']) }.from(0).to(2) ``` ### Decrement See Increment for all examples ```ruby expect{ InstrumentAllTheThings.increment('my.counter') }.to change{ counter_value('my.counter') }.from(0).to(-1) ``` ### Count Count underlies both increment and decrement, and works in a similar way. ```ruby expect { InstrumentAllTheThings.count('my.counter', 3, tags: ['a:b', 'foo:bar']) InstrumentAllTheThings.count('my.counter', 2, tags: ['a:b', 'foo:baz']) }.to change { counter_value('my.counter') }.from(0).to(5) .and change { counter_value('my.counter', with_tags: ['a:b']) }.from(0).to(5) .and change { counter_value('my.counter', with_tags: ['foo:bar']) }.from(0).to(3) .and change { counter_value('my.counter', with_tags: ['foo:baz']) }.from(0).to(2) ``` ### Gauge ```ruby expect { InstrumentAllTheThings.gauge('my.gauge', 1) InstrumentAllTheThings.gauge('my.gauge', 2) }.to change { gauge_value('my.gauge') }.from(nil).to(2) expect { InstrumentAllTheThings.gauge('my.gauge', 3, tags: ['a:b', 'foo:bar']) InstrumentAllTheThings.gauge('my.gauge', 1, tags: ['a:b', 'foo:bar']) InstrumentAllTheThings.gauge('my.gauge', 2, tags: ['a:b', 'foo:baz']) InstrumentAllTheThings.gauge('my.gauge', 7, tags: ['a:b']) }.to change { gauge_value('my.gauge') }.to(7) .and change { gauge_value('my.gauge', with_tags: ['a:b']) }.to(7) .and change { gauge_value('my.gauge', with_tags: ['foo:bar']) }.to(1) .and change { gauge_value('my.gauge', with_tags: ['foo:baz']) }.to(2) ``` ### Set ```ruby expect { InstrumentAllTheThings.set('my.set', 1) InstrumentAllTheThings.set('my.set', 2) }.to change { set_value('my.set') }.from(0).to(2) expect { InstrumentAllTheThings.set('my.set', 3, tags: ['a:b', 'foo:bar']) InstrumentAllTheThings.set('my.set', 3, tags: ['a:b', 'foo:bar']) InstrumentAllTheThings.set('my.set', 5, tags: ['a:b', 'foo:bar']) InstrumentAllTheThings.set('my.set', 6, tags: ['a:b', 'foo:baz']) InstrumentAllTheThings.set('my.set', 9, tags: ['a:b']) }.to change { set_value('my.set') }.to(4) .and change { set_value('my.set', with_tags: ['a:b']) }.to(4) .and change { set_value('my.set', with_tags: ['foo:bar']) }.to(2) .and change { set_value('my.set', with_tags: ['foo:baz']) }.to(1) ``` ### Histogram Histogram is a pesudo metric in Datadog based on the StatsD timing metric. InstrumentAllTheThings provides a way to test the values emitted, not the statistical calculations derived. ```ruby expect { InstrumentAllTheThings.histogram('my.histogram', 1) InstrumentAllTheThings.histogram('my.histogram', 2) }.to change { histogram_values('my.histogram') }.from([]).to([1, 2]) expect { InstrumentAllTheThings.histogram('my.histogram', 3, tags: ['a:b', 'foo:bar']) InstrumentAllTheThings.histogram('my.histogram', 5, tags: ['a:b', 'foo:bar']) InstrumentAllTheThings.histogram('my.histogram', 6, tags: ['a:b', 'foo:baz']) InstrumentAllTheThings.histogram('my.histogram', 9, tags: ['a:b']) }.to change { histogram_values('my.histogram') }.to([3, 5, 6, 9]) .and change { histogram_values('my.histogram', with_tags: ['a:b']) }.to([3, 5, 6, 9]) .and change { histogram_values('my.histogram', with_tags: ['foo:bar']) }.to([3, 5]) .and change { histogram_values('my.histogram', with_tags: ['foo:baz']) }.to([6]) ``` ### Distribution ```ruby expect { InstrumentAllTheThings.distribution('my.distribution', 1) InstrumentAllTheThings.distribution('my.distribution', 2) }.to change { distribution_values('my.distribution') }.from([]).to([1, 2]) expect { InstrumentAllTheThings.distribution('my.distribution', 3, tags: ['a:b', 'foo:bar']) InstrumentAllTheThings.distribution('my.distribution', 5, tags: ['a:b', 'foo:bar']) InstrumentAllTheThings.distribution('my.distribution', 6, tags: ['a:b', 'foo:baz']) InstrumentAllTheThings.distribution('my.distribution', 9, tags: ['a:b']) }.to change { distribution_values('my.distribution') }.to([3, 5, 6, 9]) .and change { distribution_values('my.distribution', with_tags: ['a:b']) }.to([3, 5, 6, 9]) .and change { distribution_values('my.distribution', with_tags: ['foo:bar']) }.to([3, 5]) .and change { distribution_values('my.distribution', with_tags: ['foo:baz']) }.to([6]) ``` ### Timing ```ruby expect { InstrumentAllTheThings.timing('my.timing', 1) InstrumentAllTheThings.timing('my.timing', 2) }.to change { timing_values('my.timing') }.from([]).to([1, 2]) expect { InstrumentAllTheThings.timing('my.timing', 3, tags: ['a:b', 'foo:bar']) InstrumentAllTheThings.timing('my.timing', 5, tags: ['a:b', 'foo:bar']) InstrumentAllTheThings.timing('my.timing', 6, tags: ['a:b', 'foo:baz']) InstrumentAllTheThings.timing('my.timing', 9, tags: ['a:b']) }.to change { timing_values('my.timing') }.to([3, 5, 6, 9]) .and change { timing_values('my.timing', with_tags: ['a:b']) }.to([3, 5, 6, 9]) .and change { timing_values('my.timing', with_tags: ['foo:bar']) }.to([3, 5]) .and change { timing_values('my.timing', with_tags: ['foo:baz']) }.to([6]) ``` ### Time The time helper wraps a block with reporting to the timing metric time. As such specs should leverage that spec helper. ```ruby expect { InstrumentAllTheThings.time('my.time') {} InstrumentAllTheThings.time('my.time') {} }.to change { timing_values('my.time').length }.from(0).to(2) expect { InstrumentAllTheThings.time('my.time', tags: ['a:b', 'foo:bar']) {} InstrumentAllTheThings.time('my.time', tags: ['a:b', 'foo:bar']) {} InstrumentAllTheThings.time('my.time', tags: ['a:b', 'foo:baz']) {} InstrumentAllTheThings.time('my.time', tags: ['a:b']) {} }.to change { timing_values('my.time').length }.to(4) .and change { timing_values('my.time', with_tags: ['a:b']).length }.to(4) .and change { timing_values('my.time', with_tags: ['foo:bar']).length }.to(2) .and change { timing_values('my.time', with_tags: ['foo:baz']).length }.to(1) ``` ## Performance Is it slow? That depends quite a bit on your perspective. We keep a benchmark as part of our specs and monitor the overall cost of instrumentation. The current numbers look roughly like: ``` Calculating ------------------------------------- uninstrumetned 17.038M (± 3.3%) i/s - 85.164M in 5.004312s the_works 7.404k (±12.9%) i/s - 36.936k in 5.100630s only_trace 27.968k (±12.7%) i/s - 139.209k in 5.061907s only_error_logging 638.098k (± 4.6%) i/s - 3.231M in 5.075275s only_gc_stats 12.930k (±13.2%) i/s - 63.865k in 5.070874s only_execution_counts 9.847k (±11.1%) i/s - 49.088k in 5.073475s ``` All of these calculations are performed against an empty method, which ruby executes in 0.000059ms. If we consider this the baseline, running all of the included instrumentation in the non-error case has a cost of 0.135ms. If this is an expensive calculation is directly proportional to the execution duration of the method. If you are tracing a 1ms method, it represents a 13% overhead, but if you are tracing a 100ms method it is only 0.13%. The moral of the story, don't blindly add the works to every method in your stack, instead choose methods who's output will be diagnostically meaningful. We are also always looking for patches with will reduce execution time and allocations of the instrumentation, so please open some PRs! ## Configuration The configuration for IATT is available on the top level InstrumentAllTheThings module. | Config Name | Description | Default | ----------- | ----------- | ------- | stat_namespace | The string to add to all outbound stats (may not be changed after stat transmiter initialization) | `nil` | logger | The logger used to report errors and info | If the constant `Rails` is set, use `Rails.logger`.
If `App` and it responds to `logger` use `App.logger`. Otherwise create a new `Logger` sent to STDOUT | stat_reporter | The class which receives simple stats | If [Datadog::Statsd](https://github.com/DataDog/dogstatsd-ruby) is found, use that, otherwise the Blackhole client is used | tracer | The instance of a tracer which will handle all traces | If `Datadog` is defined and responds to `tracer`, use the value returned by that. Otherwise use the Blackhole. [Gem](https://github.com/DataDog/dd-trace-rb/blob/master/docs/GettingStarted.md) ### Stats Reporters #### Datadog The default client if the constant `Datadog::Statsd` is found. Initialized with environment variables * `DATADOG_HOST` if set, otherwise `localhost` * `DATADOG_POST` if set, otherwise `8125` ### Tracers #### Datadog The default client if the constant `Datadog` is found and has a non-null value for `tracer`. ## Development After checking out the repo, run `bin/setup` to install dependencies. Then, run `rake spec` to run the tests. You can also run `bin/console` for an interactive prompt that will allow you to experiment. To install this gem onto your local machine, run `bundle exec rake install`. To release a new version, update the version number in `version.rb`, and then run `bundle exec rake release`, which will create a git tag for the version, push git commits and tags, and push the `.gem` file to [rubygems.org](https://rubygems.org). # Breaking Changes # V2 => V3 IATT no longer takes ENV var configuration for directly. All DataDog configuration is handled by their env vars. ## Contributing Bug reports and pull requests are welcome on GitHub at https://github.com/GetTerminus/instrument_all_the_things.