CSV Decision

CSV based Ruby decision tables

csv_decision is a RubyGem for CSV based decision tables. It accepts decision tables implemented as a CSV file, which can then be used to execute complex conditional logic against an input hash, producing a decision as an output hash.

Why use `csv_decision`?

Typical “business logic” is notoriously illogical - full of corner cases and one-off exceptions. A decision table can express data-based decisions in a way that comes more naturally to subject matter experts, who typically prefer spreadsheet models. Business logic may then be encapsulated, avoiding the need to write tortuous conditional expressions in Ruby that draw the ire of rubocop and its ilk.

This gem and the examples below take inspiration from rufus/decision. (That gem is no longer maintained and CSV Decision has better decision-time performance, at the expense of slower table parse times and more memory – see benchmarks/rufus_decision.rb.)

Installation

To get started, just add csv_decision to your Gemfile, and then run bundle:

ruby gem 'csv_decision', '~> 0.0.1'

or simply bash gem install csv_decision

Simple example

This table considers two input conditions: topic and region, labeled in:. Certain combinations yield an output value for team_member, labeled out:.

in:topic | in:region  | out:team_member
---------+------------+----------------
sports   | Europe     | Alice
sports   |            | Bob
finance  | America    | Charlie
finance  | Europe     | Donald
finance  |            | Ernest
politics | Asia       | Fujio
politics | America    | Gilbert
politics |            | Henry
         |            | Zach

When the topic is finance and the region is Europe the team member Donald is selected.

This is a “first match” decision table in that as soon as a match is made execution stops and a single output row (hash) is returned.

The ordering of rows matters. Ernest, who is in charge of finance for the rest of the world, except for America and Europe, must come after his colleagues Charlie and Donald. Zach has been placed last, catching all the input combos not matching any other row.

Here is the example as code:

“`ruby # Valid CSV string data = <<~DATA in :topic, in :region, out :team_member sports, Europe, Alice sports, , Bob finance, America, Charlie finance, Europe, Donald finance, , Ernest politics, Asia, Fujio politics, America, Gilbert politics, , Henry , , Zach DATA

table = CSVDecision.parse(data)

table.decide(topic: 'finance', region: 'Europe') #=> { team_member: 'Donald' } table.decide(topic: 'sports', region: nil) #=> { team_member: 'Bob' } table.decide(topic: 'culture', region: 'America') #=> { team_member: 'Zach' } “`

An empty in: cell means “matches any value”, even nils.

Note that all column header names are symbolized, so it's actually more accurate to write in :topic; however spaces before and after the : do not matter.

If you have cloned this gem's git repo, then the example can also be run by loading the table from a CSV file:

ruby table = CSVDecision.parse(Pathname('spec/data/valid/simple_example.csv'))

We can also load this same table using the option: first_match: false, which means that all matching rows will be accumulated into an array of hashes.

ruby table = CSVDecision.parse(data, first_match: false) table.decide(topic: 'finance', region: 'Europe') #=> { team_member: %w[Donald Ernest Zach] }

For more examples see spec/csv_decision/table_spec.rb. Complete documentation of all table parameters is in the code - see lib/csv_decision/parse.rb and lib/csv_decision/table.rb.

CSV Decision features

Either returns the first matching row as a hash (default), or accumulates all matches as an array of hashes (i.e., parse option first_match: false or CSV file option accumulate).
Fast decision-time performance (see benchmarks folder).
In addition to simple strings, csv_decision can match basic Ruby constants (e.g., =nil), regular expressions (e.g., =~ on|off), comparisons (e.g., > 100.0 ) and Ruby-style ranges (e.g., 1..10)
Can compare an input column versus another input hash key - e.g., > :column.
Any cell starting with # is treated as a comment, and comments may appear anywhere in the table. (Comment cells are always interpreted as the empty string.)
Can use column symbol expressions or Ruby methods (0-arity) in input columns for matching - e.g., :column.zero? or :column == 0.
May also use Ruby methods in output columns - e.g., :column.length.
Accepts data as a file, CSV string or an array of arrays. (For safety all input data is force encoded to UTF-8, and non-ascii strings are converted to empty strings.)
All CSV cells are parsed for correctness, and helpful error messages generated for bad input.

Constants other than strings

Although csv_decision is string oriented, it does recognise other types of constant present in the input hash. Specifically, the following classes are recognized: Integer, BigDecimal, NilClass, TrueClass and FalseClass.

This is accomplished by prefixing the value with one of the operators =, == or :=. (The syntax is intentionally lax.)

For example: “`ruby data = <<~DATA in :constant, out :value :=nil, :=nil ==false, ==false =true, =true = 0, = 0 :=100.0, :=100.0 DATA

table = CSVDecision.parse(data) table.decide(constant: nil) # returns value: nil
table.decide(constant: 0) # returns value: 0
table.decide(constant: BigDecimal('100.0')) # returns value: BigDecimal('100.0')
“`

All input and output column names are symbolized, and those symbols may be used to form simple expressions that refer to values in the input hash.

For example: “`ruby data = <<~DATA in :node, in :parent, out :top? , == :node, yes , , no DATA

table = CSVDecision.parse(data)
table.decide(node: 0, parent: 0) # returns top?: 'yes'
table.decide(node: 1, parent: 0) # returns top?: 'no'

“`

Note that there is no need to include an input column for :node in the decision table - it just needs to be present in the input hash. The expression, == :node should be read as :parent == :node. It can also be shortened to just :node, so the above decision table may be simplified to:

ruby data = <<~DATA in :parent, out :top? :node, yes , no DATA These comparison operators are also supported: !=, >, >=, <, <=. For more simple examples see spec/csv_decision/examples_spec.rb.

Input guard conditions

Sometimes it's more convenient to write guard expressions in a single column specialized for that purpose. For example:

data = <<~DATA
  in :country, guard:,          out :ID, out :ID_type, out :len
  US,          :CUSIP.present?, :CUSIP,  CUSIP,        :ID.length
  GB,          :SEDOL.present?, :SEDOL,  SEDOL,        :ID.length
  ,            :ISIN.present?,  :ISIN,   ISIN,         :ID.length
  ,            :SEDOL.present?, :SEDOL,  SEDOL,        :ID.length
  ,            :CUSIP.present?, :CUSIP,  CUSIP,        :ID.length
  ,            ,                := nil,  := nil,       := nil
DATA

table = CSVDecision.parse(data)
table.decide(country: 'US',  CUSIP: '123456789') #=> { ID: '123456789', ID_type: 'CUSIP', len: 9 }
table.decide(country: 'EU',  CUSIP: '123456789', ISIN:'123456789012') 
  #=> { ID: '123456789012', ID_type: 'ISIN', len: 12 }

Input guard: columns may be anonymous, and must contain non-constant expressions. In addition to 0-arity Ruby methods, the following comparison operators are allowed: ==, !=, >, >=, < and <=. Also, regular expressions are supported - i.e., =~ and !~.

Output if conditions

In some situations it is useful to apply filter conditions after all the output columns have been derived. For example:

data = <<~DATA
  in :country, guard:,          out :ID, out :ID_type, out :len,   if:
  US,          :CUSIP.present?, :CUSIP,  CUSIP8,       :ID.length, :len == 8
  US,          :CUSIP.present?, :CUSIP,  CUSIP9,       :ID.length, :len == 9
  US,          :CUSIP.present?, :CUSIP,  DUMMY,        :ID.length,
  ,            :ISIN.present?,  :ISIN,   ISIN,         :ID.length, :len == 12
  ,            :ISIN.present?,  :ISIN,   DUMMY,        :ID.length,
  ,            :CUSIP.present?, :CUSIP,  DUMMY,        :ID.length,
  DATA

table = CSVDecision.parse(data)
table.decide(country: 'US',  CUSIP: '123456789') #=> {ID: '123456789', ID_type: 'CUSIP9', len: 9}
table.decide(CUSIP: '12345678', ISIN:'1234567890') #=> {ID: '1234567890', ID_type: 'DUMMY', len: 10}

Output if: columns may be anonymous, and must contain non-constant expressions. In addition to 0-arity Ruby methods, the following comparison operators are allowed: ==, !=, >, >=, < and <=. Also, regular expressions are supported - i.e., =~ and !~.

Testing

csv_decision includes thorough RSpec tests:

bash # Execute within a clone of the csv_decision Git repository: bundle install rspec

Planned features

csv_decision is still a work in progress, and will be enhanced to support the following features: * Text-only input columns may be indexed for faster lookup performance. * Input hash values may be (conditionally) defaulted with a constant or a function call. * Output columns may construct interpolated strings referencing column symbols. * Supply a pre-defined library of functions that can be called within input columns to implement matching logic or from the output columns to formulate the final decision. * Available functions may be extended with a user-supplied library of Ruby methods for tailored logic.

Reasons for the limitations of column expressions

The simple column expressions allowed by csv_decision are purposely limited for reasons of understandability and maintainability. The whole point of this gem is to make decision rules easier to express and comprehend as declarative, tabular logic. While Ruby makes it easy to execute arbitrary code embedded within a CSV file, this could easily result in hard to debug logic that also poses safety risks.

Changelog

See CHANGELOG.md for a list of changes.

CSV Decision

CSV based Ruby decision tables

Why use `csv_decision`?

Installation

Simple example

CSV Decision features

Constants other than strings

Column header symbols

Input guard conditions

Output if conditions

Testing

Planned features

Reasons for the limitations of column expressions

Changelog

License

CSV Decision

CSV based Ruby decision tables

Why use csv_decision?

Installation

Simple example

CSV Decision features

Constants other than strings

Column header symbols

Input guard conditions

Output if conditions

Testing

Planned features

Reasons for the limitations of column expressions

Changelog

License

Why use `csv_decision`?