CSV Decision
CSV based Ruby decision tables
csv_decision
is a RubyGem for CSV based decision tables. It
accepts decision tables implemented as a CSV file,
which can then be used to execute complex conditional logic against an
input hash, producing a decision as an output hash.
Why use csv_decision
?
Typical “business logic” is notoriously illogical - full of corner cases and one-off exceptions. A decision table can express data-based decisions in a way that comes naturally to subject matter experts, who typically use spreadsheet models. Business logic can be encapsulated in a table, avoiding the need for tortuous conditional expressions.
This gem and the examples below take inspiration from rufus/decision.
(That gem is no longer maintained and CSV Decision has better decision-time
performance, at the expense of slower table parse times and more memory –
see benchmarks/rufus_decision.rb
.)
Installation
To get started, just add csv_decision
to your
Gemfile
, and then run bundle
:
ruby gem 'csv_decision'
or simply bash gem install csv_decision
Simple example
This table considers two input conditions: topic
and
region
, labeled in:
. Certain combinations yield
an output value for team_member
, labeled out:
.
in:topic | in:region | out:team_member
---------+------------+----------------
sports | Europe | Alice
sports | | Bob
finance | America | Charlie
finance | Europe | Donald
finance | | Ernest
politics | Asia | Fujio
politics | America | Gilbert
politics | | Henry
| | Zach
When the topic is finance
and the region is
Europe
the team member Donald
is selected. This
is a “first match” decision table in that as soon as a match is made
execution stops and a single output row (hash) is returned.
The ordering of rows matters. Ernest
, who is in charge of
finance
for the rest of the world, except for
America
and Europe
, must come after his
colleagues Charlie
and Donald
. Zach
has been placed last, catching all the input combos not matching any other
row.
Here's the example as code:
“`ruby # Valid CSV string data = <<~DATA in :topic, in :region, out :team_member sports, Europe, Alice sports, , Bob finance, America, Charlie finance, Europe, Donald finance, , Ernest politics, Asia, Fujio politics, America, Gilbert politics, , Henry , , Zach DATA
table = CSVDecision.parse(data)
table.decide(topic: 'finance', region: 'Europe') #=> { team_member: 'Donald' } table.decide(topic: 'sports', region: nil) #=> { team_member: 'Bob' } table.decide(topic: 'culture', region: 'America') #=> { team_member: 'Zach' } “`
An empty in:
cell means “matches any value”.
Note that all column header names are symbolized, so it's actually more
accurate to write in :topic
; however spaces before and after
the :
do not matter.
If you have cloned this gem's git repo, then the example can also be run by loading the table from a CSV file:
ruby table =
CSVDecision.parse(Pathname('spec/data/valid/simple_example.csv'))
We can also load this same table using the option: first_match:
false
, which means that all matching rows will be
accumulated into an array of hashes.
ruby table = CSVDecision.parse(data, first_match: false)
table.decide(topic: 'finance', region: 'Europe') #=> {
team_member: %w[Donald Ernest Zach] }
For more examples see spec/csv_decision/table_spec.rb
.
Complete documentation of all table parameters is in the code - see
lib/csv_decision/parse.rb
and
lib/csv_decision/table.rb
.
CSV Decision features
-
Either returns the first matching row as a hash (default), or accumulates all matches as an array of hashes (i.e.,
parse
optionfirst_match: false
or CSV file optionaccumulate
). -
Fast decision-time performance (see
benchmarks
folder). Automatically indexes all constants-only columns that do not contain any empty strings. -
In addition to strings, can match basic Ruby constants (e.g.,
=nil
), regular expressions (e.g.,=~ on|off
), comparisons (e.g.,> 100.0
) and Ruby-style ranges (e.g.,1..10
) -
Can compare an input column versus another input hash key - e.g.,
> :column
. -
Any cell starting with
#
is treated as a comment, and comments may appear anywhere in the table. -
Column symbol expressions or Ruby methods (0-arity) may be used in input columns for matching - e.g.,
:column.zero?
or:column == 0
. -
May also use Ruby methods in output columns - e.g.,
:column.length
. -
Accepts data as a file, CSV string or an array of arrays.
Constants other than strings
Although csv_decision
is string oriented, it does recognise
other types of constant present in the input hash. Specifically, the
following classes are recognized: Integer
,
BigDecimal
, NilClass
, TrueClass
and
FalseClass
.
This is accomplished by prefixing the value with one of the operators
=
, ==
or :=
. (The syntax is
intentionally lax.)
For example: “`ruby data = <<~DATA in :constant, out :value :=nil, :=nil ==false, ==false =true, =true = 0, = 0 :=100.0, :=100.0 DATA
table = CSVDecision.parse(data) table.decide(constant: nil) # returns
value: nil
table.decide(constant: 0) # returns value: 0
table.decide(constant: BigDecimal('100.0')) # returns value:
BigDecimal('100.0')
“`
Column header symbols
All input and output column names are symbolized, and those symbols may be used to form simple expressions that refer to values in the input hash.
For example: “`ruby data = <<~DATA in :node, in :parent, out :top? , == :node, yes , , no DATA
table = CSVDecision.parse(data)
table.decide(node: 0, parent: 0) # returns top?: 'yes'
table.decide(node: 1, parent: 0) # returns top?: 'no'
“`
Note that there is no need to include an input column for
:node
in the decision table - it just needs to be present in
the input hash. The expression, == :node
should be read as
:parent == :node
. It can also be shortened to just
:node
, so the above decision table may be simplified to:
ruby data = <<~DATA in :parent, out :top?
:node, yes , no DATA
These comparison
operators are also supported: !=
, >
,
>=
, <
, <=
. In addition, you
can also apply a Ruby 0-arity method - e.g., .present?
or
.nil?
. Negation is also supported - e.g., !.nil?
.
Note that .nil?
can also be written as := nil?
,
and !.nil?
as := !nil?
, depending on preference.
For more simple examples see
spec/csv_decision/examples_spec.rb
.
Input guard
conditions
Sometimes it's more convenient to write guard expressions in a single column specialized for that purpose. For example:
data = <<~DATA
in :country, guard:, out :ID, out :ID_type, out :len
US, :CUSIP.present?, :CUSIP, CUSIP, :ID.length
GB, :SEDOL.present?, :SEDOL, SEDOL, :ID.length
, :ISIN.present?, :ISIN, ISIN, :ID.length
, :SEDOL.present?, :SEDOL, SEDOL, :ID.length
, :CUSIP.present?, :CUSIP, CUSIP, :ID.length
, , := nil, := nil, := nil
DATA
table = CSVDecision.parse(data)
table.decide(country: 'US', CUSIP: '123456789') #=> { ID: '123456789', ID_type: 'CUSIP', len: 9 }
table.decide(country: 'EU', CUSIP: '123456789', ISIN:'123456789012')
#=> { ID: '123456789012', ID_type: 'ISIN', len: 12 }
Input guard:
columns may be anonymous, and must contain
non-constant expressions. In addition to 0-arity Ruby methods, the
following comparison operators are allowed: ==
,
!=
, >
, >=
, <
and <=
. Also, regular expressions are supported - i.e.,
=~
and !~
.
Output if
conditions
In some situations it is useful to apply filter conditions after all the output columns have been derived. For example:
data = <<~DATA
in :country, guard:, out :ID, out :ID_type, out :len, if:
US, :CUSIP.present?, :CUSIP, CUSIP8, :ID.length, :len == 8
US, :CUSIP.present?, :CUSIP, CUSIP9, :ID.length, :len == 9
US, :CUSIP.present?, :CUSIP, DUMMY, :ID.length,
, :ISIN.present?, :ISIN, ISIN, :ID.length, :len == 12
, :ISIN.present?, :ISIN, DUMMY, :ID.length,
, :CUSIP.present?, :CUSIP, DUMMY, :ID.length,
DATA
table = CSVDecision.parse(data)
table.decide(country: 'US', CUSIP: '123456789') #=> {ID: '123456789', ID_type: 'CUSIP9', len: 9}
table.decide(CUSIP: '12345678', ISIN:'1234567890') #=> {ID: '1234567890', ID_type: 'DUMMY', len: 10}
Output if:
columns may be anonymous, and must contain
non-constant expressions. In addition to 0-arity Ruby methods, the
following comparison operators are allowed: ==
,
!=
, >
, >=
, <
and <=
. Also, regular expressions are supported - i.e.,
=~
and !~
.
Input set
columns
If you wish to set default values in the input hash, you can use a
set
column rather than an in
column. The data row
beneath the header is used to specify the default expression. There are
three variations: set
(unconditional default)
set/nil?
(set if nil?
true) and
set/blank?
(set if blank?
true). Note that the
decide!
method will mutate the input hash.
data = <<~DATA
set/nil? :country, guard:, set: class, out :PAID, out: len, if:
US, , :class.upcase,
US, :CUSIP.present?, != PRIVATE, :CUSIP, :PAID.length, :len == 9
!=US, :ISIN.present?, != PRIVATE, :ISIN, :PAID.length, :len == 12
US, :CUSIP.present?, PRIVATE, :CUSIP, :PAID.length,
!=US, :ISIN.present?, PRIVATE, :ISIN, :PAID.length,
DATA
table = CSVDecision.parse(data)
table.decide(CUSIP: '1234567890', class: 'Private') #=> {PAID: '1234567890', len: 10}
table.decide(ISIN: '123456789012', country: 'GB', class: 'private') #=> {PAID: '123456789012', len: 12}
Input path
columns
For hashes that contain sub-hashes, it's possible to specify a path for the purposes of matching. (Arrays are currently not supported.)
data = <<~DATA
path:, path:, out :value
header, , :source_name
header, metrics, :service_name
payload, , :amount
payload, ref_data, :account_id
DATA
table = CSVDecision.parse(data, first_match: false)
input = {
header: {
id: 1, type_cd: 'BUY', source_name: 'Client', client_name: 'AAPL',
metrics: { service_name: 'Trading', receive_time: '12:00' }
},
payload: {
tran_id: 9, amount: '100.00',
ref_data: { account_id: '5010', type_id: 'BUYL' }
}
}
table.decide(input) #=> { value: %w[Client Trading 100.00 5010] }
Testing
csv_decision
includes thorough RSpec tests:
bash # Execute within a clone of the csv_decision Git repository:
bundle install rspec
Planned features
csv_decision
is still a work in progress, and will be enhanced
to support the following features: * Supply a pre-defined library of
functions that may be called within input columns to implement custom
matching logic, or from the output columns to formulate the final
decision. * Built-in lookup functions evaluate other decision tables to
implement guard conditions, or supply output values. * Available
functions may be extended with a user-supplied library of Ruby methods for
custom logic. * Output columns may construct interpolated strings
containing references to column symbols.
Reasons for the limitations of column expressions
The simple column expressions allowed by csv_decision
are
purposely limited for reasons of understandability and maintainability. The
whole point of this gem is to make decision rules easier to express and
comprehend as declarative, tabular logic. While Ruby makes it easy to
execute arbitrary code embedded within a CSV file, this could easily result
in hard to debug logic that also poses safety risks.
Changelog
See CHANGELOG.md for a list of changes.
License
CSV Decision © 2017-2018 by Brett Vickers. CSV Decision is licensed under the MIT license. Please see the LICENSE document for more information.