Machine Learning with ID3 Decision Trees in Ruby
Introduction to ID3 algorithm
AI4R implements the ID3 algorithm (Quinlan) as one of its automatic classifiers. Given a set of preclassified examples, it builds a top-down induction of decision tree, biased by the information gain and entropy measure.
The good thing about this automatic learning method is that humans learns as well. Unlike other AI techniques like neural networks, classifiers can generate ruby code with if / else sentences. You can use this to evaluate parameters on realtime, copy paste them in a code, or just read them to learn about your problem domain.
Marketing target strategy example using ID3 Decision Trees in Ruby
Let's suppose that you are writting an application that must identify people as relevant marketing targets or not. The only information that you have is a collection of examples, provided by a marketing survey:
DATA_LABELS = [ 'city', 'age_range', 'gender', 'marketing_target' ] DATA_SET = [ ['New York', '<30', 'M', 'Y'], ['Chicago', '<30', 'M', 'Y'], ['Chicago', '<30', 'F', 'Y'], ['New York', '<30', 'M', 'Y'], ['New York', '<30', 'M', 'Y'], ['Chicago', '[30-50)', 'M', 'Y'], ['New York', '[30-50)', 'F', 'N'], ['Chicago', '[30-50)', 'F', 'Y'], ['New York', '[30-50)', 'F', 'N'], ['Chicago', '[50-80]', 'M', 'N'], ['New York', '[50-80]', 'F', 'N'], ['New York', '[50-80]', 'M', 'N'], ['Chicago', '[50-80]', 'M', 'N'], ['New York', '[50-80]', 'F', 'N'], ['Chicago', '>80', 'F', 'Y'] ]
You can create an ID3 Decision tree to do the dirty job for you:
id3 = ID3.new(DATA_SET, DATA_LABELS)
The Decision tree will automatically create the "rules" to parse new data, and identify new posible marketing targets:
id3.to_s # => if age_range=='<30' then marketing_target='Y' elsif age_range=='[30-50)' and city=='Chicago' then marketing_target='Y' elsif age_range=='[30-50)' and city=='New York' then marketing_target='N' elsif age_range=='[50-80]' then marketing_target='N' elsif age_range=='>80' then marketing_target='Y' else raise 'There was not enough information during training to do a proper induction for this data element' end id3.eval(['New York', '<30', 'M']) # => 'Y'
Better data loading
In real life you will use many more data training examples, with more attributes. Consider moving your data to an external CSV (comma separate values) file.
data_set = [] CSV::Reader.parse(File.open("#{File.dirname(__FILE__)}/data_set.csv", 'r')) do |row| data_set << row end data_labels = data_set.shift id3 = ID3.new(data_set, data_labels)
A good tip for data evaluation
The ID3 class provides a method to evaluate new data.
id3.eval(['New York', '<30', 'M']) # => 'Y'
But instead of going through the tree every time, you can take advantage of the fact that the method "to_s" generates proper ruby code!
id3 = ID3.new(DATA_SET, DATA_LABELS) age_range = '<30' city = 'New York' gender = 'M' marketing_target = nil eval id3.to_s puts marketing_target # => 'Y'