module Stamina class Automaton # # Provides useful automaton walking methods. This module is automatically # included in Automaton and is not intended to be used directly. # # == Examples # # Building an automaton for the regular language a(ba)* # s0, s1 = nil # fa = Automaton.new do # s0 = add_state(:initial => true) # s1 = add_state(:accepting => true) # connect(0,1,'a') # connect(1,0,'b') # end # # # some examples with reached # fa.dfa_reached('? a b') # -> s0 dfa variant method # fa.dfa_reached('? a a') # -> nil # fa.dfa_reached('? b a', s1) # -> s1 from an explicit init state # # fa.reached('? a b') # -> [s0] generic method on automaton # fa.reached('? a a') # -> [] # fa.reached('? b a', s1) # -> [s1] # # # some examples with split (the most powerful one!) # fa.dfa_split('? a b a b') # [['a','b','a','b'], s0, []] # fa.dfa_split('? a b b a') # [['a','b'], s0, ['b','a']] # # fa.split('? a b a b') # [['a','b','a','b'], [s0], []] # fa.split('? a b b a') # [['a','b'], [s0], ['b','a']] # # # All of this works on non-deterministic automata as well (and epsilon # # symbols are taken into account), but you'll probably need to read # # the following section to master the power of this module in this case! # # == Using this module # This section fully details the design choices that has been made for the # implementation of the Walking module used by Stamina on Automaton. It is provided # because Walking is one of the core classes of Stamina, that probably all # users (and contributors) will use. Walking usage is really user-friendly, # so you are normally not required to read this section in the first # place ! Read it only if of interest for you, or if you experiment unexpected # results. # # Methods defined by this module respect common conventions that you must be # aware of: # # === Generic methods vs. dfa variants # The convention is simple: methods whose name starts with 'dfa_' are expected # to be used on deterministic automata only (that is, automata answering _true_ # to the deterministic? method invocation). We refer to those methods as # 'dfa variants'. Strange results may occur if invoked on non-deterministic # automata. Other methods are called 'generic methods' and can be used on any # automaton. Generic methods and dfa variants sometimes use different conventions # according to arguments and returned values, as explained below. # # === Argument conventions # - all methods taking a _symbol_ argument expect it to be a valid instance of # the class used for representing input symbols on edges of your automaton # (that is, the mark you've installed under :symbol key on the edge, see # Automaton documentation for details). # - all methods taking an _input_ argument support the following objects for it: # - InputString instance: an real input string typically coming from a Sample. # - Array of symbols: where by symbol, we mean an input symbol as explained # above (and not a Ruby Symbol instance). The array is never modified by the # methods, so that you don't have to worry about where this array comes from. # - String (a real Ruby one): in this case, the input is expected to be an ADL # input string, which is parsed using ADL::parse_string. Note that 'a b a b' # is NOT a valid ADL input string, so that you typically have to use the '?' # sign to indicate that the tested string is basically unlabeled. # - all methods taking a _from_ argument support the following objects for it: # - ommited: _from_ is interpreted as the set of initial states by generic # methods and the last rule applies. _from_ is interpreted as the unique initial # state of the deterministic automaton by dfa method variants (dfa_xxx), # and the third rule applies. # - Integer: _from_ is interpreted as a state index, and the next rule applies # on the index-th state of the automaton. # - State: _from_ is interpreted by the generic methods as a singleton set # containing the state and the last rule applies. Deterministic method # variants interpret it as the start state from which the walk must start. # In this case, they always return a State instance (or _nil_) instead of # an array of states. # - Array: _from_ is interpreted as a set of states (duplicates are supported # so it's actually a bag) from which the walk must start. Indexes of states # are also supported, see Automaton documentation about indexes. # # === Returned value conventions # Moreover, (unless stated explicitely) all methods returning states as (part of) # their returned value respect the following _return_ conventions (which somewhat # summarizes the _from_ conventions above): # - generic methods *always* return an array of states (without duplicates) which # can be modified. This array is *never* sorted by state index. To insist: # even when invoked on a deterministic automaton with a State argument as # _from_, they will return an array of states as show by the code excerpt # below. Lastly, the returned value is *never* _nil_, but an empty array may # be returned when it makes sense (no reached states for example). # # fa = Automaton.new do ... end # build a(ba)* automaton # s0 = fa.initial_state # fa.reached('? a b a b', s0) # returns [s0], not s0 ! # # - dfa variant methods respond to your query using the same language as you: # if _from_ is ommitted, is a State or an Integer, the the result will be a # single State instance, or _nil_ if it makes sense (no reached state for # example). Otherwise, they behaves exactly as generic methods (*always* return # an array of states, ...) # # === Epsilon symbol aware methods # Stamina does not allow epsilon symbols on deterministic automata; thus, this # subsection only applies to generic methods. # # Methods documented as 'epsilon aware' (almost all generic methods) *always* # take epsilon symbols into account in their computations (Stamina uses _nil_ as # epsilon symbol, by convention), in a natural way. For example: # # fa = Automaton.new do ... end # build a non-deterministic automaton # # with epsilon symbols # # # the line below computes the set of reached states # # (from the set of initial states) by walking the dfa # # with a string. # # # # The actual computation is in fact the set of reached # # states with the string (regex) 'eps* a eps* b eps*', # # where eps is the epsilon symbol. # reached = fa.reached('? a b') # # == Detailed API module Walking # # Returns reachable states from _from_ states with an input _symbol_. This # method is not epsilon symbol aware. # def step(from, symbol) from = walking_to_from(from) from.collect{|s| s.step(symbol)}.flatten.uniq end # # Returns the state reached from _from_ states with an input _symbol_. Returns # nil or the empty array (according to _from_ conventions) if no state can be # reached with the given symbol. # def dfa_step(from, symbol) step = walking_to_from(from).collect{|s| s.dfa_step(symbol)}.flatten.uniq walking_to_dfa_result(step, from) end # # Computes an array representing the set of states that can be reached from # _from_ states with the given input _symbol_. # # This method is epsilon symbol aware (represented with nil) on non # deterministic automata, meaning that it actually computes the set of # reachable states through strings respecting the eps* symbol eps* # regular expression, where eps is the epsilon symbol. # def delta(from, symbol) walking_to_from(from).collect{|s| s.delta(symbol)}.flatten.uniq end # # Returns the target state (or the target states, according to _from_ # conventions) that can be reached from _from_ states with a given input # _symbol_. Returns nil (or an empty array, according to the same conventions) # if no such state exists. # def dfa_delta(from, symbol) if from.is_a?(Automaton::State) from.dfa_delta(symbol) else delta = walking_to_from(from).collect{|s| s.dfa_delta(symbol)}.flatten.uniq walking_to_dfa_result(delta, from) end end # # Splits a given input and returns a triplet [parsed,reached,remaining] # where _parsed_ is an array of parsed symbols, _reached_ is the set of reached # states with the _parsed_ input string and _remaining_ is an array of symbols # with the unparsable part of the string. This method is epsilon symbol aware. # # By construction, the following properties are verified: # - parsed + remaining == input (assuming input is an array of symbols), # which means that atring concatenation of parsed and remaining symbols is # is the input string. # - reached.empty? == false, because at least initial states (or # _from_ if provided) are reached. # - remaining.empty? == parses?(input,from), meaning that the automaton # parses the whole input if there is no remaining symol. # - delta(reached, remaining[0]).empty? unless remaining.empty?, # which express the splitting stop condition: splitting continues while at # least one state can be reached with the next symbol. # def split(input, from=nil, sort=false) if deterministic? parsed, reached, remaining = dfa_split(input, from) [parsed, walking_from_dfa_to_nfa_result(reached), remaining] else # the three elements of the triplet parsed = [] reached = walking_to_from(from) remaining = walking_to_modifiable_symbols(input) # walk now until remaining.empty? symb = remaining[0] next_reached = delta(reached, symb) # stop it if no reached state break if next_reached.empty? # otherwise, update triplet parsed << remaining.shift reached = next_reached end reached.sort! if sort [parsed, reached, remaining] end end # Same as split, respecting dfa conventions. def dfa_split(input, from=nil) from = initial_state if from.nil? from = ith_state(from) if from.is_a?(Integer) if from.is_a?(Automaton::State) # the three elements of the triplet parsed = [] reached = from remaining = walking_to_modifiable_symbols(input) # walk now until remaining.empty? symb = remaining[0] next_reached = reached.dfa_delta(symb) # stop it if no reached state break if next_reached.nil? # otherwise, update triplet parsed << remaining.shift reached = next_reached end [parsed, reached, remaining] else # the three elements of the triplet parsed = [] reached = walking_to_from(from) remaining = walking_to_modifiable_symbols(input) # walk now until remaining.empty? symb = remaining[0] next_reached = dfa_delta(reached, symb) # stop it if no reached state break if next_reached.nil? or next_reached.empty? # otherwise, update triplet parsed << remaining.shift reached = next_reached end [parsed, walking_to_dfa_result(reached, from), remaining] end end # # Walks the automaton with an input string, starting at states _from_, # collects the set of all reached states and returns it. Unlike split, # returned array is empty if the string is not parsable by the automaton. # This method is epsilon symbol aware. # def reached(input, from=nil) parsed, reached, remaining = split(input, from) remaining.empty? ? reached : [] end # Same as reached, respecting dfa conventions. def dfa_reached(input, from=nil) walking_to_dfa_result(reached(input,from),from) end # # Checks if the automaton is able to parse an input string. Returns true if # at least one state can be reached, false otherwise. Unlike accepts?, the # labeling of the reached state does not count. # def parses?(input, from=nil) not(reached(input,from).empty?) end # # Checks if the automaton accepts an input string. Returns true if at least # one accepting state can be reached, false otherwise. # def accepts?(input, from=nil) not reached(input,from).select{|s| s.accepting? and not s.error?}.empty? end # # Checks if the automaton rejects an input string. Returns true if no # accepting state can be reached, false otherwise. # def rejects?(input, from=nil) not(accepts?(input, from)) end # Returns '1' if the string is accepted by the automaton, # '0' otherwise. def label_of(str) accepts?(str) ? '1' : '0' end ### protected section ######################################################## protected # # Converts an input to a modifiable array of symbols. # # If _input_ is an array, it is simply duplicated. If an InputString, # InputString#symbols is invoked and result is duplicated. If _input_ is a # ruby String, it is split using input.split(' '). Raises an # ArgumentError otherwise. # def walking_to_modifiable_symbols(input) case input when Array input.dup when InputString input.symbols.dup when String ADL::parse_string(input).symbols.dup else raise(ArgumentError, "#{input} cannot be converted to a array of symbols", caller) end end # Implements _from_ conventions. def walking_to_from(from) return initial_states if from.nil? Array===from ? from.collect{|s| to_state(s)} : [to_state(from)] end # Implements _return_ conventions of dfa_xxx methods. def walking_to_dfa_result(result, from) result.compact! # methods are allowed to return [nil] Array===from ? result : (result.empty? ? nil : result[0]) end # Implements _return_ conventions of standard methods that uses dfa_xxx ones. def walking_from_dfa_to_nfa_result(result) Array===result ? result : (result.nil? ? [] : [result]) end end # end Walking include Stamina::Automaton::Walking include Stamina::Classifier end # class Automaton end # end Stamina