# = HTMLTokenizer # # Author:: Ben Giddings (mailto:bg-rubyforge@infofiend.com) # Copyright:: Copyright (c) 2004 Ben Giddings # License:: Distributes under the same terms as Ruby # # # This is a partial port of the functionality behind Perl's TokeParser # Provided a page it progressively returns tokens from that page # # $Id: htmltokenizer.rb,v 1.7 2005/06/07 21:05:53 merc Exp $ # # A class to tokenize HTML. # # Example: # # page = " #
#
# This is the paragraph, it contains
# links,
# . Ok, here is some more text and
# another link.
#
This is the paragraph, it contains
links,
. Ok, here is some more text and
another link.