Sha256: a7c82c8375a7ddc5a8c59c13b5a4fb90a354d4ea7361b1ec8b8672c030a77f6d

Contents?: true

Size: 1.03 KB

Versions: 1

Compression:

Stored size: 1.03 KB

Contents

require 'wordlist/builder'

require 'spidr'

module Wordlist
  module Builders
    class Website < Builder

      # Host to spider
      attr_accessor :host

      #
      # Creates a new Website builder object with the specified _path_
      # and _host_. If a _block_ is given, it will be passed the new created
      # Website builder object.
      #
      def initialize(path,host,&block)
        @host = host

        super(path,&block)
      end

      #
      # Builds the word-list file by spidering the +host+ and parsing the
      # inner-text from all HTML pages. If a _block_ is given, it will be
      # called before all HTML pages on the +host+ have been parsed.
      #
      def build!(&block)
        super(&block)

        Spidr.host(@host) do |spidr|
          spidr.every_page do |page|
            if page.html?
              page.doc.search('//h1|//h2|//h3|//h4|//h5|//p|//span').each do |element|
                parse(element.inner_text)
              end
            end
          end
        end
      end

    end
  end
end

Version data entries

1 entries across 1 versions & 1 rubygems

Version Path
wordlist-0.1.0 lib/wordlist/builders/website.rb