Sha256: f786d4fcf627b861dc0d4fb0d2a3399a06fe60fb4ec01ea19ac935c4533911ef

Contents?: true

Size: 878 Bytes

Versions: 5

Compression:

Stored size: 878 Bytes

Contents

module Jkl
  module Text
    class << self

      def sanitize(text)
        remove_short_lines(strip_all_tags(remove_script_tags(text)))
      end
      alias :clean :sanitize

      def strip_all_tags(text)
        text.gsub(/<\/?[^>]*>/, "")
      end

      def remove_blank_lines(text)
        text.gsub(/\n\r|\r\n|\n|\r/, "")
      end

      def remove_html_comments(text)
        text.gsub(/<!--(.|\s)*?-->/, "")
      end

      def remove_script_tags(text)
        text = remove_html_comments(text)
        text.gsub(/((<[\s\/]*script\b[^>]*>)([^>]*)(<\/script>))/i, "")
      end

      def remove_short_lines(text)
        text = text.gsub(/\s\s/, "\n")
        str = ""
        # remove short lines - ususally just navigation
        text.split("\n").each do |l|
          str << l unless l.count(" ") < 5
        end
        str
      end
      
    end
  end
end

Version data entries

5 entries across 5 versions & 1 rubygems

Version Path
jakal-0.1.92 lib/jkl/text_client.rb
jakal-0.1.91 lib/jkl/text_client.rb
jakal-0.1.9 lib/jkl/text_client.rb
jakal-0.1.8 lib/jkl/text_client.rb
jakal-0.1.7 lib/jkl/text_client.rb