Class: CodeZauker::Util
- Inherits:
-
Object
- Object
- CodeZauker::Util
- Defined in:
- lib/code_zauker.rb
Overview
Basic utility class
Instance Method Summary (collapse)
-
- (Object) ensureUTF8(untrusted_string)
Ensure Data are correctly imported
blog.grayproductions.net/articles/ruby_19s_string This code try to "guess" the right encoding switching to ISO-8859-1 if UTF-8 is not valid.
-
- (Object) get_lines(filename)
Obtain lines from a filename It works even with pdf files.
- - (Boolean) is_pdf?(filename)
-
- (Object) mixCase(trigram)
Compute all the possible case-mixed trigrams It works for every string size TODO: Very bad implementation, need improvements.
Instance Method Details
- (Object) ensureUTF8(untrusted_string)
Ensure Data are correctly imported
blog.grayproductions.net/articles/ruby_19s_string This code try to "guess" the right encoding switching to ISO-8859-1 if UTF-8 is not valid. Tipical use case: an italian source code wronlgy interpreted as a UTF-8 whereas it is a ISO-8859 windows code.
57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 |
# File 'lib/code_zauker.rb', line 57 def ensureUTF8(untrusted_string) if untrusted_string.valid_encoding?()==false #puts "DEBUG Trouble on #{untrusted_string}" untrusted_string.force_encoding("ISO-8859-1") # We try ISO-8859-1 tipical windows begin valid_string=untrusted_string.encode("UTF-8", { :undef =>:replace, :invalid => :replace} ) rescue Encoding::InvalidByteSequenceError => e raise e end # if valid_string != untrusted_string # puts "CONVERTED #{valid_string} Works?#{valid_string.valid_encoding?}" # end return valid_string else return untrusted_string end end |
- (Object) get_lines(filename)
Obtain lines from a filename It works even with pdf files
82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 |
# File 'lib/code_zauker.rb', line 82 def get_lines(filename) lines=[] if self.is_pdf?(filename) # => enable pdf processing.... #puts "PDF..." File.open(filename, "rb") do |io| reader = PDF::Reader.new(io) #puts "PDF Scanning...#{reader.info}" reader.pages.each do |page| linesToTrim=page.text.split("\n") linesToTrim.each do |l| lines.push(l.strip()) end end #puts "PDF Lines:#{lines.length}" end else File.open(filename,"r") { |f| lines=f.readlines() } end return lines end |
- (Boolean) is_pdf?(filename)
76 77 78 |
# File 'lib/code_zauker.rb', line 76 def is_pdf?(filename) return filename.downcase().end_with?(".pdf") end |
- (Object) mixCase(trigram)
Compute all the possible case-mixed trigrams It works for every string size TODO: Very bad implementation, need improvements
20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 |
# File 'lib/code_zauker.rb', line 20 def mixCase(trigram) caseMixedElements=[] lx=trigram.length combos=2**lx startString=trigram.downcase #puts "Combos... 1..#{combos}... #{startString}" for c in 0..(combos-1) do # Make binary maskForStuff=c.to_s(2) p=0 #puts maskForStuff currentMix="" # Pad it if maskForStuff.length < lx maskForStuff = ("0"*(lx-maskForStuff.length)) +maskForStuff end maskForStuff.each_char { | x | #putc x if x=="1" currentMix +=startString[p].upcase else currentMix +=startString[p].downcase end #puts currentMix p+=1 } caseMixedElements.push(currentMix) end return caseMixedElements end |