Sha256: 04fae01dc39b0cf9b23b743f7fce6856e46263fcabdefaebc4eeb4d9faa3d8a1
Contents?: true
Size: 1010 Bytes
Versions: 1
Compression:
Stored size: 1010 Bytes
Contents
= rTika A JRuby wrapper around the excellent Apache Tika content extraction library. Feed rTika your files and get extracted text and metadata in return. == Usage Make sure you're on JRuby first. require 'rubygems' require 'rtika' result = RTika::FileParser.parse("mywordfile.doc") puts result.content puts result.title puts result.author result = RTika::StringParser.parse("<html><body>this is my very ... long ... string</body></html>") puts result.content puts result.title == Note on Patches/Pull Requests * Fork the project. * Make your feature addition or bug fix. * Add tests for it. This is important so I don't break it in a future version unintentionally. * Commit, do not mess with rakefile, version, or history. (if you want to have your own version, that is fine but bump version in a commit by itself I can ignore when I pull) * Send me a pull request. Bonus points for topic branches. == Copyright Copyright (c) 2010 Pradeep Elankumaran. See LICENSE for details.
Version data entries
1 entries across 1 versions & 1 rubygems
Version | Path |
---|---|
rtika-0.1.0 | README.rdoc |