Sha256: 07d59c6e974ed5d4805f5d2ff1ecb4c3f44368611c455990302791be032eced2
Contents?: true
Size: 1.32 KB
Versions: 1
Compression:
Stored size: 1.32 KB
Contents
= rTika A JRuby wrapper around the excellent Apache Tika content extraction library. Feed rTika your files and get extracted text and metadata in return. == Usage Make sure you're on JRuby first. require 'rubygems' require 'rtika' result = RTika::FileParser.parse("mywordfile.doc") puts result.content # prints out the document's contents puts result.title # fetches title from the doc's metadata puts result.author # fetches author from the doc's metadata result = RTika::StringParser.parse("<html> <head><title>MYTITLE</title></head> <body>this is my very ... long ... string</body></html>") puts result.content # returns <body> contents puts result.title # returns <title> contents Options :remove_boilerplate => true # uses the Boilerpipe library that ships with Tika to remove headers & footers == Note on Patches/Pull Requests * Fork the project. * Make your feature addition or bug fix. * Add tests for it. This is important so I don't break it in a future version unintentionally. * Commit, do not mess with rakefile, version, or history. (if you want to have your own version, that is fine but bump version in a commit by itself I can ignore when I pull) * Send me a pull request. Bonus points for topic branches. == Copyright Copyright (c) 2010 Pradeep Elankumaran. See LICENSE for details.
Version data entries
1 entries across 1 versions & 1 rubygems
Version | Path |
---|---|
rtika-0.3.0 | README.rdoc |