Sha256: f037ad996283586ac30524c010826373386b76339540d5da53a67d33c223bf65
Contents?: true
Size: 1.2 KB
Versions: 1
Compression:
Stored size: 1.2 KB
Contents
= rTika A JRuby wrapper around the excellent Apache Tika content extraction library. Feed rTika your files and get extracted text and metadata in return. == Usage Make sure you're on JRuby first. require 'rubygems' require 'rtika' result = RTika::FileParser.parse("mywordfile.doc") puts result.content # prints out the document's contents puts result.title # fetches title from the doc's metadata puts result.author # fetches author from the doc's metadata result = RTika::StringParser.parse("<html> <head><title>MYTITLE</title></head> <body>this is my very ... long ... string</body></html>") puts result.content # returns <body> contents puts result.title # returns <title> contents == Note on Patches/Pull Requests * Fork the project. * Make your feature addition or bug fix. * Add tests for it. This is important so I don't break it in a future version unintentionally. * Commit, do not mess with rakefile, version, or history. (if you want to have your own version, that is fine but bump version in a commit by itself I can ignore when I pull) * Send me a pull request. Bonus points for topic branches. == Copyright Copyright (c) 2010 Pradeep Elankumaran. See LICENSE for details.
Version data entries
1 entries across 1 versions & 1 rubygems
Version | Path |
---|---|
rtika-0.2.0 | README.rdoc |