Sha256: 04fae01dc39b0cf9b23b743f7fce6856e46263fcabdefaebc4eeb4d9faa3d8a1

Contents?: true

Size: 1010 Bytes

Versions: 1

Compression:

Stored size: 1010 Bytes

Contents

= rTika

A JRuby wrapper around the excellent Apache Tika content extraction library.
Feed rTika your files and get extracted text and metadata in return.

== Usage
Make sure you're on JRuby first.

  require 'rubygems'
  require 'rtika'

  result = RTika::FileParser.parse("mywordfile.doc")
  puts result.content
  puts result.title
  puts result.author

  result = RTika::StringParser.parse("<html><body>this is my very ... long ... string</body></html>")
  puts result.content
  puts result.title

== Note on Patches/Pull Requests
 
* Fork the project.
* Make your feature addition or bug fix.
* Add tests for it. This is important so I don't break it in a
  future version unintentionally.
* Commit, do not mess with rakefile, version, or history.
  (if you want to have your own version, that is fine but bump version in a commit by itself I can ignore when I pull)
* Send me a pull request. Bonus points for topic branches.

== Copyright

Copyright (c) 2010 Pradeep Elankumaran. See LICENSE for details.

Version data entries

1 entries across 1 versions & 1 rubygems

Version Path
rtika-0.1.0 README.rdoc