Sha256: 7f50d9a5b57fd4eb5a4b79bd1d5a4bed8d27d5b63351847d026ca95acc372d6e

Contents?: true

Size: 1.1 KB

Versions: 5

Compression:

Stored size: 1.1 KB

Contents

## Ruby Tika Parser

### Introduction

This is a simple frontend to the Java Tika parser command line jar / app.

It is the same as running: 

    java -server -Djava.awt.headless=true -Dfile.encoding=UTF-8 -jar tika-app-1.24.1.jar FileToParse.pdf

with options like --xml, --text, etc.

### Installation

To install, add ruby_tika_app to your _Gemfile_ and run `bundle install`:

    gem 'ruby_tika_app'


### Note about installation

RubyTikaApp is a pretty big gem since it includes the ruby-tika-app jarfile.
It might take a while to install.

### Usage

First, you need Java installed.  And it needs to be in your $PATH.

Then:

```ruby
require 'ruby_tika_app'

rta = RubyTikaApp.new("sample_file.pdf")

puts rta.to_xml # <xml output>

# You also get to_json, to_text, to_text_main, and to_metadata

```

### Testing

Run:

    bundle exec rspec spec/

*NOTE*: Since we are using an underlying java library to connect to external
URLs we can't use a standard mocking library.  The test suite starts a
rack-based web server.

### Contributing

Fork on GitHub and after you've committed tested patches, send a pull request.

Version data entries

5 entries across 5 versions & 1 rubygems

Version Path
ruby_tika_app_lambda-1.25.4 README.md
ruby_tika_app_lambda-1.25.3 README.md
ruby_tika_app_lambda-1.25.2 README.md
ruby_tika_app_lambda-1.25.1 README.md
ruby_tika_app_lambda-1.25.0 README.md