# Proselytism

Document converter, text and image extractor using OpenOffice headless server, pdf_tools and net_pbm

## Note

This gem has been originally written for as a RoR 3.2 engine running on Ruby 1.8.7.
It should be framework agnostic and has been tested on Ubuntu and MacOSX.

Due to its dependency to system_timer it doesn't work with ruby 1.9.x

## Installation

Install the required external librairies :

    # aptitude install netpbm
    # aptitude install xpdf
    # aptitude install libreoffice

Add this line to your application's Gemfile:

    gem 'proselytism', :git => "git://github.com/itkin/proselytism.git"

And then execute:

    $ bundle

Generate the config file or / and an initializer

    $ rails g proselytism:config
    $ rails g proselytism:initializer

As an engine, Proselytism automatically load and autoconfig with /config/proselytism.yml if it exists
You can override these configurations params with an initializer. This is especially usefull when you want a custom log file
    
```ruby
#/config/initializers/proselytism.rb
Proselytism.config do |config|
  config.logger = ActiveSupport::BufferedLogger.new(File.join(Rails.root, 'log', 'proselytism.log'))
end
```

## Usage

```ruby
Proselytism.convert source_file_path, :to => :pdf do |converted_file_path|

end
Proselytism.extract_text source_file_path do |extracted_text|

end
Proselytism.extract_images source_file_path do |image_files_paths|

end
```

Proselytism create its converted files in temporary folders.
  - If you pass a block to the method the folders are automatically deleted after the block is yield, so use or copy the file content within the block
  - If you don't pass a block, don't forget to safely remove the temp folder

```ruby
pdf_file_path = Proselytism.convert source_file_path, :to => :pdf
FileUtils.remove_entry_secure File.dirname(pdf_file_path)
```
    
## Add your own converter

Add your own converter by extending Proselytism::Converters::Base
  - Your converter will be automatically selected and used related to the form and to extensions list
  - Add a perform method which
    - define a text command
    - call execute
    - return the converted file(s) path

```ruby
class MyConverter < Proselytism::Converters::Base
  form :ext1, :ext2
  to :ext3, :ext4

  def perform(origin, options={})
    destination = destination_file_path(origin, options)
    command = "pdftotext #{origin} #{destination} 2>&1"
    execute command
    destination
  end
end
```
## Contributing

1. Fork it
2. Create your feature branch (`git checkout -b my-new-feature`)
3. Commit your changes (`git commit -am 'Add some feature'`)
4. Push to the branch (`git push origin my-new-feature`)
5. Create new Pull Request