Sha256: c96b884f44315391e3099f9920ee1c51a2fbc00bbaa330aa50134a43f0e16a0a

Contents?: true

Size: 1.54 KB

Versions: 5

Compression:

Stored size: 1.54 KB

Contents

# PDFBox text extraction

This gem lets you extract plain text from PDF documents. It is a Jruby wrapper for the [Apache PDFBox](https://pdfbox.apache.org/) library.

## Installation

Add this line to your application's Gemfile:

    gem 'pdfbox_text_extraction'

And then execute:

    $ bundle

Or install it yourself as:

    $ gem install pdfbox_text_extraction

## Usage

To extract all text on every page:

    extracted_text = PdfboxTextExtraction.run(path_to_pdf)

To extract text inside a crop area:

    extracted_text = PdfboxTextExtraction.run(
      path_to_pdf,
      {
        crop_x: 0, # crop area top left corner x-coordinate
        crop_y: 1.0, # crop area top left corner y-coordinate
        crop_width: 8.5, # crop area width
        crop_height: 9.4, # crop area height
      }
    )

## Contributing

1. Fork it ( https://github.com/jhund/pdfbox_text_extraction/fork )
2. Create your feature branch (`git checkout -b my-new-feature`)
3. Commit your changes (`git commit -am 'Add some feature'`)
4. Push to the branch (`git push origin my-new-feature`)
5. Create a new Pull Request

### Resources

* [Source code (github)](https://github.com/jhund/pdfbox_text_extraction)
* [Issues](https://github.com/jhund/pdfbox_text_extraction/issues)
* [Rubygems.org](http://rubygems.org/gems/pdfbox_text_extraction)

### License

[MIT licensed](https://github.com/jhund/pdfbox_text_extraction/blob/master/LICENSE.txt).

### Copyright

Copyright (c) 2016 Jo Hund. See [(MIT) LICENSE](https://github.com/jhund/pdfbox_text_extraction/blob/master/LICENSE.txt) for details.

Version data entries

5 entries across 5 versions & 1 rubygems

Version Path
pdfbox_text_extraction-1.2.0 README.md
pdfbox_text_extraction-1.1.0 README.md
pdfbox_text_extraction-1.0.2 README.md
pdfbox_text_extraction-1.0.1 README.md
pdfbox_text_extraction-1.0.0 README.md