README.md in ocr-file-0.0.3 vs README.md in ocr-file-0.0.4
- old
+ new
@@ -42,16 +42,18 @@
type_of_ocr: OcrFile::OcrEngines::CloudVision::DOCUMENT_TEXT_DETECTION,
ocr_engine: 'tesseract', # 'cloud-vision'
# Image Pre-Processing
image_preprocess: true,
effects: ['despeckle', 'deskew', 'enhance', 'sharpen', 'remove_shadow', 'bw'], # Applies effects as listed. 'norm' is also available
+ automatic_reprocess: true, # Will possibly do double + the operations but can produce better results automatically
# PDF to Image Processing
optimise_pdf: true,
extract_pdf_images: true, # if false will screenshot each PDF page
temp_filename_prefix: 'image',
# Console Output
verbose: true,
+ timing: true,
}
doc = OcrFile::Document.new(
original_file_path: '/path-to-original-file/', # supports PDFs and images
save_file_path: '/folder-to-save-to/',
@@ -83,10 +85,12 @@
### Notes / Tips
Set `extract_pdf_images` to `false` for higher quality OCR. However this will consume more temporary space per PDF page and also be considerably slower.
Image pre-processing only thresholds (bw), normalises the colour space, removes speckles, removes shadows and tries to straighten the image. Will make the end result Black and White but have far more accurate OCR (PDFs). The order of operations is important, but steps can be removed when necessary. Expanding the colour dynamic range with `'norm'` can also be done but isn't recommended.
+`automatic_reprocess` is much slower as it has to re-do operations per image (in some cases) but will select the best result for each page.
+
### Simple CLI
Once installed you can use `ocr-file` as a CLI. Its currently a reduced set of options. These are subject to change in future versions
```
# Basic Usage with console output
@@ -106,19 +110,17 @@
To install this gem onto your local machine, run `bundle exec rake install`. To release a new version, update the version number in `version.rb`, and then run `bundle exec rake release`, which will create a git tag for the version, push git commits and tags, and push the `.gem` file to [rubygems.org](https://rubygems.org).
### TODOs
- input validation
- Better CLI
-- image processing
- password
- Base64 encoding
- requirements checking (installed dependencies etc ...)
- Tests
- Configurable temp folder cleanup
- Improve console output
- Fix spaces in file names
- Better verbosity
-- Timing
### Tests
To run tests execute:
$ rake test