README.md in ocr-file-0.0.1 vs README.md in ocr-file-0.0.2
- old
+ new
@@ -40,13 +40,12 @@
# Cloud-Vision OCR
image_annotator: nil, # Needed for Cloud-Vision
type_of_ocr: OcrFile::OcrEngines::CloudVision::DOCUMENT_TEXT_DETECTION,
ocr_engine: 'tesseract', # 'cloud-vision'
# Image Pre-Processing
- image_pre_preprocess: true,
- effects: ['bw', 'norm'],
- threshold: 0.25,
+ image_preprocess: true,
+ effects: ['despeckle', 'deskew', 'enhance', 'sharpen', 'bw'], # Applies effects as listed. 'norm' is also available
# PDF to Image Processing
optimise_pdf: true,
extract_pdf_images: true, # if false will screenshot each PDF page
temp_filename_prefix: 'image',
# Console Output
@@ -82,27 +81,44 @@
```
### Notes / Tips
Set `extract_pdf_images` to `false` for higher quality OCR. However this will consume more temporary space per PDF page and also be considerably slower.
-Image pre-processing is not yet implemented.
+Image pre-processing only thresholds (bw), normalises the colour space, removes speckles and tries to straighten the image. Will make the end result Black and White but have far more accurate OCR (PDFs). The order of operations is important, but steps can be removed when necessary.
+### Simple CLI
+Once installed you can use `ocr-file` as a CLI. Its currently a reduced set of options. These are subject to change in future versions
+
+```
+# Basic Usage with console output
+ocr-file input_file_path output_folder_path
+
+# Output to PDF
+ocr-file input_file_path output_folder_path pdf
+
+# Output to TXT
+ocr-file input_file_path output_folder_path txt
+```
+
## Development
After checking out the repo, run `bin/setup` to install dependencies. Then, run `rake test` to run the tests. You can also run `bin/console` for an interactive prompt that will allow you to experiment.
To install this gem onto your local machine, run `bundle exec rake install`. To release a new version, update the version number in `version.rb`, and then run `bundle exec rake release`, which will create a git tag for the version, push git commits and tags, and push the `.gem` file to [rubygems.org](https://rubygems.org).
### TODOs
- input validation
-- CLI
+- Better CLI
- image processing
- password
- Base64 encoding
- requirements checking (installed dependencies etc ...)
- Tests
- Configurable temp folder cleanup
- Improve console output
+- Fix spaces in file names
+- Better verbosity
+- Timing
### Tests
To run tests execute:
$ rake test