README.md in ocr-file-0.0.4 vs README.md in ocr-file-0.0.6

- old
+ new

@@ -47,10 +47,11 @@ automatic_reprocess: true, # Will possibly do double + the operations but can produce better results automatically # PDF to Image Processing optimise_pdf: true, extract_pdf_images: true, # if false will screenshot each PDF page temp_filename_prefix: 'image', + spelling_correction: true, # Will attempt to fix text at the end (not used for searchable pdf output) # Console Output verbose: true, timing: true, } @@ -74,10 +75,11 @@ ) doc.to_pdf # How to merge files into a single PDF: + # The files can be images or other PDFs filepaths = [] documents = file_paths.map { |path| OcrFile::ImageEngines::PdfEngine.open_pdf(path, password: '') } merged_document = OcrFile::ImageEngines::PdfEngine.merge(documents) OcrFile::ImageEngines::PdfEngine.save_pdf(merged_document, save_file_path, optimise: true) ``` @@ -118,9 +120,14 @@ - Tests - Configurable temp folder cleanup - Improve console output - Fix spaces in file names - Better verbosity +- Docker +- pdftk / pdf merge for text and bookmarks etc ... + - https://github.com/tesseract-ocr/tesseract/issues/660 + - tesseract -c naked_pdf=true +- ### Tests To run tests execute: $ rake test