README in ruby-ole-1.2.7 vs README in ruby-ole-1.2.8

- old
+ new

@@ -1,27 +1,110 @@ = Introduction -For now, see the docs for the Ole::Storage class. +The ruby-ole library provides a variety of functions primarily for +working with OLE2 structured storage files, such as those produced by +Microsoft Office - eg *.doc, *.msg etc. += Example Usage + +Here are some examples of how to use the library functionality, +categorised roughly by purpose. + +1. Reading and writing files within an OLE container + + The recommended way to manipulate the contents is via the + "file_system" API, whereby you use Ole::Storage instance methods + similar to the regular File and Dir class methods. + + ole = Ole::Storage.open('oleWithDirs.ole', 'rb+') + p ole.dir.entries('.') # => [".", "..", "dir1", "dir2", "file1"] + p ole.file.read('file1')[0, 25] # => "this is the entry 'file1'" + ole.dir.mkdir('newdir') + +2. Accessing OLE meta data + + Some convenience functions are provided for (currently read only) + access to OLE property sets and other sources of meta data. + + ole = Ole::Storage.open('test_word_95.doc') + p ole.meta_data.file_format # => "MSWordDoc" + p ole.meta_data.mime_type # => "application/msword" + p ole.meta_data.doc_author.split.first # => "Charles" + +3. Raw access to underlying OLE internals + + This is probably of little interest to most developers using the + library, but for some use cases you may need to drop down to the + lower level API on which the "file_system" API is constructed, + which exposes more of the format details. + + <tt>Ole::Storage</tt> files can have multiple files with the same name, + or with a slash in the name, and other things that are probably + strictly invalid. This API is the only way to access those files. + + You can access the header object directly: + + p ole.header.num_sbat # => 1 + p ole.header.magic.unpack('H*') # => ["d0cf11e0a1b11ae1"] + + You can directly access the array of all Dirent objects, + including the root: + + p ole.dirents.length # => 5 + puts ole.root.to_tree + # => + - #<Dirent:"Root Entry"> + |- #<Dirent:"\001Ole" size=20 data="\001\000\000\002\000..."> + |- #<Dirent:"\001CompObj" size=98 data="\001\000\376\377\003..."> + |- #<Dirent:"WordDocument" size=2574 data="\334\245e\000-..."> + \- #<Dirent:"\005SummaryInformation" size=54788 data="\376\377\000\000\001..."> + + You can access (through RangesIO methods, or by using the + relevant Dirent and AllocationTable methods) information like where within + the container a stream is located (these are offset/length pairs): + + p ole.root["\001CompObj"].open { |io| io.ranges } # => [[0, 64], [64, 34]] + +See the documentation for each class for more details. + += Thanks + +* The code contained in this project was initially based on chicago's libole + (source available at http://prdownloads.sf.net/chicago/ole.tgz). + +* It was later augmented with some corrections by inspecting pole, and (purely + for header definitions) gsf. + +* The property set parsing code came from the apache java project POIFS. + +* The excellent idea for using a pseudo file system style interface by providing + #file and #dir methods which mimic File and Dir, was borrowed (along with almost + unchanged tests!) from Thomas Sondergaard's rubyzip. + = TODO -== 1.2.8 +== 1.2.9 -* fix property sets a bit more. see TODO in Ole::Storage::MetaData +* add buffering to rangesio so that performance for small reads and writes + isn't so awful. maybe try and remove the bottlenecks of unbuffered first + with more profiling, then implement the buffering on top of that. * fix mode strings - like truncate when using 'w+', supporting append 'a+' modes etc. done? * make ranges io obey readable vs writeable modes. * more RangesIO completion. ie, doesn't support #<< at the moment. -* ability to zero out padding and unused blocks -* case insensitive mode for ole/file_system? == 1.3.1 -* fix this README :). maybe move todo out, and put something useful here. +* fix property sets a bit more. see TODO in Ole::Storage::MetaData +* ability to zero out padding and unused blocks +* case insensitive mode for ole/file_system? +* better tests for mbat support. +* further doc cleanup == Longer term * more benchmarking, profiling, and speed fixes. was thinking vs other ruby filesystems (eg, vs File/Dir itself, and vs rubyzip), and vs other ole implementations (maybe perl's, and poifs) just to check its in the ballpark, with no remaining silly bottlenecks. * supposedly vba does something weird to ole files. test that. +