#-- # =============================================================================== # Copyright (c) 2005, Christopher Kleckner # All rights reserved # # This file is part of the Rio library for ruby. # # Rio is free software; you can redistribute it and/or modify # it under the terms of the GNU General Public License as published by # the Free Software Foundation; either version 2 of the License, or # (at your option) any later version. # # Rio is distributed in the hope that it will be useful, # but WITHOUT ANY WARRANTY; without even the implied warranty of # MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the # GNU General Public License for more details. # # You should have received a copy of the GNU General Public License # along with Rio; if not, write to the Free Software # Foundation, Inc., 51 Franklin St, Fifth Floor, Boston, MA 02110-1301 USA # =============================================================================== #++ # # To create the documentation for Rio run the command # rake rdoc # from the distribution directory. Then point your browser at the 'doc/rdoc' directory. # # Suggested Reading # * RIO::Doc::SYNOPSIS # * RIO::Doc::INTRO # * RIO::Doc::HOWTO # * RIO::Rio # # Rio is pre-alpha software. # The documented interface and behavior is subject to change without notice. module RIO module Doc =begin rdoc The following example are provided without comment array = rio('afile').readlines rio('afile') > rio('acopy') ary = rio('afile').chomp.lines[0...10] rio('adir').rename.all.files('*.htm') do |file| file.ext = '.html' end A basic familiarity with ruby and shell operations should allow a casual reader to guess what these examples will do. How they are being performed may not be what a casual reader might expect. I will explain these example to illustrate the Rio basics. For many more examples please read the HOWTO document and the rdoc documentation. == Example 1. array = rio('afile').readlines This uses IO#readlines to read the lines of 'afile' into an array. === Creating a Rio Rio extends the module Kernel by adding one function _rio_, which acts as a constructor returning a Rio. This constructor builds a description of the resource the Rio will access (usually a path). It does not open the resource, check for its existance, or do anything except remember its specifcation. _rio_ returns the Rio which can be chained to a Rio method as in this example or stored in a variable. This coud have been written ario = rio('afile') array = ario.readlines ario = rio('afile') In this case the resource specified is a relative path. After the first line the Rio does know or care whether it is a path to a file nor whether it exists. Rio provides many methods that only deal with a resource at this level, much as the standard library classes Pathname and URI. It should be noted at this point that Rio paths stored internally as a URL as specified in RFC 1738 and therefore use slashes as separators. A resource can also be specified without separators, because _rio_ interprets multiple arguments as parts of a path to be joined, and an array as an array of parts to be joined. So the following all specify the same resource. rio('adir/afile') rio('adir','afile') rio(%w/adir afile/) The rio constructor can be used to specify non-file-system resources, but for this example we will restrict our discussion to paths to entities on file-systems. array = ario.readlines Now that we have a Rio, we can call one of its methods; in this case _readlines_. This is an example of using a Rio as a proxy for the builtin IO#readlines. Given the method _readlines_, the Rio opens 'afile' for reading, calls readlines on the resulting IO object, closes the IO object, and returns the lines read. == Example 2 rio('afile') > rio('acopy') This copies the file 'afile' into the file 'acopy'. The first things that happen here are the creation of the Rios. As described in Example 1, when created a Rio simply remembers the specifcation of its resource. In this case, a relative path 'afile' on the left and a relative path 'acopy' on the right. Next the Rio#> (copy-to) method is called on the 'afile' Rio with the 'acopy' Rio as its argument. If that looks like a greater-than operator to you, think Unix shell, with Rios '>' is the copy-to operator. Upon seeing the copy-to operator, the Rio has all the information it needs to proceed. It determines that it must be opened for reading, that its argument must be opened for writing, and that it's contents must be copied to the resource referenced by it' argument -- and that is what it does. Then it closes itself and its argument. Consider if we had written this example this way. afile = rio('afile') acopy = rio('acopy') afile > acopy In this case we would still have variables referencing the Rios, and perhaps we would like do things a little differently than described above. Be assured that the selection of mode and automatic closing of files are the default behaviour and can be changed. Say we wanted 'afile' to remain open so that we could rewind it and make a second copy, we might do something like this: afile = rio('afile').nocloseoneof afile > rio('acopy1') afile.rewind > rio('acopy2') afile.close Actually the 'thinking-process' of the Rio when it sees a copy-to operator is much more complex that described above. If its argument had been a rio referencing a directory, it would not have opened itself for reading, but instead used FileUtils#cp to copy itself; if its argument had been a string, its contents would have ended up in the string; If its argument had been an array, its lines would been elements of that array; if its argument had been a socket, the its contents would have been copied to the socket. See the documentation for details. == Example 3. array = rio('afile').chomp.lines[0...10] This fills +array+ with the first ten lines of 'afile', with each line chomped The casual observer mentioned above might think that +lines+ returns an array of lines and that this is a simple rewording of array = rio('afile').readlines[0...10] or even of array = File.new('afile').readlines[0...10]. They would be wrong. +chomp+ is a configuration method which turns on chomp-mode and returns the Rio. Chomp-mode causes all line oriented read operations to perform a String#chomp on each line === Reading files Rio provides four methods to select which part of the file is read and how the file is divided. They are +lines+, +records+, +rows+ and +bytes+. Briefly, +lines+ specifies that the file should be read line by line and +bytes(n)+ specifies that the file should be read in _n_ byte chunks. All four take arguments which can be used to filter lines or chunks in or out. For simple Rios +records+ and +rows+ only specify the filter arguments and are provided for use be extensions. For example, the CSV extension returns an array of the columns in a line when +records+ is used. In the absence of an extension +records+ and +rows+ behave like +lines+. First lets rewrite our example as: array = rio('afile').chomp.lines(0...10).to_a The arguments to lines specify which records are to be read. Arguments are interpreted based on their class as follows: * Range - interpreted as a range of record numbers to be read * Integer - interpreted as a one-element range * RegExp - only matching records are processed * Symbol - sent to each record, which is processed unless the result is false or nil * Proc - called for each record, the record is processed unless the return value is false or nil See the documentation for details and examples. In our example we have specified the Range (0...10). The +lines+ method is just configuring the Rio, it does not trigger any IO operation. The fact that it was called and the arguments it was called with are stored away and the Rio is returned for further configuration or an actual IO operation. When an IO operation is called the Range will be used to limit processing to the first ten records. For example: rio('afile').lines(0...10).each { |line| ... } # block will be called for the first 10 records rio('afile').lines(0...10).to_a # the first 10 records will be returned in an array rio('afile').lines(0...10) > rio('acopy') # the first 10 records will be copied to 'acopy' "But wait", you say, "In our original example the range was an argument to the subscript operator, not to +lines+". This works because the subscript operator processes its arguments as if they had been arguments to the most-recently-called selection method and then calls +to_a+ on the rio. So our rewrite of the example does precisely the same thing as the original The big difference between the original example and the casual-observer's solution is that hers creates an array of the entire contents and only returns the first 10 while the original only puts 10 records into the array. As a sidenote, Rios also have an optimization that can really help in certain situations. If records are only selected using Ranges, it stops iterating when it is beyond the point where it could possibly ever match. This can make a dramatic difference when one is only interested in the first few lines of very large files. == Example 4. rio('adir').rename.all.files('*.htm') do |file| file.ext = '.html' end This changes the extension of all .htm files below 'adir' to '.html' First we create the rio as always. Next we process the +rename+ method. When used as it is here -- without arguments -- it just turns on rename-mode and returns the Rio. +all+ is another configuration method, which causes directories to be processed recursively +files+ is another configuration method. In example 3 we used +lines+ to select what to process when iterating through a file. +files+ is used to select what to process when iterating through directories. The arguments to +files+ can be the same as those for +lines+ except that Ranges can not be used and globs can. In our example, the argument to +files+ is a string which is treated as a glob. As with +lines+, +files+ does not trigger any IO, it just configures the Rio. === There's no action The previous examples had something that triggered IO: +readlines+, +to_a+, +each+, > (copy-to). This example does not. This example illustrates Rio's 'implied each'. All the configuration methods will call each for you if a block is given. So, because a block follows the files method, it calls +each+ and passes it the block. Let's recap. At this point we have a Rio with a resource specified. We have configured with a couple of modes, 'rename', and 'all', and we have limited the elements we want to process to entries that are files and match the glob '*.htm'. +each+ causes the Rio to open the directory and call the block for each entry that is both a file and matches the glob. It was also configured with +all+,so it descends into subdirectories to find further matches and calles the block for each of them. The argument passed to the block is a Rio referencing the entry on the file-system. The _rename_mode_ we set has had no effect on our iteration at all, so why is it there? In general, configuration options that are not applicable to a Rio are silently ignored, however, for directories some of them are passed on to the Rios for each entry when iterating. Since +rename+ is one such option, The example could have been written: rio('adir').all.files('*.htm') do |file| file.rename.ext = '.html' end The rename-with-no-args method affects the behaviour of the ext= option. In this case, setting it for the directory, rather than for each file in the block seems to make the intent of the code more clear, but that is a matter of personal taste. See the documentation for more information on the rename-with-no-args method == Suggested Reading * RIO::Doc::SYNOPSIS * RIO::Doc::INTRO * RIO::Doc::HOWTO * RIO::Rio =end module MISC end end end