Module | RIO::Doc::MISC | lib/rio/doc/MISC.rb |
The following example are provided without comment
array = rio('afile').readlines rio('afile') > rio('acopy') ary = rio('afile').chomp.lines[0...10] rio('adir').rename.all.files('*.htm') do |file| file.ext = '.html' end
A basic familiarity with ruby and shell operations should allow a casual reader to guess what these examples will do. How they are being performed may not be what a casual reader might expect. I will explain these example to illustrate the Rio basics.
For many more examples please read the HOWTO document and the rdoc documentation.
array = rio('afile').readlines
This uses IO#readlines to read the lines of ‘afile’ into an array.
Rio extends the module Kernel by adding one function rio, which acts as a constructor returning a Rio. This constructor builds a description of the resource the Rio will access (usually a path). It does not open the resource, check for its existance, or do anything except remember its specifcation. rio returns the Rio which can be chained to a Rio method as in this example or stored in a variable. This coud have been written
ario = rio('afile') array = ario.readlines ario = rio('afile')
In this case the resource specified is a relative path. After the first line the Rio does know or care whether it is a path to a file nor whether it exists. Rio provides many methods that only deal with a resource at this level, much as the standard library classes Pathname and URI. It should be noted at this point that Rio paths stored internally as a URL as specified in RFC 1738 and therefore use slashes as separators. A resource can also be specified without separators, because rio interprets multiple arguments as parts of a path to be joined, and an array as an array of parts to be joined. So the following all specify the same resource.
rio('adir/afile') rio('adir','afile') rio(%w/adir afile/)
The rio constructor can be used to specify non-file-system resources, but for this example we will restrict our discussion to paths to entities on file-systems.
array = ario.readlines
Now that we have a Rio, we can call one of its methods; in this case readlines. This is an example of using a Rio as a proxy for the builtin IO#readlines. Given the method readlines, the Rio opens ‘afile’ for reading, calls readlines on the resulting IO object, closes the IO object, and returns the lines read.
rio('afile') > rio('acopy')
This copies the file ‘afile’ into the file ‘acopy’.
The first things that happen here are the creation of the Rios. As described in Example 1, when created a Rio simply remembers the specifcation of its resource. In this case, a relative path ‘afile’ on the left and a relative path ‘acopy’ on the right.
Next the Rio#> (copy-to) method is called on the ‘afile’ Rio with the ‘acopy’ Rio as its argument. If that looks like a greater-than operator to you, think Unix shell, with Rios ’>’ is the copy-to operator.
Upon seeing the copy-to operator, the Rio has all the information it needs to proceed. It determines that it must be opened for reading, that its argument must be opened for writing, and that it’s contents must be copied to the resource referenced by it’ argument — and that is what it does. Then it closes itself and its argument.
Consider if we had written this example this way.
afile = rio('afile') acopy = rio('acopy') afile > acopy
In this case we would still have variables referencing the Rios, and perhaps we would like do things a little differently than described above. Be assured that the selection of mode and automatic closing of files are the default behaviour and can be changed. Say we wanted ‘afile’ to remain open so that we could rewind it and make a second copy, we might do something like this:
afile = rio('afile').nocloseoneof afile > rio('acopy1') afile.rewind > rio('acopy2') afile.close
Actually the ‘thinking-process’ of the Rio when it sees a copy-to operator is much more complex that described above. If its argument had been a rio referencing a directory, it would not have opened itself for reading, but instead used FileUtils#cp to copy itself; if its argument had been a string, its contents would have ended up in the string; If its argument had been an array, its lines would been elements of that array; if its argument had been a socket, the its contents would have been copied to the socket. See the documentation for details.
array = rio('afile').chomp.lines[0...10]
This fills array with the first ten lines of ‘afile’, with each line chomped
The casual observer mentioned above might think that lines returns an array of lines and that this is a simple rewording of array = rio('afile').readlines or even of array = File.new('afile').readlines. They would be wrong.
chomp is a configuration method which turns on chomp-mode and returns the Rio. Chomp-mode causes all line oriented read operations to perform a String#chomp on each line
Rio provides four methods to select which part of the file is read and how the file is divided. They are lines, records, rows and bytes. Briefly, lines specifies that the file should be read line by line and +bytes(n)+ specifies that the file should be read in n byte chunks. All four take arguments which can be used to filter lines or chunks in or out. For simple Rios records and rows only specify the filter arguments and are provided for use be extensions. For example, the CSV extension returns an array of the columns in a line when records is used. In the absence of an extension records and rows behave like lines.
First lets rewrite our example as:
array = rio('afile').chomp.lines(0...10).to_a
The arguments to lines specify which records are to be read. Arguments are interpreted based on their class as follows:
See the documentation for details and examples.
In our example we have specified the Range (0...10). The lines method is just configuring the Rio, it does not trigger any IO operation. The fact that it was called and the arguments it was called with are stored away and the Rio is returned for further configuration or an actual IO operation. When an IO operation is called the Range will be used to limit processing to the first ten records. For example:
rio('afile').lines(0...10).each { |line| ... } # block will be called for the first 10 records rio('afile').lines(0...10).to_a # the first 10 records will be returned in an array rio('afile').lines(0...10) > rio('acopy') # the first 10 records will be copied to 'acopy'
"But wait", you say, "In our original example the range was an argument to the subscript operator, not to lines". This works because the subscript operator processes its arguments as if they had been arguments to the most-recently-called selection method and then calls to_a on the rio. So our rewrite of the example does precisely the same thing as the original
The big difference between the original example and the casual-observer’s solution is that hers creates an array of the entire contents and only returns the first 10 while the original only puts 10 records into the array.
As a sidenote, Rios also have an optimization that can really help in certain situations. If records are only selected using Ranges, it stops iterating when it is beyond the point where it could possibly ever match. This can make a dramatic difference when one is only interested in the first few lines of very large files.
rio('adir').rename.all.files('*.htm') do |file| file.ext = '.html' end
This changes the extension of all .htm files below ‘adir’ to ’.html’
First we create the rio as always.
Next we process the rename method. When used as it is here — without arguments — it just turns on rename-mode and returns the Rio.
all is another configuration method, which causes directories to be processed recursively
files is another configuration method. In example 3 we used lines to select what to process when iterating through a file. files is used to select what to process when iterating through directories. The arguments to files can be the same as those for lines except that Ranges can not be used and globs can.
In our example, the argument to files is a string which is treated as a glob. As with lines, files does not trigger any IO, it just configures the Rio.
The previous examples had something that triggered IO: readlines, to_a, each, > (copy-to). This example does not. This example illustrates Rio’s ‘implied each’. All the configuration methods will call each for you if a block is given. So, because a block follows the files method, it calls each and passes it the block.
Let’s recap. At this point we have a Rio with a resource specified. We have configured with a couple of modes, ‘rename’, and ‘all’, and we have limited the elements we want to process to entries that are files and match the glob ’*.htm’. each causes the Rio to open the directory and call the block for each entry that is both a file and matches the glob. It was also configured with all,so it descends into subdirectories to find further matches and calles the block for each of them. The argument passed to the block is a Rio referencing the entry on the file-system.
The rename_mode we set has had no effect on our iteration at all, so why is it there? In general, configuration options that are not applicable to a Rio are silently ignored, however, for directories some of them are passed on to the Rios for each entry when iterating. Since rename is one such option, The example could have been written:
rio('adir').all.files('*.htm') do |file| file.rename.ext = '.html' end
The rename-with-no-args method affects the behaviour of the ext= option. In this case, setting it for the directory, rather than for each file in the block seems to make the intent of the code more clear, but that is a matter of personal taste. See the documentation for more information on the rename-with-no-args method
Copyright © 2005 Christopher Kleckner. All rights reserved.