#--
# ===============================================================================
# Copyright (c) 2005, Christopher Kleckner
# All rights reserved
#
# This file is part of the Rio library for ruby.
#
# Rio is free software; you can redistribute it and/or modify
# it under the terms of the GNU General Public License as published by
# the Free Software Foundation; either version 2 of the License, or
# (at your option) any later version.
#
# Rio is distributed in the hope that it will be useful,
# but WITHOUT ANY WARRANTY; without even the implied warranty of
# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
# GNU General Public License for more details.
#
# You should have received a copy of the GNU General Public License
# along with Rio; if not, write to the Free Software
# Foundation, Inc., 51 Franklin St, Fifth Floor, Boston, MA 02110-1301 USA
# ===============================================================================
#++
#
# To create the documentation for Rio run the command
# rake rdoc
# from the distribution directory. Then point your browser at the 'doc/rdoc' directory.
#
# Suggested Reading
# * RIO::Doc::SYNOPSIS
# * RIO::Doc::INTRO
# * RIO::Doc::HOWTO
# * RIO::Rio
#
# Rio is pre-alpha software.
# The documented interface and behavior is subject to change without notice.
module RIO
module Doc
=begin rdoc
The following example are provided without comment
array = rio('afile').readlines
rio('afile') > rio('acopy')
ary = rio('afile').chomp.lines[0...10]
rio('adir').rename.all.files('*.htm') do |file|
file.ext = '.html'
end
A basic familiarity with ruby and shell operations should allow a casual reader to guess what
these examples will do. How they are being performed may not be what a casual reader might expect.
I will explain these example to illustrate the Rio basics.
For many more examples please read the HOWTO document and the rdoc documentation.
== Example 1.
array = rio('afile').readlines
This uses IO#readlines to read the lines of 'afile' into an array.
=== Creating a Rio
Rio extends the module Kernel by adding one function _rio_, which acts as a constructor returning a Rio. This
constructor builds a description of the resource the Rio will access (usually a path). It does not open the
resource, check for its existance, or do anything except remember its specifcation. _rio_ returns the Rio
which can be chained to a Rio method as in this example or stored in a variable. This coud have been written
ario = rio('afile')
array = ario.readlines
ario = rio('afile')
In this case the resource specified is a relative path. After the first line
the Rio does know or care whether it
is a path to a file nor whether it exists. Rio provides many methods that only deal with a resource
at this level, much as the standard library classes Pathname and URI. It should be noted at this
point that Rio paths stored internally as a URL as specified in RFC 1738 and therefore use slashes as
separators. A resource can also be specified without separators, because _rio_ interprets multiple arguments
as parts of a path to be joined, and an array as an array of parts to be joined. So the following
all specify the same resource.
rio('adir/afile')
rio('adir','afile')
rio(%w/adir afile/)
The rio constructor can be used to specify non-file-system resources, but for this example we will restrict
our discussion to paths to entities on file-systems.
array = ario.readlines
Now that we have a Rio, we can call one of its methods; in this case _readlines_. This is an example of using
a Rio as a proxy for the builtin IO#readlines. Given the method _readlines_, the Rio opens 'afile' for reading,
calls readlines on the resulting IO object, closes the IO object, and returns the lines read.
== Example 2
rio('afile') > rio('acopy')
This copies the file 'afile' into the file 'acopy'.
The first things that happen here are the creation of the Rios. As described in Example 1, when created
a Rio simply remembers the specifcation of its resource. In this case, a relative path 'afile' on the
left and a relative path 'acopy' on the right.
Next the Rio#> (copy-to) method is called on the 'afile' Rio with the 'acopy' Rio as its argument. If that
looks like a greater-than operator to you, think Unix shell, with Rios '>' is the copy-to operator.
Upon seeing the copy-to operator, the Rio has all the information it needs to proceed. It determines that
it must be opened for reading, that its argument must be opened for writing, and that it's contents must
be copied to the resource referenced by it' argument -- and that is what it does. Then it closes itself and
its argument.
Consider if we had written this example this way.
afile = rio('afile')
acopy = rio('acopy')
afile > acopy
In this case we would still have variables referencing the Rios, and perhaps we would like do things a little
differently than described above. Be assured that the selection of mode and automatic closing of files are the
default behaviour and can be changed. Say we wanted 'afile' to remain open so that we could rewind it and make
a second copy, we might do something like this:
afile = rio('afile').nocloseoneof
afile > rio('acopy1')
afile.rewind > rio('acopy2')
afile.close
Actually the 'thinking-process' of the Rio when it sees a copy-to operator is much more complex that described above.
If its argument had been a rio referencing a directory, it would not have opened itself for reading,
but instead used FileUtils#cp to copy itself; if its argument had been a string, its contents would have ended up
in the string; If its argument had been an array, its lines would been elements of that array; if its argument had
been a socket, the its contents would have been copied to the socket. See the documentation for details.
== Example 3.
array = rio('afile').chomp.lines[0...10]
This fills +array+ with the first ten lines of 'afile', with each line chomped
The casual observer mentioned above might think that +lines+ returns an array of lines and that this
is a simple rewording of array = rio('afile').readlines[0...10] or even of
array = File.new('afile').readlines[0...10]. They would be wrong.
+chomp+ is a configuration method which turns on chomp-mode and returns the Rio. Chomp-mode causes all
line oriented read operations to perform a String#chomp on each line
=== Reading files
Rio provides four methods to select which part of the file is read and how the file is divided. They are +lines+,
+records+, +rows+ and +bytes+. Briefly, +lines+ specifies that the file should be read line by line and +bytes(n)+
specifies that the file should be read in _n_ byte chunks. All four take arguments which can be used to
filter lines or chunks in or out. For simple Rios +records+ and +rows+ only specify the filter arguments and
are provided for use be extensions. For example, the CSV extension returns an array of the columns in a line
when +records+ is used. In the absence of an extension +records+ and +rows+ behave like +lines+.
First lets rewrite our example as:
array = rio('afile').chomp.lines(0...10).to_a
The arguments to lines specify which records are to be read.
Arguments are interpreted based on their class as follows:
* Range - interpreted as a range of record numbers to be read
* Integer - interpreted as a one-element range
* RegExp - only matching records are processed
* Symbol - sent to each record, which is processed unless the result is false or nil
* Proc - called for each record, the record is processed unless the return value is false or nil
See the documentation for details and examples.
In our example we have specified the Range (0...10). The +lines+ method is just configuring the Rio, it does
not trigger
any IO operation. The fact that it was called and the arguments it was called with are stored away and the Rio
is returned for further configuration or an actual IO operation. When an IO operation is called the Range will be
used to limit processing to the first ten records. For example:
rio('afile').lines(0...10).each { |line| ... } # block will be called for the first 10 records
rio('afile').lines(0...10).to_a # the first 10 records will be returned in an array
rio('afile').lines(0...10) > rio('acopy') # the first 10 records will be copied to 'acopy'
"But wait", you say, "In our original example the range was an argument to the subscript operator, not to +lines+".
This works because the subscript operator processes its arguments as if they had been arguments to the
most-recently-called selection method and then calls +to_a+ on the rio. So our rewrite of the example
does precisely the same thing as the original
The big difference between the original example and the casual-observer's solution is that hers
creates an array of the entire contents and only returns the first 10 while the original only puts
10 records into the array.
As a sidenote, Rios also have an optimization that can really help in certain situations. If records are only
selected using Ranges, it stops iterating when it is beyond the point where it could possibly ever match. This
can make a dramatic difference when one is only interested in the first few lines of very large files.
== Example 4.
rio('adir').rename.all.files('*.htm') do |file|
file.ext = '.html'
end
This changes the extension of all .htm files below 'adir' to '.html'
First we create the rio as always.
Next we process the +rename+ method. When used as it is here -- without arguments -- it just turns on rename-mode
and returns the Rio.
+all+ is another configuration method, which causes directories to be processed recursively
+files+ is another configuration method. In example 3 we used +lines+ to select what to process when
iterating through a file. +files+ is used to select what to process when iterating through
directories. The arguments to +files+ can be the same as those for +lines+ except that Ranges can not
be used and globs can.
In our example, the argument to +files+ is a string which is treated as a glob. As with +lines+, +files+
does not trigger any IO, it just configures the Rio.
=== There's no action
The previous examples had something that triggered IO: +readlines+, +to_a+, +each+, > (copy-to). This example
does not. This example illustrates Rio's 'implied each'. All the configuration methods will call each for you
if a block is given. So, because a block follows the files method, it calls +each+ and passes it the block.
Let's recap. At this point we have a Rio with a resource specified. We have configured with a couple of modes,
'rename', and 'all', and we have limited the elements we want to process to entries that are files and
match the glob '*.htm'. +each+ causes the Rio to open the directory and call the block for each entry that is
both a file and matches the glob. It was also configured with +all+,so it descends into subdirectories to
find further matches and calles the block for each of them. The argument passed to the block is a Rio
referencing the entry on the file-system.
The _rename_mode_ we set has had no effect on our iteration at all, so why is it there? In general,
configuration options that are not applicable to a Rio are silently ignored, however, for directories
some of them are passed on to the Rios for each entry when iterating. Since +rename+ is one such option,
The example could have been written:
rio('adir').all.files('*.htm') do |file|
file.rename.ext = '.html'
end
The rename-with-no-args method affects the behaviour of the ext= option. In this case,
setting it for the directory, rather than for each file in the block seems to make the intent
of the code more clear, but that is a matter of personal taste. See the documentation for more
information on the rename-with-no-args method
== Suggested Reading
* RIO::Doc::SYNOPSIS
* RIO::Doc::INTRO
* RIO::Doc::HOWTO
* RIO::Rio
=end
module MISC
end
end
end