= BinData
A declarative way to read and write structured binary data.
== What is it for?
Do you ever find yourself writing code like this?
io = File.open(...)
len = io.read(2).unpack("v")
name = io.read(len)
width, height = io.read(8).unpack("VV")
puts "Rectangle #{name} is #{width} x #{height}"
It's ugly, violates DRY and feels like you're writing Perl, not Ruby.
There is a better way.
class Rectangle < BinData::MultiValue
uint16le :len
string :name, :read_length => :len
uint32le :width
uint32le :height
end
io = File.open(...)
r = Rectangle.read(io)
puts "Rectangle #{r.name} is #{r.width} x #{r.height}"
BinData makes it easy to specify the structure of the data you are
manipulating.
Read on for the tutorial, or go straight to the
download[http://rubyforge.org/frs/?group_id=3252] page.
== Syntax
BinData declarations are easy to read. Here's an example.
class MyFancyFormat < BinData::MultiValue
stringz :comment
uint8 :count, :check_value => lambda { (value % 2) == 0 }
array :some_ints, :type => :int32be, :initial_length => :count
end
The structure of the data in this example is
1. A zero terminated string
2. An unsigned 8bit integer which must by even
3. A sequence of unsigned 32bit integers in big endian form, the total
number of which is determined by the value of the 8bit integer.
The BinData declaration matches the english description closely. Just for
fun, lets look at how we'd implement this using #pack and #unpack. Here's
the writing code, have a go at the reading code.
comment = "this is a comment"
some_ints = [2, 3, 8, 9, 1, 8]
File.open(...) do |io|
io.write([comment, some_ints.size, *some_ints].pack("Z*CN*"))
end
The general format of a BinData declaration is a class containing one or more
fields.
class MyName < BinData::MultiValue
type field_name, :param1 => "foo", :param2 => bar, ...
...
end
*type* is the name of a supplied type (e.g. uint32be, +string+)
or a user defined type. For user defined types, convert the class name
from CamelCase to lowercase underscore_style.
*field_name* is the name by which you can access the data. Use either a
String or a Symbol. You may specify a name as nil, but this is described
later in the tutorial.
Each field may have *parameters* for how to process the data. The
parameters are passed as a Hash using Symbols for keys.
== Handling dependencies between fields
A common occurance in binary file formats is one field depending upon the
value of another. e.g. A string preceded by it's length.
As an example, let's assume a Pascal style string where the byte preceding
the string contains the string's length.
# reading
io = File.open(...)
len = io.getc
str = io.read(len)
puts "string is " + str
# writing
io = File.open(...)
str = "this is a string"
io.putc(str.length)
io.write(str)
Here's how we'd implement the same example with BinData.
class PascalString < BinData::MultiValue
uint8 :len, :value => lambda { data.length }
string :data, :read_length => :len
end
# reading
io = File.open(...)
ps = PascalString.new
ps.read(io)
puts "string is " + ps.data
# writing
io = File.open(...)
ps = PascalString.new
ps.data = "this is a string"
ps.write(io)
This syntax needs explaining. Let's simplify by examining reading and
writing separately.
class PascalStringReader < BinData::MultiValue
uint8 :len
string :data, :read_length => :len
end
This states that when reading the string, the initial length of the string
(and hence the number of bytes to read) is determined by the value of the
+len+ field.
Note that :read_length => :len is syntactic sugar for
:read_length => lambda { len }, but more on that later.
class PascalStringWriter < BinData::MultiValue
uint8 :len, :value => lambda { data.length }
string :data
end
This states that the value of +len+ is always equal to the length of +data+.
+len+ may not be manually modified.
Combining these two definitions gives the definition for +PascalString+ as
previously defined.
Once thing to note with dependencies, is that a field can only depend on one
before it. You can't have a string which has the characters first and the
length afterwards.
== Predefined Types
These are the predefined types. Custom types can be created by composing
these types.
BinData::String:: A sequence of bytes.
BinData::Stringz:: A zero terminated sequence of bytes.
BinData::Array:: A list of objects of the same type.
BinData::Choice:: A choice between several objects.
BinData::Struct:: An ordered collection of named objects.
BinData::Int8:: Signed 8 bit integer.
BinData::Int16le:: Signed 16 bit integer (little endian).
BinData::Int16be:: Signed 16 bit integer (big endian).
BinData::Int32le:: Signed 32 bit integer (little endian).
BinData::Int32be:: Signed 32 bit integer (big endian).
BinData::Int64le:: Signed 64 bit integer (little endian).
BinData::Int64be:: Signed 64 bit integer (big endian).
BinData::Uint8:: Unsigned 8 bit integer.
BinData::Uint16le:: Unsigned 16 bit integer (little endian).
BinData::Uint16be:: Unsigned 16 bit integer (big endian).
BinData::Uint32le:: Unsigned 32 bit integer (little endian).
BinData::Uint32be:: Unsigned 32 bit integer (big endian).
BinData::Uint64le:: Unsigned 64 bit integer (little endian).
BinData::Uint64be:: Unsigned 64 bit integer (big endian).
BinData::FloatLe:: Single precision floating point number (little endian).
BinData::FloatBe:: Single precision floating point number (big endian).
BinData::DoubleLe:: Double precision floating point number (little endian).
BinData::DoubleBe:: Double precision floating point number (big endian).
BinData::Rest:: Consumes the rest of the input stream.
== Parameters
class PascalStringWriter < BinData::MultiValue
uint8 :len, :value => lambda { data.length }
string :data
end
Revisiting the Pascal string writer, we see that a field can take
parameters. Parameters are passed as a Hash, where the key is a symbol.
It should be noted that parameters are designed to be lazily evaluated,
possibly multiple times. This means that any parameter value must not have
side effects.
Here are some examples of legal values for parameters.
* :param => 5
* :param => lambda { 5 + 2 }
* :param => lambda { foo + 2 }
* :param => :foo
The simplest case is when the value is a literal value, such as 5.
If the value is not a literal, it is expected to be a lambda. The lambda
will be evaluated in the context of the parent, in this case the parent is
an instance of +PascalStringWriter+.
If the value is a symbol, it is taken as syntactic sugar for a lambda
containing the value of the symbol.
e.g :param => :foo is :param => lambda { foo }
== Saving Typing
The endianess of numeric types must be explicitly defined so that the code
produced is independent of architecture. Explicitly specifying the
endianess of each numeric type can become tedious, so the following
shortcut is provided.
class A < BinData::MultiValue
endian :little
uint16 :a
uint32 :b
double :c
uint32be :d
array :e, :type => :int16
end
is equivalent to:
class A < BinData::MultiValue
uint16le :a
uint32le :b
double_le :c
uint32be :d
array :e, :type => :int16le
end
Using the endian keyword improves the readability of the declaration as well
as reducing the amount of typing necessary. Note that the endian keyword will
cascade to nested types, as illustrated with the array in the above example.
== Creating custom types
Custom types should be created by subclassing BinData::MultiValue or
BinData::SingleValue. Ocassionally it may be useful to subclass
BinData::Single. Subclassing other classes may have unexpected results
and is unsupported.
Let us revisit the Pascal String example.
class PascalString < BinData::MultiValue
uint8 :len, :value => lambda { data.length }
string :data, :read_length => :len
end
We'd like to make PascalString a custom type that behaves like a
BinData::Single object so we can use :initial_value etc. Here's an
example usage of what we'd like:
class Favourites < BinData::MultiValue
pascal_string :language, :initial_value => "ruby"
pascal_string :os, :initial_value => "unix"
end
f = Favourites.new
f.os = "freebsd"
f.to_s #=> "\004ruby\007freebsd"
We create this type of custom string by inheriting from BinData::SingleValue
and implementing the #get and #set methods.
class PascalString < BinData::SingleValue
uint8 :len, :value => lambda { data.length }
string :data, :read_length => :len
def get; self.data; end
def set(v) self.data = v; end
end
If the type we are creating represents a single value then inherit from
BinData::SingleValue, otherwise inherit from BinData::MultiValue.
== License
BinData is released under the same license as Ruby.
Copyright (c) 2007, 2008 Dion Mendel