= BinData A declarative way to read and write structured binary data. == What is it for? Do you ever find yourself writing code like this? io = File.open(...) len = io.read(2).unpack("v") name = io.read(len) width, height = io.read(8).unpack("VV") puts "Rectangle #{name} is #{width} x #{height}" It's ugly, violates DRY and feels like you're writing Perl, not Ruby. There is a better way. class Rectangle < BinData::Struct uint16le :len string :name, :initial_length => :len uint32le :width uint32le :height end io = File.open(...) r = Rectangle.read(io) puts "Rectangle #{r.name} is #{r.width} x #{r.height}" BinData makes it easy to specify the structure of the data you are manipulating. Read on for the tutorial, or go straight to the download[http://rubyforge.org/frs/?group_id=3252] page. == Syntax BinData declarations are easy to read. Here's an example. class MyFancyFormat < BinData::Struct stringz :comment uint8 :count, :check_value => lambda { (value % 2) == 0 } array :some_ints, :type => :int32be, :initial_length => :count end The structure of the data in this example is 1. A zero terminated string 2. An unsigned 8bit integer which must by even 3. A sequence of unsigned 32bit integers in big endian form, the total number of which is determined by the value of the 8bit integer. The BinData declaration matches the english description closely. Just for fun, lets look at how we'd implement this using #pack and #unpack. Here's the writing code, have a go at the reading code. comment = "this is a comment" some_ints = [2, 3, 8, 9, 1, 8] File.open(...) do |io| io.write([comment, some_ints.size, *some_ints].pack("Z*CN*")) end The general format of a BinData declaration is a class containing one or more fields. class MyName < BinData::Struct type field_name, :param1 => "foo", :param2 => bar, ... ... end *type* is the name of a supplied type (e.g. uint32be, +string+) or a user defined type. For user defined types, convert the class name from CamelCase to lowercase underscore_style. *field_name* is the name by which you can access the data. Use either a String or a Symbol. You may specify a name as nil, but this is described later in the tutorial. Each field may have *parameters* for how to process the data. The parameters are passed as a Hash using Symbols for keys. == Handling dependencies between fields A common occurance in binary file formats is one field depending upon the value of another. e.g. A string preceded by it's length. As an example, let's assume a Pascal style string where the byte preceding the string contains the string's length. # reading io = File.open(...) len = io.getc str = io.read(len) puts "string is " + str # writing io = File.open(...) str = "this is a string" io.putc(str.length) io.write(str) Here's how we'd implement the same example with BinData. class PascalString < BinData::Struct uint8 :len, :value => lambda { data.length } string :data, :initial_length => :len end # reading io = File.open(...) ps = PascalString.new ps.read(io) puts "string is " + ps.data # writing io = File.open(...) ps = PascalString.new ps.data = "this is a string" ps.write(io) This syntax needs explaining. Let's simplify by examining reading and writing separately. class PascalStringReader < BinData::Struct uint8 :len string :data, :initial_length => :len end This states that when reading the string, the initial length of the string (and hence the number of bytes to read) is determined by the value of the +len+ field. Note that :initial_length => :len is syntactic sugar for :initial_length => lambda { len }, but more on that later. class PascalStringWriter < BinData::Struct uint8 :len, :value => lambda { data.length } string :data end This states that the value of +len+ is always equal to the length of +data+. +len+ may not be manually modified. Combining these two definitions gives the definition for +PascalString+ as previously defined. Once thing to note with dependencies, is that a field can only depend on one before it. You can't have a string which has the characters first and the length afterwards. == Predefined Types These are the predefined types. Custom types can be created by composing these types. BinData::Int8:: Signed 8 bit integer. BinData::Int16le:: Signed 16 bit integer (little endian). BinData::Int16be:: Signed 16 bit integer (big endian). BinData::Int32le:: Signed 32 bit integer (little endian). BinData::Int32be:: Signed 32 bit integer (big endian). BinData::Uint8:: Unsigned 8 bit integer. BinData::Uint16le:: Unsigned 16 bit integer (little endian). BinData::Uint16be:: Unsigned 16 bit integer (big endian). BinData::Uint32le:: Unsigned 32 bit integer (little endian). BinData::Uint32be:: Unsigned 32 bit integer (big endian). BinData::FloatLe:: Single precision floating point number (little endian). BinData::FloatBe:: Single precision floating point number (big endian). BinData::DoubleLe:: Double precision floating point number (little endian). BinData::DoubleBe:: Double precision floating point number (big endian). BinData::String:: A sequence of bytes. BinData::Stringz:: A zero terminated sequence of bytes. BinData::Array:: A list of objects of the same type. BinData::Choice:: A choice between several objects. BinData::Struct:: An ordered collection of named objects. == Parameters class PascalStringWriter < BinData::Struct uint8 :len, :value => lambda { data.length } string :data end Revisiting the Pascal string writer, we see that a field can take parameters. Parameters are passed as a Hash, where the key is a symbol. It should be noted that parameters are designed to be lazily evaluated, possibly multiple times. This means that any parameter value must not have side effects. Here are some examples of legal values for parameters. * :param => 5 * :param => lambda { 5 + 2 } * :param => lambda { foo + 2 } * :param => :foo The simplest case is when the value is a literal value, such as 5. If the value is not a literal, it is expected to be a lambda. The lambda will be evaluated in the context of the parent, in this case the parent is an instance of +PascalStringWriter+. If the value is a symbol, it is taken as syntactic sugar for a lambda containing the value of the symbol. e.g :param => :foo is :param => lambda { foo } == Saving Typing The endianess of numeric types must be explicitly defined so that the code produced is independent of architecture. Explicitly specifying the endianess of each numeric type can become tedious, so the following shortcut is provided. class A < BinData::Struct endian :little uint16 :a uint32 :b double :c uint32be :d array :e, :type => :int16 end is equivalent to: class A < BinData::Struct uint16le :a uint32le :b double_le :c uint32be :d array :e, :type => :int16le end Using the endian keyword improves the readability of the declaration as well as reducing the amount of typing necessary. Note that the endian keyword will cascade to nested types, as illustrated with the array in the above example. == Creating custom types Custom types should be created by subclassing BinData::Struct. Ocassionally it may be useful to subclass BinData::Single. Subclassing other classes may have unexpected results and is unsupported. == License BinData is released under the same license as Ruby. Copyright (c) 2007 Dion Mendel