README in bindata-0.11.1 vs README in bindata-1.0.0
- old
+ new
@@ -1,300 +1,1143 @@
-= BinData
+Title: BinData Reference Manual
+{:ruby: lang=ruby html_use_syntax=true}
+
+# BinData
+
A declarative way to read and write structured binary data.
-== What is it for?
+## What is it for?
Do you ever find yourself writing code like this?
- io = File.open(...)
- len = io.read(2).unpack("v")
- name = io.read(len)
- width, height = io.read(8).unpack("VV")
- puts "Rectangle #{name} is #{width} x #{height}"
+ io = File.open(...)
+ len = io.read(2).unpack("v")[0]
+ name = io.read(len)
+ width, height = io.read(8).unpack("VV")
+ puts "Rectangle #{name} is #{width} x #{height}"
+{:ruby}
It's ugly, violates DRY and feels like you're writing Perl, not Ruby.
+
There is a better way.
- class Rectangle < BinData::Record
- uint16le :len
- string :name, :read_length => :len
- uint32le :width
- uint32le :height
- end
+ class Rectangle < BinData::Record
+ endian :little
+ uint16 :len
+ string :name, :read_length => :len
+ uint32 :width
+ uint32 :height
+ end
- io = File.open(...)
- r = Rectangle.read(io)
- puts "Rectangle #{r.name} is #{r.width} x #{r.height}"
+ io = File.open(...)
+ r = Rectangle.read(io)
+ puts "Rectangle #{r.name} is #{r.width} x #{r.height}"
+{:ruby}
BinData makes it easy to specify the structure of the data you are
manipulating.
Read on for the tutorial, or go straight to the
-download[http://rubyforge.org/frs/?group_id=3252] page.
+[download](http://rubyforge.org/frs/?group_id=3252) page.
-== Syntax
+## License
+BinData is released under the same license as Ruby.
+
+Copyright © 2007 - 2009 [Dion Mendel](mailto:dion@lostrealm.com)
+
+---------------------------------------------------------------------------
+
+# Overview
+
BinData declarations are easy to read. Here's an example.
- class MyFancyFormat < BinData::Record
- stringz :comment
- uint8 :count, :check_value => lambda { (value % 2) == 0 }
- array :some_ints, :type => :int32be, :initial_length => :count
- end
+ class MyFancyFormat < BinData::Record
+ stringz :comment
+ uint8 :num_ints, :check_value => lambda { value.even? }
+ array :some_ints, :type => :int32be, :initial_length => :num_ints
+ end
+{:ruby}
-The structure of the data in this example is
-1. A zero terminated string
-2. An unsigned 8bit integer which must by even
-3. A sequence of unsigned 32bit integers in big endian form, the total
- number of which is determined by the value of the 8bit integer.
+This fancy format describes the following collection of data:
-The BinData declaration matches the english description closely. Just for
-fun, lets look at how we'd implement this using #pack and #unpack. Here's
-the writing code, have a go at the reading code.
+1. A zero terminated string
+2. An unsigned 8bit integer which must by even
+3. A sequence of unsigned 32bit integers in big endian form, the total
+ number of which is determined by the value of the 8bit integer.
- comment = "this is a comment"
- some_ints = [2, 3, 8, 9, 1, 8]
- File.open(...) do |io|
- io.write([comment, some_ints.size, *some_ints].pack("Z*CN*"))
- end
+The BinData declaration matches the English description closely.
+Compare the above declaration with the equivalent `#unpack` code to read
+such a data record.
+ def read_fancy_format(io)
+ comment, num_ints, rest = io.read.unpack("Z*Ca*")
+ raise ArgumentError, "ints must be even" unless num_ints.even?
+ some_ints = rest.unpack("N#{num_ints}")
+ {:comment => comment, :num_ints => num_ints, :some_ints => *some_ints}
+ end
+{:ruby}
-The general format of a BinData declaration is a class containing one or more
-fields.
+The BinData declaration clearly shows the structure of the record. The
+`#unpack` code makes this structure opaque.
- class MyName < BinData::Record
- type field_name, :param1 => "foo", :param2 => bar, ...
- ...
- end
+The general usage of BinData is to declare a structured collection of
+data as a user defined record. This record can be instantiated, read,
+written and manipulated without the user having to be concerned with the
+underlying binary representation of the data.
-*type* is the name of a supplied type (e.g. <tt>uint32be</tt>, +string+)
-or a user defined type. For user defined types, convert the class name
-from CamelCase to lowercase underscore_style.
+---------------------------------------------------------------------------
-*field_name* is the name by which you can access the data. Use either a
-String or a Symbol.
+# Common Operations
-Each field may have *parameters* for how to process the data. The
-parameters are passed as a Hash using Symbols for keys.
+There are operations common to all BinData types, including user defined
+ones. These are summarised here.
-== Handling dependencies between fields
+## Reading and writing
-A common occurance in binary file formats is one field depending upon the
-value of another. e.g. A string preceded by it's length.
+`::read(io)`
-As an example, let's assume a Pascal style string where the byte preceding
-the string contains the string's length.
+: Creates a BinData object and reads its value from the given string
+ or `IO`. The newly created object is returned.
- # reading
- io = File.open(...)
- len = io.getc
- str = io.read(len)
- puts "string is " + str
+ str = BinData::Stringz::read("string1\0string2")
+ str.snapshot #=> "string1"
+ {:ruby}
- # writing
- io = File.open(...)
- str = "this is a string"
- io.putc(str.length)
- io.write(str)
+`#read(io)`
-Here's how we'd implement the same example with BinData.
+: Reads and assigns binary data read from `io`.
- class PascalString < BinData::Record
- uint8 :len, :value => lambda { data.length }
- string :data, :read_length => :len
- end
+ obj = BinData::Uint16be.new
+ obj.read("\022\064")
+ obj.value #=> 4660
+ {:ruby}
- # reading
- io = File.open(...)
- ps = PascalString.new
- ps.read(io)
- puts "string is " + ps.data
+`#write(io)`
- # writing
- io = File.open(...)
- ps = PascalString.new
- ps.data = "this is a string"
- ps.write(io)
+: Writes the binary representation of the object to `io`.
-This syntax needs explaining. Let's simplify by examining reading and
-writing separately.
+ File.open("...", "wb") do |io|
+ obj = BinData::Uint64be.new
+ obj.value = 568290145640170
+ obj.write(io)
+ end
+ {:ruby}
- class PascalStringReader < BinData::Record
- uint8 :len
- string :data, :read_length => :len
- end
+`#to_binary_s`
-This states that when reading the string, the initial length of the string
-(and hence the number of bytes to read) is determined by the value of the
-+len+ field.
+: Returns the binary representation of this object as a string.
-Note that <tt>:read_length => :len</tt> is syntactic sugar for
-<tt>:read_length => lambda { len }</tt>, but more on that later.
+ obj = BinData::Uint16be.new
+ obj.assign(4660)
+ obj.to_binary_s #=> "\022\064"
+ {:ruby}
- class PascalStringWriter < BinData::Record
- uint8 :len, :value => lambda { data.length }
- string :data
- end
+## Manipulating
-This states that the value of +len+ is always equal to the length of +data+.
-+len+ may not be manually modified.
+`#assign(value)`
-Combining these two definitions gives the definition for +PascalString+ as
-previously defined.
+: Assigns the given value to this object. `value` can be of the same
+ format as produced by `#snapshot`, or it can be a compatible data
+ object.
+
+ arr = BinData::Array.new(:type => :uint8)
+ arr.assign([1, 2, 3, 4])
+ arr.snapshot #=> [1, 2, 3, 4]
+ {:ruby}
-Once thing to note with dependencies, is that a field can only depend on one
-before it. You can't have a string which has the characters first and the
-length afterwards.
+`#clear`
-== Predefined Types
+: Resets this object to its initial state.
-These are the predefined types. Custom types can be created by composing
-these types.
+ obj = BinData::Int32be.new(:initial_value => 42)
+ obj.assign(50)
+ obj.clear
+ obj.value #=> 42
+ {:ruby}
-BinData::String:: A sequence of bytes.
-BinData::Stringz:: A zero terminated sequence of bytes.
+`#clear?`
-BinData::Array:: A list of objects of the same type.
-BinData::Choice:: A choice between several objects.
-BinData::Struct:: An ordered collection of named objects.
+: Returns whether this object is in its initial state.
-BinData::Int8:: Signed 8 bit integer.
-BinData::Int16le:: Signed 16 bit integer (little endian).
-BinData::Int16be:: Signed 16 bit integer (big endian).
-BinData::Int32le:: Signed 32 bit integer (little endian).
-BinData::Int32be:: Signed 32 bit integer (big endian).
-BinData::Int64le:: Signed 64 bit integer (little endian).
-BinData::Int64be:: Signed 64 bit integer (big endian).
+ arr = BinData::Array.new(:type => :uint16be, :initial_length => 5)
+ arr[3] = 42
+ arr.clear? #=> false
-BinData::Uint8:: Unsigned 8 bit integer.
-BinData::Uint16le:: Unsigned 16 bit integer (little endian).
-BinData::Uint16be:: Unsigned 16 bit integer (big endian).
-BinData::Uint32le:: Unsigned 32 bit integer (little endian).
-BinData::Uint32be:: Unsigned 32 bit integer (big endian).
-BinData::Uint64le:: Unsigned 64 bit integer (little endian).
-BinData::Uint64be:: Unsigned 64 bit integer (big endian).
+ arr[3].clear
+ arr.clear? #=> true
+ {:ruby}
-BinData::Bit1:: 1 bit unsigned integer (big endian).
-BinData::Bit2:: 2 bit unsigned integer (big endian).
-...
-BinData::Bit63:: 63 bit unsigned integer (big endian).
+## Inspecting
-BinData::Bit1le:: 1 bit unsigned integer (little endian).
-BinData::Bit2le:: 2 bit unsigned integer (little endian).
-...
-BinData::Bit63le:: 63 bit unsigned integer (little endian).
+`#num_bytes`
-BinData::FloatLe:: Single precision floating point number (little endian).
-BinData::FloatBe:: Single precision floating point number (big endian).
-BinData::DoubleLe:: Double precision floating point number (little endian).
-BinData::DoubleBe:: Double precision floating point number (big endian).
+: Returns the number of bytes required for the binary representation
+ of this object.
-BinData::Rest:: Consumes the rest of the input stream.
+ arr = BinData::Array.new(:type => :uint16be, :initial_length => 5)
+ arr[0].num_bytes #=> 2
+ arr.num_bytes #=> 10
+ {:ruby}
-== Parameters
+`#snapshot`
- class PascalStringWriter < BinData::Record
- uint8 :len, :value => lambda { data.length }
- string :data
- end
+: Returns the value of this object as primitive Ruby objects
+ (numerics, strings, arrays and hashs). The output of `#snapshot`
+ may be useful for serialization or as a reduced memory usage
+ representation.
-Revisiting the Pascal string writer, we see that a field can take
-parameters. Parameters are passed as a Hash, where the key is a symbol.
-It should be noted that parameters are designed to be lazily evaluated,
-possibly multiple times. This means that any parameter value must not have
-side effects.
+ obj = BinData::Uint8.new
+ obj.assign(3)
+ obj + 3 #=> 6
+ obj.snapshot #=> 3
+ obj.snapshot.class #=> Fixnum
+ {:ruby}
+
+`#offset`
+
+: Returns the offset of this object with respect to the most distant
+ ancestor structure it is contained within. This is most likely to
+ be used with arrays and records.
+
+ class Tuple < BinData::Record
+ int8 :a
+ int8 :b
+ end
+
+ arr = BinData::Array.new(:type => :tuple, :initial_length => 3)
+ arr[2].b.offset #=> 5
+ {:ruby}
+
+`#rel_offset`
+
+: Returns the offset of this object with respect to the parent
+ structure it is contained within. Compare this to `#offset`.
+
+ class Tuple < BinData::Record
+ int8 :a
+ int8 :b
+ end
+
+ arr = BinData::Array.new(:type => :tuple, :initial_length => 3)
+ arr[2].b.rel_offset #=> 1
+ {:ruby}
+
+`#inspect`
+
+: Returns a human readable representation of this object. This is a
+ shortcut to #snapshot.inspect.
+
+---------------------------------------------------------------------------
+
+# Records
+
+The general format of a BinData record declaration is a class containing
+one or more fields.
+
+ class MyName < BinData::Record
+ type field_name, :param1 => "foo", :param2 => bar, ...
+ ...
+ end
+{:ruby}
+
+`type`
+: is the name of a supplied type (e.g. `uint32be`, `string`, `array`)
+ or a user defined type. For user defined types, the class name is
+ converted from `CamelCase` to lowercased `underscore_style`.
+
+`field_name`
+: is the name by which you can access the data. Use either a
+ `String` or a `Symbol`.
+
+Each field may have optional *parameters* for how to process the data.
+The parameters are passed as a `Hash` with `Symbols` for keys.
+Parameters are designed to be lazily evaluated, possibly multiple times.
+This means that any parameter value must not have side effects.
+
Here are some examples of legal values for parameters.
- * :param => 5
- * :param => lambda { 5 + 2 }
- * :param => lambda { foo + 2 }
- * :param => :foo
+* `:param => 5`
+* `:param => lambda { 5 + 2 }`
+* `:param => lambda { foo + 2 }`
+* `:param => :foo`
-The simplest case is when the value is a literal value, such as 5.
+The simplest case is when the value is a literal value, such as `5`.
-If the value is not a literal, it is expected to be a lambda. The lambda
-will be evaluated in the context of the parent, in this case the parent is
-an instance of +PascalStringWriter+.
+If the value is not a literal, it is expected to be a lambda. The
+lambda will be evaluated in the context of the parent, in this case the
+parent is an instance of `MyName`.
If the value is a symbol, it is taken as syntactic sugar for a lambda
containing the value of the symbol.
-e.g <tt>:param => :foo</tt> is <tt>:param => lambda { foo }</tt>
+e.g `:param => :foo` is `:param => lambda { foo }`
-== Saving Typing
+## Specifying default endian
-The endianess of numeric types must be explicitly defined so that the code
-produced is independent of architecture. Explicitly specifying the
-endianess of each numeric type can become tedious, so the following
-shortcut is provided.
+The endianess of numeric types must be explicitly defined so that the
+code produced is independent of architecture. However, explicitly
+specifying the endian for each numeric field can result in a bloated
+declaration that can be difficult to read.
- class A < BinData::Record
- endian :little
+ class A < BinData::Record
+ int16be :a
+ int32be :b
+ int16le :c # <-- Note little endian!
+ int32be :d
+ float_be :e
+ array :f, :type => :uint32be
+ end
+{:ruby}
- uint16 :a
- uint32 :b
- double :c
- uint32be :d
- array :e, :type => :int16
- end
+The `endian` keyword can be used to set the default endian. This makes
+the declaration easier to read. Any numeric field that doesn't use the
+default endian can explicitly override it.
-is equivalent to:
+ class A < BinData::Record
+ endian :big
- class A < BinData::Record
- uint16le :a
- uint32le :b
- double_le :c
- uint32be :d
- array :e, :type => :int16le
- end
+ int16 :a
+ int32 :b
+ int16le :c # <-- Note how this little endian now stands out
+ int32 :d
+ float :e
+ array :f, :type => :uint32
+ end
+{:ruby}
-Using the endian keyword improves the readability of the declaration as well
-as reducing the amount of typing necessary. Note that the endian keyword will
-cascade to nested types, as illustrated with the array in the above example.
+The increase in clarity can be seen with the above example. The
+`endian` keyword will cascade to nested types, as illustrated with the
+array in the above example.
-== Creating custom types
+## Optional fields
-Custom types should be created by subclassing BinData::Record or
-BinData::Primitive. Ocassionally it may be useful to subclass
-BinData::BasePrimitive. Subclassing other classes may have unexpected results
-and is unsupported.
+A record may contain optional fields. The optional state of a field is
+decided by the `:onlyif` parameter. If the value of this parameter is
+`false`, then the field will be as if it didn't exist in the record.
+ class RecordWithOptionalField < BinData::Record
+ ...
+ uint8 :comment_flag
+ string :comment, :length => 20, :onlyif => :has_comment?
+
+ def has_comment?
+ comment_flag.nonzero?
+ end
+ end
+{:ruby}
+
+In the above example, the `comment` field is only included in the record
+if the value of the `comment_flag` field is non zero.
+
+## Handling dependencies between fields
+
+A common occurence in binary file formats is one field depending upon
+the value of another. e.g. A string preceded by its length.
+
+As an example, let's assume a Pascal style string where the byte
+preceding the string contains the string's length.
+
+ # reading
+ io = File.open(...)
+ len = io.getc
+ str = io.read(len)
+ puts "string is " + str
+
+ # writing
+ io = File.open(...)
+ str = "this is a string"
+ io.putc(str.length)
+ io.write(str)
+{:ruby}
+
+Here's how we'd implement the same example with BinData.
+
+ class PascalString < BinData::Record
+ uint8 :len, :value => lambda { data.length }
+ string :data, :read_length => :len
+ end
+
+ # reading
+ io = File.open(...)
+ ps = PascalString.new
+ ps.read(io)
+ puts "string is " + ps.data
+
+ # writing
+ io = File.open(...)
+ ps = PascalString.new
+ ps.data = "this is a string"
+ ps.write(io)
+{:ruby}
+
+This syntax needs explaining. Let's simplify by examining reading and
+writing separately.
+
+ class PascalStringReader < BinData::Record
+ uint8 :len
+ string :data, :read_length => :len
+ end
+{:ruby}
+
+This states that when reading the string, the initial length of the
+string (and hence the number of bytes to read) is determined by the
+value of the `len` field.
+
+Note that `:read_length => :len` is syntactic sugar for
+`:read_length => lambda { len }`, as described previously.
+
+ class PascalStringWriter < BinData::Record
+ uint8 :len, :value => lambda { data.length }
+ string :data
+ end
+{:ruby}
+
+This states that the value of `len` is always equal to the length of
+`data`. `len` may not be manually modified.
+
+Combining these two definitions gives the definition for `PascalString`
+as previously defined.
+
+It is important to note with dependencies, that a field can only depend
+on one before it. You can't have a string which has the characters
+first and the length afterwards.
+
+---------------------------------------------------------------------------
+
+# Primitive Types
+
+BinData provides support for the most commonly used primitive types that
+are used when working with binary data. Namely:
+
+* fixed size strings
+* zero terminated strings
+* byte based integers - signed or unsigned, big or little endian and
+ of any size
+* bit based integers - unsigned big or little endian integers of any
+ size
+* floating point numbers - single or double precision floats in either
+ big or little endian
+
+Primitives may be manipulated individually, but is more common to work
+with them as part of a record.
+
+Examples of individual usage:
+
+ int16 = BinData::Int16be.new
+ int16.value = 941
+ int16.to_binary_s #=> "\003\255"
+
+ fl = BinData::FloatBe.read("\100\055\370\124") #=> 2.71828174591064
+ fl.num_bytes #=> 4
+
+ fl * int16 #=> 2557.90320057996
+{:ruby}
+
+There are several parameters that are specific to primitives.
+
+`:initial_value`
+
+: This contains the initial value that the primitive will contain
+ after initialization. This is useful for setting default values.
+
+ obj = BinData::String.new(:initial_value => "hello ")
+ obj + "world" #=> "hello world"
+
+ obj.assign("good-bye " )
+ obj + "world" #=> "good-bye world"
+ {:ruby}
+
+`:value`
+
+: The primitive will always contain this value. Reading or assigning
+ will not change the value. This parameter is used to define
+ constants or dependent fields.
+
+ pi = BinData::FloatLe.new(:value => Math::PI)
+ pi.assign(3)
+ puts pi #=> 3.14159265358979
+ {:ruby}
+
+`:check_value`
+
+: When reading, will raise a `ValidityError` if the value read does
+ not match the value of this parameter.
+
+ obj = BinData::String.new(:check_value => lambda { /aaa/ =~ value })
+ obj.read("baaa!") #=> "baaa!"
+ obj.read("bbb") #=> raises ValidityError
+
+ obj = BinData::String.new(:check_value => "foo")
+ obj.read("foo") #=> "foo"
+ obj.read("bar") #=> raises ValidityError
+ {:ruby}
+
+## Numerics
+
+There are three kinds of numeric types that are supported by BinData.
+
+### Byte based integers
+
+These are the common integers that are used in most low level
+programming languages (C, C++, Java etc). These integers can be signed
+or unsigned. The endian must be specified so that the conversion is
+independent of architecture. The bit size of these integers must be a
+multiple of 8. Examples of byte based integers are:
+
+`uint16be`
+: unsigned 16 bit big endian integer
+
+`int8`
+: signed 8 bit integer
+
+`int32le`
+: signed 32 bit little endian integer
+
+`uint40be`
+: unsigned 40 bit big endian integer
+
+The `be` | `le` suffix may be omitted if the `endian` keyword is in use.
+
+### Bit based integers
+
+These unsigned integers are used to define bitfields in records.
+Bitfields are big endian by default but little endian may be specified
+explicitly. Little endian bitfields are rare, but do occur in older
+file formats (e.g. The file allocation table for FAT12 filesystems is
+stored as an array of 12bit little endian integers).
+
+An array of bit based integers will be packed according to their endian.
+
+In a record, adjacent bitfields will be packed according to their
+endian. All other fields are byte aligned.
+
+Examples of bit based integers are:
+
+`bit1`
+: 1 bit big endian integer (may be used as boolean)
+
+`bit4_le`
+: 4 bit little endian integer
+
+`bit32`
+: 32 bit big endian integer
+
+The difference between byte and bit base integers of the same number of
+bits (e.g. `uint8` vs `bit8`) is one of alignment.
+
+This example is packed as 3 bytes
+
+ class A < BinData::Record
+ bit4 :a
+ uint8 :b
+ bit4 :c
+ end
+
+ Data is stored as: AAAA0000 BBBBBBBB CCCC0000
+{:ruby}
+
+Whereas this example is packed into only 2 bytes
+
+ class B < BinData::Record
+ bit4 :a
+ bit8 :b
+ bit4 :c
+ end
+
+ Data is stored as: AAAABBBB BBBBCCCC
+{:ruby}
+
+### Floating point numbers
+
+BinData supports 32 and 64 bit floating point numbers, in both big and
+little endian format. These types are:
+
+`float_le`
+: single precision 32 bit little endian float
+
+`float_be`
+: single precision 32 bit big endian float
+
+`double_le`
+: double precision 64 bit little endian float
+
+`double_be`
+: double precision 64 bit big endian float
+
+The `_be` | `_le` suffix may be omitted if the `endian` keyword is in use.
+
+### Example
+
+Here is an example declaration for an Internet Protocol network packet.
+
+ class IP_PDU < BinData::Record
+ endian :big
+
+ bit4 :version, :value => 4
+ bit4 :header_length
+ uint8 :tos
+ uint16 :total_length
+ uint16 :ident
+ bit3 :flags
+ bit13 :frag_offset
+ uint8 :ttl
+ uint8 :protocol
+ uint16 :checksum
+ uint32 :src_addr
+ uint32 :dest_addr
+ string :options, :read_length => :options_length_in_bytes
+ string :data, :read_length => lambda { total_length - header_length_in_bytes }
+
+ def header_length_in_bytes
+ header_length * 4
+ end
+
+ def options_length_in_bytes
+ header_length_in_bytes - 20
+ end
+ end
+{:ruby}
+
+Three of the fields have parameters.
+* The version field always has the value 4, as per the standard.
+* The options field is read as a raw string, but not processed.
+* The data field contains the payload of the packet. Its length is
+ calculated as the total length of the packet minus the length of
+ the header.
+
+## Strings
+
+BinData supports two types of strings - fixed size and zero terminated.
+Strings are treated as a sequence of 8bit bytes. This is the same as
+strings in Ruby 1.8. The issue of character encoding is ignored by
+BinData.
+
+### Fixed Sized Strings
+
+Fixed sized strings may have a set length. If an assigned value is
+shorter than this length, it will be padded to this length. If no
+length is set, the length is taken to be the length of the assigned
+value.
+
+There are several parameters that are specific to fixed sized strings.
+
+`:read_length`
+
+: The length to use when reading a value.
+
+ obj = BinData::String.new(:read_length => 5)
+ obj.read("abcdefghij")
+ obj.value #=> "abcde"
+ {:ruby}
+
+`:length`
+
+: The fixed length of the string. If a shorter string is set, it
+ will be padded to this length. Longer strings will be truncated.
+
+ obj = BinData::String.new(:length => 6)
+ obj.read("abcdefghij")
+ obj.value #=> "abcdef"
+
+ obj = BinData::String.new(:length => 6)
+ obj.value = "abcd"
+ obj.value #=> "abcd\000\000"
+
+ obj = BinData::String.new(:length => 6)
+ obj.value = "abcdefghij"
+ obj.value #=> "abcdef"
+ {:ruby}
+
+`:pad_char`
+
+: The character to use when padding a string to a set length. Valid
+ values are `Integers` and `Strings` of length 1.
+ `"\0"` is the default.
+
+ obj = BinData::String.new(:length => 6, :pad_char => 'A')
+ obj.value = "abcd"
+ obj.value #=> "abcdAA"
+ obj.to_binary_s #=> "abcdAA"
+ {:ruby}
+
+`:trim_padding`
+
+: Boolean, default `false`. If set, the value of this string will
+ have all pad_chars trimmed from the end of the string. The value
+ will not be trimmed when writing.
+
+ obj = BinData::String.new(:length => 6, :trim_value => true)
+ obj.value = "abcd"
+ obj.value #=> "abcd"
+ obj.to_binary_s #=> "abcd\000\000"
+ {:ruby}
+
+### Zero Terminated Strings
+
+These strings are modelled on the C style of string - a sequence of
+bytes terminated by a null (`"\0"`) character.
+
+ obj = BinData::Stringz.new
+ obj.read("abcd\000efgh")
+ obj.value #=> "abcd"
+ obj.num_bytes #=> 5
+ obj.to_binary_s #=> "abcd\000"
+{:ruby}
+
+## User Defined Primitive Types
+
+Most user defined types will be Records, but occasionally we'd like to
+create a custom type of primitive.
+
Let us revisit the Pascal String example.
- class PascalString < BinData::Record
- uint8 :len, :value => lambda { data.length }
- string :data, :read_length => :len
- end
+ class PascalString < BinData::Record
+ uint8 :len, :value => lambda { data.length }
+ string :data, :read_length => :len
+ end
+{:ruby}
-We'd like to make PascalString a custom type that behaves like a
-BinData::BasePrimitive object so we can use :initial_value etc. Here's an
-example usage of what we'd like:
+We'd like to make `PascalString` a user defined type that behaves like a
+`BinData::BasePrimitive` object so we can use `:initial_value` etc.
+Here's an example usage of what we'd like:
- class Favourites < BinData::Record
- pascal_string :language, :initial_value => "ruby"
- pascal_string :os, :initial_value => "unix"
- end
+ class Favourites < BinData::Record
+ pascal_string :language, :initial_value => "ruby"
+ pascal_string :os, :initial_value => "unix"
+ end
- f = Favourites.new
- f.os = "freebsd"
- f.to_binary_s #=> "\004ruby\007freebsd"
+ f = Favourites.new
+ f.os = "freebsd"
+ f.to_binary_s #=> "\004ruby\007freebsd"
+{:ruby}
-We create this type of custom string by inheriting from BinData::Primitive
-and implementing the #get and #set methods.
+We create this type of custom string by inheriting from
+`BinData::Primitive` (instead of `BinData::Record`) and implementing the
+`#get` and `#set` methods.
- class PascalString < BinData::Primitive
- uint8 :len, :value => lambda { data.length }
- string :data, :read_length => :len
+ class PascalString < BinData::Primitive
+ uint8 :len, :value => lambda { data.length }
+ string :data, :read_length => :len
- def get; self.data; end
- def set(v) self.data = v; end
- end
+ def get; self.data; end
+ def set(v) self.data = v; end
+ end
+{:ruby}
-If the type we are creating represents a primitive value then inherit from
-BinData::Primitive, otherwise inherit from BinData::Record.
+### Advanced User Defined Primitive Types
-== License
+Sometimes a user defined primitive type can not easily be declaratively
+defined. In this case you should inherit from `BinData::BasePrimitive`
+and implement the following three methods:
-BinData is released under the same license as Ruby.
+* `value_to_binary_string(value)`
+* `read_and_return_value(io)`
+* `sensible_default()`
-Copyright (c) 2007 - 2009 Dion Mendel
+Here is an example of a big integer implementation.
+
+ # A custom big integer format. Binary format is:
+ # 1 byte : 0 for positive, non zero for negative
+ # x bytes : Little endian stream of 7 bit bytes representing the
+ # positive form of the integer. The upper bit of each byte
+ # is set when there are more bytes in the stream.
+ class BigInteger < BinData::BasePrimitive
+ def value_to_binary_string(value)
+ negative = (value < 0) ? 1 : 0
+ value = value.abs
+ bytes = [negative]
+ loop do
+ seven_bit_byte = value & 0x7f
+ value >>= 7
+ has_more = value.nonzero? ? 0x80 : 0
+ byte = has_more | seven_bit_byte
+ bytes.push(byte)
+
+ break if has_more.zero?
+ end
+
+ bytes.collect { |b| b.chr }.join
+ end
+
+ def read_and_return_value(io)
+ negative = read_uint8(io).nonzero?
+ value = 0
+ bit_shift = 0
+ loop do
+ byte = read_uint8(io)
+ has_more = byte & 0x80
+ seven_bit_byte = byte & 0x7f
+ value |= seven_bit_byte << bit_shift
+ bit_shift += 7
+
+ break if has_more.zero?
+ end
+
+ negative ? -value : value
+ end
+
+ def sensible_default
+ 0
+ end
+
+ def read_uint8(io)
+ io.readbytes(1).unpack("C").at(0)
+ end
+ end
+{:ruby}
+
+---------------------------------------------------------------------------
+
+# Arrays
+
+A BinData array is a list of data objects of the same type. It behaves
+much the same as the standard Ruby array, supporting most of the common
+methods.
+
+When instantiating an array, the type of object it contains must be
+specified.
+
+ arr = BinData::Array.new(:type => :uint8)
+ arr[3] = 5
+ arr.snapshot #=> [0, 0, 0, 5]
+{:ruby}
+
+Parameters can be passed to this object with a slightly clumsy syntax.
+
+ arr = BinData::Array.new(:type => [:uint8, {:initial_value => :index}])
+ arr[3] = 5
+ arr.snapshot #=> [0, 1, 2, 5]
+{:ruby}
+
+There are two different parameters that specify the length of the array.
+
+`:initial_length`
+
+: Specifies the initial length of a newly instantiated array.
+ The array may grow as elements are inserted.
+
+ obj = BinData::Array.new(:type => :int8, :initial_length => 4)
+ obj.read("\002\003\004\005\006\007")
+ obj.snapshot #=> [2, 3, 4, 5]
+ {:ruby}
+
+`:read_until`
+
+: While reading, elements are read until this condition is true. This
+ is typically used to read an array until a sentinel value is found.
+ The variables `index`, `element` and `array` are made available to
+ any lambda assigned to this parameter. If the value of this
+ parameter is the symbol `:eof`, then the array will read as much
+ data from the stream as possible.
+
+ obj = BinData::Array.new(:type => :int8,
+ :read_until => lambda { index == 1 })
+ obj.read("\002\003\004\005\006\007")
+ obj.snapshot #=> [2, 3]
+
+ obj = BinData::Array.new(:type => :int8,
+ :read_until => lambda { element >= 3.5 })
+ obj.read("\002\003\004\005\006\007")
+ obj.snapshot #=> [2, 3, 4]
+
+ obj = BinData::Array.new(:type => :int8,
+ :read_until => lambda { array[index] + array[index - 1] == 9 })
+ obj.read("\002\003\004\005\006\007")
+ obj.snapshot #=> [2, 3, 4, 5]
+
+ obj = BinData::Array.new(:type => :int8, :read_until => :eof)
+ obj.read("\002\003\004\005\006\007")
+ obj.snapshot #=> [2, 3, 4, 5, 6, 7]
+ {:ruby}
+
+---------------------------------------------------------------------------
+
+# Choices
+
+A Choice is a collection of data objects of which only one is active at
+any particular time. Method calls will be delegated to the active
+choice. The possible types of objects that a choice contains is
+controlled by the `:choices` parameter, while the `:selection` parameter
+specifies the active choice.
+
+`:choices`
+
+: Either an array or a hash specifying the possible data objects. The
+ format of the array/hash.values is a list of symbols representing
+ the data object type. If a choice is to have params passed to it,
+ then it should be provided as `[type_symbol, hash_params]`. An
+ implementation constraint is that the hash may not contain symbols
+ as keys.
+
+`:selection`
+
+: An index/key into the `:choices` array/hash which specifies the
+ currently active choice.
+
+`:copy_on_change`
+
+: If set to `true`, copy the value of the previous selection to the
+ current selection whenever the selection changes. Default is
+ `false`.
+
+Examples
+
+ type1 = [:string, {:value => "Type1"}]
+ type2 = [:string, {:value => "Type2"}]
+
+ choices = {5 => type1, 17 => type2}
+ obj = BinData::Choice.new(:choices => choices, :selection => 5)
+ obj.value # => "Type1"
+
+ choices = [ type1, type2 ]
+ obj = BinData::Choice.new(:choices => choices, :selection => 1)
+ obj.value # => "Type2"
+
+ choices = [ nil, nil, nil, type1, nil, type2 ]
+ obj = BinData::Choice.new(:choices => choices, :selection => 3)
+ obj.value # => "Type1"
+
+ class MyNumber < BinData::Record
+ int8 :is_big_endian
+ choice :data, :choices => { true => :int32be, false => :int32le },
+ :selection => lambda { is_big_endian != 0 },
+ :copy_on_change => true
+ end
+
+ obj = MyNumber.new
+ obj.is_big_endian = 1
+ obj.data = 5
+ obj.to_binary_s #=> "\001\000\000\000\005"
+
+ obj.is_big_endian = 0
+ obj.to_binary_s #=> "\000\005\000\000\000"
+{:ruby}
+
+---------------------------------------------------------------------------
+
+# Advanced Topics
+
+## Wrappers
+
+Sometimes you wish to create a new type that is simply an existing type
+with some predefined parameters. Examples could be an array with a
+specified type, or an integer with an initial value.
+
+This can be achieved with a wrapper. A wrapper creates a new type based
+on an existing type which has predefined parameters. These parameters
+can of course be overridden at initialisation time.
+
+Here we define an array that contains big endian 16 bit integers. The
+array has a preferred initial length.
+
+ class IntArray < BinData::Wrapper
+ endian :big
+ array :type => :uint16, :initial_length => 5
+ end
+
+ arr = IntArray.new
+ arr.size #=> 5
+{:ruby}
+
+The initial length can be overridden at initialisation time.
+
+ arr = IntArray.new(:initial_length => 8)
+ arr.size #=> 8
+{:ruby}
+
+## Parameterizing User Defined Types
+
+All BinData types have parameters that allow the behaviour of an object
+to be specified at initialization time. User defined types may also
+specify parameters. There are two types of parameters: mandatory and
+default.
+
+### Mandatory Parameters
+
+Mandatory parameters must be specified when creating an instance of the
+type. The `:type` parameter of `Array` is an example of a mandatory
+type.
+
+ class IntArray < BinData::Wrapper
+ mandatory_parameter :half_count
+
+ array :type => :uint8, :initial_length => lambda { half_count * 2 }
+ end
+
+ arr = IntArray.new
+ #=> raises ArgumentError: parameter 'half_count' must be specified in IntArray
+
+ arr = IntArray.new(:half_count => lambda { 1 + 2 })
+ arr.snapshot #=> [0, 0, 0, 0, 0, 0]
+{:ruby}
+
+### Default Parameters
+
+Default parameters are optional. These parameters have a default value
+that may be overridden when an instance of the type is created.
+
+ class Phrase < BinData::Primitive
+ default_parameter :number => "three"
+ default_parameter :adjective => "blind"
+ default_parameter :noun => "mice"
+
+ stringz :a, :initial_value => :number
+ stringz :b, :initial_value => :adjective
+ stringz :c, :initial_value => :noun
+
+ def get; "#{a} #{b} #{c}"; end
+ def set(v)
+ if /(.*) (.*) (.*)/ =~ v
+ self.a, self.b, self.c = $1, $2, $3
+ end
+ end
+ end
+
+ obj = Phrase.new(:number => "two", :adjective => "deaf")
+ obj.to_s #=> "two deaf mice"
+{:ruby}
+
+## Debugging
+
+BinData includes several features to make it easier to debug
+declarations.
+
+### Tracing
+
+BinData has the ability to trace the results of reading a data
+structure.
+
+ class A < BinData::Record
+ int8 :a
+ bit4 :b
+ bit2 :c
+ array :d, :initial_length => 6, :type => :bit1
+ end
+
+ BinData::trace_reading do
+ A.read("\373\225\220")
+ end
+{:ruby}
+
+Results in the following being written to `STDERR`.
+
+ obj.a => -5
+ obj.b => 9
+ obj.c => 1
+ obj.d[0] => 0
+ obj.d[1] => 1
+ obj.d[2] => 1
+ obj.d[3] => 0
+ obj.d[4] => 0
+ obj.d[5] => 1
+{:ruby}
+
+### Rest
+
+The rest keyword will consume the input stream from the current position
+to the end of the stream.
+
+ class A < BinData::Record
+ string :a, :read_length => 5
+ rest :rest
+ end
+
+ obj = A.read("abcdefghij")
+ obj.a #=> "abcde"
+ obj.rest #=" "fghij"
+{:ruby}
+
+### Hidden fields
+
+The typical way to view the contents of a BinData record is to call
+`#snapshot` or `#inspect`. This gives all fields and their values. The
+`hide` keyword can be used to prevent certain fields from appearing in
+this output. This removes clutter and allows the developer to focus on
+what they are currently interested in.
+
+ class Testing < BinData::Record
+ hide :a, :b
+ string :a, :read_length => 10
+ string :b, :read_length => 10
+ string :c, :read_length => 10
+ end
+
+ obj = Testing.read(("a" * 10) + ("b" * 10) + ("c" * 10))
+ obj.snapshot #=> {"c"=>"cccccccccc"}
+ obj.to_binary_s #=> "aaaaaaaaaabbbbbbbbbbcccccccccc"
+{:ruby}
+
+---------------------------------------------------------------------------
+
+# Alternatives
+
+There are several alternatives to BinData. Below is a comparison
+between BinData and its alternatives.
+
+The short form is that BinData is the best choice for most cases. If
+decoding / encoding speed is very important and the binary formats are
+simple then BitStruct may be a good choice. (Though if speed is
+important, perhaps you should investigate a language other than Ruby.)
+
+### [BitStruct](http://rubyforge.org/projects/bit-struct)
+
+BitStruct is the most complete of all the alternatives. It is
+declarative and supports all the same primitive types as BinData. In
+addition it includes a self documenting feature to make it easy to write
+reports.
+
+The major limitation of BitStruct is that it does not support variable
+length fields and dependent fields. The simple PascalString example
+used previously is not possible with BitStruct. This limitation is due
+to the design choice to favour speed over flexibility.
+
+Most non trivial file formats rely on dependent and variable length
+fields. It is difficult to use BitStruct with these formats as code
+must be written to explicitly handle the dependencies.
+
+BitStruct does not currently support little endian bit fields, or
+bitfields that span more than 2 bytes. BitStruct is actively maintained
+so these limitations may be removed in a future release.
+
+If speed is important and you are only dealing with simple binary data
+types then BitStruct is a good choice. For non trivial data types,
+BinData is the better choice.
+
+### [BinaryParse](http://rubyforge.org/projects/binaryparse)
+
+BinaryParse is a declarative style packer / unpacker. It provides the
+same primitives as Ruby's `#pack`, with the addition of date and time.
+Like BitStruct, it doesn't provide dependent or variable length fields.
+
+### [BinStruct](http://rubyforge.org/projects/metafuzz)
+
+BinStruct is an imperative approach to unpacking binary data. It does
+provide some declarative style syntax sugar. It provides support for
+the most common primitive types, as well as arbitrary length bitfields.
+
+It's main focus is as a binary fuzzer, rather than as a generic decoding
+/ encoding library.
+
+### [Packable](http://github.com/marcandre/packable/tree/master)
+
+Packable makes it much nicer to use Ruby's `#pack` and `#unpack`
+methods. Instead of having to remember that, for example `"n"` is the
+code to pack a 16 bit big endian integer, packable provides many
+convenient shortcuts. In the case of `"n"`, `{:bytes => 2, :endian => :big}`
+may be used instead.
+
+Using Packable improves the readability of `#pack` and `#unpack`
+methods, but explicitly calls to `#pack` and `#unpack` aren't as
+readable as a declarative approach.
+
+### [Bitpack](http://rubyforge.org/projects/bitpack)
+
+Bitpack provides methods to extract big endian integers of arbitrary bit
+length from an octet stream.
+
+The extraction code is written in `C`, so if speed is important and bit
+manipulation is all the functionality you require then this may be an
+alternative.
+
+---------------------------------------------------------------------------