# CASTY Casty is a fork of CAST, a C parser and abstract syntax tree for Ruby. ## Example require 'casty' source = File.read('file.c') ast = C.parse(source) ast.entities.each do |declaration| declaration.declarator.each do |declarator| puts "#{declarator.name}: declarator.type" end end Or in irb: irb> ast = C.parse('int main(void) { return 0; }') => TranslationUnit entities: - FunctionDef type: Function type: Int params: [] name: "main" def: Block stmts: - Return expr: IntLiteral val: 0 irb> puts ast int main(void) { return 0; } => nil ## Nodes `C.parse` returns a tree of `Node` objects. Here's the class hierarchy:
The highlighted ones are abstract. The last 2 (`NodeList`s) represent lists of `Node`s. They quack like standard ruby `Arrays`. `NodeChain` is a doubly linked list; `NodeArray` is an array. ### Node Methods * `parent`: return the parent in the tree (a `Node` or nil). * `pos`, `pos=`: the position in the source file (a `Node::Pos`). * `to_s`: return the code for the tree (a `String`). * `inspect`: return a pretty string for inspection, makes irb fun. * `match?(str)`, `=~(str)`: return true iff str parses as a `Node` equal to this one. * `detach`: remove this node from the tree (parent becomes nil) and return it. * `detached?`, `attached?`: return true if parent is nil or non-nil respectively. * `replace_with(node)`: replace this node with node in the tree. * `swap_with(node)`: exchange this node with node in their trees. * `insert_prev(*nodes)`, `insert_next(*nodes)`: insert nodes before this node in the parent list. Parent must be a `NodeList`! Useful for adding statements before a node in a block, for example. * `Foo?`: (where `Foo` is a module name) return `self.is_a?(Foo)`. This is a convienience for a common need. Example:
   \# print all global variables
   ast.entities.each do |node|
     node.Declaration? or next
     node.declarators.each do |decl|
       unless decl.type.Function?
         puts "#{decl.name}: #{decl.type}"
       end
     end
   end
   
The `=~` method lets you do: if declarator.type =~ 'const int *' puts "Ooh, a const int pointer!" end This is not the same as `declarator.type.to_s == 'const int *'`; that'd require you to guess how `to_s` formats its strings (most notably, the whitespace). ### Fields and Children The big table down below lists the *fields* of each `Node`. A field is an attribute which: * is used in equality checks (`==` and `eql?`). * are copied recursively by `dup` and `clone`. Fields listed as *children* form the tree structure. They only have a `Node` or `nil` value, and are yielded/returned/affected by the traversal methods: * `next`, `prev`: return the next/prev sibling. * `list_next`, `list_prev`: like `next`/`prev`, but also requires the parent to be `NodeList`. I'll be honest; I don't remember why I added these methods. They may well suddenly disappear. * `each`, `reverse_each`: Yield all (non-nil) children. `Node` includes `Enumerable`, so, you know. * `depth_first`, `reverse_depth_first`: Walk the tree in that order, yielding two args (event, node) at each node. event is `:down` on the way down, `:up` on the way up. If the block throws `:prune`, it won't descend any further. * `preorder`, `reverse_preorder`, `postorder`, `reverse_postorder`: Walk the tree depth first, yielding nodes in the given order. For the preorders, if the block throws `:prune`, it won't descend any further. * `node_after(child)`, `node_before(child)`: return the node before/after child (same as `child.next`). * `remove_node(child)`: remove child from this node (same as `child.detach`). * `replace_node(child, new_child)`: replace child with yeah you guessed it (same as `child.replace_with(newchild)`). Note: don't modify the tree during traversal! Other notes about the table: * Field names that end in '?' are always true-or-false. * If no default is listed: * it is false if the field name ends in a '?' * it is a `NodeArray` if it is a `NodeList`. * it is `nil` otherwise.
Class Field Type / values Default Comments
TranslationUnit entities * NodeList NodeChain[] The root of a parsed file.
Declaration storage :typedef, :extern, :static, :auto, :register Also:
  • #typedef? -- true iff storage == :typedef
  • #extern? -- true iff storage == :extern
  • #static? -- true iff storage == :static
  • #auto? -- true iff storage == :auto
  • #register? -- true iff storage == :register
type * DirectType
declarators * NodeList NodeArray[]
inline? true, false
Declarator indirect_type * IndirectType What's a "declarator?" Consider "int i, *ip;". This is a Declaration with two Declarators:
    Declaration
        type: Int
        declarators: 
            - Declarator
                name: "i"
            - Declarator
                indirect_type: Pointer
                name: "ip"
      
The indirect_type of the ip Declarator is a Pointer to nil. To get the complete type of the variable use:
  • #type -- return the complete type. This is a clone; modifying it won't modify the tree.
So calling #type on the ip Declarator gives:
    Pointer
      type: Int
      
name String
init * Expression
num_bits * Integer
FunctionDef storage :extern, :static Also:
  • #extern? -- return true iff storage == :extern
  • #static? -- return true iff storage == :static
  • #prototype? -- same as !no_prototype?
  • #prototype=(val) -- same as no_prototype = !val
no_prototype? means that no prototype was given. That means parameter types weren't given in the parens, but in the "old-style" declaration list. Example:
int main(argc, argv)
    int argc;
    char **argv;
{
    return 0;
}
int main(int argc, char **argv) {
    return 0;
}
No prototype Prototype
Everyone tells you to use prototypes. That's because no type checking is done when calling a function declared without a prototype.
inline? true, false
type * Type
name String
def * Block Block.new
no_prototype? true, false
Parameter register? true, false Used in Functions.
type * Type
name String
Enumerator name String Used in Enums.
val * Expression
MemberInit member * NodeList of (Member or Expression) Used in CompoundLiterals.
init * Expression
Member name String Used in MemberInits.
Block labels * NodeList of Label NodeArray[]
stmts * NodeList of (Statement or Declaration or Comment) NodeArray[]
If labels * NodeList of Label NodeArray[]
cond * Expression
then * Statement
else * Statement
Switch labels * NodeList of Label NodeArray[]
cond * Expression
stmt * Statement
While labels * NodeList of Label NodeArray[] do? means it's a do-while loop.
do? true, false
cond * Expression
stmt * Statement
For labels * NodeList of Label NodeArray[]
init * Expression or Declaration
cond * Expression
iter * Expression
stmt * Statement
Goto labels * NodeList of Label NodeArray[]
target String
Continue labels * NodeList of Label NodeArray[]
Break labels * NodeList of Label NodeArray[]
Return labels * NodeList of Label NodeArray[]
expr * Expression
ExpressionStatement labels * NodeList of Label NodeArray[]
expr * Expression
PlainLabel name String
Default
Case expr * Expression
Comma exprs * NodeList of Expression
Conditional cond * Expression
then * Expression
else * Expression
Variable name String
Index expr * Expression
index * Expression
Call expr * Expression
args * NodeList of (Expression or Type)
Dot expr * Expression
member * String
Arrow expr * Expression
member * String
PostInc expr * Expression
PostDec expr * Expression
Cast type * Type
expr * Expression
Address expr * Expression
Dereference expr * Expression
Sizeof expr * Type or Expression
Positive expr * Expression
Negative expr * Expression
PreInc expr * Expression
PreDec expr * Expression
BitNot expr * Expression
Not expr * Expression
Add expr1 * Expression
expr2 * Expression
Subtract expr1 * Expression
expr2 * Expression
Multiply expr1 * Expression
expr2 * Expression
Divide expr1 * Expression
expr2 * Expression
Mod expr1 * Expression
expr2 * Expression
Equal expr1 * Expression
expr2 * Expression
NotEqual expr1 * Expression
expr2 * Expression
Less expr1 * Expression
expr2 * Expression
More expr1 * Expression
expr2 * Expression
LessOrEqual expr1 * Expression
expr2 * Expression
MoreOrEqual expr1 * Expression
expr2 * Expression
BitAnd expr1 * Expression
expr2 * Expression
BitOr expr1 * Expression
expr2 * Expression
BitXor expr1 * Expression
expr2 * Expression
ShiftLeft expr1 * Expression
expr2 * Expression
ShiftRight expr1 * Expression
expr2 * Expression
And expr1 * Expression
expr2 * Expression
Or expr1 * Expression
expr2 * Expression
Assign lval * Expression
rval * Expression
MultiplyAssign lval * Expression
rval * Expression
DivideAssign lval * Expression
rval * Expression
ModAssign lval * Expression
rval * Expression
AddAssign lval * Expression
rval * Expression
SubtractAssign lval * Expression
rval * Expression
ShiftLeftAssign lval * Expression
rval * Expression
ShiftRightAssign lval * Expression
rval * Expression
BitAndAssign lval * Expression
rval * Expression
BitXorAssign lval * Expression
rval * Expression
BitOrAssign lval * Expression
rval * Expression
StringLiteral val String The String in val is the literal string entered. "\n" isn't converted to a newline, for instance.
CharLiteral val String The String in val is the literal string entered. '\n' isn't converted to a newline, for instance.
CompoundLiteral type * Type

Here's an example:

(struct S){1, .x = 2, .y [3] .z = 4}

parses as:

CompoundLiteral
    type: Struct
        name: "S"
    member_inits: 
        - MemberInit
            init: IntLiteral
                val: 1
        - MemberInit
            member: 
                - Member
                    name: "x"
            init: IntLiteral
                val: 2
        - MemberInit
            member: 
                - Member
                    name: "y"
                - IntLiteral
                    val: 3
                - Member
                    name: "z"
            init: IntLiteral
                val: 4
member_inits * NodeList of MemberInit NodeArray[]
IntLiteral val Integer

Also:

  • #dec? -- return true iff format == :dec
  • #hex? -- return true iff format == :hex
  • #oct? -- return true iff format == :oct
format :dec, :hex, :oct :dec
FloatLiteral val Float
Pointer const? true, false
restrict? true, false
volatile? true, false
type * Type
Array const? true, false
restrict? true, false
volatile? true, false
type * Type
length * Expression
Function const? true, false
restrict? true, false
volatile? true, false
type * Type
params * NodeList of Parameter NodeArray[]
var_args? true, false
Struct const? true, false
restrict? true, false
volatile? true, false
name String
members * NodeList of Member NodeArray[]
Union const? true, false
restrict? true, false
volatile? true, false
name String
members * NodeList of Member NodeArray[]
Enum const? true, false
restrict? true, false
volatile? true, false
name String
members * NodeList of Enumerator
CustomType const? true, false For typedef'd names.
restrict? true, false
volatile? true, false
name String
Void const? true, false const is for things like const void *.
restrict? true, false
volatile? true, false
Int const? true, false Also:
  • #short? -- return true iff longness == -1
  • #plain? -- return true iff longness == 0
  • #long? -- return true iff longness == 1
  • #long_long? -- return true iff longness == 2
  • #signed? -- same as !unsigned?
  • #signed=(val) -- same as unsigned = !val
restrict? true, false
volatile? true, false
longness -1, 0, 1, 2 0
unsigned? true, false
Float const? true, false Also:
  • #plain? -- return true iff longness == 0
  • #double? -- return true iff longness == 1
  • #long_double? -- return true iff longness == 2
restrict? true, false
volatile? true, false
longness 0, 1, 2 0
Char const? true, false Also:
  • #signed? -- return true iff signed == true
  • #unsigned? -- return true iff signed == false
  • #plain? -- return true iff signed == nil
Yes, C99 says that char, signed char, and unsigned char are 3 distinct types (unlike with int -- go figure). Like Martian chalk and Venusian cheese: completely different, but you can fit 'em each in one byte.
restrict? true, false
volatile? true, false
signed true, false, nil
Bool const? true, false This is the rarely seen _Bool type.
restrict? true, false
volatile? true, false
Complex const? true, false

This is the rarely seen _Complex type.

  • #plain? -- return true iff longness == 0
  • #double? -- return true iff longness == 1
  • #long_double? -- return true iff longness == 2
restrict? true, false
volatile? true, false
longness 0, 1, 2 0
Imaginary const? true, false

This is the rarely seen _Imaginary type.

  • #plain? -- return true iff longness == 0
  • #double? -- return true iff longness == 1
  • #long_double? -- return true iff longness == 2
restrict? true, false
volatile? true, false
longness 0, 1, 2 0
BlockExpression block * Block Block.new Only if the block_expressions extension is enabled. See "Extensions" section below.
## Parser `C.parse` will use the default parser (`C.default_parser`), but you can also manage your own parser(s) if you need finer control over state. Parser state consists of: * `type_names`: a Set of Strings. As a parser eats `typedef`s, this grows. * `pos`: the `Node::Pos` this parser will start parsing at. A `Node::Pos` has three read-write attributes: `filename`, `line_num`, `col_num`. Default is nil, 1, 0. Note that the type names the parser has seen affects the parser! For example, consider: a * b; * If only `a` is a type, this is a declaration. * If neither `a` nor `b` are types, this is a multiplication statement. * Otherwise, it's a syntax error. You may append type names implicitly, by parsing `typedef`s, or explicitly like this: parser.type_names << 'Thing' << 'OtherThing' ### Parsing Snippets `C.parse` will parse the toplevel C construct, a `C::TranslationUnit`, but you can also parse other snippets of C: C::Statement.parse('while (not_looking) { paint_car(); }') C::Type.parse('void *(*)(int *(*)[][2], ...)') This works for both concrete and abstract `Node` subclasses. A `Parser` may be given as an optional second argument. ### Extensions to C99 * `Type`s are allowed as function arguments. This is needed to parse C99 macros like `va_arg()`. * `Block`s in parentheses are allowed as expressions ([a gcc extension][gcc-block-expressions]). You need to call `parser.enable_block_expressions` first. They appear as `BlockExpression` nodes. [gcc-block-expressions]: http://gcc.gnu.org/onlinedocs/gcc-4.2.1/gcc/Statement-Exprs.html#Statement-Exprs ## Parsing Full Programs This can be tricky for a number of reasons. Here are the issues you'll likely encounter. ### Preprocessing Directives that start with `#` are not handled by the `Parser`, as they're external to the C grammar. CAST ships with a `Preprocessor`, which wraps the preprocessor used to build your Ruby interpreter. cpp = C::Preprocessor.new cpp.include_path << '/usr/include' << /usr/local/include' cpp.macros['DEBUG'] = '1' cpp.macros['max(a, b)'] = '((a) > (b) ? (a) : (b))' cpp.preprocess(code) Note however, that preprocessors tend to leave vendor-specific extensions in their output. GNU `cpp`, for example, leaves "linemarkers" (lines that begin with `#`) in the output which you'll need to filter out manually before feeding it to a `Parser`. ### Built-in types Mac OS 10.5's system `cpp` for instance assumes the compiler will recognize types such as `__darwin_va_list`. ### Syntactic Extensions Some code may take advantage of compiler-specific extensions to the syntax. For example, `gcc` supports inline assembly via directives like: asm("movl %1, %%eax; "movl %%eax, %0;" :"=r"(y) :"r"(x) :"%eax"); Such code is fairly rare, so there is no direct support in CAST for this. You'll need to manually massage such constructs out of the `Parser` input. Or send me patches. Delicious patches. ## Contributing * [Bug reports](http://github.com/Vasfed/cast/issues) * [Source](http://github.com/Vasfed/cast) * Patches: Fork on Github, send pull request. * Include tests where practical. * Leave the version alone, or bump it in a separate commit. ## Copyright Copyright (c) George Ogata. See LICENSE for details.