RubygemsResearch

Sha256: cc553cdccdda0d35e25e642e1e22a8c4dd8b51cbe09ff02adcd66ee88cdb7bff

Contents?: true

Size: 1.25 KB

Versions: 4

Compression:

Stored size: 1.25 KB

Abstract
================================================================
A common programming task is data extraction from xml and html documents.  I introduce parsley, an embedded language (ala SQL, regular expressions) that improves the usability and/or speed of current extraction techniques.

Introduction
================================================================

Today, developers use a couple toolsets to do data extraction.  Many developers use libraries like Hpricot for Ruby and Beautiful Soup for Python.  These libraries allow extraction of xml subtrees via XPath or CSS selectors.  These subtrees are futher refined using the scripting language, often with the help of regular expressions.

Other developers use XSLT.  While fast, mature, and conceptually elegant, XSLT

- current techniques
- benefits of standardization
- best of current

Features
================================================================
- integrated grammars
  - with some expression examples
- multiple elements, one pass / context switching
- exslt / standard library
- json
- language integration
- pruning
- structural parsing

Examples
- Ruby/python/json
- structural parse
- 

Benchmarks
- size comparision with XSLT
- speed comparision with nokogiri, hpricot

Conclusion

Version data entries

4 entries across 4 versions & 4 rubygems

Version	Path
gtl-parsley-ruby-0.5.0	ext/parsley/PAPER
le1t0-parsley-ruby-0.4.5.001	ext/parsley/PAPER
parsley-ruby-0.4.5	ext/parsley/PAPER
edge-parsley-ruby-0.4.5	ext/parsley/PAPER