= Mida
* {Mida Project Page}[http://lawrencewoodman.github.com/mida]
* {Mida Github Repository}[https://github.com/LawrenceWoodman/mida]
* {Mida Bug Tracker}[https://github.com/LawrenceWoodman/mida/issues]
== Description
A Microdata[http://en.wikipedia.org/wiki/Microdata_(HTML5)] parser and
extractor library for ruby.
This is based on the latest Published version of the Microdata Specification
dated {5th April 2011}[http://www.w3.org/TR/2011/WD-microdata-20110405/].
== Installation
Mida keeps RubyGems[http://rubygems.org/gems/mida] up-to-date with its latest version, so installing is as easy as:
gem install mida
=== Requirements:
* +Nokogiri+
== Command Line Usage
To use the command line tool, supply it with the urls or filenames that you
would like to be parsed (by default each item is output as yaml):
mida http://lawrencewoodman.github.com/mida/news/
If you want to search for specific types you can use the -t switch
followed by a Regular Expression:
mida -t /person/i http://lawrencewoodman.github.com/mida/news/
For more information look at mida's help:
mida -h
== Library Usage
The following examples assume that you have required +mida+ and
open-uri.
=== Extracting Microdata from a page
All the Microdata is extracted from a page when a new Mida::Document instance
is created.
To extract all the Microdata from a webpage:
url = 'http://example.com'
open(url) {|f| doc = Mida::Document.new(f, url)}
The top-level +Items+ will be held in an array accessible via
doc.items.
To simply list all the top-level +Items+ that have been found:
puts doc.items
=== Searching
If you want to search for an +Item+ that has a specific +itemtype+/vocabulary
this can be done with the +search+ method.
To return all the +Items+ that use one of Google's Review vocabularies:
doc.search(%r{http://data-vocabulary\.org.*?review.*?}i)
=== Inspecting an +Item+
Each +Item+ is a Mida::Item instance and has four main methods of
interest: +type+, +vocabulary+, +properties+ and +id+.
To find out the +itemtype+ of the +Item+:
puts doc.items.first.type
To find out the +itemid+ of the +Item+:
puts doc.items.first.id
Properties are returned as a hash containing name/values pairs. The
values will be an array of either +String+ or Mida::Item instances.
To see the +properties+ of the +Item+:
puts doc.items.first.properties
=== Working with Vocabularies
Mida allows you to define vocabularies, so that input data can be constrained to match
expected patterns. By default a generic vocabulary (Mida::GenericVocabulary)
is registered which will match against any +itemtype+ with any number of properties.
If you want to specify a vocabulary you create a class derived from Mida::Vocabulary
and use +itemtype+, +has_one+, +has_many+ and +extract+ to describe the vocabulary.
As an example the following describes a subset of Google's Review vocabulary:
class Rating < Mida::Vocabulary
itemtype %r{http://data-vocabulary.org/rating}i
has_one 'best'
has_one 'worst'
has_one 'value'
end
class Review < Mida::Vocabulary
itemtype %r{http://data-vocabulary.org/review}i
has_one 'itemreviewed'
has_one 'rating' do
extract Rating, Mida::DataType::Text
end
end
When you create a subclass of Mida::Vocabulary it automatically
registers the Vocabulary.
Now if Mida is parsing some input and manages to match against the +Review+ +itemtype+, it
will only allow the specified properties and will reject any that don't have the correct number. It
will also set Item#vocabulary accordingly, e.g.
doc.items.first.vocabulary # => Review
== Bugs/Feature Requests
If you find a bug or want to make a feature request, please report it at the
Mida project's {issues tracker}[https://github.com/LawrenceWoodman/mida/issues]
on github.
== License
Copyright (c) 2011 Lawrence Woodman.
This software is licensed under the MIT License. Please see the file, LICENSE.rdoc, for details.