README.rdoc in muddyit_fu-0.2.11 vs README.rdoc in muddyit_fu-0.2.12

- old
+ new

@@ -1,24 +1,22 @@
 = muddyit_fu
 
 Muddy is an information extraction platform.  For further
 details see the '{Getting Started with Muddy}[http://blog.muddy.it/2009/11/getting-started-with-muddy]'
-article.  This gem provides access to the Muddy platform via it's API :
+article.  This gem provides access to the Muddy platform via it's API (see {Muddy Developer Guide}[http://muddy.it/developers/]).
 
-{Muddy Developer Guide}[http://muddy.it/developers/]
-
 == Installation
 
   sudo gem install gemcutter
   sudo gem tumble
   sudo gem install muddyit_fu
 
 == Authentication and authorisation
 
 Muddy supports OAuth and HTTP Basic auth for authentication and authorisation.
 We recommend you use OAuth wherever possible when accessing Muddy.  An example
-of using OAuth with the muddy platform is descibed in the 
+of using OAuth with the Muddy platform is described in the
 {Building with Muddy and OAuth}[http://blog.muddy.it/2010/01/building-with-muddy-and-oauth]
 article.
 
 === Example muddyit.yml for OAuth
 
@@ -57,23 +55,32 @@
 
 == Storing extraction results in a collection
 
 Muddy allows you to store the entity extraction results so aggregate operations
 can be performed over a collection of content (a 'collection' has many analysed 'pages').
-A basic muddy account provides a single 'collection' where extraction results
+A basic Muddy account provides a single 'collection' where extraction results
 can be stored.
 
 To store a page against a collection, the collection must first be found :
 
   collection = muddyit.collections.find(:all).first
 
 Once a collection has been found, entity extraction results can be stored in it:
 
   collection.pages.create('http://news.bbc.co.uk/1/hi/uk_politics/8011321.stm', {:minium_confidence => 0.2})
 
-== Viewing all analysed pages in a collection
+== Working with a collection
 
+A collection allows aggregate operations to be perfomed on itself and on it's
+members.  A collection is identified by it's 'collection token'.  This is an
+alphanumeric six character string (e.g. 'a0ret4').  A collection can be found if
+it's token is known :
+
+  collection = muddyit.collections.find('a0ret4')
+
+=== Viewing all analysed pages
+
 You can iterate through all the analysed pages in a collection, be aware that
 the Muddy API provides the pages as paginated sets, so it may take some time to
 page through a complete set of pages in a collection (due to repeated HTTP requests
 for each new paginated set of results).
 
@@ -85,29 +92,46 @@
     page.entities.each do |entity|
       puts "\t#{entity.uri}"
     end
   end
 
-== Working with a collection
+=== Finding a particular page or pages
 
-A collection allows aggregate operations to be perfomed on itself and on it's
-members.  A collection is identified by it's 'collection token'.  This is an
-alphanumeric six character string (e.g. 'a0ret4').  A collection can be found if
-it's token is known :
+Each page in a collection is assigned a unique alphanumeric identifier.  Whilst
+this can be used to find a given page in a collection, it is possible to search
+for the page using other attributes :
 
-  collection = muddyit.collections.find('a0ret4')
+  page = collection.pages.find('5d0e32b6-fd0b-400a-ac49-dae965a292df')
+  page = collection.pages.find(:all, :uri => 'http://news.bbc.co.uk/1/hi/business/8186840.stm').first
+  page = collection.pages.find(:all, :title => 'BBC NEWS | Business | ITV in 25m Friends Reunited sale').first
 
-=== View all pages containing 'Gordon Brown'
+=== Rereshing a page's results
 
-If we want to find all references to the grounded entity for 'Gordon Brown 'then
+A page can be 'refereshed' (the entity extraction is run again) by calling the
+refresh method on a page object :
+
+  page = collection.pages.find('5d0e32b6-fd0b-400a-ac49-dae965a292df')
+  updated_page = page.update
+
+=== Deleting a page from a collection
+
+A page can be removed from a collection by calling the 'destroy' method on a
+page object :
+
+  page = collection.pages.find('5d0e32b6-fd0b-400a-ac49-dae965a292df')
+  page.destroy
+
+=== View all pages containing entity 'Gordon Brown'
+
+If we want to find all pages that reference the grounded entity for 'Gordon Brown' then
 it can be searched for using it's DBpedia URI :
 
   require 'muddyit_fu'
   muddyit = Muddyit.new('./config.yml')
   collection = muddyit.collections.find('a0ret4')
   collection.pages.find_by_entity('http://dbpedia.org/resource/Gordon_Brown') do |page|
-    puts page.identifier
+    puts "#{page.identifier} - #{page.title}"
   end
 
 === Find related entities for 'Gordon Brown'
 
 To find other entities that occur frequently with 'Gordon Brown' in this
@@ -116,11 +140,11 @@
   require 'muddyit_fu'
   muddyit = Muddyit.new('./config.yml')
   collection = muddyit.collections.find('a0ret4')
   puts "Related entity\tOccurance
   collection.entities.find_related('http://dbpedia.org/resource/Gordon_Brown').each do |entry|
-    puts "#{entry[:enity].uri}\t#{entry[:count]}"
+    puts "#{entry[:entity].uri}\t#{entry[:count]}"
   end
 
 === Find related content for : http://news.bbc.co.uk/1/hi/uk_politics/7878418.stm
 
 To find other content in the collection that shares similar entities with the
@@ -132,9 +156,20 @@
   page = collection.pages.find(:all, :uri => 'http://news.bbc.co.uk/1/hi/uk_politics/7878418.stm').first
   puts "Page : #{page.title}\n\n"
   page.related_content.each do |results|
     puts "#{results[:page].title} #{results[:count]}"
   end
+
+== Batch processing content and the Muddy queue
+
+The Muddy platform runs a background job queue that allows many requests to be
+made in quick succession (rather than waiting for the full extraction request to
+complete), with analysis of the pages happening asynchronously via the queue
+and being stored in the collection at a later time.  This can be useful when trying
+to analyse large content collections.  To send a request to the queue use :
+
+  collection = muddyit.collections.find('a0ret4')
+  collection.pages.create('http://news.bbc.co.uk/1/hi/uk_politics/8011321.stm', {:realtime => false})
 
 == Contact
 
   Author: Rob Lee
   Email: support [at] muddy.it