README.md in html2rss-0.8.0 vs README.md in html2rss-0.8.1

- old
+ new

@@ -13,11 +13,11 @@ With the _feed config_ containing the URL to scrape and CSS selectors for information extraction (like title, URL, ...) your RSS builds. [Extractors](#using-extractors) and chain-able [post processors](#using-post-processors) make information extraction, processing and sanitizing a breeze. -[Scraping JSON](#scraping-json) responses and +[Scraping JSON](#scraping-and-handling-json-responses) responses and [setting HTTP request headers](#set-any-http-header-in-the-request) is supported, too. ## Installation @@ -34,14 +34,11 @@ ```ruby require 'html2rss' rss = Html2rss.feed( - channel: { - title: 'StackOverflow: Hot Network Questions', - url: 'https://stackoverflow.com/questions' - }, + channel: { url: 'https://stackoverflow.com/questions' }, selectors: { items: { selector: '#hot-network-questions > ul > li' }, title: { selector: 'a' }, link: { selector: 'a', extractor: 'href' } } @@ -55,17 +52,19 @@ **Looks too complicated?** See [`html2rss-configs`](https://github.com/gildesmarais/html2rss-configs) for ready-made feed configs! ### The `channel` -| attribute | | type | remark | -| ------------- | -------- | ------- | ----------------------- | -| `title` | required | String | | -| `url` | required | String | | -| `ttl` | optional | Integer | time to live in minutes | -| `description` | optional | String | | -| `headers` | optional | Hash | See notes below. | +| attribute | | type | default | remark | +| ------------- | -------- | ------- | -------------: | ------------------------------------------ | +| `url` | required | String | | | +| `title` | optional | String | auto-generated | | +| `description` | optional | String | auto-generated | | +| `ttl` | optional | Integer | `360` | TTL in _minutes_ | +| `time_zone` | optional | String | `'UTC'` | TimeZone name | +| `headers` | optional | Hash | `{}` | Set HTTP request headers. See notes below. | +| `json` | optional | Boolean | `false` | Handle JSON response. See notes below. | ### The `selectors` You must provide an `items` selector hash which contains the CSS selector. `items` needs to return a collection of HTML tags. @@ -76,22 +75,22 @@ each item has to have at least a `title` or a `description`. Your `selectors` can contain arbitrary selector names, but only these will make it into the RSS feed: -| RSS 2.0 tag | name in html2rss | remark | -| ------------- | ---------------- | --------------------------- | -| `title` | `title` | | -| `description` | `description` | Supports HTML. | -| `link` | `link` | A URL. | -| `author` | `author` | | -| `category` | `categories` | See notes below. | -| `enclosure` | `enclosure` | See notes below. | -| `pubDate` | `update` | An instance of `Time`. | -| `guid` | `guid` | Generated from the `title`. | -| `comments` | `comments` | A URL. | -| `source` | ~~source~~ | Not yet supported. | +| RSS 2.0 tag | name in `html2rss` | remark | +| ------------- | ------------------ | --------------------------- | +| `title` | `title` | | +| `description` | `description` | Supports HTML. | +| `link` | `link` | A URL. | +| `author` | `author` | | +| `category` | `categories` | See notes below. | +| `enclosure` | `enclosure` | See notes below. | +| `pubDate` | `update` | An instance of `Time`. | +| `guid` | `guid` | Generated from the `title`. | +| `comments` | `comments` | A URL. | +| `source` | ~~source~~ | Not yet supported. | ### The `selector` hash Your selector hash can have these attributes: @@ -223,11 +222,11 @@ </details> ## Adding `<category>` tags to an item -The `categories` selector takes an array of selector names. The value of those +The `categories` selector takes an array of selector names. Each value of those selectors will become a `<category>` on the RSS item. <details> <summary>See a Ruby example</summary> @@ -266,15 +265,15 @@ </details> ## Adding an `<enclosure>` tag to an item -An enclosure can be 'anything', e.g. a image, audio or video file. +An enclosure can be any file, e.g. a image, audio or video. The `enclosure` selector needs to return a URL of the content to enclose. If the extracted URL is relative, it will be converted to an absolute one using the channel's URL as base. -Since html2rss does no further inspection of the enclosure, its support comes with trade-offs: +Since `html2rss` does no further inspection of the enclosure, its support comes with trade-offs: 1. The content-type is guessed from the file extension of the URL. 2. If the content-type guessing fails, it will default to `application/octet-stream`. 3. The content-length will always be undetermined and thus stated as `0` bytes. @@ -308,11 +307,11 @@ attribute: "src" ``` </details> -## Scraping JSON +## Scraping and handling JSON responses Although this gem is called **html**​*2rss*, it's possible to scrape and process JSON. Adding `json: true` to the channel config will convert the JSON response to XML. @@ -483,10 +482,10 @@ Find a full example of a `config.yml` at [`spec/config.test.yml`](https://github.com/gildesmarais/html2rss/blob/master/spec/config.test.yml). ## Gotchas and tips & tricks - Check that the channel URL does not redirect to a mobile page with a different markup structure. -- Do not rely on your web browser's developer console. html2rss does not execute JavaScript. +- Do not rely on your web browser's developer console. `html2rss` does not execute JavaScript. - Fiddling with [`curl`](https://github.com/curl/curl) and [`pup`](https://github.com/ericchiang/pup) to find the selectors seems efficient (`curl URL | pup`). - [CSS selectors are quite versatile, here's an overview.](https://www.w3.org/TR/selectors-4/#overview) ## Development