README.md in html2rss-0.6.0 vs README.md in html2rss-0.7.0

- old
+ new

@@ -18,71 +18,151 @@ Add this line to your application's Gemfile: `gem 'html2rss'` Then execute: `bundle` ```ruby -rss = Html2rss.feed( - channel: { title: 'StackOverflow: Hot Network Questions', url: 'https://stackoverflow.com/questions' }, - selectors: { - items: { selector: '#hot-network-questions > ul > li' }, - title: { selector: 'a' }, - link: { selector: 'a', extractor: 'href' } - } -) +rss = + Html2rss.feed( + channel: { title: 'StackOverflow: Hot Network Questions', url: 'https://stackoverflow.com/questions' }, + selectors: { + items: { selector: '#hot-network-questions > ul > li' }, + title: { selector: 'a' }, + link: { selector: 'a', extractor: 'href' } + } + ) puts rss.to_s ``` ## Usage with a YAML config file -Create a YAML config file. Find an example at [`rspec/config.test.yml`](https://github.com/gildesmarais/html2rss/blob/master/spec/config.test.yml). +Create a YAML config file. Find an example at [`spec/config.test.yml`](https://github.com/gildesmarais/html2rss/blob/master/spec/config.test.yml). -`Html2rss.feed_from_yaml_config(File.join(['spec', 'config.test.yml']), 'nuxt-releases')` returns +`Html2rss.feed_from_yaml_config(File.join(['spec', 'config.test.yml']), 'nuxt-releases')` +returns an `RSS:Rss` object. -an `RSS:Rss` object. - **Too complicated?** See [`html2rss-configs`](https://github.com/gildesmarais/html2rss-configs) for ready-made feed configs! +## Assigning categories to an item + +The `categories` selector takes an array of selector names. The value of those +selectors will become a category on the item. + +<details> + <summary>See a YAML config example</summary> + +```yml +channel: +# ... omitted +selectors: + #... omitted + genre: + selector: '.genre' + branch: + selector: '.branch' + categories: + - genre + - branch +``` + +</details> + +## Adding an enclosure to each item + +An enclosure can be 'anything', e.g. a image, audio or video file. + +The config's `enclosure` selector needs to return a URL of the content to enclose. If the extracted URL is relative, it will be converted to an absolute one using the channel's url as a base. + +Since html2rss does no further inspection of the enclosure, the support of this tag comes with trade-offs: + +1. The content-type is guessed from the file extension of the URL. +2. If the content-type guessing fails, it will default to `application/octet-stream`. +3. The content-length will always be undetermined and thus stated as `0` bytes. + +Read the [RSS 2.0 spec](http://www.rssboard.org/rss-profile#element-channel-item-enclosure) for further information on enclosing content. + +<details> + <summary>See a YAML config example</summary> + +```yml +channel: +# ... omitted +selectors: + #... omitted +enclosure: + selector: 'img' + extractor: 'attribute' + attribute: 'src' +``` + +</details> + ## Scraping JSON -Since 0.5.0 it is possible to scrape and process JSON. +Since 0.5.0 it's possible to scrape and process JSON. Adding `json: true` to the channel config will convert the JSON response to XML. -Feed config: +<details> + <summary>See a YAML feed config example</summary> ```yaml channel: url: https://example.com - title: "Example with JSON" + title: 'Example with JSON' json: true # ... ``` -Imagine this HTTP response: +</details> +Under the hood it uses ActiveSupport's [`Hash.to_xml`](https://apidock.com/rails/Hash/to_xml) core extension for the JSON to XML conversion. + +### Conversion of JSON objects + +This JSON object: + ```json { "data": [{ "title": "Headline", "url": "https://example.com" }] } ``` will be converted to: ```xml -<html> +<hash> <data> <datum> <title>Headline</title> <url>https://example.com</url> </datum> </data> -</html> +</hash> ``` -Your items selector would be `data > datum`, the item's link selector would be `url`. +Your items selector would be `data > datum`, the item's `link` selector would be `url`. -Under the hood it uses ActiveSupport's [`Hash.to_xml`](https://apidock.com/rails/Hash/to_xml) core extension for the JSON to XML conversion. +### Conversion of JSON arrays + +This JSON array: + +```json +[{ "title": "Headline", "url": "https://example.com" }] +``` + +will be converted to: + +```xml +<objects> + <object> + <title>Headline</title> + <url>https://example.com</url> + </object> +</objects> +``` + +Your items selector would be `objects > object`, the item's `link` selector would be `url`. ## Set any HTTP header in the request You can add any HTTP headers to the request to the channel URL. You can use this to e.g. have Cookie or Authorization information being sent or to overwrite the User-Agent.