html2rss
Request HTML from an URL and transform it to a Ruby RSS 2.0 object.
Are you searching for a ready to use “website to RSS” solution? Check out html2rss-web!
Each website needs a feed config which contains the URL to scrape and CSS selectors to extract the required information (like title, URL, …). This gem provides extractors (e.g. extract the information from an HTML attribute) and chainable post processors to make information retrieval even easier.
Installation
Add this line to your application's Gemfile: gem 'html2rss'
Then execute: bundle
rss = Html2rss.feed(
channel: { title: 'StackOverflow: Hot Network Questions', url: 'https://stackoverflow.com/questions' },
selectors: {
items: { selector: '#hot-network-questions > ul > li' },
title: { selector: 'a' },
link: { selector: 'a', extractor: 'href' }
}
)
puts rss.to_s
Usage with a YAML config file
Create a YAML config file. Find an example at rspec/config.test.yml.
Html2rss.feed_from_yaml_config(File.join(['spec', 'config.test.yml']), 'nuxt-releases')
returns
an RSS:Rss
object.
Too complicated? See html2rss-configs for ready-made feed configs!
Development
After checking out the repo, run bin/setup
to install dependencies. Then, run rake spec
to run the tests. You can also run bin/console
for an interactive prompt that will allow you to experiment.
Contributing
Bug reports and pull requests are welcome on GitHub at github.com/gildesmarais/html2rss.
Releasing a new version
-
git pull
-
increase version in
lib/version.rb
-
bundle
-
commit the changes
-
git tag v....
-
git push; git push --tags
-
update the changelog, commit and push
Changelog generation
The CHANGELOG.md
can be generated automatically with standard-changelog.