# Mercury Parser A tiny ruby wrapper for [Mercury's Web Parser](https://mercury.postlight.com/web-parser/) [![Gem Version](https://badge.fury.io/rb/mercury_parser.png)](http://badge.fury.io/rb/mercury_parser) [![Code Climate](https://codeclimate.com/github/moisesnarvaez/mercury_parser.png)](https://codeclimate.com/github/moisesnarvaez/mercury_parser) [![Dependency Status](https://gemnasium.com/moisesnarvaez/mercury_parser.png)](https://gemnasium.com/moisesnarvaez/mercury_parser) [![Build Status](https://travis-ci.org/moisesnarvaez/mercury_parser.png)](https://travis-ci.org/moisesnarvaez/mercury_parser) ## Installation Add this line to your application's Gemfile: gem 'mercury_parser' And then execute: bundle install ## Configuration Set the Api Key: ```ruby MercuryParser.api_key = MERCURY_API_KEY ``` Make sure to set `MERCURY_API_KEY` in your environement variables. You can get an API key by contacting Mercury's team directly, more information on their [web parser page](https://mercury.postlight.com/web-parser/). Multiple tokens or multithreaded usage: ```ruby client = MercuryParser::Client.new(api_key: MERCURY_API_KEY) ``` ## Usage ### Parse Parse a webpage and return its main content: ```ruby article = MercuryParser.parse("https://trackchanges.postlight.com/building-awesome-cms-f034344d8ed") => #

Awesome CMS is…an awesome list of awesome CMSes. It’s on GitHub, so anyone can add to it via a pull request. Here are some notes on how and why it came to be.

GitHub has a set of powerful commands for narrowing search results. In seeking out modern content management tools, I used queries like this:

cms OR “content management” OR admin pushed:>2016–01–01 stars:>50

Sorting by stars, I worked my way backwards. I was able to quickly spot relevant CMS projects. I also started to notice some trends.

I knew the list of all popular content management systems would be huge. I didn’t want to put that data into Markdown directly, as it would be difficult to maintain and to augment with extra data (stars on GitHub, last push date, tags, etc).

Instead, I opted to store the data in TOML, a human-friendly configuration file language. You can view all of the data that powers Awesome CMS in the data folder. Here’s WordPress’ entry in that file:

[[cms]]
name = "WordPress"
description = "WordPress is a free and open-source content management system (CMS) based on PHP and MySQL."
url = "https://wordpress.org"
github_repo = "WordPress/WordPress"
awesome_repo = "miziomon/awesome-wordpress"
language = "php"

I process this file using JavaScript in generateReadme.js. It handles processing the TOML, fetching information from GitHub, and generating the final README.md file using the Handlebars template. I’m scraping GitHub for star counts because GitHub’s API only allows for 60 requests an hour for authenticated users. We want to make it as easy as possible for anyone to contribute. Requiring users to generate a GitHub authentication token to generate the README wasn’t an option.

By storing the data in TOML at generating the README.md using JavaScript, I’ve essentially created an incredibly light-weight, GitHub backed, static CMS to power Awesome CMS.

I heard you like content management systems
", author="Jeremy Mack", date_published="2016-10-03T12:48:58.385Z", lead_image_url="https://d262ilb51hltx0.cloudfront.net/max/1200/1*zo51eqdjJ_XSU0D8Vm8P9A.png", dek=nil, next_page_url=nil, url="https://trackchanges.postlight.com/building-awesome-cms-f034344d8ed", domain="trackchanges.postlight.com", excerpt="Awesome CMS is…an awesome list of awesome CMSes. It’s on GitHub, so anyone can add to it via a pull request.", word_count=397, direction="ltr", total_pages=1, rendered_pages=1> article.title article.content article.author article.date_published article.lead_image_url article.dek article.next_page_url article.url article.domain article.excerpt article.word_count article.direction article.total_pages article.rendered_pages ``` ## Contributing 1. Fork it 2. [Create a topic branch](http://learn.github.com/p/branching.html) 3. Add specs for your unimplemented modifications 4. Run `bundle exec rspec`. If specs pass, return to step 3. 5. Implement your modifications 6. Run `bundle exec rspec`. If specs fail, return to step 5. 7. Commit your changes and push 8. [Submit a pull request](http://help.github.com/send-pull-requests/) ## Inspiration Based on: [ReadabilityParserGem](https://github.com/phildionne/readability_parser) ## Author [Moises Narvaez](http://www.moisesnarvaez.com) ## Copyright Copyright (c) 2016 Moises Narvaez ## License MIT License Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions: The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software. THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.