# twitter-text ![](https://img.shields.io/gem/v/twitter-text.svg) This is the Ruby implementation of the twitter-text parsing library. The library has methods to parse Tweets and calculate length, validity, parse @mentions, #hashtags, URLs, and more. ## Setup Installation uses bundler. ``` % gem install bundler % bundle install ``` ## Conformance tests To run the Conformance test suite from the command line via rake: ``` % rake test:conformance:run ``` You can also run the rspec tests in the `spec` directory: ``` % rspec spec ``` # Length validation twitter-text 2.0 introduces configuration files that define how Tweets are parsed for length. This allows for backwards compatibility and flexibility going forward. Old-style traditional 140-character parsing is defined by the v1.json configuration file, whereas v2.json is updated for "weighted" Tweets where ranges of Unicode code points can have independent weights aside from the default weight. The sum of all code points, each weighted appropriately, should not exceed the max weighted length. Some old methods from twitter-text 1.0 have been marked deprecated, such as the `tweet_length()` method. The new API is based on the following method, `parse_tweet()` ```ruby def parse_tweet(text, options = {}) { ... } ``` This method takes a string as input and returns a results object that contains information about the string. `Twitter::Validation::ParseResults` object includes: * `:weighted_length`: the overall length of the tweet with code points weighted per the ranges defined in the configuration file. * `:permillage`: indicates the proportion (per thousand) of the weighted length in comparison to the max weighted length. A value > 1000 indicates input text that is longer than the allowable maximum. * `:valid`: indicates if input text length corresponds to a valid result. * `:display_range_start, :display_range_end`: An array of two unicode code point indices identifying the inclusive start and exclusive end of the displayable content of the Tweet. For more information, see the description of `display_text_range` here: [Tweet updates](https://developer.twitter.com/en/docs/tweets/tweet-updates) * `:valid_range_start, :valid_range_end`: An array of two unicode code point indices identifying the inclusive start and exclusive end of the valid content of the Tweet. For more information on the extended Tweet payload see [Tweet updates](https://developer.twitter.com/en/docs/tweets/tweet-updates) ## Extraction Examples # Extraction ```ruby class MyClass include Twitter::Extractor usernames = extract_mentioned_screen_names("Mentioning @twitter and @jack") # usernames = ["twitter", "jack"] end ``` ### Extraction with a block argument ```ruby class MyClass include Twitter::Extractor extract_reply_screen_name("@twitter are you hiring?").do |username| # username = "twitter" end end ``` ## Auto-linking Examples ### Auto-link ```ruby class MyClass include Twitter::Autolink html = auto_link("link @user, please #request") end ``` ### For Ruby on Rails you want to add this to app/helpers/application_helper.rb ```ruby module ApplicationHelper include Twitter::Autolink end ``` ### Now the auto_link function is available in every view. So in index.html.erb: ```ruby <%= auto_link("link @user, please #request") %> ``` ### Usernames Username extraction and linking matches all valid Twitter usernames but does not verify that the username is a valid Twitter account. ### Lists Auto-link and extract list names when they are written in @user/list-name format. ### Hashtags Auto-link and extract hashtags, where a hashtag can contain most letters or numbers but cannot be solely numbers and cannot contain punctuation. ### URLs Asian languages like Chinese, Japanese or Korean may not use a delimiter such as a space to separate normal text from URLs making it difficult to identify where the URL ends and the text starts. For this reason twitter-text currently does not support extracting or auto-linking of URLs immediately followed by non-Latin characters. Example: "http://twitter.com/は素晴らしい" . The normal text is "は素晴らしい" and is not part of the URL even though it isn't space separated. ### International Special care has been taken to be sure that auto-linking and extraction work in Tweets of all languages. This means that languages without spaces between words should work equally well. ### Hit Highlighting Use to provide emphasis around the "hits" returned from the Search API, built to work against text that has been auto-linked already. ## Issues Have a bug? Please create an issue here on GitHub! ## Authors ### V2.0 * David LaMacchia () * Yoshimasa Niwa () * Sudheer Guntupalli () * Kaushik Lakshmikanth () * Jose Antonio Marquez Russo () * Lee Adams () ### Previous authors * Matt Sanford () * Raffi Krikorian () * Ben Cherry () * Patrick Ewing () * Jeff Smick () * Kenneth Kufluk () * Keita Fujii () * Jean-Philippe Bougie () * Erik Michaels-Ober () ## License Copyright 2012-2017 Twitter, Inc and other contributors Licensed under the [Apache License, Version 2.0](http://www.apache.org/licenses/LICENSE-2.0)