# BlingFire

[BlingFire](https://github.com/microsoft/BlingFire) - high speed text tokenization - for Ruby

## Installation

Add this line to your application’s Gemfile:

```ruby
gem 'blingfire'
```

## Getting Started

Create a model

```ruby
model = BlingFire::Model.new
```

Tokenize words

```ruby
model.text_to_words(text)
```

Tokenize sentences

```ruby
model.text_to_sentences(text)
```

## Pre-trained Models

BlingFire comes with a default model that follows the tokenization logic of NLTK with a few changes. You can also download other models:

- [BERT Base](https://github.com/microsoft/BlingFire/blob/master/dist-pypi/blingfire/bert_base_tok.bin)
- [BERT Base Cased](https://github.com/microsoft/BlingFire/blob/master/dist-pypi/blingfire/bert_base_cased_tok.bin)
- [BERT Chinese](https://github.com/microsoft/BlingFire/blob/master/dist-pypi/blingfire/bert_chinese.bin)
- [BERT Multilingual Cased](https://github.com/microsoft/BlingFire/blob/master/dist-pypi/blingfire/bert_multi_cased.bin)
- [WBD](https://github.com/microsoft/BlingFire/blob/master/dist-pypi/blingfire/wbd_chuni.bin)

Load a model

```ruby
model = BlingFire.load_model("bert_base_tok.bin")
```

Convert text to ids

```ruby
model.text_to_ids(text)
```

## History

View the [changelog](https://github.com/ankane/blingfire/blob/master/CHANGELOG.md)

## Contributing

Everyone is encouraged to help improve this project. Here are a few ways you can help:

- [Report bugs](https://github.com/ankane/blingfire/issues)
- Fix bugs and [submit pull requests](https://github.com/ankane/blingfire/pulls)
- Write, clarify, or fix documentation
- Suggest or add new features

To get started with development:

```sh
git clone https://github.com/ankane/blingfire.git
cd blingfire
bundle install
bundle exec rake vendor:all
bundle exec rake test
```