--- title: New release of python module author: Onur Çelebi authorURL: https://research.fb.com/people/celebi-onur/ authorFBID: 663146146 --- Today, we are happy to release a new version of the fastText python library. The main goal of this release is to merge two existing python modules: the official `fastText` module which was available on our github repository and the unofficial `fasttext` module which was available on pypi.org. We hope that this new version will address the confusion due to the previous existence of two similar, but different, python modules. The new version of our library is now available on [pypi.org](https://pypi.org/project/fasttext/) as well as on our github repository, and you can find [an overview of its API here](/docs/en/python-module.html). fastText vs fasttext: what happened? ---------------------------------- There was an ongoing confusion among our user community about the existence of both `fastText` and `fasttext` modules. When fastText was first released in 2016, it was a command line only utility. Very soon, people wanted to use fastText's capabilities from python without having to call a binary for each action. In August 2016, [Bayu Aldi Yansyah](https://github.com/pyk), a developer outside of Facebook, published a python wrapper of fastText. His work was very helpful to a lot of people in our community and he published his unofficial python library on pypi with the pretty straighforward module name `fasttext` (note the lowercase `t`). Later, our team began to work on an official python binding of fastText, that was published under the same github repository as the C++ source code. However, the module name for this official library was `fastText` (note the uppercase `T`). Last year, Bayu Aldi Yansyah gave us admin access to the pypi project so that we could merge the two libraries. To sum up, we ended up with two libraries that had: - almost the same name - different APIs - different versions - different ways to install That was a very confusing situation for the community. What actions did we take? -------------------------- Today we are merging the two python libraries. We decided to keep the official API and top level functions such as `train_unsupervised` and `train_supervised` as well as returning numpy objects. We remove `cbow`, `skipgram` and `supervised` functions from the unofficial API. However, [we bring nice ideas](#wordvectormodel-and-supervisedmodel-objects) from the unofficial API to the official one. In particular, we liked the pythonic approach of `WordVectorModel`. This new python module is named `fasttext`, and is available on both [pypi](https://pypi.org/project/fasttext/) and our [github](https://github.com/facebookresearch/fastText) repository. From now, we will refer to the tool as "fastText", however the name of the python module is `fasttext`. What is the right way to do now? -------------------------------- Before, you would either use `fastText` (uppercase `T`): ```python import fastText # and call: fastText.train_supervised fastText.train_unsupervised ``` or use `fasttext` (lowercase `t`): ```python import fasttext # and call: fasttext.cbow fasttext.skipgram fasttext.supervised ``` Now, the right way to do is to `import fasttext` (lowercase `t`) and use ```python import fasttext # and call: fasttext.train_supervised fasttext.train_unsupervised ``` We are keeping the lowercase `fasttext` module name, while we keep the `fastText` API. This is because: - the standard way to name python modules is all lowercases - the API from `fastText` is exposing numpy arrays, which is widely used by the machine learning community. You can find a more comprehensive overview of our python API [here](/docs/en/python-module.html). Should I modify my existing code? --------------------------------- Depending on the version of the python module you were using, you might need to do some little modifications on your existing code. ### 1) You were using the official `fastText` module: You don't have to do much. Just replace your `import fastText` lines by `import fasttext` and everything should work as usual. ### 2) You were using the unofficial `fasttext` module: If you were using the functions `cbow`, `skipgram`, `supervised` and/or `WordVectorModel`, `SupervisedModel` objects, you were using the unofficial `fasttext` module. Updating your code should be pretty straightforward, but it still implies some little changes. #### `cbow` function: use `train_unsupervised` instead. For example, replace: ``` fasttext.cbow("train.txt", "model_file", lr=0.05, dim=100, ws=5, epoch=5) ``` with ``` model = fasttext.train_unsupervised("train.txt", model='cbow', lr=0.05, dim=100, ws=5, epoch=5) model.save_model("model_file.bin") ``` #### `skipgram` function: use `train_unsupervised` instead. For example, replace: ``` fasttext.skipgram("train.txt", "model_file", lr=0.05, dim=100, ws=5, epoch=5) ``` with ``` model = fasttext.train_unsupervised("train.txt", model='skipgram', lr=0.05, dim=100, ws=5, epoch=5) model.save_model("model_file.bin") ``` #### `supervised` function: use `train_supervised` instead For example, replace: ``` fasttext.supervised("train.txt", "model_file", lr=0.1, dim=100, epoch=5, word_ngrams=2, loss='softmax') ``` with ``` model = fasttext.train_supervised("train.txt", lr=0.1, dim=100, epoch=5, , word_ngrams=2, loss='softmax') model.save_model("model_file.bin") ``` #### Parameters - As you can see, you can use either `word_ngrams` or `wordNgrams` as parameter name. Because the parameter names from the unofficial API are mapped to the official ones: `min_count` to `minCount`, `word_ngrams` to `wordNgrams`, `lr_update_rate` to `lrUpdateRate`, `label_prefix` to `label` and `pretrained_vectors` to `pretrainedVectors`. - `silent` parameter is not supported. Use `verbose` parameter instead. - `encoding` parameter is not supported, every input should be encoded in `utf-8`. ### `WordVectorModel` and `SupervisedModel` objects Instead of `WordVectorModel` and `SupervisedModel` objects, we return a model object that mimics some nice ideas from the unofficial API. ```python model = fasttext.train_unsupervised("train.txt", model='skipgram') print(model.words) # list of words in dictionary print(model['king']) # get the vector of the word 'king' print('king' in model) # check if a word is in dictionary ``` ```python model = fasttext.train_supervised("train.txt") print(model.words) # list of words in dictionary print(model.labels) # list of labels ``` The model object also contains the arguments of the training: ```python print(model.epoch) print(model.loss) print(model.wordNgrams) ``` Thank you! ------------ We want to thank our incredible community. We truly appreciate your feedback, a big thank you to everyone reporting issues and contributing to the project. In particular we want to express how grateful we are to [Bayu Aldi Yansyah](https://github.com/pyk) who did a great job with his python library and for giving us the ownership of the pypi `fasttext` project.