README.md in pragmatic_segmenter-0.3.0 vs README.md in pragmatic_segmenter-0.3.1
- old
+ new
@@ -708,10 +708,11 @@
* *Unsupervised Multilingual Sentence Boundary Detection* - Tibor Kiss and Jan Strunk (2005) [[pdf](http://www.linguistics.ruhr-uni-bochum.de/~strunk/ks2005FINAL.pdf) | [mirror](https://s3.amazonaws.com/tm-town-nlp-resources/ks2005FINAL.pdf)]
* *An Analysis of Sentence Boundary Detection Systems for English and Portuguese Documents* - Carlos N. Silla Jr. and Celso A. A. Kaestner (2004) [[pdf](https://www.cs.kent.ac.uk/pubs/2004/2930/content.pdf) | [mirror](https://s3.amazonaws.com/tm-town-nlp-resources/An+Analysis+of+Sentence+Boundary+Detection+Systems+for+English+and+Portuguese+Documents.pdf)]
* *Periods, Capitalized Words, etc.* - Andrei Mikheev (2002) [[pdf](https://s3.amazonaws.com/tm-town-nlp-resources/cl-prop.pdf)]
* *Scaled log likelihood ratios for the detection of abbreviations in text corpora* - Tibor Kiss and Jan Strunk (2002) [[pdf](http://www.linguistics.ruhr-uni-bochum.de/~kiss/publications/abbrev.pdf) | [mirror](https://s3.amazonaws.com/tm-town-nlp-resources/abbrev.pdf)]
* *Viewing sentence boundary detection as collocation identification* - Tibor Kiss and Jan Strunk (2002) [[pdf](http://www.linguistics.rub.de/~kiss/publications/07v-kiss.pdf) | [mirror](https://s3.amazonaws.com/tm-town-nlp-resources/07v-kiss.pdf)]
+* *Automatic Sentence Break Disambiguation for Thai* - Paisarn Charoenpornsawat and Virach Sornlertlamvanich (2001) [[pdf](http://www.cs.cmu.edu/~paisarn/papers/iccpol2001.pdf) | [mirror](https://s3.amazonaws.com/tm-town-nlp-resources/iccpol2001.pdf)]
* *Sentence Boundary Detection: A Comparison of Paradigms for Improving MT Quality* - Daniel J. Walker, David E. Clements, Maki Darwin and Jan W. Amtrup (2001) [[pdf](https://www.cs.kent.ac.uk/pubs/2004/2930/content.pdf) | [mirror](https://s3.amazonaws.com/tm-town-nlp-resources/walker.pdf)]
* *A Sentence Boundary Detection System* - Wendy Chen (2000) [[ppt](www.deg.byu.edu/presentations/SpResConf00.chen/SpResConf00.ppt) | [mirror](https://s3.amazonaws.com/tm-town-nlp-resources/SpResConf00.ppt)]
* *Tagging Sentence Boundaries* - Andrei Mikheev (2000) [[pdf](http://www.aclweb.org/anthology/A00-2035) | [mirror](https://s3.amazonaws.com/tm-town-nlp-resources/A00-2035.pdf)]
* *Automatic Extraction of Rules For Sentence Boundary Disambiguation* - E. Stamatatos, N. Fakotakis, AND G. Kokkinakis (1999) [[pdf](https://s3.amazonaws.com/tm-town-nlp-resources/Automatic+Extraction+of+Rules+For+Sentence+Boundary+Disambiguation.pdf)]
* *A Maximum Entropy Approach to Identifying Sentence Boundaries* - Jeffrey C. Reynar and Adwait Ratnaparkhi (1997) [[pdf](https://www.aclweb.org/anthology/A/A97/A97-1004.pdf) | [mirror](https://s3.amazonaws.com/tm-town-nlp-resources/A97-1004.pdf)]
@@ -723,10 +724,11 @@
## TODO
* Add additional language support
* Add abbreviation lists for any languages that do not currently have one (only relevant for languages that have the concept of abbreviations with periods)
* Get Golden Rule #18 passing - Handling of a.m. or p.m. followed by a capitalized non sentence starter (ex. "At 5 p.m. Mr. Smith went to the bank. He left the bank at 6 p.m. Next he went to the store." --> ["At 5 p.m. Mr. Smith went to the bank.", "He left the bank at 6 p.m.", "Next he went to the store."])
+* Support for Thai. This is a very challenging problem due to the absence of explicit sentence markers (i.e. like a period in English) and the ambiguity in Thai regarding what constitutes a sentence even among native speakers. For more information see the following research papers ([#1](http://www.cs.cmu.edu/~paisarn/papers/iccpol2001.pdf) | [#2](http://pioneer.chula.ac.th/~awirote/ling/snlp2007-wirote.pdf)).
## Change Log
**Version 0.0.1**
* Initial Release
@@ -801,10 +803,13 @@
**Version 0.3.0**
* Add support for square brackets
* Add support for continuous exclamation points or questions marks or combinations of both
* Fix Roman numeral support
-* Add English abbreviations
+* Add English abbreviations
+
+**Version 0.3.1**
+* Fix undefined method 'gsub!' for nil:NilClass issue
## Contributing
If you find a text that is incorrectly segmented using this gem, please submit an issue.
\ No newline at end of file