# More sources to add * [CJK Decomposition data](http://cjkdecomp.codeplex.com/) * [Jun-Da Character Frequenct Lists](http://lingua.mtsu.edu/chinese-computing/) * Unihan ** [On line lookup](http://unicode.org/charts/unihan.html) ** [Raw Data](http://www.unicode.org/Public/UNIDATA/) ** [Single Zip](http://www.unicode.org/Public/UNIDATA/Unihan.zip) * [KanjiVG](https://github.com/kanjivg/kanjivg) * [Wikipedia : Ancient Chinese characters project](http://commons.wikimedia.org/wiki/Commons:Ancient_Chinese_characters_project) * [Hanzim Data](http://interstitiality.net/hanziData.html) ## Corpora * [Leiden Weibo Corpus](http://lwc.daanvanesch.nl/) * [The Lancaster Corpus of Mandarin Chinese](http://www.ota.ox.ac.uk/headers/2474.xml) * [Blog post: Top 5 "Language data consortium" corpora for Mandarin](http://corplinguistics.wordpress.com/2011/10/30/top-five-ldc-corpora/)