README.rst in tomoto-0.2.3

- old
+ new

@@ -200,10 +200,59 @@
     print("Log-likelihood of inference: ", ll)
 
 The `infer` method can infer only one instance of `tomotopy.Document` or a `list` of instances of `tomotopy.Document`. 
 See more at `tomotopy.LDAModel.infer`.
 
+Corpus and transform
+--------------------
+Every topic model in `tomotopy` has its own internal document type.
+A document can be created and added into suitable for each model through each model's `add_doc` method. 
+However, trying to add the same list of documents to different models becomes quite inconvenient, 
+because `add_doc` should be called for the same list of documents to each different model.
+Thus, `tomotopy` provides `tomotopy.utils.Corpus` class that holds a list of documents. 
+`tomotopy.utils.Corpus` can be inserted into any model by passing as argument `corpus` to `__init__` or `add_corpus` method of each model. 
+So, inserting `tomotopy.utils.Corpus` just has the same effect to inserting documents the corpus holds.
+
+Some topic models requires different data for its documents. 
+For example, `tomotopy.DMRModel` requires argument `metadata` in `str` type, 
+but `tomotopy.PLDAModel` requires argument `labels` in `List[str]` type. 
+Since `tomotopy.utils.Corpus` holds an independent set of documents rather than being tied to a specific topic model, 
+data types required by a topic model may be inconsistent when a corpus is added into that topic model. 
+In this case, miscellaneous data can be transformed to be fitted target topic model using argument `transform`. 
+See more details in the following code:
+
+::
+
+    from tomotopy import DMRModel
+    from tomotopy.utils import Corpus
+
+    corpus = Corpus()
+    corpus.add_doc("a b c d e".split(), a_data=1)
+    corpus.add_doc("e f g h i".split(), a_data=2)
+    corpus.add_doc("i j k l m".split(), a_data=3)
+
+    model = DMRModel(k=10)
+    model.add_corpus(corpus) 
+    # You lose `a_data` field in `corpus`, 
+    # and `metadata` that `DMRModel` requires is filled with the default value, empty str.
+
+    assert model.docs[0].metadata == ''
+    assert model.docs[1].metadata == ''
+    assert model.docs[2].metadata == ''
+
+    def transform_a_data_to_metadata(misc: dict):
+        return {'metadata': str(misc['a_data'])}
+    # this function transforms `a_data` to `metadata`
+
+    model = DMRModel(k=10)
+    model.add_corpus(corpus, transform=transform_a_data_to_metadata)
+    # Now docs in `model` has non-default `metadata`, that generated from `a_data` field.
+
+    assert model.docs[0].metadata == '1'
+    assert model.docs[1].metadata == '2'
+    assert model.docs[2].metadata == '3'
+
 Parallel Sampling Algorithms
 ----------------------------
 Since version 0.5.0, `tomotopy` allows you to choose a parallelism algorithm. 
 The algorithm provided in versions prior to 0.4.2 is `COPY_MERGE`, which is provided for all topic models.
 The new algorithm `PARTITION`, available since 0.5.0, makes training generally faster and more memory-efficient, but it is available at not all topic models.
@@ -258,9 +307,15 @@
 `tomotopy` is licensed under the terms of MIT License, 
 meaning you can use it for any reasonable purpose and remain in complete ownership of all the documentation you produce.
 
 History
 -------
+* 0.12.1 (2021-06-20)
+    * An issue where `tomotopy.LDAModel.set_word_prior()` causes a crash has been fixed.
+    * Now `tomotopy.LDAModel.perplexity` and `tomotopy.LDAModel.ll_per_word` return the accurate value when `TermWeight` is not `ONE`.
+    * `tomotopy.LDAModel.used_vocab_weighted_freq` was added, which returns term-weighted frequencies of words.
+    * Now `tomotopy.LDAModel.summary()` shows not only the entropy of words, but also the entropy of term-weighted words.
+
 * 0.12.0 (2021-04-26)
     * Now `tomotopy.DMRModel` and `tomotopy.GDMRModel` support multiple values of metadata (see https://github.com/bab2min/tomotopy/blob/main/examples/dmr_multi_label.py )
     * The performance of `tomotopy.GDMRModel` was improved.
     * A `copy()` method has been added for all topic models to do a deep copy.
     * An issue was fixed where words that are excluded from training (by `min_cf`, `min_df`) have incorrect topic id. Now all excluded words have `-1` as topic id.