Sha256: a480f5e878849300b6056adce1ed60a1f65c9cb5c9632db98afed3768275f4b0

Contents?: true

Size: 603 Bytes

Versions: 7

Compression:

Stored size: 603 Bytes

Contents

# Find the Regional Flavor of topics using Geolocated Wikipedia Articles

(Chapter 1 of "Big Data for Chimps")

1. article -> wordbag
2. join on page data to get geolocation
3. use pagelinks to get larger pool of implied geolocations
4. turn geolocations into quadtile keys
5. aggregate topics by quadtile
6. take summary statistics aggregated over term and quadkey
7. combine those statistics to identify terms that occur more frequently than the base rate would predict
8. explore and validate the results
9. filter to find strongly-flavored words, and other reductions of the data for visualization

Version data entries

7 entries across 7 versions & 2 rubygems

Version Path
ul-wukong-4.1.1 examples/text/regional_flavor/README.md
ul-wukong-4.1.0 examples/text/regional_flavor/README.md
wukong-4.0.0 examples/text/regional_flavor/README.md
wukong-3.0.1 examples/text/regional_flavor/README.md
wukong-3.0.0 examples/text/regional_flavor/README.md
wukong-3.0.0.pre3 examples/text/regional_flavor/README.md
wukong-3.0.0.pre2 examples/text/regional_flavor/README.md