Sha256: a480f5e878849300b6056adce1ed60a1f65c9cb5c9632db98afed3768275f4b0
Contents?: true
Size: 603 Bytes
Versions: 7
Compression:
Stored size: 603 Bytes
Contents
# Find the Regional Flavor of topics using Geolocated Wikipedia Articles (Chapter 1 of "Big Data for Chimps") 1. article -> wordbag 2. join on page data to get geolocation 3. use pagelinks to get larger pool of implied geolocations 4. turn geolocations into quadtile keys 5. aggregate topics by quadtile 6. take summary statistics aggregated over term and quadkey 7. combine those statistics to identify terms that occur more frequently than the base rate would predict 8. explore and validate the results 9. filter to find strongly-flavored words, and other reductions of the data for visualization
Version data entries
7 entries across 7 versions & 2 rubygems