=========
Tablestakes
===========

[![Gem Version](https://badge.fury.io/rb/tablestakes.svg)](http://badge.fury.io/rb/tablestakes)

Tablestakes is a gem for processing tabular data.  It is for people who would rather not meddle with
a spreadsheet, or load their data into a SQL database.  You get the instant gratification of being
able to read a tab-delimited file, with header values, and then do field counts, field modifications, 
selections, joins to your heart's content. Tablestakes operates only in memory, so it
is fast.  Of course that also means that there are some size limitations -- very large tables
should be processed with another library.

Contents
--------
- [How to install](#how-to-install)
- [Philosophy and Conventions](#philosophy-and-conventions)
- [Loading and Writing Files](#loading-and-writing-files)
- [Selecting Data](#selecting-data)
- [Counting Data](#counting-data)
- [Updating Data](#updating-data)
- [Join, Union, and Intersect](#join-union-and-intersect)
- [Interacting with your Data](#interacting-with-your-data)

How to Install
--------------

1.  Install the gem

    ```shell
    gem install tablestakes
    ```

2.  Add the `tablestakes` gem to your ruby code

    ```ruby
    require 'tablestakes'
    ```
    
Now you're ready to start slicing and dicing your data tables!


Philosophy and Conventions
--------------------------

Tablestakes is meant to be fast and easy for manipulating your data. It maintains Ruby
conventions, like method chaining and mostly non-destructive methods.

Tablestakes tables also maintain some conventions for simplicity:

* Table column names are always the values in the first row of your data file.
* Fields in the table are always strings (conversion to numbers or dates is a potential enhancement).
* Methods only modify one dimension at a time.  So, for instance, `Table#select` only selects
columns and `Table#where` only selects rows. Chain them together for the desired effect.
* Tables are ordered, both columns and rows, until modified.


Loading and Writing Files
-------------------------
Tables can be created from tab-delimited data files, using the Table constructor:

```ruby
cities = Table.new('cities.txt')
```

Tables can also be created from other tables (useful in saving sub-tables), they can be
created from an Array of rows (embedded Arrays), and the `Table#new` function also
creates a blank table when no parameters (or nil) are given.

Tables are saved as tab-delimited files using the `Table#write_file` method:

```ruby
cities.write_file('new-cities.txt')
```

Tables can also be sent to your favorite I/O channel using the `Table#to_s` function,
which creates a tab-delimited string.


Selecting Data
--------------

Selecting your data happens in two dimensions - rows and columns.  First, you
can create an ordered Array of a row or column just by asking for it by header name.

```ruby
cities.column('State')     # returns ["Texas", "Tennessee", "California", ...]
```

If you're uncertain of your header names, they are accessible as an Array via the
`Table#headers` method.

```ruby
cities.headers    # returns ["2012 rank", "City", "State", ...]
```

Rows can be returned when a numeric index is known.  To return the first data row:

```ruby
cities.row(0)     # returns ["119", "Amarillo", "Texas", "195250", "190695", ...]
```

Table columns can be selected specifically with the `Table#select` method:

```ruby
cities.select("City", "State", "2010 Census")    # returns a table with only those columns
```

In order to select rows, use `Table#where`, which allows you to select rows given a ruby
statement to be evaluated against a given value in a column.  For instance:

```ruby
cities.where("State", "=~ /^N.*/")    # returns a sub-table of cities in states that begin with 'N'
```

Use single quotes when comparing your column value to a string.  Also, note that all
numeric data is stored as a string value unless explicitly converted by your selection
statement.


Counting Data
-------------

One reason to manipulate tables quickly in memory is to get counts for histograms,
pie charts, and other data analysis representations.  Tablestakes gives you simple methods
for counting.

```ruby
cities.size    # returns the number of rows in the cities table
cities.length  # same as cities.size
cities.count   # same as cities.size
cities.count('State', 'New York') # returns the number of entries that have State=='New York'
```

If you want to know the frequency of certain values in your data
set, there are a couple of methods for selecting the most and 
least frequent items.

```ruby
cities.top("State")         # returns the state with the most cities listed
cities.top("State", 5)      # returns the 5 most frequent states
cities.bottom("State", 5)   # returns the 5 least frequent states
```

Additionally, you can create a separate Table object that tallies on a given column

```ruby
cities.tally('State')  # returns a Table of States and the number of times they appear
puts cities.tally('State').to_s # print a table of the frequency that states appear
```


Updating Data
-------------

Sometimes data in a table needs to be cleaned up and modified.  
the `Table#sub` method provides a way to eliminate common garbage from 
your data such as stray characters.

```ruby
cities.sub("2012 land area", /.*sq mi/, '')     # deletes 'sq mi' from the 2012 land area field
``` 

`Table#sub` takes a regular expression and a substitute string, which 
gives some flexibility in how data is updated.  Note that this is 
a method which modifies the table object.

Join, Union, and Intersect
--------------------------

Once your tables are read into memory, it is useful to join them
with other tables or find the common elements.  Tablestakes 
provides a simple join function as follows

```ruby
capitals.join(cities, "Capital", "City")    # create a table which only contains highly populated Capital cities
```

You may also need to quickly compare the elements of one column 
in a table with the elements in another table.  `Table#union` and `Table#intersect` 
are for that situation.

```ruby
capitals.union(cities, "Capitals", "Cities")       # returns an array of all cities in both tables
capitals.intersect(cities, "Capitals", "Cities")   # returns an array of only the cities in both tables
```

Interacting with your Data
--------------------------

Typically, you can accomplish your goals with chained queries of the datatable.  Here
are some examples:

1.  Find all of the cities in New York

    ```ruby
    ny_cities = cities.where("State", "== 'New York'")
    ```
    
2.  Find all of the capitals which are not in the set of most populated cities

    ```ruby
    small_caps = capitals.column("Capital") - capitals.join(cities, 'Capital', 'City').column('Capital')
    ```
    
3.  Read a file, select the columns and rows you want, and write the subtable as a tab-delimited
file.
     
    ```ruby
    Table.new('cities.txt').select('City','State','2012 estimate').where('2012 estimate', ".to_i > 1000000").write_file('big_cities.txt')
    ```
    
Some methods, such as `Table#row` and `Table#column` return Arrays, and of course these are
readily modified using their own native methods.

Future Enhancements
-------------------

Some future enhancements that would make this gem better include:

1.  Implement Ruby Enumerators

2.  Include some concept of data type ... at least FixedNum and Date. 

3.  `Table#sort` method -- probably requires enumerators and data types to be effective.