org.apache.cassandra.hadoop
Class ColumnFamilyInputFormat

java.lang.Object
  extended by org.apache.hadoop.mapreduce.InputFormat<java.lang.String,java.util.SortedMap<byte[],IColumn>>
      extended by org.apache.cassandra.hadoop.ColumnFamilyInputFormat

public class ColumnFamilyInputFormat
extends org.apache.hadoop.mapreduce.InputFormat<java.lang.String,java.util.SortedMap<byte[],IColumn>>

Hadoop InputFormat allowing map/reduce against Cassandra rows within one ColumnFamily. At minimum, you need to set the CF and predicate (description of columns to extract from each row) in your Hadoop job Configuration. The ConfigHelper class is provided to make this simple: ConfigHelper.setColumnFamily ConfigHelper.setSlicePredicate You can also configure the number of rows per InputSplit with ConfigHelper.setInputSplitSize This should be "as big as possible, but no bigger." Each InputSplit is read from Cassandra with multiple get_slice_range queries, and the per-call overhead of get_slice_range is high, so larger split sizes are better -- but if it is too large, you will run out of memory. The default split size is 64k rows.


Constructor Summary
ColumnFamilyInputFormat()
           
 
Method Summary
 org.apache.hadoop.mapreduce.RecordReader<java.lang.String,java.util.SortedMap<byte[],IColumn>> createRecordReader(org.apache.hadoop.mapreduce.InputSplit inputSplit, org.apache.hadoop.mapreduce.TaskAttemptContext taskAttemptContext)
           
 java.util.List<org.apache.hadoop.mapreduce.InputSplit> getSplits(org.apache.hadoop.mapreduce.JobContext context)
           
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Constructor Detail

ColumnFamilyInputFormat

public ColumnFamilyInputFormat()
Method Detail

getSplits

public java.util.List<org.apache.hadoop.mapreduce.InputSplit> getSplits(org.apache.hadoop.mapreduce.JobContext context)
                                                                 throws java.io.IOException
Specified by:
getSplits in class org.apache.hadoop.mapreduce.InputFormat<java.lang.String,java.util.SortedMap<byte[],IColumn>>
Throws:
java.io.IOException

createRecordReader

public org.apache.hadoop.mapreduce.RecordReader<java.lang.String,java.util.SortedMap<byte[],IColumn>> createRecordReader(org.apache.hadoop.mapreduce.InputSplit inputSplit,
                                                                                                                         org.apache.hadoop.mapreduce.TaskAttemptContext taskAttemptContext)
                                                                                                                  throws java.io.IOException,
                                                                                                                         java.lang.InterruptedException
Specified by:
createRecordReader in class org.apache.hadoop.mapreduce.InputFormat<java.lang.String,java.util.SortedMap<byte[],IColumn>>
Throws:
java.io.IOException
java.lang.InterruptedException


Copyright © 2010 The Apache Software Foundation